Clustering in Data Sciences-an Introduction

Data in itself is merely a number, so is age, yet we make a big fuss about it.No matter how old you are, you are just one step away from Victory.

If you are aspiring to become a Data Scientist, then gaining power and knowledge over Clustering Algorithm is imperative.


Clustering is a Machine Learning technique that falls under the group of Unsupervised learning. Clustering is one of the most popular Statistical technique employed in Machine learning, to group data points with similarities.


We shall explain this concept of Clustering using the following illustrations.

You walk into a gym, the most prominent feature that you will observer would be tiny groups of men working out in turns.

These groups will start their workouts together and will end their workouts together, all at the same time. In Gym terms they are called as workout buddies. In data sciences terms, we call them as Clusters.

Have you heard the proverbial saying, “Birds of feather, flock together”, “Eagles don’t fly with Ducks”, all of these are merely meant to tell you that like personalities/ items will always get grouped together. This is all the more prevalent even in our every walk of life that you can imagine or think off.

Definition of Clustering from Data Sciences perspective

Clustering can be defined as a process of grouping the given data set into subset/smaller sub classes called clusters, that have like/similar characteristics. The keyword is “Similarity”

How Clustering Algorithm is designed-layman’s view

Take the data set which ought to be Clustered. Find the Similarity that features predominantly. Design the Cluster Algorithm and apply it to the data. The output received would be the Clustered data set.

Application of Clustering Techniques

The opportunity to apply Clustering technique is limitless.
Some examples may include.

Banking-Credit cards

Supermarket special promotions

Car loans

Hotel reward programs

Air Travels

We give below a brief understanding as to how Clustering technique is employed.

A simple illustration

Let us illustrate the first example. Most banks offer a variety of Credit cards, ranging from Normal levels to Gold Standards. Of course these credit cards will not be offered to anybody and everybody. The banks maintain a data set of all its clients, and they are all classified by clustering technique.

The algorithm used may test and qualify the clients based on the following clustering criteria, that is, all those who retain a balance of,

 greater than $100,000

around $100,000

lesser than $ 100,000

Clearly, now the bank will know who would qualify for the award of Gold standard Credit cards and who qualifies for the Normal brass standard Credit cards.

Some Advantages of using Clustering techniques

The results obtained using Clustering technique have the
following attributes

  • Scalable
  • Interpretable
  • Comprehensible
  • Useable

Types of Clustering Methods

There are different Methods available, but we shall list
only the prominent ones.

Methods suggested by Farley and Raftery (1998)

  1. Hierarchical Clustering Method
  2. Partitioning Clustering Method

Methods suggested by Han and Kamber (2001)

  1. Density based Clustering Method
  2. Model based Clustering Method
  3. Grid based Clustering Method

Hierarchical Clustering can be further divided as follows

  1. Agglomerative Hierarchical Clustering
  2. Divisive Hierarchical Clustering

0 responses on "Clustering in Data Sciences-an Introduction"

Leave a Message

Your email address will not be published. Required fields are marked *, All rights reserved. SiteLock
error: Content is protected !!