Exponential Functions Since the Independent variable occurs in the Exponent...
If you are aspiring to become a Data Scientist, then gaining power and knowledge over Clustering Algorithm is imperative.
Clustering is a Machine Learning technique that falls under the group of Unsupervised learning. Clustering is one of the most popular Statistical technique employed in Machine learning, to group data points with similarities.
We shall explain this concept of Clustering using the following illustrations.
You walk into a gym, the most prominent feature that you will observer would be tiny groups of men working out in turns.
These groups will start their workouts together and will end their workouts together, all at the same time. In Gym terms they are called as workout buddies. In data sciences terms, we call them as Clusters.
Have you heard the proverbial saying, “Birds of feather, flock together”, “Eagles don’t fly with Ducks”, all of these are merely meant to tell you that like personalities/ items will always get grouped together. This is all the more prevalent even in our every walk of life that you can imagine or think off.
Definition of Clustering from Data Sciences perspective
Clustering can be defined as a process of grouping the given data set into subset/smaller sub classes called clusters, that have like/similar characteristics. The keyword is “Similarity”
How Clustering Algorithm is designed-layman’s view
Take the data set which ought to be Clustered. Find the Similarity that features predominantly. Design the Cluster Algorithm and apply it to the data. The output received would be the Clustered data set.
Application of Clustering Techniques
The opportunity to apply Clustering technique is limitless.
Some examples may include.
Supermarket special promotions
Hotel reward programs
We give below a brief understanding as to how Clustering technique is employed.
A simple illustration
Let us illustrate the first example. Most banks offer a variety of Credit cards, ranging from Normal levels to Gold Standards. Of course these credit cards will not be offered to anybody and everybody. The banks maintain a data set of all its clients, and they are all classified by clustering technique.
The algorithm used may test and qualify the clients based on the following clustering criteria, that is, all those who retain a balance of,
greater than $100,000
lesser than $ 100,000
Clearly, now the bank will know who would qualify for the award of Gold standard Credit cards and who qualifies for the Normal brass standard Credit cards.
Some Advantages of using Clustering techniques
The results obtained using Clustering technique have the
Types of Clustering Methods
There are different Methods available, but we shall list
only the prominent ones.
Methods suggested by Farley and Raftery (1998)
- Hierarchical Clustering Method
- Partitioning Clustering Method
Methods suggested by Han and Kamber (2001)
- Density based Clustering Method
- Model based Clustering Method
- Grid based Clustering Method
Hierarchical Clustering can be further divided as follows
- Agglomerative Hierarchical Clustering
- Divisive Hierarchical Clustering