Data Science Crash Course 7/10: Clustering & Unsupervised Learning
Let’s learn Data Science in 2020
This is 7th instalment of Data Science Crash Course. We learned about supervised learning and what to do when you have a dataset with labels. In this text we’ll look at datasets with no labels provided and talk about unsupervised learning.
What is Unsupervised Learning
Imagine we have raw data like social statistics related to marketing. For example you’re trying to understand who has bought a MacBook from your ecommerce and you’d like to find people who are similar. Or you’re selling tickets through an online platform and you try to group your clients into different categories so that you can have a coherent message to each group.
In order to cluster data or group your data into categories (which are not given a priori!), you have to use one of clustering algorithms. Again sklearn will come helpful. Let’s review two basic methods with a code example from sklearn.
Clustering methods in Data Science
k-means is the basic technique in clustering. “K” here stands for a number of clusters you want to have. This is arbitrary, you choose this parameter, but there are methods (see elbow method for example), where you…