Data Science Crash Course 7/10: Clustering & Unsupervised Learning

Let’s learn Data Science in 2020

Przemek Chojecki
4 min readJan 8, 2020

This is 7th instalment of Data Science Crash Course. We learned about supervised learning and what to do when you have a dataset with labels. In this text we’ll look at datasets with no labels provided and talk about unsupervised learning.

Clustering in Python. Data Science Crash Course.

What is Unsupervised Learning

Imagine we have raw data like social statistics related to marketing. For example you’re trying to understand who has bought a MacBook from your ecommerce and you’d like to find people who are similar. Or you’re selling tickets through an online platform and you try to group your clients into different categories so that you can have a coherent message to each group.

In order to cluster data or group your data into categories (which are not given a priori!), you have to use one of clustering algorithms. Again sklearn will come helpful. Let’s review two basic methods with a code example from sklearn.

Clustering methods in Data Science

k-means is the basic technique in clustering. “K” here stands for a number of clusters you want to have. This is arbitrary, you choose this parameter, but there are methods (see elbow method for example), where you…

--

--

Przemek Chojecki
Przemek Chojecki

Written by Przemek Chojecki

AI & crypto, PhD in mathematics, Forbes 30 under 30, former Oxford fellow.

No responses yet