Skip to Content
Learn
K-Means++ Clustering
Review

Congratulations, now your K-Means model is improved and ready to go!

K-Means++ improves K-Means by placing initial centroids more strategically. As a result, it can result in more optimal clusterings than K-Means.

It can also outperform K-Means in speed. If you get very unlucky initial centroids using K-Means, the algorithm can take a long time to converge. K-Means++ will often converge quicker!

You can implement K-Means++ with the scikit-learn library similar to how you implement K-Means.

The KMeans() function has an init parameter, which specifies the method for initialization:

  • 'random'
  • 'k-means++'

Note: scikit-learn’s KMeans() uses 'k-means++' by default, but it is a good idea to be explicit!

Instructions

The code in the workspace performs two clusterings on Codecademy learner data using K-Means. The first algorithm initializes the centroids at the x positions given on line 12 and the y positions given on line 13. The second algorithm initializes the centroids according to the K-Means++ algorithm.

Try changing the positions at which the centroids are initialized on lines 12 and 13. How does changing the initialization position affect the final clustering? And how does the first clustering compare to the K-Means++ clustering?

Make sure to scroll down to see the second graph!

Folder Icon

Sign up to start coding

Already have an account?