K-Means Clustering

Example Usage


1from mltrain.unsupervised.KMeans import KMeans
2
3# Initialize the model
4model = KMeans(k=3, epochs=100)
5
6# Train the model
7cluster_labels = model.train(X_train)
8
9# The centroids can be accessed with
10centroids = model.centroids
11

Overview

The KMeans class implements the K-Means clustering algorithm. This algorithm partitions the dataset into k clusters, where each data point belongs to the cluster with the nearest centroid. The centroids are updated iteratively until convergence or until the maximum number of iterations is reached.

Hyperparameters

k (default=3): The number of clusters to form.
epochs (default=100): The maximum number of iterations for the algorithm.

Attributes

centroids (numpy.ndarray): The coordinates of the centroids after training.

Methods

`init(self, k=3, epochs=100)`

Initializes the KMeans model with the specified parameters.

Args:
- k (int): The number of clusters to form.
- epochs (int): The maximum number of iterations for the algorithm.

`euclidean(self, x1, x2)`

Computes the Euclidean distance between two points.

Args:
- x1 (numpy.ndarray): The first data point.
- x2 (numpy.ndarray): The second data point.
Returns:
- float: The Euclidean distance between x1 and x2.

`get_cluster_labels(self, clusters, X)`

Assigns labels to each data point based on the cluster it belongs to.

Args:
- clusters (list of lists): A list of clusters, where each cluster is a list of indices of the data points.
- X (numpy.ndarray): The dataset.
Returns:
- numpy.ndarray: An array of cluster labels for each data point in X.

`closest_centroid(self, x1)`

Finds the index of the closest centroid to the given data point.

Args:
- x1 (numpy.ndarray): The data point.
Returns:
- int: The index of the closest centroid.

`create_clusters(self, X)`

Creates clusters by assigning each data point to the closest centroid.

Args:
- X (numpy.ndarray): The dataset.
Returns:
- list of lists: A list of clusters, where each cluster is a list of indices of the data points.

`create_new_centroids(self, clusters, X)`

Updates the centroids by calculating the mean of all data points in each cluster.

Args:
- clusters (list of lists): A list of clusters, where each cluster is a list of indices of the data points.
- X (numpy.ndarray): The dataset.
Returns:
- numpy.ndarray: An array of updated centroids.

`train(self, X)`

Trains the KMeans model by repeatedly assigning data points to the closest centroid and updating the centroids.

Args:
- X (numpy.ndarray): The dataset.
Returns:
- numpy.ndarray: An array of cluster labels for each data point in X.

K-Means Clustering

Example Usage

Overview

Hyperparameters

Attributes

Methods

__init__(self, k=3, epochs=100)

euclidean(self, x1, x2)

get_cluster_labels(self, clusters, X)

closest_centroid(self, x1)

create_clusters(self, X)

create_new_centroids(self, clusters, X)