1from mltrain.unsupervised.KMeans import KMeans 2 3# Initialize the model 4model = KMeans(k=3, epochs=100) 5 6# Train the model 7cluster_labels = model.train(X_train) 8 9# The centroids can be accessed with 10centroids = model.centroids 11
The KMeans
class implements the K-Means clustering algorithm. This algorithm partitions the dataset into k
clusters, where each data point belongs to the cluster with the nearest centroid. The centroids are updated iteratively until convergence or until the maximum number of iterations is reached.
k
(default=3): The number of clusters to form.epochs
(default=100): The maximum number of iterations for the algorithm.centroids
(numpy.ndarray): The coordinates of the centroids after training.__init__(self, k=3, epochs=100)
Initializes the KMeans model with the specified parameters.
k
(int): The number of clusters to form.epochs
(int): The maximum number of iterations for the algorithm.euclidean(self, x1, x2)
Computes the Euclidean distance between two points.
x1
(numpy.ndarray): The first data point.x2
(numpy.ndarray): The second data point.float
: The Euclidean distance between x1
and x2
.get_cluster_labels(self, clusters, X)
Assigns labels to each data point based on the cluster it belongs to.
clusters
(list of lists): A list of clusters, where each cluster is a list of indices of the data points.X
(numpy.ndarray): The dataset.numpy.ndarray
: An array of cluster labels for each data point in X
.closest_centroid(self, x1)
Finds the index of the closest centroid to the given data point.
x1
(numpy.ndarray): The data point.int
: The index of the closest centroid.create_clusters(self, X)
Creates clusters by assigning each data point to the closest centroid.
X
(numpy.ndarray): The dataset.list of lists
: A list of clusters, where each cluster is a list of indices of the data points.create_new_centroids(self, clusters, X)
Updates the centroids by calculating the mean of all data points in each cluster.
clusters
(list of lists): A list of clusters, where each cluster is a list of indices of the data points.X
(numpy.ndarray): The dataset.numpy.ndarray
: An array of updated centroids.train(self, X)
Trains the KMeans model by repeatedly assigning data points to the closest centroid and updating the centroids.
X
(numpy.ndarray): The dataset.numpy.ndarray
: An array of cluster labels for each data point in X
.