1from mltrain.unsupervised.KMeans import KMeans 2 3# Initialize the model 4model = KMeans(k=3, epochs=100) 5 6# Train the model 7cluster_labels = model.train(X_train) 8 9# The centroids can be accessed with 10centroids = model.centroids 11
The KMeans class implements the K-Means clustering algorithm. This algorithm partitions the dataset into k clusters, where each data point belongs to the cluster with the nearest centroid. The centroids are updated iteratively until convergence or until the maximum number of iterations is reached.
k (default=3): The number of clusters to form.epochs (default=100): The maximum number of iterations for the algorithm.centroids (numpy.ndarray): The coordinates of the centroids after training.__init__(self, k=3, epochs=100)Initializes the KMeans model with the specified parameters.
k (int): The number of clusters to form.epochs (int): The maximum number of iterations for the algorithm.euclidean(self, x1, x2)Computes the Euclidean distance between two points.
x1 (numpy.ndarray): The first data point.x2 (numpy.ndarray): The second data point.float: The Euclidean distance between x1 and x2.get_cluster_labels(self, clusters, X)Assigns labels to each data point based on the cluster it belongs to.
clusters (list of lists): A list of clusters, where each cluster is a list of indices of the data points.X (numpy.ndarray): The dataset.numpy.ndarray: An array of cluster labels for each data point in X.closest_centroid(self, x1)Finds the index of the closest centroid to the given data point.
x1 (numpy.ndarray): The data point.int: The index of the closest centroid.create_clusters(self, X)Creates clusters by assigning each data point to the closest centroid.
X (numpy.ndarray): The dataset.list of lists: A list of clusters, where each cluster is a list of indices of the data points.create_new_centroids(self, clusters, X)Updates the centroids by calculating the mean of all data points in each cluster.
clusters (list of lists): A list of clusters, where each cluster is a list of indices of the data points.X (numpy.ndarray): The dataset.numpy.ndarray: An array of updated centroids.train(self, X)Trains the KMeans model by repeatedly assigning data points to the closest centroid and updating the centroids.
X (numpy.ndarray): The dataset.numpy.ndarray: An array of cluster labels for each data point in X.