• Home
  • Docs
  • About






  • K-Means Clustering

    Example Usage

    1from mltrain.unsupervised.KMeans import KMeans 2 3# Initialize the model 4model = KMeans(k=3, epochs=100) 5 6# Train the model 7cluster_labels = model.train(X_train) 8 9# The centroids can be accessed with 10centroids = model.centroids 11

    Overview

    The KMeans class implements the K-Means clustering algorithm. This algorithm partitions the dataset into k clusters, where each data point belongs to the cluster with the nearest centroid. The centroids are updated iteratively until convergence or until the maximum number of iterations is reached.


    Hyperparameters

    • k (default=3): The number of clusters to form.
    • epochs (default=100): The maximum number of iterations for the algorithm.

    Attributes

    • centroids (numpy.ndarray): The coordinates of the centroids after training.

    Methods

    __init__(self, k=3, epochs=100)

    Initializes the KMeans model with the specified parameters.

    • Args:
      • k (int): The number of clusters to form.
      • epochs (int): The maximum number of iterations for the algorithm.

    euclidean(self, x1, x2)

    Computes the Euclidean distance between two points.

    • Args:
      • x1 (numpy.ndarray): The first data point.
      • x2 (numpy.ndarray): The second data point.
    • Returns:
      • float: The Euclidean distance between x1 and x2.

    get_cluster_labels(self, clusters, X)

    Assigns labels to each data point based on the cluster it belongs to.

    • Args:
      • clusters (list of lists): A list of clusters, where each cluster is a list of indices of the data points.
      • X (numpy.ndarray): The dataset.
    • Returns:
      • numpy.ndarray: An array of cluster labels for each data point in X.

    closest_centroid(self, x1)

    Finds the index of the closest centroid to the given data point.

    • Args:
      • x1 (numpy.ndarray): The data point.
    • Returns:
      • int: The index of the closest centroid.

    create_clusters(self, X)

    Creates clusters by assigning each data point to the closest centroid.

    • Args:
      • X (numpy.ndarray): The dataset.
    • Returns:
      • list of lists: A list of clusters, where each cluster is a list of indices of the data points.

    create_new_centroids(self, clusters, X)

    Updates the centroids by calculating the mean of all data points in each cluster.

    • Args:
      • clusters (list of lists): A list of clusters, where each cluster is a list of indices of the data points.
      • X (numpy.ndarray): The dataset.
    • Returns:
      • numpy.ndarray: An array of updated centroids.

    train(self, X)

    Trains the KMeans model by repeatedly assigning data points to the closest centroid and updating the centroids.

    • Args:
      • X (numpy.ndarray): The dataset.
    • Returns:
      • numpy.ndarray: An array of cluster labels for each data point in X.