• Home
  • Docs
  • About






  • DBSCAN Clustering

    Example Usage

    1from mltrain.unsupervised.DBSCAN import DBSCAN 2 3# Initialize the model 4model = DBSCAN(epsilon=0.5, min_points=5) 5 6# Train the model 7cluster_labels = model.train(X_train) 8 9# Get the assigned labels 10labels = model.get_labels() 11

    Overview

    The DBSCAN class implements the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. DBSCAN groups together points that are closely packed and marks points that lie alone in low-density regions as outliers or noise.


    Hyperparameters

    • epsilon (default=0.5): The maximum distance between two samples for one to be considered as in the neighborhood of the other.
    • min_points (default=5): The number of samples in a neighborhood for a point to be considered as a core point.

    Attributes

    • labels (numpy.ndarray): The labels assigned to each point in the dataset after training. A label of -1 indicates noise.
    • cluster_id (int): The current cluster ID used during the clustering process.

    Methods

    __init__(self, epsilon=0.5, min_points=5)

    Initializes the DBSCAN model with the specified parameters.

    • Args:
      • epsilon (float): The maximum distance between two samples for one to be considered as in the neighborhood of the other.
      • min_points (int): The number of samples in a neighborhood for a point to be considered as a core point.

    train(self, X)

    Trains the DBSCAN model using the provided dataset.

    • Args:
      • X (numpy.ndarray): The input data points.
    • Returns:
      • numpy.ndarray: The labels assigned to each point in the dataset. A label of -1 indicates noise.

    compute_distance_matrix(self, X)

    Computes the pairwise distance matrix for the dataset.

    • Args:
      • X (numpy.ndarray): The input data points.
    • Returns:
      • numpy.ndarray: A distance matrix where the entry (i, j) represents the distance between point i and point j.

    region_query(self, i, distance_matrix)

    Finds all points within epsilon distance from point i.

    • Args:
      • i (int): The index of the point to query.
      • distance_matrix (numpy.ndarray): The precomputed distance matrix.
    • Returns:
      • numpy.ndarray: An array of indices of all points within epsilon distance from point i.

    append_cluster(self, i, neighbours, visited_samples, distance_matrix, cluster_id)

    Expands the cluster by adding all density-reachable points.

    • Args:
      • i (int): The index of the initial core point.
      • neighbours (numpy.ndarray): The initial set of neighbors within epsilon distance.
      • visited_samples (numpy.ndarray): A boolean array indicating whether each point has been visited.
      • distance_matrix (numpy.ndarray): The precomputed distance matrix.
      • cluster_id (int): The current cluster id to assign to points.

    get_labels(self)

    Returns the labels assigned to the data points after training.

    • Returns:
      • numpy.ndarray: An array of labels where each label corresponds to a cluster id, or -1 for noise.