DBSCAN Clustering

Example Usage


1from mltrain.unsupervised.DBSCAN import DBSCAN
2
3# Initialize the model
4model = DBSCAN(epsilon=0.5, min_points=5)
5
6# Train the model
7cluster_labels = model.train(X_train)
8
9# Get the assigned labels
10labels = model.get_labels()
11

Overview

The DBSCAN class implements the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. DBSCAN groups together points that are closely packed and marks points that lie alone in low-density regions as outliers or noise.

Hyperparameters

epsilon (default=0.5): The maximum distance between two samples for one to be considered as in the neighborhood of the other.
min_points (default=5): The number of samples in a neighborhood for a point to be considered as a core point.

Attributes

labels (numpy.ndarray): The labels assigned to each point in the dataset after training. A label of -1 indicates noise.
cluster_id (int): The current cluster ID used during the clustering process.

Methods

`init(self, epsilon=0.5, min_points=5)`

Initializes the DBSCAN model with the specified parameters.

Args:
- epsilon (float): The maximum distance between two samples for one to be considered as in the neighborhood of the other.
- min_points (int): The number of samples in a neighborhood for a point to be considered as a core point.

`train(self, X)`

Trains the DBSCAN model using the provided dataset.

Args:
- X (numpy.ndarray): The input data points.
Returns:
- numpy.ndarray: The labels assigned to each point in the dataset. A label of -1 indicates noise.

`compute_distance_matrix(self, X)`

Computes the pairwise distance matrix for the dataset.

Args:
- X (numpy.ndarray): The input data points.
Returns:
- numpy.ndarray: A distance matrix where the entry (i, j) represents the distance between point i and point j.

`region_query(self, i, distance_matrix)`

Finds all points within epsilon distance from point i.

Args:
- i (int): The index of the point to query.
- distance_matrix (numpy.ndarray): The precomputed distance matrix.
Returns:
- numpy.ndarray: An array of indices of all points within epsilon distance from point i.

`append_cluster(self, i, neighbours, visited_samples, distance_matrix, cluster_id)`

Expands the cluster by adding all density-reachable points.

Args:
- i (int): The index of the initial core point.
- neighbours (numpy.ndarray): The initial set of neighbors within epsilon distance.
- visited_samples (numpy.ndarray): A boolean array indicating whether each point has been visited.
- distance_matrix (numpy.ndarray): The precomputed distance matrix.
- cluster_id (int): The current cluster id to assign to points.

`get_labels(self)`

Returns the labels assigned to the data points after training.

Returns:
- numpy.ndarray: An array of labels where each label corresponds to a cluster id, or -1 for noise.

DBSCAN Clustering

Example Usage

Overview

Hyperparameters

Attributes

Methods

__init__(self, epsilon=0.5, min_points=5)

train(self, X)

compute_distance_matrix(self, X)

region_query(self, i, distance_matrix)