1from mltrain.unsupervised.DBSCAN import DBSCAN 2 3# Initialize the model 4model = DBSCAN(epsilon=0.5, min_points=5) 5 6# Train the model 7cluster_labels = model.train(X_train) 8 9# Get the assigned labels 10labels = model.get_labels() 11
The DBSCAN
class implements the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. DBSCAN groups together points that are closely packed and marks points that lie alone in low-density regions as outliers or noise.
epsilon
(default=0.5): The maximum distance between two samples for one to be considered as in the neighborhood of the other.min_points
(default=5): The number of samples in a neighborhood for a point to be considered as a core point.labels
(numpy.ndarray): The labels assigned to each point in the dataset after training. A label of -1 indicates noise.cluster_id
(int): The current cluster ID used during the clustering process.__init__(self, epsilon=0.5, min_points=5)
Initializes the DBSCAN model with the specified parameters.
epsilon
(float): The maximum distance between two samples for one to be considered as in the neighborhood of the other.min_points
(int): The number of samples in a neighborhood for a point to be considered as a core point.train(self, X)
Trains the DBSCAN model using the provided dataset.
X
(numpy.ndarray): The input data points.numpy.ndarray
: The labels assigned to each point in the dataset. A label of -1 indicates noise.compute_distance_matrix(self, X)
Computes the pairwise distance matrix for the dataset.
X
(numpy.ndarray): The input data points.numpy.ndarray
: A distance matrix where the entry (i, j) represents the distance between point i and point j.region_query(self, i, distance_matrix)
Finds all points within epsilon distance from point i
.
i
(int): The index of the point to query.distance_matrix
(numpy.ndarray): The precomputed distance matrix.numpy.ndarray
: An array of indices of all points within epsilon distance from point i
.append_cluster(self, i, neighbours, visited_samples, distance_matrix, cluster_id)
Expands the cluster by adding all density-reachable points.
i
(int): The index of the initial core point.neighbours
(numpy.ndarray): The initial set of neighbors within epsilon distance.visited_samples
(numpy.ndarray): A boolean array indicating whether each point has been visited.distance_matrix
(numpy.ndarray): The precomputed distance matrix.cluster_id
(int): The current cluster id to assign to points.get_labels(self)
Returns the labels assigned to the data points after training.
numpy.ndarray
: An array of labels where each label corresponds to a cluster id, or -1 for noise.