1 2 3 4 5 6 7 8 9 10from mltrain.unsupervised.DBSCAN import DBSCAN # Initialize the model model = DBSCAN(epsilon=0.5, min_points=5) # Train the model cluster_labels = model.train(X_train) # Get the assigned labels labels = model.get_labels()
The DBSCAN class implements the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. DBSCAN groups together points that are closely packed and marks points that lie alone in low-density regions as outliers or noise.
epsilon (default=0.5): The maximum distance between two samples for one to be considered as in the neighborhood of the other.min_points (default=5): The number of samples in a neighborhood for a point to be considered as a core point.labels (numpy.ndarray): The labels assigned to each point in the dataset after training. A label of -1 indicates noise.cluster_id (int): The current cluster ID used during the clustering process.__init__(self, epsilon=0.5, min_points=5)Initializes the DBSCAN model with the specified parameters.
epsilon (float): The maximum distance between two samples for one to be considered as in the neighborhood of the other.min_points (int): The number of samples in a neighborhood for a point to be considered as a core point.train(self, X)Trains the DBSCAN model using the provided dataset.
X (numpy.ndarray): The input data points.numpy.ndarray: The labels assigned to each point in the dataset. A label of -1 indicates noise.compute_distance_matrix(self, X)Computes the pairwise distance matrix for the dataset.
X (numpy.ndarray): The input data points.numpy.ndarray: A distance matrix where the entry (i, j) represents the distance between point i and point j.region_query(self, i, distance_matrix)Finds all points within epsilon distance from point i.
i (int): The index of the point to query.distance_matrix (numpy.ndarray): The precomputed distance matrix.numpy.ndarray: An array of indices of all points within epsilon distance from point i.append_cluster(self, i, neighbours, visited_samples, distance_matrix, cluster_id)Expands the cluster by adding all density-reachable points.
i (int): The index of the initial core point.neighbours (numpy.ndarray): The initial set of neighbors within epsilon distance.visited_samples (numpy.ndarray): A boolean array indicating whether each point has been visited.distance_matrix (numpy.ndarray): The precomputed distance matrix.cluster_id (int): The current cluster id to assign to points.get_labels(self)Returns the labels assigned to the data points after training.
numpy.ndarray: An array of labels where each label corresponds to a cluster id, or -1 for noise.