Random Forest Model

Example Usage


1from mltrain.supervised.RandomForest import RandomForest
2
3# Initialize the model
4model = RandomForest(n_trees=100, max_depth=10, min_samples_split=2, criteria='gini')
5
6# Train the model
7model.train(X_train, y_train)
8
9# Make predictions
10predictions = model.predict(X_test)
11
12# Calculate accuracy
13accuracy = model.accuracy(y_test, predictions)
14
15# Generate confusion matrix
16conf_matrix = model.confusion_matrix(y_test, predictions)
17

Overview

The RandomForest class implements a Random Forest model for classification tasks. This class constructs multiple decision trees using bootstrap sampling and aggregates their predictions to improve classification accuracy and robustness. It supports various hyperparameters to control the behavior of individual trees and the overall forest.

Hyperparameters

n_trees (default=100): The number of decision trees in the forest.
max_depth (default=10): The maximum depth of each decision tree.
min_samples_split (default=2): The minimum number of samples required to split a node in each tree.
criteria (default='gini'): The criterion used to evaluate splits ('gini' or 'entropy').

Attributes

trees (list): A list of trained DecisionTree instances.

Methods

`init(self, n_trees=100, max_depth=10, min_samples_split=2, criteria='gini')`

Initializes the Random Forest model with the specified hyperparameters.

Args:
- n_trees (int): Number of trees in the forest.
- max_depth (int): Maximum depth of each tree.
- min_samples_split (int): Minimum number of samples required to split a node.
- criteria (str): Criterion used to evaluate splits ('gini' or 'entropy').

`bootstrap_sample(self, X, y)`

Generates a bootstrap sample (random sample with replacement) from the dataset.

Args:
- X (numpy.ndarray): The input features.
- y (numpy.ndarray): The target labels.
Returns:
- Tuple[numpy.ndarray, numpy.ndarray]: Bootstrap sample of features and target labels.

`most_common_label(self, y)`

Determines the most common label in the target array.

Args:
- y (numpy.ndarray): The array of target labels.
Returns:
- Any: The most common label in the target array.

`train(self, X, y)`

Trains the random forest by creating and training multiple decision trees.

Args:
- X (numpy.ndarray): The training dataset.
- y (numpy.ndarray): The target labels for the training dataset.
Returns:
- None

`predict(self, X)`

Predicts class labels for the given dataset using the trained random forest.

Args:
- X (numpy.ndarray): The dataset for which to make predictions.
Returns:
- numpy.ndarray: An array of predicted class labels.

`accuracy(self, y_true, y_pred)`

Calculates the accuracy of the model based on true and predicted labels.

Args:
- y_true (numpy.ndarray): True target labels.
- y_pred (numpy.ndarray): Predicted target labels.
Returns:
- float: The accuracy of the predictions.

`confusion_matrix(self, y_true, y_pred)`

Generates a confusion matrix to evaluate the accuracy of the classification.

Args:
- y_true (numpy.ndarray): True target labels.
- y_pred (numpy.ndarray): Predicted target labels.
Returns:
- numpy.ndarray: A confusion matrix.

Random Forest Model

Example Usage

Overview

Hyperparameters

Attributes

Methods

__init__(self, n_trees=100, max_depth=10, min_samples_split=2, criteria='gini')

bootstrap_sample(self, X, y)

most_common_label(self, y)

train(self, X, y)

predict(self, X)

accuracy(self, y_true, y_pred)