PCA (Principle Component Analysis)

Example Usage


1from mltrain.unsupervised.PCA import PCA
2import numpy as np
3
4# Initialize the model
5pca = PCA(n_components=2)
6
7# Fit the model and transform the data and plot graph too.
8transformed_X = pca.train_transform(X_train, plot_graph=True)
9
10# Get the principal components
11principal_components = pca.pc
12
13
14

Overview

The PCA class implements Principal Component Analysis (PCA), a technique for dimensionality reduction. PCA transforms data into a new coordinate system where the axes (principal components) are ordered by the amount of variance they capture from the data.

Hyperparameters

n_components (int, default=2): The number of principal components to retain after dimensionality reduction.

Attributes

pc (numpy.ndarray): The principal components (eigenvectors) after fitting the model.
mean (numpy.ndarray): The mean of the features in the original data.

Methods

`init(self, n_components=2)`

Initializes the PCA model with the specified number of components.

Args:
- n_components (int): Number of principal components to retain.

`train(self, X)`

Fits the PCA model to the input data.

Args:
- X (numpy.ndarray): The input data to perform PCA on, with shape (n_samples, n_features).
Returns:
- numpy.ndarray: The principal components after fitting the model.
Raises:
- ValueError: If the number of components is greater than the number of features.

`transform(self, X)`

Applies the dimensionality reduction on the input data.

Args:
- X (numpy.ndarray): The input data to transform, with shape (n_samples, n_features).
Returns:
- numpy.ndarray: The data transformed into the principal component space.

`train_transform(self, X, plot_graph=False)`

Fits the PCA model and transforms the input data in one step. Optionally, plots the data in the reduced principal component space.

Args:
- X (numpy.ndarray): The input data to fit and transform, with shape (n_samples, n_features).
- plot_graph (bool, optional): Whether to plot the transformed data. Only works for 1, 2, or 3 components.
Returns:
- numpy.ndarray: The data transformed into the principal component space.