lnb package

Submodules

lnb.classifiers module

lnb.classifiers.drop_zero_cols(X_train: DataFrame, X_test: DataFrame | None = None) → tuple

Drops columns from the input DataFrames where all column values are zero.

Parameters:

X_train (pd.DataFrame) – The training data.
X_test (pd.DataFrame, optional) – The test data, optional.

Returns:

If X_test is not provided, returns the modified X_train with zero-sum columns dropped. If X_test is provided, returns both X_train and X_test with zero-sum columns dropped.

Return type:

pd.DataFrame or tuple of pd.DataFrame

lnb.classifiers.fit_classifier(X_train: DataFrame, y_train: DataFrame, model: str, cv=False)

Trains a classifier based on the specified model type using the provided training data.

Parameters:

X_train: pd.DataFrame: The feature data for the training set.
y_train: pd.DataFrame: The target labels for the training set.
model: str: The type of classifier to train. Supported values are: ‘logistic_regression’, ‘random_forest’, ‘mlp’.
cv: bool, optional: If True, performs cross-validation during model training to tune hyperparameters. If False, trains the model using fixed hyperparameters (default is False). This is a modification

Returns:

object: A trained classifier object based on the specified model.

lnb.classifiers.fit_classifiers(X_train: DataFrame, y_train: DataFrame, models: list, cv=False)

_summary_

Parameters:

X_train (pd.DataFrame) – _description_
y_train (pd.DataFrame) – _description_
models (list) – _description_
cv (bool, optional) – _description_, defaults to False

Returns:

_description_

Return type:

_type_

lnb.classifiers.fit_validate_classifiers(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame, models: list, cv: bool = False) → tuple

Trains and validates multiple classifiers on the provided training and test sets.

Parameters:

X_train: pd.DataFrame: The feature data for the training set.
y_train: pd.DataFrame: The target labels for the training set.
X_test: pd.DataFrame: The feature data for the test set.
y_test: pd.DataFrame: The target labels for the test set.
models: list: A list of model names (as strings) to be trained and validated. Supported models are: ‘logistic_regression’, ‘random_forest’, ‘mlp’.
cv: bool, optional: If True, performs cross-validation during model training to tune hyperparameters. If False, trains the model using fixed hyperparameters (default is False).

Returns:

tuple: A tuple containing:

A list of trained models.
A list of the results (training and test accuracy, AUC, etc.) for each model.

lnb.classifiers.scale_features(X_train: DataFrame, X_test: DataFrame | None = None) → tuple

Scales the features in X_train by standardizing them. If X_test is provided, it scales the test set using the same mean and standard deviation as X_train.

Parameters: X_train: pd.DataFrame

The training data

X_test: pd.DataFrame, optional: The test data (default is None)

Returns: If X_test is not provided, returns the standardized X_train. If X_test is provided, returns both the standardized X_train and X_test.

lnb.classifiers.train_LogisticRegression(X_train: DataFrame, y_train: DataFrame, cv: bool = False) → LogisticRegression

Trains a logistic regression model using either a fixed regularization parameter or cross-validation.

Parameters:

X_train: pd.DataFrame: The feature data for the training set.
y_train: pd.DataFrame: The target labels for the training set.
cv: bool, optional: If True, performs cross-validation to find the best regularization parameter. If False, trains the model using a fixed regularization parameter (default is False).

Returns:

LogisticRegression: A trained logistic regression model.

If cv=True, the function also prints the best regularization parameter (C) found during cross-validation.

lnb.classifiers.train_MLP(X_train: DataFrame, y_train: DataFrame, cv: bool = False) → MLPClassifier

Trains a Multi-layer Perceptron (MLP) classifier using either fixed hyperparameters or cross-validation for hyperparameter tuning.

Parameters:

X_train: pd.DataFrame: The feature data for the training set.
y_train: pd.DataFrame: The target labels for the training set.
cv: bool, optional: If True, performs cross-validation to find the best hyperparameters. If False, trains the model using fixed hyperparameters (default is False).

Returns:

MLPClassifier: A trained MLP classifier.

If cv=True, the function performs a grid search to tune the hyperparameters and prints the best parameters found.

lnb.classifiers.train_RandomForest(X_train: DataFrame, y_train: DataFrame, cv: bool = False) → RandomForestClassifier

Trains a random forest classifier using either fixed hyperparameters or cross-validation for hyperparameter tuning.

Parameters:

X_train: pd.DataFrame

The feature data for the training set.

y_train: pd.DataFrame

The target labels for the training set.

cv: bool, optional

If True, performs cross-validation to find the best hyperparameters.: If False, trains the model using fixed hyperparameters (default is False).

Returns:

RandomForestClassifier: A trained random forest classifier.

If cv=True, the function performs a grid search to tune the hyperparameters and prints the best parameters found.

lnb.classifiers.validate_clf(clf: ClassifierMixin, X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame) → tuple

Evaluates the classifier’s performance on both the training and test datasets. Prints the accuracy and AUC (Area Under the Curve) for both training and test sets.

Parameters:

clf: ClassifierMixin: The classifier to be evaluated (any estimator with predict and predict_proba methods).
X_train: pd.DataFrame: The feature data for the training set.
y_train: pd.DataFrame: The target labels for the training set.
X_test: pd.DataFrame: The feature data for the test set.
y_test: pd.DataFrame: pd.DataFrameThe target labels for the test set.

Returns:

tuple: A tuple containing:

Training accuracy
Training AUC
Test accuracy
Test AUC (if computable)

lnb.data_prep module

lnb.data_prep.discretize_dataset(df: DataFrame, columns: list) → DataFrame: Convert the dataset to one where all categories in categorical columns are integers instead of class name strings

lnb.data_prep.get_target_record(df: DataFrame, index: int) → DataFrame: Given an index, return the 1-record dataframe corresponding to the index

lnb.data_prep.load_data(path_to_data: str, path_to_metadata: str, cols_to_select: list = ['all'])

lnb.data_prep.merge_datasets(df_secant: DataFrame) → DataFrame: Will merge the list of dataset given in entry into a global dataset

lnb.data_prep.normalize_cont_cols(df: DataFrame, meta_data: list, df_aux: DataFrame, types: tuple = ('Float',)) → DataFrame

lnb.data_prep.read_data(data_path: str, categorical_cols: list, continuous_cols: list) → DataFrame: Read given file_path (csv) and return a pd dataframe. If all categorical, make sure data all column values are strings

lnb.data_prep.read_metadata(metadata_path: str) → tuple: Read metadata from a json file (is necessary for the reprosyn generators)

lnb.data_prep.select_columns(df: DataFrame, categorical_cols: list, continuous_cols: list, cols_to_select: list, meta_data_og: list) → tuple

lnb.data_prep.split_data(df: DataFrame, path_to_ids: str)

lnb.distance module

lnb.feature_extractors module

lnb.generators module

lnb.mia module

lnb.plots module

lnb.shadow_data module

lnb.utils module

lnb.utils.blockPrint()

lnb.utils.enablePrint()

lnb.utils.ignore_depreciation()

async lnb.utils.save_metrics_to_file(file_path, data)

lnb.utils.str2bool(s)

lnb.utils.str2list(s)

lnb package

Submodules

lnb.classifiers module

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

Parameters:

Returns:

lnb.data_prep module

lnb.distance module

lnb.feature_extractors module

lnb.generators module

lnb.mia module

lnb.plots module

lnb.shadow_data module

lnb.utils module

Module contents