lnb package

Submodules

lnb.classifiers module

lnb.classifiers.drop_zero_cols(X_train: DataFrame, X_test: DataFrame | None = None) tuple

Drops columns from the input DataFrames where all column values are zero.

Parameters:
  • X_train (pd.DataFrame) – The training data.

  • X_test (pd.DataFrame, optional) – The test data, optional.

Returns:

If X_test is not provided, returns the modified X_train with zero-sum columns dropped. If X_test is provided, returns both X_train and X_test with zero-sum columns dropped.

Return type:

pd.DataFrame or tuple of pd.DataFrame

lnb.classifiers.fit_classifier(X_train: DataFrame, y_train: DataFrame, model: str, cv=False)

Trains a classifier based on the specified model type using the provided training data.

Parameters:

X_train: pd.DataFrame

The feature data for the training set.

y_train: pd.DataFrame

The target labels for the training set.

model: str

The type of classifier to train. Supported values are: ‘logistic_regression’, ‘random_forest’, ‘mlp’.

cv: bool, optional

If True, performs cross-validation during model training to tune hyperparameters. If False, trains the model using fixed hyperparameters (default is False). This is a modification

Returns:

object: A trained classifier object based on the specified model.

lnb.classifiers.fit_classifiers(X_train: DataFrame, y_train: DataFrame, models: list, cv=False)

_summary_

Parameters:
  • X_train (pd.DataFrame) – _description_

  • y_train (pd.DataFrame) – _description_

  • models (list) – _description_

  • cv (bool, optional) – _description_, defaults to False

Returns:

_description_

Return type:

_type_

lnb.classifiers.fit_validate_classifiers(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame, models: list, cv: bool = False) tuple

Trains and validates multiple classifiers on the provided training and test sets.

Parameters:

X_train: pd.DataFrame

The feature data for the training set.

y_train: pd.DataFrame

The target labels for the training set.

X_test: pd.DataFrame

The feature data for the test set.

y_test: pd.DataFrame

The target labels for the test set.

models: list

A list of model names (as strings) to be trained and validated. Supported models are: ‘logistic_regression’, ‘random_forest’, ‘mlp’.

cv: bool, optional

If True, performs cross-validation during model training to tune hyperparameters. If False, trains the model using fixed hyperparameters (default is False).

Returns:

tuple: A tuple containing:
  • A list of trained models.

  • A list of the results (training and test accuracy, AUC, etc.) for each model.

lnb.classifiers.scale_features(X_train: DataFrame, X_test: DataFrame | None = None) tuple

Scales the features in X_train by standardizing them. If X_test is provided, it scales the test set using the same mean and standard deviation as X_train.

Parameters: X_train: pd.DataFrame

The training data

X_test: pd.DataFrame, optional

The test data (default is None)

Returns: If X_test is not provided, returns the standardized X_train. If X_test is provided, returns both the standardized X_train and X_test.

lnb.classifiers.train_LogisticRegression(X_train: DataFrame, y_train: DataFrame, cv: bool = False) LogisticRegression

Trains a logistic regression model using either a fixed regularization parameter or cross-validation.

Parameters:

X_train: pd.DataFrame

The feature data for the training set.

y_train: pd.DataFrame

The target labels for the training set.

cv: bool, optional

If True, performs cross-validation to find the best regularization parameter. If False, trains the model using a fixed regularization parameter (default is False).

Returns:

LogisticRegression: A trained logistic regression model.

If cv=True, the function also prints the best regularization parameter (C) found during cross-validation.

lnb.classifiers.train_MLP(X_train: DataFrame, y_train: DataFrame, cv: bool = False) MLPClassifier

Trains a Multi-layer Perceptron (MLP) classifier using either fixed hyperparameters or cross-validation for hyperparameter tuning.

Parameters:

X_train: pd.DataFrame

The feature data for the training set.

y_train: pd.DataFrame

The target labels for the training set.

cv: bool, optional

If True, performs cross-validation to find the best hyperparameters. If False, trains the model using fixed hyperparameters (default is False).

Returns:

MLPClassifier: A trained MLP classifier.

If cv=True, the function performs a grid search to tune the hyperparameters and prints the best parameters found.

lnb.classifiers.train_RandomForest(X_train: DataFrame, y_train: DataFrame, cv: bool = False) RandomForestClassifier

Trains a random forest classifier using either fixed hyperparameters or cross-validation for hyperparameter tuning.

Parameters:

X_train: pd.DataFrame

The feature data for the training set.

y_train: pd.DataFrame

The target labels for the training set.

cv: bool, optional
If True, performs cross-validation to find the best hyperparameters.

If False, trains the model using fixed hyperparameters (default is False).

Returns:

RandomForestClassifier: A trained random forest classifier.

If cv=True, the function performs a grid search to tune the hyperparameters and prints the best parameters found.

lnb.classifiers.validate_clf(clf: ClassifierMixin, X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame) tuple

Evaluates the classifier’s performance on both the training and test datasets. Prints the accuracy and AUC (Area Under the Curve) for both training and test sets.

Parameters:

clf: ClassifierMixin

The classifier to be evaluated (any estimator with predict and predict_proba methods).

X_train: pd.DataFrame

The feature data for the training set.

y_train: pd.DataFrame

The target labels for the training set.

X_test: pd.DataFrame

The feature data for the test set.

y_test: pd.DataFrame

pd.DataFrameThe target labels for the test set.

Returns:

tuple: A tuple containing:
  • Training accuracy

  • Training AUC

  • Test accuracy

  • Test AUC (if computable)

lnb.data_prep module

lnb.data_prep.discretize_dataset(df: DataFrame, columns: list) DataFrame

Convert the dataset to one where all categories in categorical columns are integers instead of class name strings

lnb.data_prep.get_target_record(df: DataFrame, index: int) DataFrame

Given an index, return the 1-record dataframe corresponding to the index

lnb.data_prep.load_data(path_to_data: str, path_to_metadata: str, cols_to_select: list = ['all'])
lnb.data_prep.merge_datasets(df_secant: DataFrame) DataFrame

Will merge the list of dataset given in entry into a global dataset

lnb.data_prep.normalize_cont_cols(df: DataFrame, meta_data: list, df_aux: DataFrame, types: tuple = ('Float',)) DataFrame
lnb.data_prep.read_data(data_path: str, categorical_cols: list, continuous_cols: list) DataFrame

Read given file_path (csv) and return a pd dataframe. If all categorical, make sure data all column values are strings

lnb.data_prep.read_metadata(metadata_path: str) tuple

Read metadata from a json file (is necessary for the reprosyn generators)

lnb.data_prep.select_columns(df: DataFrame, categorical_cols: list, continuous_cols: list, cols_to_select: list, meta_data_og: list) tuple
lnb.data_prep.split_data(df: DataFrame, path_to_ids: str)

lnb.distance module

lnb.feature_extractors module

lnb.generators module

lnb.mia module

lnb.plots module

lnb.shadow_data module

lnb.utils module

lnb.utils.blockPrint()
lnb.utils.enablePrint()
lnb.utils.ignore_depreciation()
async lnb.utils.save_metrics_to_file(file_path, data)
lnb.utils.str2bool(s)
lnb.utils.str2list(s)

Module contents