lnb package
Submodules
lnb.classifiers module
- lnb.classifiers.drop_zero_cols(X_train: DataFrame, X_test: DataFrame | None = None) tuple
Drops columns from the input DataFrames where all column values are zero.
- Parameters:
X_train (pd.DataFrame) – The training data.
X_test (pd.DataFrame, optional) – The test data, optional.
- Returns:
If X_test is not provided, returns the modified X_train with zero-sum columns dropped. If X_test is provided, returns both X_train and X_test with zero-sum columns dropped.
- Return type:
pd.DataFrame or tuple of pd.DataFrame
- lnb.classifiers.fit_classifier(X_train: DataFrame, y_train: DataFrame, model: str, cv=False)
Trains a classifier based on the specified model type using the provided training data.
Parameters:
- X_train: pd.DataFrame
The feature data for the training set.
- y_train: pd.DataFrame
The target labels for the training set.
- model: str
The type of classifier to train. Supported values are: ‘logistic_regression’, ‘random_forest’, ‘mlp’.
- cv: bool, optional
If True, performs cross-validation during model training to tune hyperparameters. If False, trains the model using fixed hyperparameters (default is False). This is a modification
Returns:
object: A trained classifier object based on the specified model.
- lnb.classifiers.fit_classifiers(X_train: DataFrame, y_train: DataFrame, models: list, cv=False)
_summary_
- Parameters:
X_train (pd.DataFrame) – _description_
y_train (pd.DataFrame) – _description_
models (list) – _description_
cv (bool, optional) – _description_, defaults to False
- Returns:
_description_
- Return type:
_type_
- lnb.classifiers.fit_validate_classifiers(X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame, models: list, cv: bool = False) tuple
Trains and validates multiple classifiers on the provided training and test sets.
Parameters:
- X_train: pd.DataFrame
The feature data for the training set.
- y_train: pd.DataFrame
The target labels for the training set.
- X_test: pd.DataFrame
The feature data for the test set.
- y_test: pd.DataFrame
The target labels for the test set.
- models: list
A list of model names (as strings) to be trained and validated. Supported models are: ‘logistic_regression’, ‘random_forest’, ‘mlp’.
- cv: bool, optional
If True, performs cross-validation during model training to tune hyperparameters. If False, trains the model using fixed hyperparameters (default is False).
Returns:
- tuple: A tuple containing:
A list of trained models.
A list of the results (training and test accuracy, AUC, etc.) for each model.
- lnb.classifiers.scale_features(X_train: DataFrame, X_test: DataFrame | None = None) tuple
Scales the features in X_train by standardizing them. If X_test is provided, it scales the test set using the same mean and standard deviation as X_train.
Parameters: X_train: pd.DataFrame
The training data
- X_test: pd.DataFrame, optional
The test data (default is None)
Returns: If X_test is not provided, returns the standardized X_train. If X_test is provided, returns both the standardized X_train and X_test.
- lnb.classifiers.train_LogisticRegression(X_train: DataFrame, y_train: DataFrame, cv: bool = False) LogisticRegression
Trains a logistic regression model using either a fixed regularization parameter or cross-validation.
Parameters:
- X_train: pd.DataFrame
The feature data for the training set.
- y_train: pd.DataFrame
The target labels for the training set.
- cv: bool, optional
If True, performs cross-validation to find the best regularization parameter. If False, trains the model using a fixed regularization parameter (default is False).
Returns:
LogisticRegression: A trained logistic regression model.
If cv=True, the function also prints the best regularization parameter (C) found during cross-validation.
- lnb.classifiers.train_MLP(X_train: DataFrame, y_train: DataFrame, cv: bool = False) MLPClassifier
Trains a Multi-layer Perceptron (MLP) classifier using either fixed hyperparameters or cross-validation for hyperparameter tuning.
Parameters:
- X_train: pd.DataFrame
The feature data for the training set.
- y_train: pd.DataFrame
The target labels for the training set.
- cv: bool, optional
If True, performs cross-validation to find the best hyperparameters. If False, trains the model using fixed hyperparameters (default is False).
Returns:
MLPClassifier: A trained MLP classifier.
If cv=True, the function performs a grid search to tune the hyperparameters and prints the best parameters found.
- lnb.classifiers.train_RandomForest(X_train: DataFrame, y_train: DataFrame, cv: bool = False) RandomForestClassifier
Trains a random forest classifier using either fixed hyperparameters or cross-validation for hyperparameter tuning.
Parameters:
- X_train: pd.DataFrame
The feature data for the training set.
- y_train: pd.DataFrame
The target labels for the training set.
- cv: bool, optional
- If True, performs cross-validation to find the best hyperparameters.
If False, trains the model using fixed hyperparameters (default is False).
Returns:
RandomForestClassifier: A trained random forest classifier.
If cv=True, the function performs a grid search to tune the hyperparameters and prints the best parameters found.
- lnb.classifiers.validate_clf(clf: ClassifierMixin, X_train: DataFrame, y_train: DataFrame, X_test: DataFrame, y_test: DataFrame) tuple
Evaluates the classifier’s performance on both the training and test datasets. Prints the accuracy and AUC (Area Under the Curve) for both training and test sets.
Parameters:
- clf: ClassifierMixin
The classifier to be evaluated (any estimator with predict and predict_proba methods).
- X_train: pd.DataFrame
The feature data for the training set.
- y_train: pd.DataFrame
The target labels for the training set.
- X_test: pd.DataFrame
The feature data for the test set.
- y_test: pd.DataFrame
pd.DataFrameThe target labels for the test set.
Returns:
- tuple: A tuple containing:
Training accuracy
Training AUC
Test accuracy
Test AUC (if computable)
lnb.data_prep module
- lnb.data_prep.discretize_dataset(df: DataFrame, columns: list) DataFrame
Convert the dataset to one where all categories in categorical columns are integers instead of class name strings
- lnb.data_prep.get_target_record(df: DataFrame, index: int) DataFrame
Given an index, return the 1-record dataframe corresponding to the index
- lnb.data_prep.load_data(path_to_data: str, path_to_metadata: str, cols_to_select: list = ['all'])
- lnb.data_prep.merge_datasets(df_secant: DataFrame) DataFrame
Will merge the list of dataset given in entry into a global dataset
- lnb.data_prep.normalize_cont_cols(df: DataFrame, meta_data: list, df_aux: DataFrame, types: tuple = ('Float',)) DataFrame
- lnb.data_prep.read_data(data_path: str, categorical_cols: list, continuous_cols: list) DataFrame
Read given file_path (csv) and return a pd dataframe. If all categorical, make sure data all column values are strings
- lnb.data_prep.read_metadata(metadata_path: str) tuple
Read metadata from a json file (is necessary for the reprosyn generators)
- lnb.data_prep.select_columns(df: DataFrame, categorical_cols: list, continuous_cols: list, cols_to_select: list, meta_data_og: list) tuple
- lnb.data_prep.split_data(df: DataFrame, path_to_ids: str)
lnb.distance module
lnb.feature_extractors module
lnb.generators module
lnb.mia module
lnb.plots module
lnb.shadow_data module
lnb.utils module
- lnb.utils.blockPrint()
- lnb.utils.enablePrint()
- lnb.utils.ignore_depreciation()
- async lnb.utils.save_metrics_to_file(file_path, data)
- lnb.utils.str2bool(s)
- lnb.utils.str2list(s)