Source: factory.core/ObjMLDatasets.py
Manages the training and testing data for the machine learning models.
| Method | Signature | Description |
|---|---|---|
| get_dataset_config | get_dataset_config(dataset_name: str) |
Returns the configuration for a given dataset. |
| download_data | download_data(dataset_name: str) |
Downloads a dataset and stores it in local.documents/ml/{dataset_name}. |
| load_data_from_url | load_data_from_url(dataset_name: str, data_url: str, column_names: list, doc_url: str = None, sep: str = ',') |
Loads a dataset from a URL. |
| load_data_from_kaggle | load_data_from_kaggle(dataset_name: str, kaggle_dataset: str, kaggle_file: str, column_names: list = None, sep: str = ',') |
Loads a dataset from Kaggle using the Kaggle API. |
| load_data_from_file | load_data_from_file(file_path: str, column_names: list = None, sep: str = ',') |
Loads a dataset from a local file. |
| extract_features | extract_features(dataset_name: str) |
Extracts features from a specified dataset. |
| display_unique_values | display_unique_values(dataset_name: str, column_name: str) |
Displays unique values for a specified column in a dataset. |
| list_featurestore | list_featurestore(package: str = None, dataset_name: str = None) |
Lists all features stored in the def_featurestore table. |
| list_model_evaluations | list_model_evaluations(package: str = None, dataset_name: str = None) |
Lists all model evaluations stored in the def_model_evaluation table. |
| build_scorecard | build_scorecard(dataset_name: str, scorecard_name: str, pdo: int = 20, base_odds: float = 50.0, base_score: int = 500) |
Builds a credit scorecard for a given dataset. |
| train_cost_sensitive_model | train_cost_sensitive_model(dataset_name: str, model_type: str = 'LogisticRegression', class_weights: str = '{1: 1, 0: 5}', test_size: float = 0.3, random_state: int = 42) |
Trains a cost-sensitive classification model for a given dataset. |
| generate_model_plots | generate_model_plots(trained_model: Any, X_test: pd.DataFrame, y_test: pd.Series, y_pred: np.ndarray, dataset_name: str, model_name: str) |
Generates and saves plots illustrating the classification model's performance. |