
NOTICE: All information contained herein is, and remains
the property of TechnoCore.
The intellectual and technical concepts contained
herein are proprietary to TechnoCore and dissemination of this information or reproduction of this material
is strictly forbidden unless prior written permission is obtained
from TechnoCore.
The ObjML class provides a foundational framework for machine learning operations within the system. It encapsulates functionalities for data preprocessing, model training, prediction, and evaluation, with a focus on integrating seamlessly with various data sources and machine learning algorithms.
The ObjML class supports cost-sensitive classification, particularly through its train_cost_sensitive_classifier method. This is crucial in scenarios where the cost of misclassifying one class is significantly higher than misclassifying another (e.g., in fraud detection, medical diagnosis).
sklearn.linear_model.LogisticRegression)Logistic Regression is a linear model used for binary classification. Despite its name, it is a classification algorithm rather than a regression algorithm. It models the probability that a given input belongs to a particular class.
Role in Cost-Sensitive Classification:
In cost-sensitive scenarios, LogisticRegression can be configured using the class_weight parameter. This parameter assigns different weights to classes, effectively penalizing misclassifications of certain classes more heavily during the model training process. This helps the model to prioritize correctly classifying the more "costly" class.
Key Parameters for Cost-Sensitive Use:
class_weight:
{0: 1, 1: 5} means that misclassifying class 1 is 5 times more costly than misclassifying class 0.solver:
class_weight is used, as it supports L1 and L2 regularization.max_iter:
random_state:
Example Usage (within ObjML.train_cost_sensitive_classifier):
# Example of how LogisticRegression is instantiated within ObjML
model = LogisticRegression(
class_weight={1: 1, 0: 5}, # Example: class 0 is 5 times more costly to misclassify
max_iter=1000,
solver='liblinear',
random_state=42
)
model.fit(X_train, y_train)
This configuration guides the LogisticRegression model to be more sensitive to errors on class 0 (e.g., "bad credit" in the German credit dataset, which was mapped from original class 2), thereby aligning the model's predictions with the business costs of misclassification.
sklearn.ensemble.RandomForestClassifier)Random Forest is an ensemble learning method for classification, regression, and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output is the class selected by most trees.
Role in Cost-Sensitive Classification:
Similar to Logistic Regression, RandomForestClassifier can be made cost-sensitive through its class_weight parameter. This allows the model to give more importance to samples from certain classes, which is particularly useful for imbalanced datasets or when misclassification costs differ.
Key Parameters for Cost-Sensitive Use:
class_weight:
{0: 1, 1: 5} means that misclassifying class 1 is 5 times more costly than misclassifying class 0.n_estimators:
random_state:
Example Usage (within ObjML.train_cost_sensitive_classifier):
# Example of how RandomForestClassifier is instantiated within ObjML
model = RandomForestClassifier(
class_weight={1: 1, 0: 5}, # Example: class 0 is 5 times more costly
random_state=42
)
model.fit(X_train, y_train)
sklearn.ensemble.GradientBoostingClassifier)Gradient Boosting is a powerful ensemble technique that builds a model in a stage-wise fashion, typically using decision trees as weak learners. It iteratively improves by fitting new models to the residuals of previous models.
Role in Cost-Sensitive Classification:
GradientBoostingClassifier does not have a direct class_weight parameter in its constructor. Instead, cost-sensitivity is typically introduced by providing sample_weight to the fit method. By assigning higher weights to samples from more costly classes, the algorithm focuses more on correctly predicting those samples.
Key Parameters for Cost-Sensitive Use:
sample_weight (passed to fit method):
sample_weight is derived from class_weight (e.g., samples from class 0 get a weight of 5, while samples from class 1 get a weight of 1), the boosting algorithm will prioritize reducing errors on the more costly samples.n_estimators:
learning_rate:
n_estimators but can lead to more robust models.random_state:
Example Usage (within ObjML.train_cost_sensitive_classifier):
# Example of how GradientBoostingClassifier is instantiated and fitted within ObjML
model = GradientBoostingClassifier(random_state=42)
# sample_weights would be computed based on class_weights and y_train
model.fit(X_train, y_train, sample_weight=sample_weights)
sklearn.neural_network.MLPClassifier)A Multi-layer Perceptron (MLP) is a type of artificial neural network. It consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer. MLPs are capable of learning non-linear functions and are widely used for classification tasks.
Role in Cost-Sensitive Classification:
Similar to Gradient Boosting, MLPClassifier does not directly accept a class_weight parameter in its constructor. Cost-sensitivity is achieved by passing sample_weight to the fit method. By weighting samples from more costly classes more heavily, the neural network's optimization process (e.g., backpropagation) will focus more on minimizing errors for those samples.
Key Parameters for Cost-Sensitive Use:
sample_weight (passed to fit method):
sample_weight is derived from class_weight, the network's weights and biases will be adjusted more significantly based on the errors made on the more costly samples.hidden_layer_sizes:
max_iter:
early_stopping:
random_state:
early_stopping is true.Example Usage (within ObjML.train_cost_sensitive_classifier):
# Example of how MLPClassifier is instantiated and fitted within ObjML
model = MLPClassifier(
hidden_layer_sizes=(100,),
max_iter=500,
random_state=42,
early_stopping=True
)
# sample_weights would be computed based on class_weights and y_train
model.fit(X_train, y_train, sample_weight=sample_weights)
Weight of Evidence (WOE) and Information Value (IV) are statistical techniques used in credit scoring and risk modeling to:
WOE measures the strength of a categorical variable in separating "good" outcomes from "bad" outcomes.
Formula:
WOE = ln(% of Goods / % of Bads)
Where:
Interpretation:
Example:
Suppose we're analyzing "Employment Status" for credit default prediction:
| Employment Status | Total | Bad (Default) | Good (No Default) | % Bad | % Good | WOE |
|---|---|---|---|---|---|---|
| Full-time | 500 | 50 | 450 | 25% | 56.25% | ln(56.25/25) = 0.81 |
| Part-time | 200 | 80 | 120 | 40% | 15% | ln(15/40) = -0.98 |
| Unemployed | 300 | 70 | 230 | 35% | 28.75% | ln(28.75/35) = -0.20 |
In this example:
IV measures the overall predictive power of a feature. It aggregates WOE across all categories.
Formula:
IV = Σ [(% of Goods - % of Bads) × WOE]
Interpretation Rules:
| IV Range | Predictive Power |
|---|---|
| < 0.02 | Useless |
| 0.02 - 0.1 | Weak |
| 0.1 - 0.3 | Medium |
| 0.3 - 0.5 | Strong |
| > 0.5 | Suspicious (may be overfitting) |
Example Calculation:
Using the employment status example above:
| Category | % Good - % Bad | WOE | IV Contribution |
|---|---|---|---|
| Full-time | 56.25% - 25% = 0.3125 | 0.81 | 0.3125 × 0.81 = 0.253 |
| Part-time | 15% - 40% = -0.25 | -0.98 | -0.25 × (-0.98) = 0.245 |
| Unemployed | 28.75% - 35% = -0.0625 | -0.20 | -0.0625 × (-0.20) = 0.0125 |
Total IV = 0.253 + 0.245 + 0.0125 = 0.51 → Strong predictor (but borderline suspicious)
For multi-class targets (e.g., Credit Score: Good, Standard, Poor), we use a one-vs-rest approach:
# Example: Calculate WOE/IV for "Good" vs. "Standard + Poor"
obj_ml = ObjML()
woe_map_good, iv_good = obj_ml.calculate_woe_iv_multiclass(
df=df,
feature='employment_status',
target='credit_score',
target_class='Good'
)
# Repeat for each class
woe_map_standard, iv_standard = obj_ml.calculate_woe_iv_multiclass(
df=df,
feature='employment_status',
target='credit_score',
target_class='Standard'
)
A credit scorecard translates model coefficients into easy-to-interpret point values. Each feature contributes a certain number of points, and the total score determines the credit decision.
PDO (Points to Double the Odds): How many points it takes to double the odds of being "good"
Base Score: The score at a specific base odds
Base Odds: The odds of being "good" at the base score
from sklearn.linear_model import LogisticRegression
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
# Create scorecard
obj_ml = ObjML()
scorecard = obj_ml.scorecard_scaling(
model=model,
pdo=20,
base_odds=50.0,
base_score=500
)
# Result: {'Intercept': 485.2, 'age': 12.5, 'income': 8.3, ...}
Multi-class scorecards output separate scores for each class, allowing ranking (e.g., "Good", "Standard", "Poor").
This example demonstrates building a 3-class scorecard using the Credit Score Classification dataset (100,000 samples with classes: Good, Standard, Poor).
import sys
import os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# Add paths
base_path = os.getcwd()
paths = ["", "/factory.core", "/factory.service", "/factory.learn"]
for relative_path in paths:
if (base_path + relative_path) not in sys.path:
sys.path.append(base_path + relative_path)
from ObjMLDatasets import ObjMLDatasets
# Initialize and download dataset
obj_datasets = ObjMLDatasets(0)
obj_datasets.download_data(dataset_name="credit_score_classification")
# Load dataset using Kaggle API
df = obj_datasets.load_data_from_kaggle(
dataset_name="credit_score_classification",
kaggle_dataset="parisrohan/credit-score-classification",
kaggle_file="train.csv"
)
print(f"Dataset shape: {df.shape}")
print(f"Target distribution:\n{df['Credit_Score'].value_counts()}")
from ObjML import ObjML
obj_ml = ObjML()
# Select categorical features for WOE transformation
categorical_features = ['Occupation', 'Credit_Mix', 'Payment_of_Min_Amount',
'Payment_Behaviour']
target_column = 'Credit_Score'
# Calculate WOE for each class using one-vs-rest
classes = ['Good', 'Standard', 'Poor']
woe_maps = {}
for class_label in classes:
woe_maps[class_label] = {}
print(f"\n=== Calculating WOE/IV for class: {class_label} ===")
for feature in categorical_features:
woe_map, iv = obj_ml.calculate_woe_iv_multiclass(
df=df,
feature=feature,
target=target_column,
target_class=class_label
)
woe_maps[class_label][feature] = woe_map
print(f"{feature}: IV = {iv:.4f} ", end="")
if iv < 0.02:
print("(Useless)")
elif iv < 0.1:
print("(Weak)")
elif iv < 0.3:
print("(Medium)")
elif iv < 0.5:
print("(Strong)")
else:
print("(Very Strong/Suspicious)")
# Apply WOE transformation for one class (e.g., 'Good')
df_woe = df.copy()
for feature in categorical_features:
woe_col_name = f"{feature}_WOE_Good"
df_woe[woe_col_name] = df_woe[feature].map(woe_maps['Good'][feature])
df_woe[woe_col_name].fillna(0, inplace=True) # Handle unseen categories
# Prepare features
numeric_features = ['Age', 'Annual_Income', 'Monthly_Inhand_Salary',
'Num_Bank_Accounts', 'Num_Credit_Card', 'Interest_Rate',
'Num_of_Loan', 'Delay_from_due_date', 'Num_of_Delayed_Payment',
'Num_Credit_Inquiries', 'Outstanding_Debt',
'Credit_Utilization_Ratio', 'Total_EMI_per_month',
'Amount_invested_monthly', 'Monthly_Balance']
woe_features = [f"{f}_WOE_Good" for f in categorical_features]
all_features = numeric_features + woe_features
X = df_woe[all_features]
y = df_woe[target_column]
# Handle missing values
X = X.fillna(X.median())
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Train one-vs-rest models
models = {}
for class_label in classes:
print(f"\nTraining model for class: {class_label}")
# Create binary target (1 if class_label, 0 otherwise)
y_binary = (y_train == class_label).astype(int)
# Train logistic regression
model = LogisticRegression(
max_iter=1000,
solver='liblinear',
random_state=42
)
model.fit(X_train, y_binary)
models[class_label] = model
# Evaluate on training set
train_accuracy = model.score(X_train, y_binary)
print(f"Training accuracy for {class_label}: {train_accuracy:.4f}")
# Generate scorecard for all classes
scorecard_multiclass = obj_ml.scorecard_scaling_multiclass(
models=models,
pdo=20,
base_odds=50.0,
base_score=500
)
# Display scorecard summary
for class_label, scorecard in scorecard_multiclass.items():
print(f"\n=== Scorecard for {class_label} ===")
print(f"Intercept: {scorecard['Intercept']:.2f}")
print("\nTop 10 influential features:")
# Sort features by absolute point value
feature_points = {k: v for k, v in scorecard.items() if k != 'Intercept'}
sorted_features = sorted(feature_points.items(),
key=lambda x: abs(x[1]),
reverse=True)[:10]
for feature, points in sorted_features:
print(f" {feature}: {points:+.2f} points")
# Apply scorecard to test set
scores = {}
for class_label, scorecard in scorecard_multiclass.items():
# Start with intercept
score = np.full(len(X_test), scorecard['Intercept'])
# Add feature contributions
for feature, points in scorecard.items():
if feature != 'Intercept' and feature in X_test.columns:
score += X_test[feature].values * points
scores[class_label] = score
# Create DataFrame with scores
scores_df = pd.DataFrame(scores, index=X_test.index)
scores_df['Predicted_Class'] = scores_df.idxmax(axis=1) # Highest score wins
scores_df['Actual_Class'] = y_test
print("\n=== Sample Predictions ===")
print(scores_df[['Good', 'Standard', 'Poor', 'Predicted_Class', 'Actual_Class']].head(10))
# Calculate accuracy
accuracy = (scores_df['Predicted_Class'] == scores_df['Actual_Class']).mean()
print(f"\nMulti-class scorecard accuracy: {accuracy:.4f}")
# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_test, scores_df['Predicted_Class'], labels=classes)
print("\n=== Confusion Matrix ===")
print(pd.DataFrame(cm, index=classes, columns=classes))
print("\n=== Classification Report ===")
print(classification_report(y_test, scores_df['Predicted_Class']))
# Analyze score distributions by class
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for idx, class_label in enumerate(classes):
scores_df_class = scores_df[scores_df['Actual_Class'] == class_label]
axes[idx].hist(scores_df_class['Good'], alpha=0.5, bins=30, label='Good Score')
axes[idx].hist(scores_df_class['Standard'], alpha=0.5, bins=30, label='Standard Score')
axes[idx].hist(scores_df_class['Poor'], alpha=0.5, bins=30, label='Poor Score')
axes[idx].set_title(f'Score Distribution for Actual {class_label}')
axes[idx].set_xlabel('Score')
axes[idx].set_ylabel('Frequency')
axes[idx].legend()
plt.tight_layout()
plt.savefig('multiclass_scorecard_distributions.png')
print("\nScore distribution plot saved as 'multiclass_scorecard_distributions.png'")
Class Separation: Good scorecards show clear separation between score distributions for different classes
Feature Importance: Features with larger absolute point values have more influence on the final score
Decision Rules: Instead of binary accept/reject, multi-class scorecards enable:
Interpretability: Stakeholders can understand:
Feature Selection: Only include features with IV > 0.02 (avoid useless predictors)
WOE Stability: Ensure WOE values are monotonic (consistently increasing or decreasing)
Sample Size: Each category should have sufficient samples (at least 5% of total)
Missing Values: Handle with separate category or domain-specific imputation
Validation: Always validate on out-of-time samples to ensure scorecard stability
Documentation: Document:
The ObjML class provides robust model persistence functionality, allowing you to save trained models to the database with full metadata tracking, version control, and easy retrieval. This is essential for production deployments, model monitoring, and regulatory compliance.
Models are stored in the def_ml_models table:
CREATE TABLE def_ml_models (
model_id VARCHAR(36) NOT NULL,
model_name VARCHAR(255) NOT NULL,
version VARCHAR(50) NOT NULL,
package VARCHAR(255) NOT NULL,
model_type VARCHAR(100),
model_binary LONGBLOB NOT NULL,
encoder_binary LONGBLOB,
metadata_json JSON,
feature_names JSON,
training_metrics JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by VARCHAR(255),
is_active BOOLEAN DEFAULT FALSE,
PRIMARY KEY (model_id, package),
UNIQUE KEY unique_name_version (model_name, version, package)
);
Note: The package field is automatically populated using self.get_package() and is part of the primary key to support multi-package environments.
from ObjML import ObjML
from sklearn.linear_model import LogisticRegression
import numpy as np
# Train a model
X_train = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y_train = np.array([0, 1, 0, 1])
model = LogisticRegression()
model.fit(X_train, y_train)
# Save to database
obj_ml = ObjML(0)
model_id = obj_ml.save_model_to_db(
model=model,
model_name="credit_risk_model",
version="v1.0",
feature_names=["age", "income"],
training_metrics={
"accuracy": 0.95,
"precision": 0.93,
"recall": 0.92,
"f1_score": 0.93
}
)
print(f"Model saved with ID: {model_id}")
from sklearn.preprocessing import OneHotEncoder
# Create and fit encoder
encoder = OneHotEncoder()
encoder.fit([['A'], ['B'], ['C']])
# Train model
model = LogisticRegression()
model.fit(X_train, y_train)
# Save model with encoder
model_id = obj_ml.save_model_to_db(
model=model,
model_name="credit_risk_model_with_encoder",
version="v1.0",
feature_names=["category", "amount"],
training_metrics={"accuracy": 0.90},
encoder=encoder,
created_by="data_scientist_1",
is_active=True # Mark as active version
)
# Save v1.0
model_id_v1 = obj_ml.save_model_to_db(
model=model_v1,
model_name="fraud_detection",
version="v1.0",
feature_names=feature_list,
training_metrics={"auc": 0.85}
)
# Later, save v2.0 with improvements
model_id_v2 = obj_ml.save_model_to_db(
model=model_v2,
model_name="fraud_detection",
version="v2.0",
feature_names=feature_list,
training_metrics={"auc": 0.92},
is_active=True # Make v2.0 the active version
)
# Load the most recent version
model, metadata = obj_ml.load_model_from_db(
model_name="credit_risk_model",
version="latest"
)
# Make predictions
X_test = np.array([[2, 3], [4, 5]])
predictions = model.predict(X_test)
# Access metadata
print(f"Model ID: {metadata['model_id']}")
print(f"Model Type: {metadata['model_type']}")
print(f"Features: {metadata['feature_names']}")
print(f"Training Metrics: {metadata['training_metrics']}")
print(f"Created: {metadata['created_at']}")
# Load a specific version
model, metadata = obj_ml.load_model_from_db(
model_name="fraud_detection",
version="v1.0"
)
# Use loaded encoder if available
if metadata['encoder']:
encoder = metadata['encoder']
transformed = encoder.transform(categorical_data)
def predict_credit_risk(customer_data):
"""Production prediction function."""
obj_ml = ObjML(0)
# Load active model
model, metadata = obj_ml.load_model_from_db(
model_name="credit_risk_model",
version="latest"
)
# Prepare features in correct order
features = customer_data[metadata['feature_names']]
# Apply encoder if present
if metadata['encoder']:
features = metadata['encoder'].transform(features)
# Make prediction
prediction = model.predict(features)
probability = model.predict_proba(features)
return {
"prediction": prediction[0],
"probability": probability[0][1],
"model_version": metadata['model_id'],
"model_type": metadata['model_type']
}
# Get all models
models = obj_ml.list_models()
for m in models:
print(f"{m['model_name']} v{m['version']} - {m['model_type']}")
print(f" Created: {m['created_at']}")
print(f" Features: {m['feature_count']}")
print(f" Active: {m['is_active']}")
# Get all versions of a specific model
models = obj_ml.list_models(model_name="fraud_detection")
# Display versions
for m in models:
status = "ACTIVE" if m['is_active'] else ""
print(f" {m['version']} - {m['created_at']} {status}")
# Get 10 most recent models
recent_models = obj_ml.list_models(limit=10)
python factory.core/ObjML.py create-table
# List all models
python factory.core/ObjML.py list-saved-models
# Filter by name
python factory.core/ObjML.py list-saved-models --model credit_risk_model
# Limit results
python factory.core/ObjML.py list-saved-models --limit 5
Note: Package is automatically determined from self.get_package() - no need to specify it.
Output example:
Saved ML Models
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Model Name ┃ Version ┃ Type ┃ Features ┃ Created ┃ Active ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ credit_risk_model │ v2.0 │ LogisticRegression │ 15 │ 2025-12-26 10:30:00 │ ✓ │
│ credit_risk_model │ v1.0 │ LogisticRegression │ 12 │ 2025-12-25 09:15:00 │ │
│ fraud_detection │ v1.5 │ RandomForest... │ 20 │ 2025-12-24 14:20:00 │ ✓ │
└─────────────────────┴─────────┴────────────────────┴──────────┴─────────────────────┴────────┘
Total: 3 model(s)
# Show latest version
python factory.core/ObjML.py show-model credit_risk_model
# Show specific version
python factory.core/ObjML.py show-model fraud_detection --version v1.0
Output example:
Model: credit_risk_model
Model ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Type: LogisticRegression
Created: 2025-12-26 10:30:00
Metadata:
training_date: 2025-12-26T10:30:00.123456
python_version: 3.12.3
sklearn_version: 1.3.0
model_class: LogisticRegression
Features (15):
1. age
2. income
3. credit_score
4. employment_length
5. debt_to_income
... and 10 more
Training Metrics:
accuracy: 0.95
precision: 0.93
recall: 0.92
f1_score: 0.93
from ObjML import ObjML
from ObjMLDatasets import ObjMLDatasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score
# 1. Load dataset
datasets = ObjMLDatasets()
df = datasets.load_dataset("german_credit")
# 2. Prepare data
X = df.drop("class", axis=1)
y = df["class"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# 3. Train model
obj_ml = ObjML(0)
model, metrics, y_pred, X_test, y_test = obj_ml.train_cost_sensitive_classifier(
X=X_train,
y=y_train,
class_weights={0: 1, 1: 5},
model_type="LogisticRegression"
)
# 4. Calculate evaluation metrics
y_pred_test = model.predict(X_test)
training_metrics = {
"accuracy": accuracy_score(y_test, y_pred_test),
"precision": precision_score(y_test, y_pred_test),
"recall": recall_score(y_test, y_pred_test),
"confusion_matrix": metrics["confusion_matrix"]
}
# 5. Save model
model_id = obj_ml.save_model_to_db(
model=model,
model_name="german_credit_model",
version="v1.0",
feature_names=list(X_train.columns),
training_metrics=training_metrics,
created_by="ml_engineer",
is_active=True
)
print(f"Model saved: {model_id}")
# 6. Later: Load and use model
loaded_model, metadata = obj_ml.load_model_from_db(
model_name="german_credit_model",
version="latest"
)
# Make predictions with loaded model
new_predictions = loaded_model.predict(X_test)
print(f"Predictions made using model {metadata['model_id']}")
Version Naming: Use semantic versioning (v1.0, v1.1, v2.0) or date-based versioning (v2025-12-26)
Metadata Tracking: Always include comprehensive training metrics for model comparison
Feature Consistency: Store feature names to ensure correct order during prediction
Encoder Management: Save encoders with models to maintain preprocessing consistency
Active Flag: Use is_active=True for production models
Model Documentation: Document:
Model Monitoring: Regularly check:
Rollback Strategy: Keep previous model versions for quick rollback if needed
Testing: Always test loaded models on validation data before production deployment
Security: Limit database write access to authorized users only
try:
model, metadata = obj_ml.load_model_from_db(
model_name="my_model",
version="v1.0"
)
except ValueError as e:
print(f"Model not found: {e}")
# List available models
models = obj_ml.list_models(model_name="my_model")
print(f"Available versions: {[m['version'] for m in models]}")
If you get warnings about sklearn version mismatch:
model, metadata = obj_ml.load_model_from_db(model_name="my_model")
print(f"Model trained with sklearn {metadata['metadata']['sklearn_version']}")
print(f"Current sklearn version: {sklearn.__version__}")
# Models are generally compatible across minor versions
# For major version differences, consider retraining
For very large models (>100MB):
# Consider:
# 1. Model compression techniques
# 2. Feature selection to reduce model complexity
# 3. Increasing MySQL max_allowed_packet setting
# 4. Using separate file storage with database references
Listendata - Weight of Evidence (WOE) and Information Value (IV) Explained
Towards Data Science - Building a Scorecard using WOE and IV
Analytics Vidhya - WOE and IV in Credit Risk Modeling
Sunscrapers - WOE and IV for Feature Selection
Credit Scoring Toolkit (Naeem Siddiqi)
Scorecard Development Best Practices
FICO Scorecard Development
Scorecardpy - Python Package Documentation
Scikit-learn - Multi-class Classification Strategies
Machine Learning Mastery - One-vs-Rest and One-vs-One
Towards Data Science - Multi-class Logistic Regression
Penn State - Logistic Regression (STAT 504)
Logistic Regression in Credit Risk
Interpreting Logistic Regression Coefficients
Model Risk Management (SR 11-7)
Gini Coefficient and Model Performance
Population Stability Index (PSI)
Kaggle - Credit Scoring Notebooks
GitHub - Credit Risk Modeling
DataCamp - Building Credit Scorecards in Python
Thomas, L.C. - "A survey of credit and behavioural scoring"
Baesens, B. et al. - "Benchmarking state-of-the-art classification algorithms"
Basel II Credit Risk Management
Fair Lending and Model Fairness
FICO Model Documentation Standards
StatQuest with Josh Starmer - Logistic Regression
Krish Naik - Credit Score Classification
Cross Validated (Stack Exchange)
r/MachineLearning - Credit Scoring Discussions
LinkedIn Learning - Credit Risk Analytics
Note: When implementing credit scorecards in production, always consult with legal and compliance teams to ensure adherence to:
Model documentation, validation, and ongoing monitoring are critical for regulatory compliance and ethical AI practices.