
NOTICE: All information contained herein is, and remains
the property of TechnoCore.
The intellectual and technical concepts contained
herein are proprietary to TechnoCore and dissemination of this information or reproduction of this material
is strictly forbidden unless prior written permission is obtained
from TechnoCore.
ObjFeatureStoreEditObjFeatureStoreEdit extends ObjFeatureStore with all feature editing, management, persistence, analytics, and YAML import/export capabilities. It provides the full set of CRUD operations for feature definitions, versioning for A/B testing, lineage tracking, usage analytics, quality assessment, documentation generation, and declarative YAML workflows.
Inheriting from ObjFeatureStore, ObjFeatureStoreEdit owns all write/management/persistence operations while the parent handles runtime computation. SQL queries used by this class live in ObjFeatureStoreEdit.yaml.
The parent class (ObjFeatureStore) contains no-op stubs for _store_computed_sql, _track_feature_lineage, and _insert_lineage. When compute methods are invoked via an ObjFeatureStoreEdit instance, the real implementations in this class are used, enabling lineage tracking and SQL persistence automatically. When invoked via a bare ObjFeatureStore, these operations are silently skipped.
The YAML format captures all aspects of a feature set:
feature_code: customer_features
package: prod
module: analytics
source_query: customers
source_query_pk: customer_id
target_table: feature_customer
features:
- feature: age_group
feature_type: SQL_CASE
feature_definition: "CASE WHEN age < 30 THEN 'young' WHEN age < 50 THEN 'middle' ELSE 'senior' END"
feature_data_type: VARCHAR
notes: "Customer age segmentation"
- feature: total_orders
feature_type: AGGREGATE
feature_definition: "COUNT(orders.order_id)|orders|customer_id"
feature_data_type: INT
notes: "Total number of orders per customer"
- feature: lifetime_value
feature_type: SYMBOLIC
feature_definition: "total_orders * avg_order_value"
feature_data_type: FLOAT
notes: "Estimated customer lifetime value"
- feature: credit_limit
feature_type: DIRECT_MAP
feature_definition: "$max_credit$"
feature_data_type: FLOAT
notes: "Credit limit from customer config"
input_table: "customer_config"
input_table_pk: "customer_id"
input_table_null_strategy: "USE_ZERO"
- feature: total_with_tax
feature_type: SYMBOLIC
feature_definition: "st.subtotal * (1 + $tax_rate$)"
feature_data_type: FLOAT
notes: "Total with tax rate applied"
input_table: "global_config"
input_table_constants: '{"tax_rate": "TaxRate"}'
dependencies:
- feature: lifetime_value
depends_on: total_orders
expectations:
- feature: age_group
expectation_name: valid_categories
expectation_config:
expectation_type: expect_column_values_to_be_in_set
column: age_group
value_set: [young, middle, senior]
severity: ERROR
notes: "Age group must be one of three valid categories"
- feature: total_orders
expectation_name: non_negative_orders
expectation_config:
expectation_type: expect_column_values_to_be_between
column: total_orders
min_value: 0
max_value: 10000
severity: ERROR
notes: "Orders must be non-negative and reasonable"
add_feature_entry(feature_code, package, module, source_query, source_query_pk, target_table)Adds or updates a feature set entry in def_feature. This defines the source data and target table for a collection of related features.
Parameters:
feature_code: Unique identifier for the feature setpackage: Package context (e.g., "prod", "test")module: Module name for organizationsource_query: Source table or query namesource_query_pk: Primary key column(s) - comma-separated for composite keystarget_table: Optional target table name (defaults to feature_{feature_code})Example:
editor = ObjFeatureStoreEdit()
editor.add_feature_entry(
"customer_features",
"prod",
"analytics",
"customers",
"customer_id",
"feature_customer"
)
add_feature_definition(feature_code, package, feature, feature_type, feature_definition, feature_data_type, notes, input_table, input_table_pk, input_table_null_strategy, input_table_constants)Adds or updates an individual feature definition in def_features.
Feature Types:
DIRECT_MAP: Direct column mapping from source tableSQL_CASE: SQL CASE/IF expressionsSYMBOLIC: Mathematical expressions with column referencesAGGREGATE: Aggregations from related tables (SUM, COUNT, AVG, etc.)WINDOW: Window functions (ROW_NUMBER, RANK, LAG, etc.)JSON_EXTRACT: Extract values from JSON columnsInput Table Parameters (Optional):
input_table: Name of additional table to LEFT JOIN for enrichmentinput_table_pk: Join key column (defaults to source PK if empty)input_table_null_strategy: How to handle NULLs - KEEP_NULL, USE_ZERO, or SKIP_COMPUTE (default: KEEP_NULL)input_table_constants: JSON string mapping constant placeholders to input table columnsBasic Example:
editor.add_feature_definition(
"customer_features",
"prod",
"total_orders",
FeatureTypeEnum.AGGREGATE,
"COUNT(orders.order_id)|orders|customer_id",
FeatureDataTypeEnum.INT,
"Total number of orders per customer"
)
Input Table Example:
import json
# Using placeholders for row-level input table columns
editor.add_feature_definition(
"customer_features",
"prod",
"credit_limit",
FeatureTypeEnum.DIRECT_MAP,
"$max_credit$", # Placeholder for input table column
FeatureDataTypeEnum.FLOAT,
notes="Customer credit limit from config",
input_table="customer_config",
input_table_pk="customer_id"
)
# Using constants for global configuration
editor.add_feature_definition(
"customer_features",
"prod",
"total_with_tax",
FeatureTypeEnum.SYMBOLIC,
"st.subtotal * (1 + $tax_rate$)",
FeatureDataTypeEnum.FLOAT,
notes="Total with tax applied",
input_table="global_config",
input_table_constants=json.dumps({
"tax_rate": "TaxRate" # Maps $tax_rate$ to global_config.TaxRate
})
)
See ObjFeatureStore.md "Input Table Feature" section for comprehensive documentation on input tables, placeholders, constants, and performance optimization.
add_expectation_definition(feature_code, package, feature, expectation_name, expectation_config_json, severity, notes)Adds a Great Expectations validation rule for a feature.
Example:
editor.add_expectation_definition(
"customer_features",
"prod",
"total_orders",
"min_orders_check",
'{"expectation_type": "expect_column_values_to_be_between", "column": "total_orders", "min_value": 0, "max_value": 1000}',
severity="ERROR",
notes="Orders must be non-negative and reasonable"
)
add_feature_dependency(feature_code, package, feature, depends_on_feature)Defines that one feature must be computed before another.
Example:
editor.add_feature_dependency(
"customer_features",
"prod",
"lifetime_value",
"total_orders"
)
create_feature_version(feature_code, package, created_by="", notes="")Creates a versioned snapshot of all feature definitions as JSON. Returns the new version number.
Example:
version = editor.create_feature_version(
"customer_features",
"prod",
created_by="data_scientist@company.com",
notes="Added lifetime value features for churn model v2"
)
print(f"Created version {version}")
activate_version(feature_code, package, version)Activates a specific feature version. Only one version can be active at a time.
get_active_version(feature_code, package)Returns the currently active version number.
track_feature_lineage(feature_code, package, feature, source_table, source_column, lineage_type="DIRECT")Records the source table and column that a feature depends on.
get_feature_lineage(feature_code, package, feature)Returns lineage records (source table, column, type) for a feature.
get_affected_features(source_table, source_column)Finds all features that depend on a specific source column. Critical for impact analysis when schema changes are planned.
Example:
affected = editor.get_affected_features("customers", "email")
print(f"Features using customer.email: {affected}")
generate_feature_documentation(feature_code, package, include_stats=True)Generates comprehensive markdown documentation for a feature set including feature definitions, types, dependencies, lineage, and statistics.
Example:
doc = editor.generate_feature_documentation("customer_features", "prod")
with open("customer_features_docs.md", "w") as f:
f.write(doc)
save_to_yaml(feature_code, package, output_file)Exports a complete feature set to a YAML file. Creates parent directories automatically if they don't exist.
Parameters:
feature_code: The FeatureCode to exportpackage: The Package to exportoutput_file: Path to output YAML file (absolute or relative)Exports:
Example:
from ObjFeatureStoreEdit import ObjFeatureStoreEdit
editor = ObjFeatureStoreEdit()
editor.save_to_yaml(
"customer_features",
"prod",
"features/customer_features.yaml"
)
Use Cases:
load_from_yaml(input_file, overwrite=False)Imports a complete feature set from a YAML file. Validates required fields and handles existing definitions based on the overwrite parameter.
Parameters:
input_file: Path to input YAML fileoverwrite: If True, updates existing definitions. If False, skips existing (default)Behavior:
Example:
editor = ObjFeatureStoreEdit()
# Initial load - creates new features
editor.load_from_yaml("features/customer_features.yaml")
# Update existing features with new definitions
editor.load_from_yaml("features/customer_features_v2.yaml", overwrite=True)
Validation:
feature_code, package, module, source_query, source_query_pkFileNotFoundError if input file doesn't existValueError if required fields are missingUse Cases:
ObjFeatureStoreEdit.py provides CLI commands for feature editing and management.
Usage: python factory.core/ObjFeatureStoreEdit.py [COMMAND]
For runtime commands (compute, validate, statistics), see ObjFeatureStore.py.
add-entry --feature-code CODE --package PKG --module MODULE [--source-query QUERY] [--source-query-pk PK] [--target-table TABLE]
Adds a new feature set entry to def_feature.
add-feature --feature-code CODE --package PKG --feature NAME --feature-type TYPE --feature-definition DEF --feature-data-type DTYPE [--notes NOTES]
Adds or updates a feature definition.
add-expectation --feature-code CODE --package PKG --feature NAME --expectation-name NAME --expectation-config-json JSON [--severity LEVEL] [--notes NOTES]
Adds a Great Expectations validation rule.
add-dependency --feature-code CODE --package PKG --feature NAME --depends-on NAME
Adds a dependency between two features.
load --feature-code CODE --package PKG
Loads and displays feature set definition and all features.
display --feature-code CODE --package PKG
Displays the feature set in a rich table format.
save-yaml --feature-code CODE --package PKG --output-file FILE
Exports a feature set to YAML file.
load-yaml --input-file FILE [--overwrite]
Imports a feature set from YAML file. Use --overwrite to update existing definitions.
create-version --feature-code CODE --package PKG [--created-by USER] [--notes NOTES]
Creates a versioned snapshot of feature definitions.
activate-ver --feature-code CODE --package PKG --version VER
Activates a specific feature version.
generate-docs --feature-code CODE --package PKG
Generates markdown documentation for a feature set.
show-column-impact --feature-code CODE --package PKG --column COL
Shows all features that depend on a specific source column.
show-feature-lineage --feature-code CODE --package PKG --feature NAME
Shows all source columns that a specific feature depends on.
export-lineage --feature-code CODE --package PKG [--format text|json] [--output FILE]
Exports complete lineage for a feature set as JSON or text.
usage-analytics --feature-code CODE --package PKG [--days 30] [--output FILE]
Generates JSON report of feature usage analytics over specified days.
assess-quality --feature-code CODE --package PKG [--feature NAME] [--output FILE]
Assesses feature quality across 5 dimensions and generates JSON report.
list-input-tables --feature-code CODE --package PKG
Lists all features using input tables with their configuration.
validate-input-tables --feature-code CODE --package PKG
Validates all input table configurations for a feature set.
input-table-stats --feature-code CODE --package PKG
Shows statistics about input table usage for a feature set.
input_table_metrics --feature-code CODE --package PKG
Computes features and displays detailed performance metrics for input table operations.
# 1. Add feature set entry
python factory.core/ObjFeatureStoreEdit.py add-entry \
--feature-code customer_features \
--package prod \
--module analytics \
--source-query customers \
--source-query-pk customer_id
# 2. Add features
python factory.core/ObjFeatureStoreEdit.py add-feature \
--feature-code customer_features \
--package prod \
--feature age_group \
--feature-type SQL_CASE \
--feature-definition "CASE WHEN age < 30 THEN 'young' ELSE 'senior' END" \
--feature-data-type VARCHAR
# 3. Add dependency
python factory.core/ObjFeatureStoreEdit.py add-dependency \
--feature-code customer_features \
--package prod \
--feature lifetime_value \
--depends-on total_orders
# 4. Export to YAML for version control
python factory.core/ObjFeatureStoreEdit.py save-yaml \
--feature-code customer_features \
--package prod \
--output-file features/customer_features.yaml
# 5. Import from YAML
python factory.core/ObjFeatureStoreEdit.py load-yaml \
--input-file features/customer_features.yaml \
--overwrite
# 6. Create a version snapshot
python factory.core/ObjFeatureStoreEdit.py create-version \
--feature-code customer_features \
--package prod \
--notes "Initial release"
# 7. Generate documentation
python factory.core/ObjFeatureStoreEdit.py generate-docs \
--feature-code customer_features \
--package prod
ObjFeatureStoreEdit enables a complete GitOps workflow for feature engineering:
# Developer creates features in dev database using ObjFeatureStore CLI
python factory.core/ObjFeatureStore.py add-feature \
--feature-code customer_features \
--package dev \
--feature age_group \
--feature-type SQL_CASE \
--feature-definition "CASE WHEN age < 30 THEN 'young' WHEN age < 50 THEN 'middle' ELSE 'senior' END" \
--feature-data-type VARCHAR
# Export to YAML for version control
python factory.core/ObjFeatureStoreEdit.py save-yaml \
--feature-code customer_features \
--package dev \
--output-file features/customer_features.yaml
# Commit to Git
git add features/customer_features.yaml
git commit -m "Add age_group feature for customer segmentation"
git push origin feature/add-age-group
# Create pull request in GitHub/Bitbucket
# Reviewers examine YAML changes:
# - Feature definitions are readable
# - Dependencies are correct
# - Validation rules are appropriate
# - No sensitive data is exposed
# After PR approval, deploy to staging
git checkout develop
git merge feature/add-age-group
# Import to staging database
python factory.core/ObjFeatureStoreEdit.py load-yaml \
--input-file features/customer_features.yaml \
--overwrite
# Compute and validate features
python factory.core/ObjFeatureStore.py compute-all-features \
--feature-code customer_features \
--package staging \
--batch
python factory.core/ObjFeatureStore.py validate-features \
--feature-code customer_features \
--package staging
# After staging validation, deploy to production
git checkout main
git merge develop
# Import to production database
python factory.core/ObjFeatureStoreEdit.py load-yaml \
--input-file features/customer_features.yaml \
--overwrite
# Compute features
python factory.core/ObjFeatureStore.py compute-all-features \
--feature-code customer_features \
--package prod \
--batch
features/
├── customer_features.yaml
├── product_features.yaml
├── transaction_features.yaml
└── archived/
├── customer_features_v1.yaml
└── customer_features_v2.yaml
Backup Before Updates: Export current state before importing changes
python factory.core/ObjFeatureStoreEdit.py save-yaml \
--feature-code customer_features \
--package prod \
--output-file backups/customer_features_$(date +%Y%m%d).yaml
python factory.core/ObjFeatureStoreEdit.py load-yaml \
--input-file features/customer_features_new.yaml \
--overwrite
Test in Dev First: Always test YAML imports in dev before staging/prod
Use Overwrite Carefully: Default overwrite=False prevents accidental updates
Validate After Import: Run feature computation and validation after every import
Use package-specific YAML files for environment-specific configurations:
# features/customer_features_dev.yaml
feature_code: customer_features
package: dev
module: analytics
source_query: customers_sample # Smaller dataset in dev
# ...
# features/customer_features_prod.yaml
feature_code: customer_features
package: prod
module: analytics
source_query: customers # Full dataset in prod
# ...
Never include sensitive data in YAML files:
Pros:
Cons:
Pros:
Cons:
Recommendation: Use both approaches together:
ObjFeatureStore CLI for rapid iteration in devObjFeatureStoreEdit for promotion to staging/prodMissing Required Field:
# Raises: ValueError: Missing required field: source_query
editor.load_from_yaml("incomplete_features.yaml")
File Not Found:
# Raises: FileNotFoundError: Input file not found: nonexistent.yaml
editor.load_from_yaml("nonexistent.yaml")
Invalid Feature Type:
# Raises: ValueError: 'INVALID_TYPE' is not a valid FeatureTypeEnum
# YAML contains: feature_type: INVALID_TYPE
editor.load_from_yaml("invalid_features.yaml")
Feature Set Not Found:
# Logs error: Feature entry not found for nonexistent_feature/prod
editor.save_to_yaml("nonexistent_feature", "prod", "output.yaml")
Example GitHub Actions workflow for feature deployment:
name: Deploy Features
on:
push:
branches: [main]
paths:
- 'features/**'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: '3.12'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Deploy customer features
run: |
python factory.core/ObjFeatureStoreEdit.py load-yaml \
--input-file features/customer_features.yaml \
--overwrite
- name: Compute features
run: |
python factory.core/ObjFeatureStore.py compute-all-features \
--feature-code customer_features \
--package prod \
--batch
- name: Validate features
run: |
python factory.core/ObjFeatureStore.py validate-features \
--feature-code customer_features \
--package prod
ObjFeatureStoreEdit inherits thread-safe database operations from ObjFeatureStore and ObjData. File I/O operations (reading/writing YAML) are not thread-safe and should not be executed concurrently on the same files.
SQL queries for ObjFeatureStoreEdit are defined in ObjFeatureStoreEdit.yaml. The MRO-based query loading in ObjData.load_queries() merges queries from both ObjFeatureStore.yaml (parent) and ObjFeatureStoreEdit.yaml (child), so all queries are available at runtime.
Table schemas are defined in ObjFeatureStore.yaml only. The __init__ method calls create_tables_from_yaml("ObjFeatureStore") to load schemas from the parent YAML.
ObjFeatureStore: Parent class providing runtime computation, validation, and statisticsObjData: Base class for database operationsObjEnum: Defines FeatureTypeEnum and FeatureDataTypeEnum