This guide is for users who maintain Feature Store metadata directly in database tables.
It covers:
def_featuredef_featuresdef_feature_dependencydef_feature_expectationdef_feature_versiondef_feature_lineagedef_feature_statisticsAll SQL below uses explicit schema quoting, e.g. `hcplanning.dev`.`def_feature`.
def_feature: One row per feature set (FeatureCode + Package).def_features: One row per feature in that feature set.def_feature_dependency: Directed dependency edges between features.def_feature_expectation: Great Expectations-style rules per feature.def_feature_version: Version snapshots for rollback/audit.def_feature_lineage: Source table/column lineage per feature.def_feature_statistics: Computed quality/profile stats per run.def_feature (header row).def_features rows.def_feature_dependency rows.def_feature_expectation rows.def_featurePrimary key: (FeatureCode, Package)
Important columns:
FeatureCode: feature set key, e.g. handover_corePackage: package key, e.g. HCCOLLECTIONSSourceQuery: source table/view nameSourceQueryPk: source PK column(s), comma-separated for composite keysInputTable1: optional input table; empty means direct mode (features computed on source table)InputFields1: optional TEXT field listing columns to use from InputTable1InputPK1: optional PK column for joining InputTable1InputJoinMode1: join type for InputTable1, e.g. INNER JOIN, LEFT JOINInputTable2, InputFields2, InputPK2, InputJoinMode2: second input table (same pattern)InputTable3, InputFields3, InputPK3, InputJoinMode3: third input table (same pattern)lastcomputed: runtime-updated timestampdef_featuresPrimary key: (FeatureCode, Package, Feature)
Important columns:
Feature: target column name in feature tableFeatureType:
DIRECT_MAPSQL_CASESQL_VALUETEXTSYMBOLICAGGREGATEWINDOWJSON_EXTRACTFeatureType meaning:
DIRECT_MAP: Copies a single source column directly into the feature column.FeatureDefinition='PersonNo'.SQL_CASE: Uses a CASE WHEN ... THEN ... ELSE ... END expression for rule-based mapping.HIGH/MEDIUM/LOW.SQL_VALUE: Uses a general SQL scalar expression evaluated per source row.LOWER(SUBSTRING_INDEX(Email, '@', -1)).TEXT: Writes a constant/literal value for all rows.FeatureDefinition='handover'.SYMBOLIC: Computes from other features and/or source columns, including $feature_name$ references.CASE WHEN $balance_outstanding$ > 5000 THEN 'Y' ELSE 'N' END.AGGREGATE: Computes from a related table via aggregation.agg_expr|related_table|join_key.COUNT(order_id)|data_orders|PersonNo.WINDOW: Uses SQL window functions over the source dataset.ROW_NUMBER() OVER (PARTITION BY Queue_code ORDER BY AxionImportTime DESC).JSON_EXTRACT: Extracts a value from a JSON column.json_column|json_path.metadata|$.score.FeatureDefinition: expression/payload interpreted by typeFeatureDataType: one of
INT, FLOAT, DOUBLE, DECIMAL, VARCHAR, TEXT, DATETIME, DATE, TIMESTAMP, BOOLEAN, JSONcomputeorder: lower runs earlierdef_feature_dependencyPrimary key: (FeatureCode, Package, Feature, DependsOnFeature)
Use when feature expression references other feature columns (e.g. $balance_outstanding$).
def_feature_expectationPrimary key: (FeatureCode, Package, Feature, ExpectationName)
ExpectationConfigJson must be valid JSON string.Severity: usually ERROR, WARNING, or INFO.def_feature_versionPrimary key: (FeatureCode, Package, Version)
Stores snapshot JSON in ConfigSnapshot.
def_feature_lineagePrimary key: (FeatureCode, Package, Feature, SourceTable, SourceColumn)
Tracks column-level impact mapping.
def_feature_statisticsPrimary key: (FeatureCode, Package, Feature, ComputedAt)
Runtime profiling output per feature.
Assume source table: `hcplanning.dev`.`data_handover` and package HCCOLLECTIONS.
INSERT INTO `hcplanning.dev`.`def_feature`
(`FeatureCode`, `Package`, `Module`, `SourceQuery`, `SourceQueryPk`, `InputTable1`)
VALUES
('handover_core', 'HCCOLLECTIONS', 'collections', 'data_handover', 'PersonNo', 'feature_handover_core')
ON DUPLICATE KEY UPDATE
`Module` = VALUES(`Module`),
`SourceQuery` = VALUES(`SourceQuery`),
`SourceQueryPk` = VALUES(`SourceQueryPk`),
`InputTable1` = VALUES(`InputTable1`);
FeatureTypeINSERT INTO `hcplanning.dev`.`def_features`
(`FeatureCode`,`Package`,`Feature`,`Module`,`FeatureType`,`FeatureDefinition`,`FeatureDataType`,`Notes`,`computeorder`)
VALUES
-- 1) DIRECT_MAP: copy source column directly
('handover_core','HCCOLLECTIONS','person_no','collections','DIRECT_MAP','PersonNo','VARCHAR','Direct map of source PK',10),
-- 2) SQL_CASE: CASE expression
('handover_core','HCCOLLECTIONS','collection_bucket','collections','SQL_CASE',
'CASE WHEN CAST(COALESCE(NULLIF(BalOuts, ''''), ''0'') AS DECIMAL(12,2)) >= 5000 THEN ''HIGH'' WHEN CAST(COALESCE(NULLIF(BalOuts, ''''), ''0'') AS DECIMAL(12,2)) >= 1000 THEN ''MEDIUM'' ELSE ''LOW'' END',
'VARCHAR','Bucketized balance',20),
-- 3) SQL_VALUE: generic SQL expression evaluated against source row
('handover_core','HCCOLLECTIONS','email_domain','collections','SQL_VALUE',
'LOWER(SUBSTRING_INDEX(COALESCE(Email, ''''), ''@'', -1))',
'VARCHAR','Email domain extraction',30),
-- 4) TEXT: static text/value assignment
('handover_core','HCCOLLECTIONS','source_system','collections','TEXT','handover','VARCHAR','Static literal value',40),
-- 5) SYMBOLIC: expression using source columns and/or prior features
('handover_core','HCCOLLECTIONS','risk_flag','collections','SYMBOLIC',
'CASE WHEN $balance_outstanding$ > 5000 OR UPPER(COALESCE(Profile, '''')) IN (''HIGH'',''RISK'') THEN ''Y'' ELSE ''N'' END',
'VARCHAR','Uses previously computed feature + source column',50),
-- 6) AGGREGATE: agg_expr|related_table|join_key
('handover_core','HCCOLLECTIONS','orders_count','collections','AGGREGATE',
'COUNT(order_id)|data_orders|PersonNo',
'INT','Aggregate from related table',60),
-- 7) WINDOW: full SQL window definition over source table
('handover_core','HCCOLLECTIONS','row_rank','collections','WINDOW',
'ROW_NUMBER() OVER (PARTITION BY Queue_code ORDER BY AxionImportTime DESC)',
'INT','Window rank within queue',70),
-- 8) JSON_EXTRACT: json_column|json_path
('handover_core','HCCOLLECTIONS','profile_score','collections','JSON_EXTRACT',
'metadata|$.score',
'VARCHAR','Extract score from JSON payload',80),
-- supporting feature used by SYMBOLIC
('handover_core','HCCOLLECTIONS','balance_outstanding','collections','SQL_VALUE',
'CAST(COALESCE(NULLIF(BalOuts, ''''), ''0'') AS DECIMAL(12,2))',
'DECIMAL','Numeric cast of BalOuts',25)
ON DUPLICATE KEY UPDATE
`Module` = VALUES(`Module`),
`FeatureType` = VALUES(`FeatureType`),
`FeatureDefinition` = VALUES(`FeatureDefinition`),
`FeatureDataType` = VALUES(`FeatureDataType`),
`Notes` = VALUES(`Notes`),
`computeorder` = VALUES(`computeorder`);
INSERT INTO `hcplanning.dev`.`def_feature_dependency`
(`FeatureCode`,`Package`,`Feature`,`DependsOnFeature`)
VALUES
('handover_core','HCCOLLECTIONS','risk_flag','balance_outstanding')
ON DUPLICATE KEY UPDATE
`DependsOnFeature` = VALUES(`DependsOnFeature`);
INSERT INTO `hcplanning.dev`.`def_feature_expectation`
(`FeatureCode`,`Package`,`Feature`,`ExpectationName`,`ExpectationConfigJson`,`Severity`,`Notes`)
VALUES
(
'handover_core','HCCOLLECTIONS','balance_outstanding','non_negative_balance',
'{"expectation_type":"expect_column_values_to_be_between","column":"balance_outstanding","min_value":0}',
'ERROR','Balance should be >= 0'
),
(
'handover_core','HCCOLLECTIONS','email_domain','valid_email_domain',
'{"expectation_type":"expect_column_values_to_match_regex","column":"email_domain","regex":"^[a-z0-9.-]+$"}',
'WARNING','Domain format check'
)
ON DUPLICATE KEY UPDATE
`ExpectationConfigJson` = VALUES(`ExpectationConfigJson`),
`Severity` = VALUES(`Severity`),
`Notes` = VALUES(`Notes`);
INSERT INTO `hcplanning.dev`.`def_feature_lineage`
(`FeatureCode`,`Package`,`Feature`,`SourceTable`,`SourceColumn`,`LineageType`)
VALUES
('handover_core','HCCOLLECTIONS','email_domain','data_handover','Email','DIRECT'),
('handover_core','HCCOLLECTIONS','balance_outstanding','data_handover','BalOuts','DIRECT'),
('handover_core','HCCOLLECTIONS','risk_flag','feature_handover_core','balance_outstanding','DERIVED')
ON DUPLICATE KEY UPDATE
`LineageType` = VALUES(`LineageType`);
INSERT INTO `hcplanning.dev`.`def_feature_version`
(`FeatureCode`,`Package`,`Version`,`CreatedBy`,`IsActive`,`Notes`,`ConfigSnapshot`)
VALUES
(
'handover_core','HCCOLLECTIONS',1,'michael@technocore.co.za',1,
'Initial handover_core feature set',
JSON_OBJECT(
'feature_code','handover_core',
'package','HCCOLLECTIONS',
'target_table','feature_handover_core'
)
);
Use comma-separated SourceQueryPk in def_feature:
UPDATE `hcplanning.dev`.`def_feature`
SET `SourceQueryPk` = 'PersonNo,Queue_code'
WHERE `FeatureCode`='handover_core' AND `Package`='HCCOLLECTIONS';
UPDATE `hcplanning.dev`.`def_feature`
SET `InputTable1` = ''
WHERE `FeatureCode`='handover_core' AND `Package`='HCCOLLECTIONS';
Runtime will default to feature_handover_core.
UPDATE `hcplanning.dev`.`def_features`
SET `computeorder` = 5
WHERE `FeatureCode`='handover_core'
AND `Package`='HCCOLLECTIONS'
AND `Feature`='balance_outstanding';
No dedicated Active flag exists. Standard approach is to remove row from def_features and keep historical stats/version rows.
DELETE FROM `hcplanning.dev`.`def_features`
WHERE `FeatureCode`='handover_core'
AND `Package`='HCCOLLECTIONS'
AND `Feature`='row_rank';
SELECT d.*
FROM `hcplanning.dev`.`def_feature_dependency` d
LEFT JOIN `hcplanning.dev`.`def_features` f
ON f.`FeatureCode` = d.`FeatureCode`
AND f.`Package` = d.`Package`
AND f.`Feature` = d.`Feature`
WHERE f.`Feature` IS NULL;
SELECT `FeatureCode`,`Package`,`computeorder`,COUNT(*) AS cnt
FROM `hcplanning.dev`.`def_features`
GROUP BY `FeatureCode`,`Package`,`computeorder`
HAVING COUNT(*) > 1;
SELECT e.*
FROM `hcplanning.dev`.`def_feature_expectation` e
LEFT JOIN `hcplanning.dev`.`def_features` f
ON f.`FeatureCode`=e.`FeatureCode`
AND f.`Package`=e.`Package`
AND f.`Feature`=e.`Feature`
WHERE f.`Feature` IS NULL;
SELECT `FeatureCode`,`Package`,COUNT(*) AS active_cnt
FROM `hcplanning.dev`.`def_feature_version`
WHERE `IsActive` = 1
GROUP BY `FeatureCode`,`Package`
HAVING COUNT(*) > 1;
FeatureCode, Package, Feature lowercase/snake_case for consistency.ON DUPLICATE KEY UPDATE for idempotent deploy scripts.computeorder + dependency rows together; do not rely on only one.SYMBOLIC expressions referencing features, ensure referenced features are computed earlier.INT, FLOAT, DECIMAL, etc.).ExpectationConfigJson before insert.START TRANSACTION;
INSERT INTO `hcplanning.dev`.`def_feature`
(`FeatureCode`,`Package`,`Module`,`SourceQuery`,`SourceQueryPk`,`InputTable1`)
VALUES
('sample_fs','HCCOLLECTIONS','collections','data_handover','PersonNo','feature_sample_fs')
ON DUPLICATE KEY UPDATE
`Module`=VALUES(`Module`),`SourceQuery`=VALUES(`SourceQuery`),`SourceQueryPk`=VALUES(`SourceQueryPk`),`InputTable1`=VALUES(`InputTable1`);
INSERT INTO `hcplanning.dev`.`def_features`
(`FeatureCode`,`Package`,`Feature`,`FeatureType`,`FeatureDefinition`,`FeatureDataType`,`computeorder`,`Notes`)
VALUES
('sample_fs','HCCOLLECTIONS','email_domain','SQL_VALUE','LOWER(SUBSTRING_INDEX(COALESCE(Email, ''''), ''@'', -1))','VARCHAR',10,'email domain'),
('sample_fs','HCCOLLECTIONS','balance_outstanding','SQL_VALUE','CAST(COALESCE(NULLIF(BalOuts, ''''), ''0'') AS DECIMAL(12,2))','DECIMAL',20,'balance'),
('sample_fs','HCCOLLECTIONS','risk_flag','SYMBOLIC','CASE WHEN $balance_outstanding$ > 5000 THEN ''Y'' ELSE ''N'' END','VARCHAR',30,'risk')
ON DUPLICATE KEY UPDATE
`FeatureType`=VALUES(`FeatureType`),`FeatureDefinition`=VALUES(`FeatureDefinition`),`FeatureDataType`=VALUES(`FeatureDataType`),`computeorder`=VALUES(`computeorder`),`Notes`=VALUES(`Notes`);
INSERT INTO `hcplanning.dev`.`def_feature_dependency`
(`FeatureCode`,`Package`,`Feature`,`DependsOnFeature`)
VALUES ('sample_fs','HCCOLLECTIONS','risk_flag','balance_outstanding')
ON DUPLICATE KEY UPDATE `DependsOnFeature`=VALUES(`DependsOnFeature`);
COMMIT;
Unknown column ... in field list during compute:
FeatureDefinition.SourceQuery table.doesn't exist for def_feature* tables:
FeatureDefinition expression invalid for source data.def_feature_dependency and/or adjust computeorder.