NOTICE: All information contained herein is, and remains
the property of TechnoCore Automate.
Updated : 2026-03-16
ObjDataTransfer handles structured data movement between any combination
of supported database backends. It supports both the workflow-driven
transfer / transfer_structure methods (configuration stored in
def_datatransfer) and the ad-hoc transfer_query method which
requires no database configuration.
ObjDataTransfer supports 18 database and storage backends. Each can be
used as a source, a target, or both — unless marked source-only.
The default and most mature backend. Uses the pymysql driver.
MariaDB is treated as MySQL-compatible throughout. Connection identifiers
"", "primary", "mariadb", or "mysql" all resolve to the primary
MariaDB instance configured in config.yaml. Supports both local Docker
containers (via terraform.mariadb) and remote/cloud-hosted instances.
TiDB Cloud Serverless is a MySQL-compatible distributed database. Uses
the pymysql driver with mandatory TLS (certifi CA bundle). Connection
identifiers "tidb" or "tidbcloud" resolve credentials via the 3-tier
provider lookup. Default port is 4000.
Uses the pymssql driver. Connection identifiers "mssql" or
"sqlserver" resolve credentials from database.connections.mssql or
terraform.mssql. Supports local Docker instances and cloud-hosted
instances (Azure SQL, AWS RDS).
Uses the psycopg2 driver. Connection identifiers "postgres",
"postgresql", or "neon" resolve via the 3-tier provider lookup.
Supports optional SSL (sslmode=require) and channel_binding for
Neon cloud. Works with local PostgreSQL and any cloud-hosted instance
(Neon, Supabase, AWS RDS, Azure).
Uses the pymongo driver. Connection identifiers "mongo" or
"mongodb" resolve credentials from database.connections.mongo or
terraform.mongo. Connects via standard mongodb:// URI. Works with
local Docker containers and cloud-hosted instances (MongoDB Atlas).
Uses the built-in sqlite3 module. Connection identifier is a file
path ending in .sqlite or .db, or ":memory:" for an in-memory
database. No network connectivity — files must be on the local
filesystem.
Turso is a cloud-hosted libSQL (SQLite fork) edge database. Uses the
libsql_experimental driver. Connection identifier "turso" resolves
URL and auth token via the 3-tier provider lookup. A direct
"libsql://..." URL can also be passed. Requires a temporary local
file for WAL-based sync.
Uses the influxdb_client driver. Connection identifier "influxdb"
resolves to a local instance (via terraform.influxdb or
database.connections.influxdb). "influxdb_cloud" resolves to a
cloud-hosted instance via the 3-tier provider lookup. Source queries
use Flux query syntax.
Uses the redis driver. Connection identifiers "redis" or
"upstash" resolve credentials via the 3-tier provider lookup, with
terraform.redis as the local fallback. Supports optional SSL for
cloud providers like Upstash. Data is stored as Redis hashes keyed by
{table}:{pk_values}.
Uses the pyodbc driver. Accepts a full ODBC connection string
("DRIVER={...};..."), a registered DSN ("DSN=..."), or the named
identifier "odbc" which reads base.odbc.connection_string from
config.yaml. Can connect to any ODBC-accessible data source, local
or remote.
File-based storage using plain YAML files. Connection identifier is a
file path ending in .yaml or .yml. No real connection object is
created — records are read from and written to the file directly. No
network connectivity.
Reads SAS data files using pandas.read_sas(). Supports both
.sas7bdat (SAS7BDAT binary format) and .xpt (SAS Transport / XPORT
format). Connection identifier is a file path ending in .sas7bdat or
.xpt. This is a source-only backend — SAS files cannot be used as a
transfer target. An optional "field=value" query string filters
records after reading. No additional dependencies beyond pandas.
Apache Parquet columnar format. Uses pandas.read_parquet() and
DataFrame.to_parquet() via the pyarrow engine. Connection identifier
is a file path ending in .parquet. Supports both read and write — can
be used as a source or target. Parquet is the standard interchange
format for data lakes, Spark, and dbt pipelines. An optional
"field=value" query string filters records when used as a source.
Apache Arrow IPC format (Feather v2). Uses pandas.read_feather() and
DataFrame.to_feather() via pyarrow. Connection identifier is a file
path ending in .feather. Supports both read and write. Feather is
optimized for fast inter-process data exchange and is significantly
faster than CSV or Parquet for local workflows. An optional
"field=value" query string filters records when used as a source.
Stata .dta data files. Uses pandas.read_stata() and
DataFrame.to_stata(). Connection identifier is a file path ending in
.dta. Supports both read and write — no additional dependencies beyond
pandas. Common in economics, social science, and policy research. An
optional "field=value" query string filters records when used as a
source.
SPSS .sav data files. Uses pandas.read_spss() which requires the
pyreadstat package. Connection identifier is a file path ending in
.sav. This is a source-only backend — SPSS files cannot be used as a
transfer target. Common in survey analytics and research. An optional
"field=value" query string filters records after reading.
Apache ORC (Optimized Row Columnar) format. Uses pandas.read_orc()
via pyarrow. Connection identifier is a file path ending in .orc.
This is a source-only backend — ORC files cannot be used as a transfer
target. ORC is common in Hadoop/Hive ecosystems. An optional
"field=value" query string filters records after reading.
Fixed-width text files as produced by mainframes and legacy banking
systems. Uses pandas.read_fwf() which auto-detects column boundaries.
Connection identifier is a file path ending in .fwf or .dat. This
is a source-only backend. No additional dependencies beyond pandas. An
optional "field=value" query string filters records after reading.
transfer_queryCopies the result of any source query into a target table with upsert
semantics. No configuration table required — all parameters are passed
directly.
count = dt.transfer_query(
source_connection="primary", # MariaDB primary
source_query="SELECT * FROM def_service",
target_connection="neon", # Neon (cloud Postgres)
target_table="def_service",
primary_keys=["ServiceCode", "Package"],
create_target=True, # auto-create table if missing
)
| Identifier | Database | Credential source |
|---|---|---|
"" / "primary" |
MariaDB primary | database.connections.primary → flat keys fallback |
"mariadb" / "mysql" |
MariaDB (local docker) | database.connections.mariadb → terraform.mariadb |
"tidb" / "tidbcloud" |
TiDB Cloud Serverless (MySQL-compatible) | 3-tier lookup — see below |
"mssql" / "sqlserver" |
SQL Server | database.connections.mssql → terraform.mssql |
"neon" / "postgres" / "postgresql" |
PostgreSQL / Neon cloud | 3-tier lookup — see below |
"mongo" / "mongodb" |
MongoDB | database.connections.mongo → terraform.mongo |
"redis" / "upstash" |
Redis / Upstash cloud | 3-tier lookup — see below |
"turso" |
Turso (cloud libSQL) | 3-tier lookup — see below |
"libsql://..." |
Turso (direct URL) | URL used directly |
"influxdb" |
InfluxDB local | database.connections.influxdb → terraform.influxdb |
"influxdb_cloud" |
InfluxDB Cloud | 3-tier lookup — see below |
"/path/file.db" / ":memory:" |
SQLite | File path |
"/path/file.yaml" |
YAML file | File path |
"/path/file.sas7bdat" |
SAS data file (source only) | File path |
"/path/file.xpt" |
SAS transport file (source only) | File path |
"/path/file.parquet" |
Apache Parquet | File path |
"/path/file.feather" |
Feather / Arrow IPC | File path |
"/path/file.dta" |
Stata data file | File path |
"/path/file.sav" |
SPSS data file (source only) | File path |
"/path/file.orc" |
Apache ORC (source only) | File path |
"/path/file.fwf" / ".dat" |
Fixed-width format (source only) | File path |
"DRIVER={...};..." |
Any ODBC source | Full ODBC connection string passed directly |
"DSN=..." |
Any ODBC DSN | System-registered ODBC DSN |
"odbc" |
Named ODBC source | base.odbc.connection_string in config.yaml |
For every named connection string the system resolves credentials in
this order:
{package}.database.connections.{name} — package-level DB{package}.{name} — provider section directly under the activebase.{name} — shared provider section in the base configterraform.{key} — local docker service config (only for providersmongo, mssql, mariadb, influxdb,redis)This means a package can override any provider's credentials without
touching the base config:
homechoice:
neon:
host: project.neon.tech
username: hcuser
password: secret
database: hcdb
ssl: true
| Backend | Query format |
|---|---|
| SQL (MariaDB / MSSQL / Postgres / SQLite / Turso) | Standard SELECT statement |
| MongoDB | "collection" — all docs; "collection|{json}" — filtered; "collection|[{pipeline}]" — aggregation |
| YAML | Empty (all records) or "field=value" filter |
| SAS / Parquet / Feather / Stata / SPSS / ORC / FWF | Empty (all records) or "field=value" filter |
| Redis | Key pattern e.g. "users:*" |
| InfluxDB | Flux query string |
Each target type uses its native upsert mechanism:
| Target | Mechanism |
|---|---|
| PostgreSQL / Neon | INSERT ... ON CONFLICT (pk) DO UPDATE SET ... |
| MySQL / MariaDB / TiDB Cloud | SELECT-then-INSERT-or-UPDATE |
| MSSQL | SELECT-then-INSERT-or-UPDATE |
| SQLite / Turso | SELECT-then-INSERT-or-UPDATE + conn.commit() |
| MongoDB | update_one(filter, $set, upsert=True) |
| YAML | Load file, find by PK, replace-or-append, save |
| Parquet / Feather / Stata | Read file → DataFrame, find by PK, replace-or-append, write back |
| SAS / SPSS / ORC / FWF | Source-only — cannot be used as target |
| Redis | HSET {table}:{pk_values} field1 val1 ... |
| InfluxDB | Write as tagged point (pk_fields → tags, rest → fields) |
| ODBC | SELECT-then-INSERT-or-UPDATE (ANSI SQL, works across all ODBC drivers) |
When create_target=True (default), the target table is created from
the schema of the first source record. Column types are inferred from
Python value types. Primary key constraint is included where supported.
Add credentials to config.yaml under base: (or under the active
package to override per-deployment):
base:
tidb:
host: YOUR_HOST.tidbcloud.com
username: YOUR_USERNAME
password: YOUR_PASSWORD
database: YOUR_DATABASE
ssl: true
neon:
host: ep-xxx.eu-west-2.aws.neon.tech
username: neondb_owner
password: YOUR_PASSWORD
database: neondb
port: 5432
ssl: true
channel_binding: require
influxdb_cloud:
url: https://YOUR_REGION.cloud2.influxdata.com
token: YOUR_TOKEN
org: YOUR_ORG
bucket: YOUR_BUCKET
upstash:
host: YOUR_HOST.upstash.io
port: 6379
password: YOUR_PASSWORD
ssl: true
turso:
url: libsql://YOUR_DB.turso.io
auth_token: YOUR_AUTH_TOKEN
Seeder methods (seed_credit_profiles,
seed_credit_score_history, generate_test_data) have moved to
factory.core/ObjDataSeed.py. See ObjDataSeed.md for details.
_tq_*)Low-level methods used by transfer_query. Not intended for direct use.
| Method | Purpose |
|---|---|
_tq_detect_type(connection_str) |
Map a connection string to a DatabaseType value |
_tq_open(connection_str) |
Open a raw connection, returns (conn, conn_type) |
_tq_close(conn, conn_type) |
Close connection (no-op for YAML and pandas file formats) |
_tq_fetch(conn, conn_type, query, connection_str) |
Execute source query → list[dict]. Pandas file formats use a dispatch dict mapping DatabaseType → pd.read_* |
_tq_col_type(value, conn_type) |
Python value → SQL column type string |
_tq_quote(name, conn_type) |
DB-appropriate identifier quoting |
_tq_placeholder(conn_type) |
? (SQLite/Turso) or %s (all others) |
_tq_ensure_table(conn, conn_type, table, sample, pk_fields) |
Auto-create target table from sample record (no-op for schemaless and pandas file formats) |
_tq_upsert(conn, conn_type, table, record, pk_fields) |
Upsert a single record. Source-only formats raise NotImplementedError. Writable pandas formats (Parquet, Feather, Stata) do read-modify-write. |
The connection system was refactored from flat config keys to a nested
database.connections structure. See ConfigIni.get_db_config() for
the lookup logic.
Old format (still supported via fallback):
homechoice:
database:
primaryip: 10.0.0.1
primaryuser: root
primarypassword: secret
primarydb: mydb
usedb: primary
New format:
homechoice:
database:
active: simulation
connections:
primary:
type: mariadb
host: 10.0.0.1
user: root
password: secret
db: mydb
simulation:
type: mariadb
host: 10.0.0.1
user: root
password: secret
db: mydb
| Test Suite | Tests | Type | Purpose |
|---|---|---|---|
test_ObjDataTransfer.py |
30+ | Unit | Type compatibility, YAML, scheduling, metadata, CLI |
test_ObjDataTransfer.py::TestObjDataTransferNeon |
4 | Live integration | Neon connection, transfer, idempotency, readback |
test_ObjDataTransfer_Comprehensive.py |
— | Integration | Full transfer pipeline |
The Neon integration tests run only when base.neon in config.yaml
contains real credentials (host and password not starting with YOUR_).
They are skipped automatically on machines without credentials.
# Run all transfer tests
pytest resource.test/pytests/factory.core/test_ObjDataTransfer.py -v
# Run Neon tests only (skipped if not configured)
pytest resource.test/pytests/factory.core/test_ObjDataTransfer.py::TestObjDataTransferNeon -v
Updated : 2026-03-16
Updated : 2026-03-16
cythonize -3 -a -i ObjDataTransfer.py
Compiling /home/axion/projects/axion/factory.core/ObjDataTransfer.py because it changed..[1/1] Cythonizing /home/axion/projects/axion/factory.core/ObjDataTransfer.py
Updated : 2026-03-16