NOTICE: All information contained herein is, and remains
the property of TechnoCore.
The intellectual and technical concepts contained
herein are proprietary to TechnoCore and dissemination of this information or reproduction of this material
is strictly forbidden unless prior written permission is obtained
from TechnoCore.
The Axion data import system is a powerful, database-driven tool for orchestrating file-based data imports. This guide provides a detailed walkthrough of the configuration process, with a focus on a complex scenario: importing multiple data files from a single ZIP archive.
def_dataimport TableAll import processes are defined by records in the def_dataimport table. The system's ServeImport.py process reads this table to determine which directories to watch, which files to process, and how to process them.
| Column | Description | Example |
|---|---|---|
DataImportCode |
A unique identifier for the import configuration. | CUSTOMER_MASTER_IMPORT |
Directory |
The absolute path to the directory to monitor for new files. | /home/axion/axion/local.documents/inbound |
FileMask |
A glob pattern to identify files to be imported. | *.zip |
ProcessedDir |
The directory where files are moved after successful import. | /home/axion/axion/local.documents/processed |
ArchiveDirectory |
The directory where files are moved if an error occurs. | /home/axion/axion/local.documents/archive |
TableName |
The base name for the temporary table created for the import. | imp_customer_master |
PreSQL |
SQL script to run before the import begins. Typically used to create temporary tables. | CREATE TABLE IF NOT EXISTS imp_customer_master_temp... |
PostSQL |
SQL script to run after the data has been loaded into the temporary table. Used to merge data into production tables. | INSERT INTO customers SELECT * FROM imp_customer_master_temp... |
WorkflowCode |
The code of a workflow (from def_workflow) to trigger after a successful import. |
NOTIFY_ADMIN_SUCCESS |
UnzipMask |
A glob pattern to select specific files from within the zip archive. | *.csv |
TriggerMask |
A glob pattern for a trigger file that must exist before processing. | _READY_ |
MaxBlockSize |
The maximum number of rows to commit in a single database transaction. | 50000 |
ImportMaxRows |
The maximum number of rows to import from a single file. | 10000000 |
FolderScanInterval |
The interval (in seconds) at which the watcher scans directories. | 15 |
config.yaml)Beyond the def_dataimport table, several critical parameters for the data import process are externalized into the project's config.yaml file. This allows for flexible adjustments without database modifications.
config.yaml Parameters (under the [dataimport] section):| Parameter | Description | Default Value |
|---|---|---|
folder_scan_interval |
The frequency (in seconds) at which the ServeImport.py process scans watched directories for new files. |
15 |
import_max_rows |
The maximum number of rows to import from a single file. This acts as a safeguard against excessively large files. | 10000000 |
max_block_size |
The number of rows processed and committed in a single database transaction. Optimizing this value can improve performance for large imports. | 50000 |
To provide continuous feedback during long-running scan operations, the ServeImport.py process emits heartbeat messages. These messages are displayed in the console and indicate that the system is actively monitoring directories. The messages are randomly selected from a predefined list in factory.core/ObjApi.yaml.
This scenario demonstrates how to configure the system to handle a single ZIP file containing multiple CSV files that need to be imported into different temporary tables.
A partner sends a single ZIP file, master_data_20251103.zip, to the inbound directory. This file contains two CSV files:
customers.csvaddresses.csvWe need to import the data from both CSV files into two separate temporary tables before merging them into the main production tables.
def_dataimport)First, we create a record in def_dataimport to manage this process.
SQL INSERT Statement:
INSERT INTO `def_dataimport` (
`DataImportCode`,
`Description`,
`Directory`,
`FileMask`,
`ProcessedDir`,
`ArchiveDirectory`,
`TableName`,
`PreSQL`,
`PostSQL`,
`WorkflowCode`,
`Unzip`,
`UnzipMask`,
`Active`
) VALUES (
'CUSTOMER_MASTER_ZIP',
'Imports customer and address data from a single zip file.',
'/home/axion/axion/local.documents/inbound/customer_master',
'*.zip',
'/home/axion/axion/local.documents/processed/customer_master',
'/home/axion/axion/local.documents/archive/customer_master',
'imp_customer_zip',
'CALL sp_create_customer_import_tables();',
'CALL sp_merge_customer_import_data();',
'WF_CUSTOMER_IMPORT_COMPLETE',
'Y',
'*.csv',
'Y'
);
PreSQL)The PreSQL field points to a stored procedure, sp_create_customer_import_tables(). This is best practice for complex setup scripts. This procedure creates the two temporary tables required for the data from customers.csv and addresses.csv.
The import orchestrator automatically names the temporary tables based on the TableName from the config and the name of the file from the zip archive.
customers.csv -> imp_customer_zip_customersaddresses.csv -> imp_customer_zip_addressesStored Procedure sp_create_customer_import_tables():
CREATE PROCEDURE sp_create_customer_import_tables()
BEGIN
-- Drop tables if they exist to ensure a clean slate
DROP TABLE IF EXISTS `imp_customer_zip_customers`;
DROP TABLE IF EXISTS `imp_customer_zip_addresses`;
-- Create table for customers.csv
CREATE TABLE `imp_customer_zip_customers` (
`customer_id` VARCHAR(50),
`first_name` VARCHAR(100),
`last_name` VARCHAR(100),
`email` VARCHAR(150),
`signup_date` DATE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-- Create table for addresses.csv
CREATE TABLE `imp_customer_zip_addresses` (
`address_id` VARCHAR(50),
`customer_id` VARCHAR(50),
`street_address` VARCHAR(255),
`city` VARCHAR(100),
`postal_code` VARCHAR(20),
`country` VARCHAR(50)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
END;
ServeImport.py watcher, monitoring the /home/axion/axion/local.documents/inbound/customer_master directory, finds master_data_20251103.zip.*.zip FileMask for the CUSTOMER_MASTER_ZIP configuration.CALL sp_create_customer_import_tables();, creating the imp_customer_zip_customers and imp_customer_zip_addresses tables.Unzip is 'Y', the system extracts the contents of the zip file into a temporary location.customers.csv and addresses.csv, which both match the UnzipMask of *.csv.
customers.csv, loading its data into the imp_customer_zip_customers table.addresses.csv, loading its data into the imp_customer_zip_addresses table.PostSQL script, CALL sp_merge_customer_import_data();.PostSQL)The PostSQL stored procedure is responsible for validating and merging the data from the temporary tables into the final production tables.
Stored Procedure sp_merge_customer_import_data():
CREATE PROCEDURE sp_merge_customer_import_data()
BEGIN
-- Use a transaction for data integrity
START TRANSACTION;
-- Merge customer data (update existing, insert new)
INSERT INTO `customers` (customer_id, first_name, last_name, email, signup_date)
SELECT customer_id, first_name, last_name, email, signup_date FROM `imp_customer_zip_customers`
ON DUPLICATE KEY UPDATE
first_name = VALUES(first_name),
last_name = VALUES(last_name),
email = VALUES(email),
signup_date = VALUES(signup_date);
-- Merge address data
INSERT INTO `customer_addresses` (address_id, customer_id, street_address, city, postal_code, country)
SELECT address_id, customer_id, street_address, city, postal_code, country FROM `imp_customer_zip_addresses`
ON DUPLICATE KEY UPDATE
street_address = VALUES(street_address),
city = VALUES(city),
postal_code = VALUES(postal_code),
country = VALUES(country);
COMMIT;
-- Clean up temporary tables
DROP TABLE `imp_customer_zip_customers`;
DROP TABLE `imp_customer_zip_addresses`;
END;
PostSQL, the original master_data_20251103.zip is moved to the ProcessedDir. If any step had failed, it would be moved to the ArchiveDirectory.WorkflowCode (WF_CUSTOMER_IMPORT_COMPLETE), which might send an email notification or start another business process.