Axion supports 19 data formats for import and export.
This guide describes each format, when to use it, and
the libraries that power them.
| Format | Import | Export | Extension | Library |
|---|---|---|---|---|
| CSV | Yes | Yes | .csv |
stdlib csv |
| JSON | Yes | Yes | .json |
orjson / json |
| NDJSON | Yes | Yes | .ndjson |
line-delimited JSON |
| XLSX | Yes | Yes | .xlsx |
openpyxl |
| XLS | Yes | Yes | .xls |
xlrd / xlwt |
| ODS | Yes | Yes | .ods |
odfpy |
| XML | Yes | Yes | .xml |
lxml / defusedxml |
| Parquet | Yes | Yes | .parquet |
pyarrow |
| ORC | Yes | Yes | .orc |
pyorc |
| Arrow | Yes | Yes | .arrow |
pyarrow.ipc |
| Avro | Yes | Yes | .avro |
fastavro |
| YAML | Yes | Yes | .yaml |
pyyaml |
| SQLite | Yes | Yes | .db / .sqlite |
stdlib sqlite3 |
| HTML | Yes | Yes | .html |
pandas / stdlib |
| Google Sheets | Yes | Yes | URL/ID | gspread |
| SFTP | Yes | Yes | (any) | pysftp + delegate |
| FWF | Yes | — | .fwf / .txt |
pandas |
| Log | Yes | — | .log |
regex parser |
| SQL Query | Yes | — | (query) | database cursor |
| Markdown | — | Yes | .md |
string formatting |
| MySQL Table | — | Yes | (table) | CREATE TABLE DDL |
| CSV Process | — | Yes | .csv |
chunked streaming |
The most common data exchange format. Plain text with
one record per line, fields separated by a delimiter
(comma by default).
When to use: Universal interchange between any two
systems. Human-readable. Supported by every spreadsheet
and database tool.
Limitations: No data types (everything is a string),
no nested data, no schema. Large files are slow to parse.
Encoding issues are common with non-ASCII data.
Import: ObjDataImportCSV — auto-detects separator
(comma, semicolon, tab, pipe). Handles quoted fields
and embedded newlines.
Export: ObjDataExportCsv — configurable delimiter
and quote character. ObjDataExportCsvProcess provides
chunked streaming for very large datasets.
Structured data format with nested objects and arrays.
The standard for web APIs and modern data exchange.
When to use: API integrations, configuration data,
documents with nested structure. Self-describing — no
external schema needed.
Limitations: Verbose for large tabular data. No
native date/time type. Entire file must fit in memory.
Import: ObjDataImportJson — expects a JSON array
of objects. Column names from the keys of the first
record.
Export: ObjDataExportJson — writes a JSON array
of objects with datetime serialisation support.
One JSON object per line. Also called JSON Lines
(.jsonl). Used for streaming and log data.
When to use: Streaming data pipelines, log files,
append-only data. Can be processed line by line without
loading the entire file.
Limitations: No schema. Each line is independent —
inconsistent keys across lines are possible.
Import: ObjDataImportNdjson — reads line by line.
Export: ObjDataExportNdjson — writes one JSON
object per line.
Modern Excel format (2007+). Binary XML in a ZIP
container. Supports multiple sheets, formatting,
formulas, and data types.
When to use: Business reports, data delivery to
non-technical users, spreadsheets with multiple sheets.
Limitations: Large files are slow. Max 1,048,576
rows per sheet. Binary format — not diff-friendly.
Import: ObjDataImportXlsx — reads specified
sheet by name or number. Supports _Sheetname and
_Sheetnumber attributes.
Export: ObjDataExportXlsx — writes with auto-sized
columns and header styling via openpyxl.
Pre-2007 Excel format. Limited to 65,536 rows and
256 columns.
When to use: Legacy system compatibility only.
Prefer XLSX for new work.
Import: ObjDataImportXls via xlrd.
Export: ObjDataExportXls via xlwt.
Open standard spreadsheet format used by LibreOffice
and OpenOffice. XML-based inside a ZIP container.
When to use: Open-source toolchains, government
data (ODS is often a compliance requirement), Linux
environments without Excel.
Import: ObjDataImportOds via odfpy.
Export: ObjDataExportOds via odfpy.
Structured, hierarchical markup format. Verbose but
self-describing with schemas (XSD) and namespaces.
When to use: SOAP APIs, legacy enterprise systems,
configuration files, document interchange (PMML,
SVG, RSS).
Limitations: Verbose. Parsing is slower than JSON.
Namespaces add complexity.
Import: ObjDataImportXml via lxml with
defusedxml for XXE protection.
Export: ObjDataExportXml — generates well-formed
XML with configurable root and row element names.
Each field occupies a fixed number of characters per
line. No delimiters. Common in mainframe and banking
data feeds.
When to use: Mainframe data interchange, COBOL
system exports, bank transaction files.
Import only: ObjDataImportFwf via
pandas.read_fwf().
Parses structured log files using configurable regex
patterns. Extracts fields from each log line.
When to use: Server access logs, application logs,
syslog files.
Import only: ObjDataImportLog — regex-based
field extraction.
Columnar storage format designed for analytics. Highly
compressed with predicate pushdown support. The standard
for data lakes (S3, HDFS, Databricks).
When to use: Large analytical datasets, data lake
storage, Spark/Pandas interchange, archival. 10-100x
smaller than CSV for typical data.
Limitations: Not human-readable. Requires pyarrow
or fastparquet. Not suitable for streaming writes.
Import: ObjDataImportParquet via pyarrow.
Export: ObjDataExportParquet via pyarrow.
Columnar format from the Hadoop ecosystem. Similar to
Parquet but optimised for Hive queries. Better
compression for string-heavy data.
When to use: Hadoop/Hive integration, data
warehousing, when ORC is the ecosystem standard.
Limitations: Requires pyorc. Linux only — no
Windows wheels. Less widely adopted than Parquet
outside Hadoop.
Import: ObjDataImportOrc via pyorc.
Export: ObjDataExportOrc via pyorc.
In-memory columnar format for zero-copy data exchange
between processes. The IPC file format (Feather v2)
serialises Arrow tables to disk.
When to use: High-performance inter-process data
sharing, Pandas/Polars interchange, ML model input
pipelines. Fastest possible read/write for columnar
data.
Limitations: Files are large (minimal compression
by default). Schema must be known at write time.
Import: ObjDataImportArrow via pyarrow.ipc.
Export: ObjDataExportArrow via pyarrow.ipc.
Row-oriented binary format with embedded schema.
The standard for Kafka message serialisation and
schema-evolution in event-driven systems.
When to use: Kafka producers/consumers, event
streams, schema registry integration. Compact binary
encoding with schema evolution support.
Limitations: Not human-readable. Requires
fastavro. Schema must be defined at write time.
Import: ObjDataImportAvro via fastavro.reader.
Export: ObjDataExportAvro via fastavro.writer.
Self-contained, serverless, zero-configuration SQL
database in a single file. The most widely deployed
database engine in the world.
When to use: Portable data exchange as a single
file, offline/mobile applications, testing, embedded
analytics. Better than CSV when you need types,
indexes, or multiple tables.
Limitations: Single-writer concurrency. Not
suitable for multi-user server workloads.
Import: ObjDataImportSqlite — reads from
specified table (auto-detects first table if not
specified). Uses stdlib sqlite3.
Export: ObjDataExportSqlite — creates table
with columns, inserts all rows. Configurable table
name via self.table_name.
Imports data by executing a SQL query against the
connected database. Not a file format — reads directly
from the live database.
When to use: Database-to-database transfers,
extracting subsets of data with WHERE clauses, joining
tables during import.
Import only: ObjDataImportQuery — executes the
query and iterates over the result cursor.
Exports the table structure as a CREATE TABLE
statement. Not data — just the schema definition.
Export only: ObjDataExportMysqltable.
Exports data as an HTML <table> with inline CSS.
Imports by extracting <table> elements from HTML
files or web pages.
When to use: Email reports, web page embedding,
scraping data from web pages.
Import: ObjDataImportHtml — uses
pandas.read_html() to extract the first table.
Supports both local files and URLs.
Export: ObjDataExportHtml — generates a complete
HTML document with <thead>, <tbody>, and inline
CSS for readability.
Exports data as a Markdown table. Human-readable in
plain text and renders nicely on GitHub, Bitbucket,
and documentation sites.
When to use: Documentation, README files, wiki
content, Slack/Teams messages.
Export only: ObjDataExportMarkdown.
Human-readable data serialisation format. Superset of
JSON with cleaner syntax. The standard for Kubernetes
configs, CI/CD pipelines, and Ansible playbooks.
When to use: Configuration data, definition
imports, human-editable data files. Preferred over
JSON when humans need to read/write the file.
Limitations: Indentation-sensitive (easy to break).
Slower to parse than JSON. Security risk with unsafe
loaders (always use safe_load).
Import: ObjDataImportYaml — expects list-of-dicts
at top level or nested inside a dict key. Uses
yaml.safe_load() only.
Export: ObjDataExportYaml — writes list-of-dicts
with yaml.dump().
Read from and write to Google Spreadsheets via the
Google Sheets API using a service account.
When to use: Shared team data, live dashboards,
data collection forms, business user collaboration.
Setup:
data.config/google_service_account.jsonConfig:
base:
google:
service_account_file: data.config/google_service_account.json
Import: ObjDataImportGsheets — reads all records
from the first worksheet (or named sheet via
_Sourcetable). All values returned as strings.
Export: ObjDataExportGsheets — clears the
worksheet and writes header + data rows. Creates or
opens spreadsheet by URL or name.
Transport wrapper — not a data format itself. Downloads
a file from an SFTP server, detects its format from the
extension, and delegates to the matching format handler.
When to use: Importing files from external partners,
bank data feeds, scheduled file drops on remote servers.
Connection methods (in priority order):
sftp://user:pass@host:port/path/file.csv_Host, _Port, _Username,_Password_Remoteconnection name fromdef_remoteconnections tabletraderftp section in config.yamlImport: ObjDataImportSftp — downloads to temp
file, delegates to format handler (CSV, JSON, XLSX,
etc.), cleans up temp file on close.
Export: ObjDataExportSftp — writes to local temp
file via format handler, uploads via SFTP, cleans up.
| Use Case | Recommended Format |
|---|---|
| General data exchange | CSV or JSON |
| Business reports | XLSX |
| Data lake / analytics | Parquet |
| Kafka / event streams | Avro |
| API integration | JSON or NDJSON |
| Database portable copy | SQLite |
| Human-editable config | YAML |
| Web scraping | HTML import |
| Email reports | HTML export |
| Documentation | Markdown export |
| Mainframe / banking | FWF import |
| Team collaboration | Google Sheets |
| External partner files | SFTP + any format |
| Fastest Read | Fastest Write | Smallest File |
|---|---|---|
| Arrow | Arrow | Parquet |
| Parquet | Parquet | ORC |
| SQLite | SQLite | Avro |
| CSV | CSV | CSV (gzipped) |
| Most Compatible | Most Portable | Most Structured |
|---|---|---|
| CSV | SQLite | Parquet |
| JSON | JSON | Avro |
| XLSX | CSV | Arrow |
To add a new import format:
factory.import/ObjDataImportXxx.pyObjImportApi(ObjData.ObjData)prep_file, open_file, close_file,column_list, next_rowfactory.import/ObjDataImportXxx.mdresource.test/pytests/factory.import/To add a new export format:
factory.export/ObjDataExportXxx.pyObjExportApi(ObjExportBase)OpenFile, CloseFile, WriteHeader,WriteData, EscapeCSV, DefaultExtensionfactory.export/ObjDataExportXxx.mdresource.test/pytests/factory.export/The format is auto-discovered by ObjDataImport.factory_object()
via case-insensitive filename matching against the file extension.