NOTICE: All information contained herein is, and remains
the property of TechnoCore Automate.
The ObjImportApi class in ObjDataImportArrow.py imports data from Apache
Arrow IPC files. It supports both the Arrow IPC file format (.arrow) and
Feather v2 (.feather) — these are the same on-disk format.
Arrow is a columnar in-memory format designed for zero-copy reads. It is
significantly faster to read than Parquet for repeated access because it
requires no decompression or decoding. Use it when data is produced and
consumed within the same pipeline (e.g., Spark → Axion import).
Requires: pyarrow (already in requirements.txt as a Parquet dependency).
prep_file(filename) -> strNo-op — returns the filename unchanged.
open_file(filename)Reads the Arrow IPC file into memory using pyarrow.ipc.open_file. The full
table is loaded once; columns are taken from the Arrow schema.
close_file()Releases the in-memory record list.
column_list() -> listReturns column names from the Arrow schema.
next_row() -> list | strReturns the next record as a list of Python-native values in schema column
order. Returns "EOF" when all records have been read. Native Arrow types
are converted to Python equivalents via table.to_pylist() at open time
(e.g., int64 → int, utf8 → str, float64 → float).
importer = ObjImportApi()
importer.open_file("dataset.arrow")
columns = importer.column_list()
while True:
row = importer.next_row()
if row == "EOF":
break
print(dict(zip(columns, row)))
importer.close_file()
import pyarrow as pa
import pyarrow.ipc as ipc
table = pa.table({"id": [1, 2, 3], "name": ["Alice", "Bob", "Carol"]})
with ipc.new_file("output.arrow", table.schema) as writer:
writer.write_table(table)
Updated : 2026-03-13