NOTICE: All information contained herein is, and remains
the property of TechnoCore Automate.
The ObjImportApi class in ObjDataImportOrc.py imports data from Apache ORC
(Optimized Row Columnar) files. ORC is a columnar storage format widely used in
the Hadoop/Hive ecosystem and is the native format for many data warehouse exports.
Unlike Parquet and Arrow, rows are streamed from the file one at a time
without loading the entire dataset into memory, making it suitable for large files.
Requires: pyorc
prep_file(filename) -> strNo-op — returns the filename unchanged.
open_file(filename)Opens the ORC file and prepares a streaming row iterator. Column names are
taken from the ORC struct schema field names.
close_file()Closes the ORC reader and releases references.
column_list() -> listReturns column names from the ORC schema.
next_row() -> list | strReturns the next row as a list of Python-native values in schema column order.
Returns "EOF" when all rows have been read.
importer = ObjImportApi()
importer.open_file("warehouse_export.orc")
columns = importer.column_list()
while True:
row = importer.next_row()
if row == "EOF":
break
print(dict(zip(columns, row)))
importer.close_file()
Updated : 2026-03-13