Source: factory.import/ObjDataImportOrc.py
API for importing data from Apache ORC files.
ORC (Optimized Row Columnar) is a columnar storage format common in the
Hadoop/Hive ecosystem. Rows are streamed from the file one at a time
without loading the entire dataset into memory.
...
| Method | Signature | Description |
|---|---|---|
| prep_file | prep_file(filename: str) -> str |
No preparation needed for ORC files. |
| open_file | open_file(filename: str) -> None |
Opens the ORC file and prepares a streaming row iterator. |
| close_file | close_file() -> None |
Closes the ORC reader. |
| column_list | column_list() -> list | None |
Returns column names from the ORC schema. |
| next_row | next_row() -> list | str | None |
Returns the next row as a list of Python-native values in schema |