
NOTICE: All information contained herein is, and remains
the property of TechnoCore.
The intellectual and technical concepts contained
herein are proprietary to TechnoCore and dissemination of this information or reproduction of this material
is strictly forbidden unless prior written permission is obtained
from TechnoCore.
The ObjImportApi class in ObjDataImportParquet.py is designed for importing data from Apache Parquet files. It leverages the pandas library to read Parquet files into a DataFrame, making it easy to process columnar data efficiently.
pandas.read_parquet for fast and efficient reading of Parquet files.__init__(self, DB=0)Initializes the ObjImportApi object, setting up properties such as the DataFrame (df), row iterator, and column list.
PrepFile(self, filename: str) -> strPrepares the file for import. For Parquet files, no special preparation is needed, so it returns the filename as is.
OpenFile(self, filename: str)Opens a Parquet file and prepares it for reading. It reads the file into a pandas DataFrame, extracts and sanitizes the column names, and sets up an iterator for the rows.
CloseFile(self)Releases the resources used by the Parquet file by clearing the DataFrame and row iterator from memory.
ColumnList(self) -> listReturns the list of sanitized column names from the Parquet file.
NextRow(self)Retrieves the next row from the Parquet file. It returns the row as a list of values. If there are no more rows, it returns 'EOF'.
# Create an instance of the importer
parquet_importer = ObjImportApi()
# Open a Parquet file
parquet_importer.OpenFile("userdata.parquet")
# Get the column list
columns = parquet_importer.ColumnList()
print("Columns:", columns)
# Iterate over the rows
while True:
row = parquet_importer.NextRow()
if row == 'EOF':
break
print("Row data:", row)
# Close the file
parquet_importer.CloseFile()