verify_parquet_file

Documentation for eth_defi.vault.base.verify_parquet_file function.

verify_parquet_file(path, expected_rows=None, expected_schema=None, required_columns=None)

Read back a parquet file after writing and verify its integrity.

Performs a metadata read-back (not a full table load) to check:

  1. The file can be opened and its metadata read without errors

  2. Row count matches expected_rows if provided

  3. All columns in expected_schema are present with correct types (extra columns are permitted — e.g. native protocol columns)

  4. All required_columns are present

Uses pq.read_metadata() and pq.read_schema() instead of pq.read_table() to avoid loading the full dataset into memory.

This function should be called on a temp file before the atomic replace so that the previous good file is preserved when verification fails.

Parameters
  • path (Union[pathlib.Path, str]) – Path to the parquet file to verify.

  • expected_rows (Optional[int]) – If set, assert the file contains exactly this many rows.

  • expected_schema (pyarrow.Schema | None) – If set, verify that all columns in this schema are present with the correct types. Extra columns are permitted.

  • required_columns (Optional[list[str]]) – If set, verify these column names are present.

Returns

Verification result with metadata about the file.

Raises

ParquetVerificationError – If any verification check fails or the file cannot be read.

Return type

eth_defi.vault.base.ParquetVerificationResult