ParquetDatasetBlockDataStore
Documentation for eth_defi.event_reader.parquet_block_data_store.ParquetDatasetBlockDataStore Python class.
- class ParquetDatasetBlockDataStore[source]
Store block data as Parquet dataset.
Partitions are keyed by block number.
Partitioning allows fast incremental updates, by overwriting the last two partitions,
Methods summary
__init__
(path[, partition_size])- param path
floor_block_number_to_partition_start
(n)Has this store any stored data.
load
([since_block_number])Load data from parquet.
Return the last block number stored on the disk.
save
(df[, since_block_number, ...])Save all data from parquet.
save_incremental
(df)Write all partitions we are missing from the data.
- __init__(path, partition_size=100000)[source]
- Parameters
path (pathlib.Path) – Directory and a metadata file there
partition_size –
- load(since_block_number=0)[source]
Load data from parquet.
- Parameters
since_block_number (int) – May return earlier rows than this if a block is a middle of a partition
- Return type
pandas.core.frame.DataFrame
- save(df, since_block_number=0, check_contains_all_blocks=True)[source]
Save all data from parquet.
If there are existing block headers written, any data will be overwritten on per partition basis.
- Parameters
since_block_number (int) – Write only the latest data after this block number (inclusive)
check_contains_all_blocks – Check that we have at least one data record for every block. Note that trades might not happen on every block.
df (pandas.core.frame.DataFrame) –