event_reader.parquet_block_data_store
Documentation for eth_defi.event_reader.parquet_block_data_store Python module.
Parquet dataset backed block data storage like block headers or trades.
Classes
Store block data as Parquet dataset. |
Exceptions
Do not allow gaps in data. |
- exception NoGapsWritten
Bases:
ExceptionDo not allow gaps in data.
- __init__(*args, **kwargs)
- __new__(**kwargs)
- add_note()
Exception.add_note(note) – add a note to the exception
- with_traceback()
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class ParquetDatasetBlockDataStore
Bases:
eth_defi.event_reader.block_data_store.BlockDataStoreStore block data as Parquet dataset.
Partitions are keyed by block number.
Partitioning allows fast incremental updates, by overwriting the last two partitions,
- Parameters
path – Directory and a metadata file there
partition_size –
- __init__(path, partition_size=100000)
- Parameters
path (pathlib.Path) – Directory and a metadata file there
partition_size –
- load(since_block_number=0)
Load data from parquet.
- Parameters
since_block_number (int) – May return earlier rows than this if a block is a middle of a partition
- Return type
pandas.core.frame.DataFrame
- save(df, since_block_number=0, check_contains_all_blocks=True)
Save all data from parquet.
If there are existing block headers written, any data will be overwritten on per partition basis.
- Parameters
since_block_number (int) – Write only the latest data after this block number (inclusive)
check_contains_all_blocks – Check that we have at least one data record for every block. Note that trades might not happen on every block.
df (pandas.core.frame.DataFrame) –
- save_incremental(df)
Write all partitions we are missing from the data.
We need to write minimum two partitions
There might be gaps in data we can write
There might be gaps data on disk we have already written
Do some heurestics what to write