ERC-4626: scanning vault data

Here is an example how to read the ERC-4626 vault historical data. We need this data in order to run other notebooks here, which then analyse the performance of the vaults. The tutorial shows how to create a local vault databases with list of vaults across all chains. It is based on open source pipeline in web3-ethereum-defi repository.

You need

  • Expert Python knowledge to work with complex Python projects

  • JSON-RPC archive nodes for various chains, e.g. from dRPC

  • Hypersync account

  • UNIX or Windows Subsystem for Linux (WSL) environment

  • Preferably screen or tmux to or similar utility to run long running processes in the background on servers

  • Some hours of patience

This is a three step scripted process:

Note

Pipeline open source code must be updated to accommodate new chains, with chain ids, names and such.

Scanning vaults for a single chain

Discovering vaults

To scan a single chain first we need to discover the vaults on the chain. This is done by scan-vaults.py script.

# Point to HTTPS RPC server for your chain
export JSON_RPC_URL=...
python scripts/erc-4626/scan-vaults.py

This script will create file ~/.tradingstrategy/vaults/vault-db.pickle with the vaults found on the chain, plus all other vaults across other chains we have scanned so far.

The console output looks like:

Scanning historical prices

After discovering the vaults on a chain, we scan their historical performance. This is done by scan-prices.py script. It will read the vaults from the database file created by the previous step. then use JSON-RPC archive nodes polling to extract historical prices and parameters like performance fees.

Scan process is stateful - It can resume, you can rerun the script and it will rescan from where the scan ended last time - Using the state, we filter out vaults that are not interesting, e.g. vaults that become

dead after certain point of time, to keep the amount of JSON-RPC calls lower. This will mean that some vault data might be incorrectly discarded if it does not pass our filters for being a viable vault.

The default scan is set to 1h interval.

This will write

  • ~/tradingstrategy/vaults/vault-prices-1h.parquet file with the historical prices

  • ~/tradingstrategy/vaults/vault-reader-state-1h.parquet to store the latest block scanned for each vault

export JSON_RPC_URL=...
python scripts/erc-4626/scan-prices.py

Output looks like:

Scanning vault historical prices on chain 999: Hyperliquid
Chain Hyperliquid has 12 vaults in the vault detection database
After filtering vaults for non-interesting entries, we have 6 vaults left
Loading token metadata for 6 addresses using 8 workers:   0%|                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
Preparing historical multicalls for 6 readers using 12 workers: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00,  2.92 readers/s]
Reading historical vault price data for chain 999 with 12 workers, blocks 68,843 - 2,206,919: 3it [00:02,  1.15it/s, Active vaults=2, Last block at=2025-03-11 01:12:36]
Token cache size is 802,816
Scan complete
{'chain_id': 999,
'chunks_done': 1,
'existing': True,
'existing_row_count': 119592,
'file_size': 1164518,
'output_fname': PosixPath('/Users/moo/.tradingstrategy/vaults/vault-prices.parquet'),
'rows_deleted': 0,
'rows_written': 15}

Cleaning data

The raw vault data contains a lot of abnormalities like almost infinite profits, broken smart contracts, missing names and so on.

  • Cleaning only supports stablecoin-nominated vaults, i.e. vaults that have denomination token in stablecoin. Cleaning process currently discards the data for other denonimations. If you need to access e.g. ETH-nominated vaults, you need to clean the data yourselfs

  • Denormalise vault data to a single Parquet/Dataframe that can be handled without vault-db.pickle file, in any programming environment

  • We calculate 1h returns for each vault

  • We calculate rolling returns and such performance metrics

The script will - Read ~/tradingstrategy/vaults/vault-prices-1h.parquet - Write ~/tradingstrategy/cleaned-vaults/vault-prices-1h.parquet

python scripts/erc-4626/clean-prices.py

Scanning all chains

There is`scan-vaults-all-chains.sh <https://github.com/tradingstrategy-ai/web3-ethereum-defi/blob/master/scripts/erc-4626/scan-vaults-all-chains.sh>`__ shell script to scan vaults across multiple chains.

You need to feed it multiple RPC endpoints like:

export JSON_RPC_ETHEREUM=...
export JSON_RPC_BASE=...
SCAN_PRICES=true scripts/erc-4626/scan-vaults-all-chains.sh

Further reading