ERC-4626: scanning vault data

Here is an example how to read the ERC-4626 vault historical data. We need this data in order to run other notebooks here, which then analyse the performance of the vaults. The tutorial shows how to create a local vault databases with list of vaults across all chains. It is based on open source pipeline in web3-ethereum-defi repository.

You need

Expert Python knowledge to work with complex Python projects
JSON-RPC archive nodes for various chains, e.g. from dRPC
Hypersync account
UNIX or Windows Subsystem for Linux (WSL) environment
Preferably screen or tmux to or similar utility to run long running processes in the background on servers
Some hours of patience

This is a three step scripted process:

For each chain
- Discover ERC-4626 vaults the chain
- Scan their historical prices
And afterwards
- Clean price data for all vaults across all chains

Note

Pipeline open source code must be updated to accommodate new chains, with chain ids, names and such.

Scanning vaults for a single chain

Discovering vaults

To scan a single chain first we need to discover the vaults on the chain. This is done by scan-vaults.py script.

# Point to HTTPS RPC server for your chain
export JSON_RPC_URL=...
python scripts/erc-4626/scan-vaults.py

This script will create file ~/.tradingstrategy/vaults/vault-db.pickle with the vaults found on the chain, plus all other vaults across other chains we have scanned so far.

The console output looks like:

Scanning historical prices

After discovering the vaults on a chain, we scan their historical performance. This is done by scan-prices.py script. It will read the vaults from the database file created by the previous step. then use JSON-RPC archive nodes polling to extract historical prices and parameters like performance fees.

Scan process is stateful - It can resume, you can rerun the script and it will rescan from where the scan ended last time - Using the state, we filter out vaults that are not interesting, e.g. vaults that become

dead after certain point of time, to keep the amount of JSON-RPC calls lower. This will mean that some vault data might be incorrectly discarded if it does not pass our filters for being a viable vault.

The default scan is set to 1h interval.

This will write

~/tradingstrategy/vaults/vault-prices-1h.parquet file with the historical prices
~/tradingstrategy/vaults/vault-reader-state-1h.parquet to store the latest block scanned for each vault

export JSON_RPC_URL=...
python scripts/erc-4626/scan-prices.py

Output looks like:

Scanning vault historical prices on chain 999: Hyperliquid
Chain Hyperliquid has 12 vaults in the vault detection database
After filtering vaults for non-interesting entries, we have 6 vaults left
Loading token metadata for 6 addresses using 8 workers:   0%|                                                                                                                                                                      | 0/1 [00:00<?, ?it/s]
Preparing historical multicalls for 6 readers using 12 workers: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00,  2.92 readers/s]
Reading historical vault price data for chain 999 with 12 workers, blocks 68,843 - 2,206,919: 3it [00:02,  1.15it/s, Active vaults=2, Last block at=2025-03-11 01:12:36]
Token cache size is 802,816
Scan complete
{'chain_id': 999,
'chunks_done': 1,
'existing': True,
'existing_row_count': 119592,
'file_size': 1164518,
'output_fname': PosixPath('/Users/moo/.tradingstrategy/vaults/vault-prices.parquet'),
'rows_deleted': 0,
'rows_written': 15}

Cleaning data

The raw vault data contains a lot of abnormalities like almost infinite profits, broken smart contracts, missing names and so on.

Cleaning only supports stablecoin-nominated vaults, i.e. vaults that have denomination token in stablecoin. Cleaning process currently discards the data for other denonimations. If you need to access e.g. ETH-nominated vaults, you need to clean the data yourselfs
Denormalise vault data to a single Parquet/Dataframe that can be handled without vault-db.pickle file, in any programming environment
We calculate 1h returns for each vault
We calculate rolling returns and such performance metrics

The script will - Read ~/tradingstrategy/vaults/vault-prices-1h.parquet - Write ~/tradingstrategy/cleaned-vaults/vault-prices-1h.parquet

python scripts/erc-4626/clean-prices.py

Scanning all chains

There is`scan-vaults-all-chains.sh <https://github.com/tradingstrategy-ai/web3-ethereum-defi/blob/master/scripts/erc-4626/scan-vaults-all-chains.sh>`__ shell script to scan vaults across multiple chains.

You need to feed it multiple RPC endpoints like:

export JSON_RPC_ETHEREUM=...
export JSON_RPC_BASE=...
SCAN_PRICES=true scripts/erc-4626/scan-vaults-all-chains.sh

ERC-4626: scanning vault data

Scanning vaults for a single chain

Discovering vaults

Scanning historical prices

Cleaning data

Scanning all chains

Further reading