research.wrangle_vault_prices
Documentation for eth_defi.research.wrangle_vault_prices Python module.
Clean vault price data.
Denormalise data to a single DataFrame
Remove abnormalities in the price data
Reduce data by removing hourly changes that are below our epsilon threshold
Generate returns data
The input is the raw scanner parquet conforming to
RawVaultPriceRow.
The output is a cleaned DataFrame conforming to
CleanedVaultPriceRow, consumed by
calculate_lifetime_metrics().
Functions
|
Add denormalised data to the prices DataFrame. |
|
Ensure all vaults have unique human-readable name. |
|
Calculate returns for each vault. |
|
Cap share prices for Hypercore (Hyperliquid native) vaults. |
|
Check that we have metadata for all vaults in the prices DataFrame. |
|
TVL-based threshold filtering of returns. |
|
Clean returns data by removing rows with NaN or infinite values. |
|
Derive unified |
|
Ensure vault state columns are present in the DataFrame. |
|
Dedpulicate data rows with epsilon. |
|
Reduce vaults to stablecoin vaults only. |
|
Fix out rows with share price that is too high. |
|
Forward fill missing vault prices up to max_gap_hours. |
A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed. |
|
|
Build a dictionary of vaults by their chain-address id. |
|
Preprocess vault data for further analysis. |
|
Remove initial inactive period from each vault's price history. |
|
Set up the order of vaults for processing. |
Classes
Schema for a single row in the cleaned vault price DataFrame. |
- class CleanedVaultPriceRow
Bases:
TypedDictSchema for a single row in the cleaned vault price DataFrame.
This is the enriched format produced by the cleaning pipeline in this module and consumed by
calculate_lifetime_metrics().It extends
RawVaultPriceRowwith denormalised metadata columns (id,name,event_count,protocol) and computed columns (returns_1h). The DataFrame uses aDatetimeIndexbuilt from thetimestampcolumn.Columns are grouped by availability:
General columns are present for all vault protocols.
ERC-4626 only columns come from on-chain ERC-4626 calls and are NaN / empty for native protocols.
Lending only columns are populated for lending protocol vaults (IPOR, Euler, Morpho, Gearbox, etc.) and NaN for others.
Hypercore only columns come from the Hyperliquid native vault API and are NaN for all other protocols.
Native protocol flow columns are populated for native protocols that provide daily deposit/withdrawal data (Hypercore, GRVT, Lighter, Hibachi) and NaN for ERC-4626 vaults.
- __init__(*args, **kwargs)
- __new__(**kwargs)
- clear()
Remove all items from the dict.
- copy()
Return a shallow copy of the dict.
- fromkeys(value=None, /)
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)
Return the value for key if key is in the dictionary, else default.
- items()
Return a set-like object providing a view on the dict’s items.
- keys()
Return a set-like object providing a view on the dict’s keys.
- pop(k[, d]) v, remove specified key and return the corresponding value.
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem()
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from mapping/iterable E and F.
If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values()
Return an object providing a view on the dict’s values.
- add_denormalised_vault_data(rows, prices_df, logger=<built-in function print>)
Add denormalised data to the prices DataFrame.
Take data from vault database and duplicate it across every row
Add protocol name and event count columns
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.DataFrame) –
- Return type
- assign_unique_names(rows, prices_df, logger=<built-in function print>, duplicate_nav_threshold=1000)
Ensure all vaults have unique human-readable name.
Rerwrite metadata rows
Find duplicate vault names
Add a running counter to the name to make it unique
- Parameters
rows (dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.DataFrame) –
- Return type
- calculate_vault_returns(prices_df, logger=<built-in function print>)
Calculate returns for each vault.
Filter out reads for which we did not get a proper share price
Add
returns_1hcolumns
Example of input data:
chain address block_number timestamp share_price ... errors id name event_count protocol 207 42161 0x487cdc7d21ac8765eff6c0e681aea36ae1594471 13294721 2022-05-30 19:59:22 1.0 ... 42161-0x487cdc7d21ac8765eff6c0e681aea36ae1594471 LDAI 17 <unknown ERC-4626>
- Parameters
prices_df (pandas.DataFrame) –
Cap share prices for Hypercore (Hyperliquid native) vaults.
Hypercore share prices are derived synthetically from portfolio history in
_calculate_share_price(). Whentotal_supplyapproaches zero whiletotal_assetsremains nonzero (e.g. after most depositors withdrew from a leveraged trading vault), share prices can overflow to absurd values (trillions+).This step caps those overflow values before the standard
fix_outlier_share_prices()smoothing runs. Only applies to Hypercore vaults (chain == 9999).- Parameters
prices_df (pandas.DataFrame) – Price data with
chainandshare_pricecolumns.logger – Logging function.
max_share_price (float) – Maximum allowed share price for Hypercore vaults.
- Returns
DataFrame with capped share prices.
- Return type
- check_missing_metadata(rows, price_ids, prices_df, logger=<built-in function print>)
Check that we have metadata for all vaults in the prices DataFrame.
Vault id is in format:
56-0x10c90bfcfb3d2a7ae814da1548ae3a7fc31c35a0If there are vaults with price data but no metadata, they are logged at error level and their IDs returned so the caller can drop them.
- Parameters
rows (dict) – Metadata rows from vault database
prices_df (pandas.DataFrame) – The full prices DataFrame, used to extract context for missing vaults.
price_ids (pandas.Series) –
- Returns
Set of vault IDs that are missing from the metadata. These should be dropped from the price data before further processing.
- Return type
- clean_by_tvl(rows, prices_df, logger=<built-in function print>, tvl_threshold_min=1000.0, tvl_threshold_max=99000000000, tvl_threshold_min_dynamic=0.02, returns_col='returns_1h')
TVL-based threshold filtering of returns.
Clean returns from TVL-manipulation outliers
See https://x.com/moo9000/status/1914746350216077544 for manipulation example
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.DataFrame) –
- Return type
- clean_returns(rows, prices_df, logger=<built-in function print>, outlier_threshold=0.5, display=<function <lambda>>, returns_col='returns_1h')
Clean returns data by removing rows with NaN or infinite values.
- In returns data we have outliers that are likely not real returns, or one-time events that cannot repeat.
Floating point errors: [Share price may jumps wildly when a vault TVL is near zero](https://x.com/0xSEM/status/1914748782102630455)
Bugs: Vault share price method to estimate returns does not work for a particular airdrop
Airdrops: Vault gets an irregular rewards that will not repeat, and thus are not good to estimate the future performance
We clean returns by doing an assumptions - Daily returns higher than static outlier - Daily TVL max does not make sense - Daily TVL min does not make sense - Daily TVL % below lifetime average TVL
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.DataFrame) –
display (Callable) –
- Return type
- derive_deposit_closed_reason(prices_df)
Derive unified
deposit_closed_reasonfrom protocol-specific columns.For Hyperliquid vaults,
deposit_closed_reasonis already set bybuild_raw_prices_dataframe()with specific reason strings.For ERC-4626 vaults, the
deposits_openstring column (“true”/”false”/””) is converted to a generic reason.- Parameters
prices_df (pandas.DataFrame) – DataFrame with
deposit_closed_reasonanddeposits_opencolumns.- Returns
DataFrame with
deposit_closed_reasonfilled in for both vault types.- Return type
- ensure_vault_state_columns(prices_df)
Ensure vault state columns are present in the DataFrame.
Adds missing columns with default values for backward compatibility with raw scan data generated before these fields were added.
- Parameters
prices_df (pandas.DataFrame) –
- Return type
- filter_unneeded_row(prices_df, logger=<built-in function print>, epsilon=0.0025)
Dedpulicate data rows with epsilon.
Reduce data size by elimating rows where the value changes is too little
Remove rows where the total asset/share price/total supply change has been too small
Note
This filter conly yields 2% savings in row count, so it turned out not to be worth of the problems.
- Parameters
prices_df (pandas.DataFrame) – Assume sorted by timestsamp
epsilon – Tolerance for floating point comparison
- Return type
- filter_vaults_by_stablecoin(rows, prices_df, logger=<built-in function print>)
Reduce vaults to stablecoin vaults only.
In this notebooks, we focus on stablecoin yield
Do not consider WETH, other native token vaults, as their returns calculation would need to match the appreciation of underlying assets
[is_stablecoin_like](https://web3-ethereum-defi.readthedocs.io/api/core/_autosummary/eth_defi.token.is_stablecoin_like.html?highlight=is_stablecoin_like#eth_defi.token.is_stablecoin_like) supports GHO, crvUSD and other DeFi/algorithmic stablecoins
Note that this picks up very few EUR and other fiat-nominated vaults
- Parameters
rows (dict[eth_typing.evm.HexAddress, eth_defi.vault.vaultdb.VaultRow]) –
prices_df (pandas.DataFrame) –
- Return type
Fix out rows with share price that is too high.
Sometimes share price jump to an outlier value and back
This caused abnormal returns in returns calculations, messing all volatility numbers, sharpe, charts, etc.
The root cause is bad oracles, fat fingers, MEV trades, etc.
The lookback window is time-based (hours), not row-based, so it works correctly for vaults with non-hourly polling intervals
See
check-share-pricescript for inspecting individual prices
Case Fluegel DAO:
timestamp
chain
address
block_number
share_price
total_assets
total_supply
2024-07-16 15:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17176415
1.60
373,740.21
232,929.92
2024-07-16 16:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17178215
1.63
379,832.59
232,929.92
2024-07-16 17:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17180015
0.33
75,744.97
232,929.92
2024-07-16 18:02:57
8453
0x277a3c57f3236a7d458576074d7c3d7046eb26c
17181815
1.64
382,282.78
232,929.92
Case Untangle Finance:
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-12 23:14:19 (3206): fixing: 1.038721 -> 1.038721, prev: 1.038827, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 00:14:13 (3207): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 01:14:09 (3208): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 02:14:03 (3209): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 03:14:01 (3210): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 04:13:56 (3211): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 05:13:53 (3212): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 06:13:51 (3213): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 07:13:45 (3214): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 08:13:40 (3215): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 09:13:37 (3216): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 10:13:27 (3217): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 11:13:21 (3218): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 00:11:51 (3230): fixing: 0.444865 -> 1.0405275, prev: 1.038721, next: 1.042334 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 01:11:42 (3231): fixing: 0.444865 -> 1.0406325, prev: 1.038931, next: 1.042334 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 02:11:33 (3232): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 03:11:36 (3233): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 04:11:26 (3234): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 05:11:17 (3235): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 06:11:10 (3236): fixing: 0.468629 -> 1.0407335, prev: 1.038931, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 07:11:01 (3237): fixing: 0.468629 -> 1.040835, prev: 1.039134, next: 1.042536 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 08:10:52 (3238): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 11:10:26 (3239): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 12:10:18 (3240): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 13:10:09 (3241): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 14:10:01 (3242): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 04:08:39 (3254): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 05:08:33 (3255): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 06:08:29 (3256): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 07:08:24 (3257): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 08:08:20 (3258): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 09:08:12 (3259): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 10:08:07 (3260): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 11:08:03 (3261): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 12:07:55 (3262): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 13:07:51 (3263): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 14:07:49 (3264): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 15:07:42 (3265): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 16:07:36 (3266): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042354 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-23 02:04:05 (3433): fixing: 1.036482 -> 1.036482, prev: 1.126302, next: 0.487429 Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-24 06:01:21 (3457): fixing: 0.487429 -> 1.041995, prev: 1.036482, next: 1.047508
- Parameters
prices_df (pandas.DataFrame) –
- Return type
- forward_fill_vault(vault_df)
Forward fill missing vault prices up to max_gap_hours.
For displaying, calculating metrics, etc. we want continuous time series
Align random sample interval to 1h
- Parameters
vault_df (pandas.DataFrame) –
Price data for a single vault.
Assume 1h price data.
- Return type
- generate_cleaned_vault_datasets(vault_db_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-metadata-db.pickle'), price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-prices-1h.parquet'), cleaned_price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/cleaned-vault-prices-1h.parquet'), logger=<built-in function print>, display=<function display>, diagnose_vault_id=None)
A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed.
Reads
vault-prices-1h.parquetand generatesvault-prices-1h-cleaned.parquetCalculate returns and various performance metrics to be included with prices data
Clean returns from abnormalities
Note
Drops non-stablecoin vaults. The cleaning is currently applicable for stable vaults only.
- get_vaults_by_id(rows)
Build a dictionary of vaults by their chain-address id.
- Parameters
rows (dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]) – Metadata rows from vault database
- Returns
Dictionary of vaults by their chain-address id
- Return type
- process_raw_vault_scan_data(rows, prices_df, logger=<built-in function print>, display=<function <lambda>>, diagnose_vault_id=None)
Preprocess vault data for further analysis.
Assign unique names to vaults
Add denormalised vault data to prices DataFrame
Filter out non-stablecoin vaults
Calculate returns, rolling metrics
- Parameters
rows (Union[dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow], eth_defi.vault.vaultdb.VaultDatabase]) – Metadata rows from vault database
logger – Notebook / console printer function
display (Callable) – Display Pandas DataFrame function
prices_df (pandas.DataFrame) –
- Return type
- remove_inactive_lead_time(prices_df, logger=<built-in function print>)
Remove initial inactive period from each vault’s price history.
At the beginning of a vault’s lifecycle, total supply may remain constant while the vault is inactive (e.g., 1, 1000, etc.)
When the vault activates, the share price may jump, causing abnormal returns
This function removes the initial rows where total_supply hasn’t changed
Uses exact equality for comparison
Skips initial rows with zero or NaN total_supply to find first valid value
- Parameters
prices_df (pandas.DataFrame) – Price data with ‘id’ and ‘total_supply’ columns. Assumes data is sorted by timestamp within each vault.
- Returns
DataFrame with inactive lead time removed for each vault
- Return type
- sort_and_index_vault_prices(prices_df, priority_ids)
Set up the order of vaults for processing.
If we do debugging we want vaults we debug go first, as the pipeline takes several minutes to run
- Parameters
prices_df (pandas.DataFrame) –