research.wrangle_vault_prices

Documentation for eth_defi.research.wrangle_vault_prices Python module.

Clean vault price data.

  • Denormalise data to a single DataFrame

  • Remove abnormalities in the price data

  • Reduce data by removing hourly changes that are below our epsilon threshold

  • Generate returns data

The input is the raw scanner parquet conforming to RawVaultPriceRow. The output is a cleaned DataFrame conforming to CleanedVaultPriceRow, consumed by calculate_lifetime_metrics().

Functions

add_denormalised_vault_data(rows, prices_df)

Add denormalised data to the prices DataFrame.

assign_unique_names(rows, prices_df[, ...])

Ensure all vaults have unique human-readable name.

calculate_vault_returns(prices_df[, logger])

Calculate returns for each vault.

cap_hypercore_share_prices(prices_df[, ...])

Cap share prices for Hypercore (Hyperliquid native) vaults.

check_missing_metadata(rows, price_ids, ...)

Check that we have metadata for all vaults in the prices DataFrame.

clean_by_tvl(rows, prices_df[, logger, ...])

TVL-based threshold filtering of returns.

clean_returns(rows, prices_df[, logger, ...])

Clean returns data by removing rows with NaN or infinite values.

derive_deposit_closed_reason(prices_df)

Derive unified deposit_closed_reason from protocol-specific columns.

ensure_vault_state_columns(prices_df)

Ensure vault state columns are present in the DataFrame.

filter_unneeded_row(prices_df[, logger, epsilon])

Dedpulicate data rows with epsilon.

filter_vaults_by_stablecoin(rows, prices_df)

Reduce vaults to stablecoin vaults only.

fix_outlier_share_prices(prices_df[, ...])

Fix out rows with share price that is too high.

forward_fill_vault(vault_df)

Forward fill missing vault prices up to max_gap_hours.

generate_cleaned_vault_datasets([...])

A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed.

get_vaults_by_id(rows)

Build a dictionary of vaults by their chain-address id.

process_raw_vault_scan_data(rows, prices_df)

Preprocess vault data for further analysis.

remove_inactive_lead_time(prices_df[, logger])

Remove initial inactive period from each vault's price history.

sort_and_index_vault_prices(prices_df, ...)

Set up the order of vaults for processing.

Classes

CleanedVaultPriceRow

Schema for a single row in the cleaned vault price DataFrame.

class CleanedVaultPriceRow

Bases: TypedDict

Schema for a single row in the cleaned vault price DataFrame.

This is the enriched format produced by the cleaning pipeline in this module and consumed by calculate_lifetime_metrics().

It extends RawVaultPriceRow with denormalised metadata columns (id, name, event_count, protocol) and computed columns (returns_1h). The DataFrame uses a DatetimeIndex built from the timestamp column.

Columns are grouped by availability:

  • General columns are present for all vault protocols.

  • ERC-4626 only columns come from on-chain ERC-4626 calls and are NaN / empty for native protocols.

  • Lending only columns are populated for lending protocol vaults (IPOR, Euler, Morpho, Gearbox, etc.) and NaN for others.

  • Hypercore only columns come from the Hyperliquid native vault API and are NaN for all other protocols.

  • Native protocol flow columns are populated for native protocols that provide daily deposit/withdrawal data (Hypercore, GRVT, Lighter, Hibachi) and NaN for ERC-4626 vaults.

__init__(*args, **kwargs)
__new__(**kwargs)
clear()

Remove all items from the dict.

copy()

Return a shallow copy of the dict.

fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items()

Return a set-like object providing a view on the dict’s items.

keys()

Return a set-like object providing a view on the dict’s keys.

pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from mapping/iterable E and F.

If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values()

Return an object providing a view on the dict’s values.

add_denormalised_vault_data(rows, prices_df, logger=<built-in function print>)

Add denormalised data to the prices DataFrame.

  • Take data from vault database and duplicate it across every row

  • Add protocol name and event count columns

Parameters
Return type

pandas.DataFrame

assign_unique_names(rows, prices_df, logger=<built-in function print>, duplicate_nav_threshold=1000)

Ensure all vaults have unique human-readable name.

  • Rerwrite metadata rows

  • Find duplicate vault names

  • Add a running counter to the name to make it unique

Parameters
Return type

pandas.DataFrame

calculate_vault_returns(prices_df, logger=<built-in function print>)

Calculate returns for each vault.

  • Filter out reads for which we did not get a proper share price

  • Add returns_1h columns

Example of input data:

     chain                                     address  block_number           timestamp  share_price  ...  errors                                                id  name  event_count            protocol
207  42161  0x487cdc7d21ac8765eff6c0e681aea36ae1594471      13294721 2022-05-30 19:59:22          1.0  ...          42161-0x487cdc7d21ac8765eff6c0e681aea36ae1594471  LDAI           17  <unknown ERC-4626>
Parameters

prices_df (pandas.DataFrame) –

cap_hypercore_share_prices(prices_df, logger=<built-in function print>, max_share_price=1000000.0)

Cap share prices for Hypercore (Hyperliquid native) vaults.

Hypercore share prices are derived synthetically from portfolio history in _calculate_share_price(). When total_supply approaches zero while total_assets remains nonzero (e.g. after most depositors withdrew from a leveraged trading vault), share prices can overflow to absurd values (trillions+).

This step caps those overflow values before the standard fix_outlier_share_prices() smoothing runs. Only applies to Hypercore vaults (chain == 9999).

Parameters
  • prices_df (pandas.DataFrame) – Price data with chain and share_price columns.

  • logger – Logging function.

  • max_share_price (float) – Maximum allowed share price for Hypercore vaults.

Returns

DataFrame with capped share prices.

Return type

pandas.DataFrame

check_missing_metadata(rows, price_ids, prices_df, logger=<built-in function print>)

Check that we have metadata for all vaults in the prices DataFrame.

Vault id is in format: 56-0x10c90bfcfb3d2a7ae814da1548ae3a7fc31c35a0

If there are vaults with price data but no metadata, they are logged at error level and their IDs returned so the caller can drop them.

Parameters
  • rows (dict) – Metadata rows from vault database

  • prices_df (pandas.DataFrame) – The full prices DataFrame, used to extract context for missing vaults.

  • price_ids (pandas.Series) –

Returns

Set of vault IDs that are missing from the metadata. These should be dropped from the price data before further processing.

Return type

set[str]

clean_by_tvl(rows, prices_df, logger=<built-in function print>, tvl_threshold_min=1000.0, tvl_threshold_max=99000000000, tvl_threshold_min_dynamic=0.02, returns_col='returns_1h')

TVL-based threshold filtering of returns.

Parameters
Return type

pandas.DataFrame

clean_returns(rows, prices_df, logger=<built-in function print>, outlier_threshold=0.5, display=<function <lambda>>, returns_col='returns_1h')

Clean returns data by removing rows with NaN or infinite values.

  • In returns data we have outliers that are likely not real returns, or one-time events that cannot repeat.
    • Floating point errors: [Share price may jumps wildly when a vault TVL is near zero](https://x.com/0xSEM/status/1914748782102630455)

    • Bugs: Vault share price method to estimate returns does not work for a particular airdrop

    • Airdrops: Vault gets an irregular rewards that will not repeat, and thus are not good to estimate the future performance

  • We clean returns by doing an assumptions - Daily returns higher than static outlier - Daily TVL max does not make sense - Daily TVL min does not make sense - Daily TVL % below lifetime average TVL

Parameters
Return type

pandas.DataFrame

derive_deposit_closed_reason(prices_df)

Derive unified deposit_closed_reason from protocol-specific columns.

For Hyperliquid vaults, deposit_closed_reason is already set by build_raw_prices_dataframe() with specific reason strings.

For ERC-4626 vaults, the deposits_open string column (“true”/”false”/””) is converted to a generic reason.

Parameters

prices_df (pandas.DataFrame) – DataFrame with deposit_closed_reason and deposits_open columns.

Returns

DataFrame with deposit_closed_reason filled in for both vault types.

Return type

pandas.DataFrame

ensure_vault_state_columns(prices_df)

Ensure vault state columns are present in the DataFrame.

  • Adds missing columns with default values for backward compatibility with raw scan data generated before these fields were added.

Parameters

prices_df (pandas.DataFrame) –

Return type

pandas.DataFrame

filter_unneeded_row(prices_df, logger=<built-in function print>, epsilon=0.0025)

Dedpulicate data rows with epsilon.

  • Reduce data size by elimating rows where the value changes is too little

  • Remove rows where the total asset/share price/total supply change has been too small

Note

This filter conly yields 2% savings in row count, so it turned out not to be worth of the problems.

Parameters
  • prices_df (pandas.DataFrame) – Assume sorted by timestsamp

  • epsilon – Tolerance for floating point comparison

Return type

pandas.DataFrame

filter_vaults_by_stablecoin(rows, prices_df, logger=<built-in function print>)

Reduce vaults to stablecoin vaults only.

Parameters
Return type

pandas.DataFrame

fix_outlier_share_prices(prices_df, logger=<built-in function print>, max_diff=0.33, look_back_hours=24, look_ahead_hours=24)

Fix out rows with share price that is too high.

  • Sometimes share price jump to an outlier value and back

  • This caused abnormal returns in returns calculations, messing all volatility numbers, sharpe, charts, etc.

  • The root cause is bad oracles, fat fingers, MEV trades, etc.

  • The lookback window is time-based (hours), not row-based, so it works correctly for vaults with non-hourly polling intervals

  • See check-share-price script for inspecting individual prices

Case Fluegel DAO:

timestamp

chain

address

block_number

share_price

total_assets

total_supply

2024-07-16 15:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17176415

1.60

373,740.21

232,929.92

2024-07-16 16:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17178215

1.63

379,832.59

232,929.92

2024-07-16 17:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17180015

0.33

75,744.97

232,929.92

2024-07-16 18:02:57

8453

0x277a3c57f3236a7d458576074d7c3d7046eb26c

17181815

1.64

382,282.78

232,929.92

Case Untangle Finance:

Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-12 23:14:19 (3206): fixing: 1.038721 -> 1.038721, prev: 1.038827, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 00:14:13 (3207): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 01:14:09 (3208): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 02:14:03 (3209): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 03:14:01 (3210): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 04:13:56 (3211): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.444865
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 05:13:53 (3212): fixing: 1.038931 -> 1.038931, prev: 1.038801, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 06:13:51 (3213): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 07:13:45 (3214): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 08:13:40 (3215): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.468629
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 09:13:37 (3216): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 10:13:27 (3217): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-13 11:13:21 (3218): fixing: 1.039134 -> 1.039134, prev: 1.038439, next: 0.482511
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 00:11:51 (3230): fixing: 0.444865 -> 1.0405275, prev: 1.038721, next: 1.042334
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 01:11:42 (3231): fixing: 0.444865 -> 1.0406325, prev: 1.038931, next: 1.042334
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 02:11:33 (3232): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 03:11:36 (3233): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 04:11:26 (3234): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 05:11:17 (3235): fixing: 0.444865 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 06:11:10 (3236): fixing: 0.468629 -> 1.0407335, prev: 1.038931, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 07:11:01 (3237): fixing: 0.468629 -> 1.040835, prev: 1.039134, next: 1.042536
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 08:10:52 (3238): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 11:10:26 (3239): fixing: 0.468629 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 12:10:18 (3240): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 13:10:09 (3241): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-14 14:10:01 (3242): fixing: 0.482511 -> 1.0406445, prev: 1.039134, next: 1.042155
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 04:08:39 (3254): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 05:08:33 (3255): fixing: 1.042334 -> 1.042334, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 06:08:29 (3256): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 07:08:24 (3257): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 08:08:20 (3258): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 09:08:12 (3259): fixing: 1.042536 -> 1.042536, prev: 0.444865, next: 1.04251
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 10:08:07 (3260): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 11:08:03 (3261): fixing: 1.042536 -> 1.042536, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 12:07:55 (3262): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 13:07:51 (3263): fixing: 1.042155 -> 1.042155, prev: 0.468629, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 14:07:49 (3264): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 15:07:42 (3265): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042519
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-15 16:07:36 (3266): fixing: 1.042155 -> 1.042155, prev: 0.482511, next: 1.042354
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-23 02:04:05 (3433): fixing: 1.036482 -> 1.036482, prev: 1.126302, next: 0.487429
Abnormal share price detected for 42161-0x4a3f7dd63077cde8d7eff3c958eb69a3dd7d31a9 at index 2025-10-24 06:01:21 (3457): fixing: 0.487429 -> 1.041995, prev: 1.036482, next: 1.047508
Parameters

prices_df (pandas.DataFrame) –

Return type

pandas.DataFrame

forward_fill_vault(vault_df)

Forward fill missing vault prices up to max_gap_hours.

  • For displaying, calculating metrics, etc. we want continuous time series

  • Align random sample interval to 1h

Parameters

vault_df (pandas.DataFrame) –

Price data for a single vault.

Assume 1h price data.

Return type

pandas.DataFrame

generate_cleaned_vault_datasets(vault_db_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-metadata-db.pickle'), price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/vault-prices-1h.parquet'), cleaned_price_df_path=PosixPath('/home/runner/.tradingstrategy/vaults/cleaned-vault-prices-1h.parquet'), logger=<built-in function print>, display=<function display>, diagnose_vault_id=None)

A command line script entry point to take raw scanned vault price data and clean it up to a format that can be analysed.

  • Reads vault-prices-1h.parquet and generates vault-prices-1h-cleaned.parquet

  • Calculate returns and various performance metrics to be included with prices data

  • Clean returns from abnormalities

Note

Drops non-stablecoin vaults. The cleaning is currently applicable for stable vaults only.

Parameters

diagnose_vault_id (Optional[str]) –

get_vaults_by_id(rows)

Build a dictionary of vaults by their chain-address id.

Parameters

rows (dict[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]) – Metadata rows from vault database

Returns

Dictionary of vaults by their chain-address id

Return type

dict[str, eth_defi.vault.vaultdb.VaultRow]

process_raw_vault_scan_data(rows, prices_df, logger=<built-in function print>, display=<function <lambda>>, diagnose_vault_id=None)

Preprocess vault data for further analysis.

  • Assign unique names to vaults

  • Add denormalised vault data to prices DataFrame

  • Filter out non-stablecoin vaults

  • Calculate returns, rolling metrics

Parameters
Return type

pandas.DataFrame

remove_inactive_lead_time(prices_df, logger=<built-in function print>)

Remove initial inactive period from each vault’s price history.

  • At the beginning of a vault’s lifecycle, total supply may remain constant while the vault is inactive (e.g., 1, 1000, etc.)

  • When the vault activates, the share price may jump, causing abnormal returns

  • This function removes the initial rows where total_supply hasn’t changed

  • Uses exact equality for comparison

  • Skips initial rows with zero or NaN total_supply to find first valid value

Parameters

prices_df (pandas.DataFrame) – Price data with ‘id’ and ‘total_supply’ columns. Assumes data is sorted by timestamp within each vault.

Returns

DataFrame with inactive lead time removed for each vault

Return type

pandas.DataFrame

sort_and_index_vault_prices(prices_df, priority_ids)

Set up the order of vaults for processing.

  • If we do debugging we want vaults we debug go first, as the pipeline takes several minutes to run

Parameters