hyperliquid.vault_data_export

Documentation for eth_defi.hyperliquid.vault_data_export Python module.

Export Hyperliquid vault data into the ERC-4626 pipeline format.

This module bridges the Hyperliquid-specific DuckDB data into the formats consumed by the existing ERC-4626 vault metrics pipeline:

  • Synthetic VaultRow entries for the VaultDatabase pickle

  • Raw price DataFrames matching the uncleaned Parquet schema, so that Hypercore data goes through the same cleaning pipeline as EVM vaults

  • Merge functions to append Hyperliquid data into existing files

Example:

from pathlib import Path
from eth_defi.hyperliquid.daily_metrics import HyperliquidDailyMetricsDatabase
from eth_defi.hyperliquid.vault_data_export import merge_into_vault_database, merge_into_uncleaned_parquet

db = HyperliquidDailyMetricsDatabase(Path("daily-metrics.duckdb"))

merge_into_vault_database(db, vault_db_path)
merge_into_uncleaned_parquet(db, uncleaned_parquet_path)

db.close()

Functions

build_raw_prices_dataframe(db)

Build a raw prices DataFrame from the Hyperliquid DuckDB.

build_raw_prices_dataframe_hf(db)

Build a raw prices DataFrame from the HF DuckDB.

create_hyperliquid_vault_row(vault_address, ...)

Create a synthetic VaultRow for a Hyperliquid native vault.

merge_hypercore_prices_to_parquet(parquet_path)

Merge Hypercore price data from one or both DuckDB databases into the Parquet.

merge_into_uncleaned_parquet(db, parquet_path)

Merge Hyperliquid daily prices into the uncleaned Parquet file.

merge_into_vault_database(db, vault_db_path)

Merge Hyperliquid vault metadata into an existing VaultDatabase pickle.

open_and_merge_hypercore_prices(parquet_path)

Open whichever Hyperliquid databases exist and merge into the parquet.

build_raw_prices_dataframe(db)

Build a raw prices DataFrame from the Hyperliquid DuckDB.

Produces rows matching the schema of the EVM vault scanner (export()), so Hypercore data can go through the same cleaning pipeline (process_raw_vault_scan_data()) as ERC-4626 vaults.

The output has timestamp as a column (not index), matching the raw uncleaned Parquet format.

Includes per-row deposit_closed_reason (str or None) and deposits_open (str “true”/”false” or None) columns derived from forward-filled is_closed, allow_deposits, and leader_fraction state columns in the DuckDB.

Also exposes Hyperliquid’s raw cumulative account PnL as account_pnl so downstream consumers can compare the website-style account PnL against the cleaned share-price based return series. follower_count and cumulative_volume are exported as scalar historical fields when available.

Parameters

db (eth_defi.hyperliquid.daily_metrics.HyperliquidDailyMetricsDatabase) – The Hyperliquid daily metrics database.

Returns

DataFrame with columns matching the uncleaned Parquet schema.

Return type

pandas.DataFrame

build_raw_prices_dataframe_hf(db)

Build a raw prices DataFrame from the HF DuckDB.

Exports raw API timestamps without resampling, so the spacing is irregular and reflects the vaultDetails API’s per-period resolution (~20 min for the last 24h, coarsening to ~3h / ~10.5h / ~weekly for older data — ~20 min is the finest the API ever serves). The downstream cleaning pipeline computes returns_1h via pct_change() on consecutive rows — this already works for irregular timestamps (the daily pipeline has always produced ~24h returns labelled returns_1h for Hypercore). The downstream forward_fill_vault() resamples to 1h when needed.

Parameters

db (eth_defi.hyperliquid.high_freq_metrics.HyperliquidHighFreqMetricsDatabase) – The HF metrics database.

Returns

DataFrame matching the uncleaned Parquet schema with raw timestamps.

Return type

pandas.DataFrame

create_hyperliquid_vault_row(vault_address, name, description, tvl, create_time, follower_count=None, is_closed=False, allow_deposits=True, relationship_type='normal', leader_fraction=None, manual_review_status=None)

Create a synthetic VaultRow for a Hyperliquid native vault.

Builds a VaultRow that matches what calculate_vault_record() expects, using the Hypercore synthetic chain ID.

User-created vaults (relationship_type="normal") use the fixed platform performance fee HYPERLIQUID_VAULT_PERFORMANCE_FEE. Protocol vaults (HLP and its children with relationship_type="parent" or "child") have zero fees.

Parameters
  • vault_address (eth_typing.evm.HexAddress) – Vault hex address (will be lowercased).

  • name (str) – Vault display name.

  • description (Optional[str]) – Vault description text.

  • tvl (float) – Current TVL in USD.

  • create_time (Optional[datetime.datetime]) – Vault creation timestamp.

  • follower_count (Optional[int]) – Number of vault depositors.

  • is_closed (bool) – Whether the vault is closed for new deposits.

  • allow_deposits (bool) – Whether the vault allows deposits. A vault can have is_closed=False but allow_deposits=False.

  • relationship_type (str) – Vault relationship type from the API: "normal" for user-created vaults, "parent" for HLP, "child" for HLP sub-vaults.

  • leader_fraction (Optional[float]) – Leader’s fraction of total vault capital (e.g. 0.10 = 10%). Used for _get_deposit_closed_reason() to warn when close to the Hyperliquid 5% minimum.

  • manual_review_status (Optional[eth_defi.hyperliquid.vault_review_sync.ReviewStatus]) – Manual review decision for this vault captured from the Hyperliquid review Google Sheet. Stored on the row so downstream exports (calculate_vault_record → JSON) can surface the decision without re-reading the sheet on every invocation.

Returns

Tuple of (VaultSpec, VaultRow).

Return type

tuple[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]

merge_hypercore_prices_to_parquet(parquet_path, daily_db=None, hf_db=None)

Merge Hypercore price data from one or both DuckDB databases into the Parquet.

Reads data from whichever databases are provided, combines them (deduplicating on (address, timestamp)), removes old chain-9999 rows from the Parquet, and writes the combined result.

This is safe for mode switches: if only the HF database is provided but the daily database also exists, pass both to preserve all historical data. When both databases contain a row for the same vault at the same timestamp, the HF row wins (more recent data).

Daily rows have midnight timestamps (from pd.to_datetime(date)), HF rows have raw API timestamps — they rarely collide.

Parameters
Returns

The combined DataFrame (EVM + Hypercore rows).

Return type

pandas.DataFrame

merge_into_uncleaned_parquet(db, parquet_path)

Merge Hyperliquid daily prices into the uncleaned Parquet file.

Writes Hypercore raw data in the same format as the EVM vault scanner, so the standard cleaning pipeline (process_raw_vault_scan_data()) can process all vaults together.

Reads the existing Parquet, removes any prior Hypercore rows (chain == 9999), appends fresh Hyperliquid daily price rows, and writes back. Idempotent: running twice produces the same result.

If the Parquet file does not exist, creates a new one.

Parameters
Returns

The combined DataFrame.

Return type

pandas.DataFrame

merge_into_vault_database(db, vault_db_path, review_statuses=None)

Merge Hyperliquid vault metadata into an existing VaultDatabase pickle.

Reads the existing pickle, upserts Hyperliquid VaultRow entries (keyed by VaultSpec), and writes back. Idempotent: running twice produces the same result.

If the pickle file does not exist, creates a new VaultDatabase.

The review_statuses argument is how the Hyperliquid review Google Sheet feeds human-entered OK / Avoid decisions into the pickle so downstream consumers (calculate_vault_record → JSON export) can surface them without re-reading the sheet on every invocation.

Behaviour:

  • review_statuses is None (sheet unreachable, credentials missing, or the caller explicitly opted out): the existing _manual_review_status value is carried forward from the previous pickle entry for each vault. This is the “persist if Google Sheets is down” contract — the last known manual review survives an outage.

  • review_statuses is a mapping: the mapped value (including an explicit None for “no review”) is written for every address present in the mapping. Addresses absent from the mapping fall back to the carry-forward path above.

Parameters
Returns

The updated VaultDatabase.

Return type

eth_defi.vault.vaultdb.VaultDatabase

open_and_merge_hypercore_prices(parquet_path, daily_db_path=None, hf_db_path=None)

Open whichever Hyperliquid databases exist and merge into the parquet.

Convenience wrapper around merge_hypercore_prices_to_parquet() that handles opening and closing both databases. Used by standalone scripts and post-processing to avoid duplicating the open/close pattern.

Parameters
  • parquet_path (pathlib.Path) – Path to the uncleaned Parquet file.

  • daily_db_path (Optional[pathlib.Path]) – Path to the daily DuckDB (None uses default, skipped if not on disc).

  • hf_db_path (Optional[pathlib.Path]) – Path to the HF DuckDB (None uses default, skipped if not on disc).

Returns

The combined DataFrame (EVM + Hypercore rows).

Return type

pandas.DataFrame