hyperliquid.vault_data_export

Documentation for eth_defi.hyperliquid.vault_data_export Python module.

Export Hyperliquid vault data into the ERC-4626 pipeline format.

This module bridges the Hyperliquid-specific DuckDB data into the formats consumed by the existing ERC-4626 vault metrics pipeline:

Synthetic VaultRow entries for the VaultDatabase pickle
Raw price DataFrames matching the uncleaned Parquet schema, so that Hypercore data goes through the same cleaning pipeline as EVM vaults
Merge functions to append Hyperliquid data into existing files

Example:

from pathlib import Path
from eth_defi.hyperliquid.daily_metrics import HyperliquidDailyMetricsDatabase
from eth_defi.hyperliquid.vault_data_export import merge_into_vault_database, merge_into_uncleaned_parquet

db = HyperliquidDailyMetricsDatabase(Path("daily-metrics.duckdb"))

merge_into_vault_database(db, vault_db_path)
merge_into_uncleaned_parquet(db, uncleaned_parquet_path)

db.close()

Module Attributes

LEADER_FRACTION_WARNING_THRESHOLD

If the leader's share of vault capital drops below this threshold, we warn that new deposits may not be accepted because the leader must maintain at least 5% of total vault capital.

Functions

`build_hypercore_prices_dataframe`([daily_db, ...])	Build a deduplicated Hypercore price DataFrame from available databases.
`build_raw_prices_dataframe`(db)	Build a raw prices DataFrame from the Hyperliquid DuckDB.
`build_raw_prices_dataframe_hf`(db)	Build a raw prices DataFrame from the HF DuckDB.
`create_hyperliquid_vault_row`(vault_address, ...)	Create a synthetic VaultRow for a Hyperliquid native vault.
`merge_hypercore_prices_to_parquet`(parquet_path)	Merge Hypercore price data from one or both DuckDB databases into the Parquet.
`merge_into_uncleaned_parquet`(db, parquet_path)	Merge Hyperliquid daily prices into the uncleaned Parquet file.
`merge_into_vault_database`(db, vault_db_path)	Merge Hyperliquid vault metadata into an existing VaultDatabase pickle.
`open_and_merge_hypercore_prices`(parquet_path)	Open whichever Hyperliquid databases exist and merge into the parquet.

LEADER_FRACTION_WARNING_THRESHOLD: float = 0.055

If the leader’s share of vault capital drops below this threshold, we warn that new deposits may not be accepted because the leader must maintain at least 5% of total vault capital.

The threshold is set 0.5% above the Hyperliquid minimum (5%) to give an early warning before deposits are actually blocked.

Source: https://hyperliquid.gitbook.io/hyperliquid-docs/hypercore/vaults/for-vault-leaders-legacy Verified: 2026-03-09

create_hyperliquid_vault_row(vault_address, name, description, tvl, create_time, follower_count=None, is_closed=False, allow_deposits=True, relationship_type='normal', leader_fraction=None, manual_review_status=None)

Create a synthetic VaultRow for a Hyperliquid native vault.

Builds a VaultRow that matches what calculate_vault_record() expects, using the Hypercore synthetic chain ID.

User-created vaults (relationship_type="normal") use the fixed platform performance fee HYPERLIQUID_VAULT_PERFORMANCE_FEE. Protocol vaults (HLP and its children with relationship_type="parent" or "child") have zero fees.

Parameters

vault_address (eth_typing.evm.HexAddress) – Vault hex address (will be lowercased).
name (str) – Vault display name.
description (Optional[str]) – Vault description text.
tvl (float) – Current TVL in USD.
create_time (Optional[datetime.datetime]) – Vault creation timestamp.
follower_count (Optional[int]) – Number of vault depositors.
is_closed (bool) – Whether the vault is closed for new deposits.
allow_deposits (bool) – Whether the vault allows deposits. A vault can have is_closed=False but allow_deposits=False.
relationship_type (str) – Vault relationship type from the API: "normal" for user-created vaults, "parent" for HLP, "child" for HLP sub-vaults.
leader_fraction (Optional[float]) – Leader’s fraction of total vault capital (e.g. 0.10 = 10%). Used for _get_deposit_closed_reason() to warn when close to the Hyperliquid 5% minimum.
manual_review_status (Optional[eth_defi.hyperliquid.vault_review_sync.ReviewStatus]) – Manual review decision for this vault captured from the Hyperliquid review Google Sheet. Stored on the row so downstream exports (calculate_vault_record → JSON) can surface the decision without re-reading the sheet on every invocation.

Returns

Tuple of (VaultSpec, VaultRow).

Return type

tuple[eth_defi.vault.base.VaultSpec, eth_defi.vault.vaultdb.VaultRow]

build_raw_prices_dataframe(db)

Build a raw prices DataFrame from the Hyperliquid DuckDB.

Produces rows matching the schema of the EVM vault scanner (export()), so Hypercore data can go through the same cleaning pipeline (process_raw_vault_scan_data()) as ERC-4626 vaults.

The output has timestamp as a column (not index), matching the raw uncleaned Parquet format.

Includes per-row deposit_closed_reason (str or None) and deposits_open (str “true”/”false” or None) columns derived from forward-filled is_closed, allow_deposits, and leader_fraction state columns in the DuckDB.

Also exposes Hyperliquid’s raw cumulative account PnL as account_pnl so downstream consumers can compare the website-style account PnL against the cleaned share-price based return series. follower_count and cumulative_volume are exported as scalar historical fields when available.

Parameters: db (eth_defi.hyperliquid.daily_metrics.HyperliquidDailyMetricsDatabase) – The Hyperliquid daily metrics database.
Returns: DataFrame with columns matching the uncleaned Parquet schema.
Return type: pandas.DataFrame

merge_into_vault_database(db, vault_db_path, review_statuses=None)

Merge Hyperliquid vault metadata into an existing VaultDatabase pickle.

Reads the existing pickle, upserts Hyperliquid VaultRow entries (keyed by VaultSpec), and writes back. Idempotent: running twice produces the same result.

If the pickle file does not exist, creates a new VaultDatabase.

The review_statuses argument is how the Hyperliquid review Google Sheet feeds human-entered OK / Avoid decisions into the pickle so downstream consumers (calculate_vault_record → JSON export) can surface them without re-reading the sheet on every invocation.

Behaviour:

review_statuses is None (sheet unreachable, credentials missing, or the caller explicitly opted out): the existing _manual_review_status value is carried forward from the previous pickle entry for each vault. This is the “persist if Google Sheets is down” contract — the last known manual review survives an outage.
review_statuses is a mapping: the mapped value (including an explicit None for “no review”) is written for every address present in the mapping. Addresses absent from the mapping fall back to the carry-forward path above.

Parameters

db (eth_defi.hyperliquid.daily_metrics.HyperliquidDailyMetricsDatabase) – The Hyperliquid daily metrics database.
vault_db_path (pathlib.Path) – Path to the VaultDatabase pickle file.
review_statuses (Optional[collections.abc.Mapping[eth_typing.evm.HexAddress, Optional[eth_defi.hyperliquid.vault_review_sync.ReviewStatus]]]) – Optional mapping from lowercased vault address to the latest manual review decision read from the Google Sheet.

Returns

The updated VaultDatabase.

Return type

eth_defi.vault.vaultdb.VaultDatabase

merge_into_uncleaned_parquet(db, parquet_path)

Merge Hyperliquid daily prices into the uncleaned Parquet file.

Writes Hypercore raw data in the same format as the EVM vault scanner, so the standard cleaning pipeline (process_raw_vault_scan_data()) can process all vaults together.

Reads the existing Parquet, removes any prior Hypercore rows (chain == 9999), appends fresh Hyperliquid daily price rows, and writes back. Idempotent: running twice produces the same result.

If the Parquet file does not exist, creates a new one.

Parameters

db (eth_defi.hyperliquid.daily_metrics.HyperliquidDailyMetricsDatabase) – The Hyperliquid daily metrics database.
parquet_path (pathlib.Path) – Path to the uncleaned Parquet file (typically vault-prices-1h.parquet).

Returns

The combined DataFrame.

Return type

pandas.DataFrame

build_raw_prices_dataframe_hf(db)

Build a raw prices DataFrame from the HF DuckDB.

Exports raw API timestamps without resampling, so the spacing is irregular and reflects the vaultDetails API’s per-period resolution (~20 min for the last 24h, coarsening to ~3h / ~10.5h / ~weekly for older data — ~20 min is the finest the API ever serves). The downstream cleaning pipeline computes returns_1h via pct_change() on consecutive rows — this already works for irregular timestamps (the daily pipeline has always produced ~24h returns labelled returns_1h for Hypercore). The downstream forward_fill_vault() resamples to 1h when needed.

Parameters: db (eth_defi.hyperliquid.high_freq_metrics.HyperliquidHighFreqMetricsDatabase) – The HF metrics database.
Returns: DataFrame matching the uncleaned Parquet schema with raw timestamps.
Return type: pandas.DataFrame

build_hypercore_prices_dataframe(daily_db=None, hf_db=None)

Build a deduplicated Hypercore price DataFrame from available databases.

Daily and high-frequency data are combined without writing a Parquet file. This lets the full post-processing pipeline batch Hypercore with other native-protocol sources in one Parquet rewrite, while standalone callers can still use merge_hypercore_prices_to_parquet().

When both databases contain a row for the same (address, timestamp), the high-frequency row wins because it is the more granular source.

Parameters

daily_db (Optional[eth_defi.hyperliquid.daily_metrics.HyperliquidDailyMetricsDatabase]) – Open daily metrics database, if available.
hf_db (Optional[eth_defi.hyperliquid.high_freq_metrics.HyperliquidHighFreqMetricsDatabase]) – Open high-frequency metrics database, if available.

Returns

Deduplicated Hypercore raw prices, or an empty DataFrame when neither database has price data.

Return type

pandas.DataFrame

merge_hypercore_prices_to_parquet(parquet_path, daily_db=None, hf_db=None)

Merge Hypercore price data from one or both DuckDB databases into the Parquet.

Reads data from whichever databases are provided, combines them (deduplicating on (address, timestamp)), removes old chain-9999 rows from the Parquet, and writes the combined result.

This is safe for mode switches: if only the HF database is provided but the daily database also exists, pass both to preserve all historical data. When both databases contain a row for the same vault at the same timestamp, the HF row wins (more recent data).

Daily rows have midnight timestamps (from pd.to_datetime(date)), HF rows have raw API timestamps — they rarely collide.

Parameters

parquet_path (pathlib.Path) – Path to the uncleaned Parquet file.
daily_db (Optional[eth_defi.hyperliquid.daily_metrics.HyperliquidDailyMetricsDatabase]) – Daily metrics database (optional).
hf_db (Optional[eth_defi.hyperliquid.high_freq_metrics.HyperliquidHighFreqMetricsDatabase]) – High-frequency metrics database (optional).

Returns

The combined DataFrame (EVM + Hypercore rows).

Return type

pandas.DataFrame

open_and_merge_hypercore_prices(parquet_path, daily_db_path=None, hf_db_path=None)

Open whichever Hyperliquid databases exist and merge into the parquet.

Convenience wrapper around merge_hypercore_prices_to_parquet() that handles opening and closing both databases. Used by standalone scripts and post-processing to avoid duplicating the open/close pattern.

Parameters

parquet_path (pathlib.Path) – Path to the uncleaned Parquet file.
daily_db_path (Optional[pathlib.Path]) – Path to the daily DuckDB (None uses default, skipped if not on disc).
hf_db_path (Optional[pathlib.Path]) – Path to the HF DuckDB (None uses default, skipped if not on disc).

Returns

The combined DataFrame (EVM + Hypercore rows).

Return type

pandas.DataFrame