feed.database

Documentation for eth_defi.feed.database Python module.

DuckDB persistence for vault post tracking.

Functions

resolve_feed_database_path()

Resolve the vault post feed DuckDB database path.

Classes

CollectedPost

A single normalised post ready for database insertion.

VaultPostDatabase

DuckDB database for tracked sources and collected posts.

class CollectedPost

Bases: object

A single normalised post ready for database insertion.

__init__(external_post_id, title, post_url, published_at, fetched_at, short_description, full_text, ai_summary=None, raw_payload=None)
Parameters
Return type

None

class VaultPostDatabase

Bases: object

DuckDB database for tracked sources and collected posts.

__init__(path)
Parameters

path (pathlib.Path) –

close()

Close the database connection.

Return type

None

fetch_recent_posts_by_feeder(feeder_ids, max_per_feeder=10)

Fetch the most recent posts for each feeder across all source types.

Joins tracked_sources and posts on source_id, ranks posts per feeder by COALESCE(published_at, fetched_at) DESC, and returns the max_per_feeder newest posts per feeder.

Parameters
  • feeder_ids (Iterable[str]) – Iterable of feeder-id slugs to look up.

  • max_per_feeder (int) – Maximum number of posts to return per feeder.

Returns

Dict mapping feeder_id to a list of post dicts with keys title, short_description, full_text, post_url, source_type, published_at (always set via COALESCE fallback to fetched_at). Lists are ordered newest-first.

Return type

dict[str, list[dict]]

get_known_post_ids(source_id=None)

Return all known external_post_id values, optionally filtered by source.

Parameters

source_id (Optional[int]) –

Return type

set[str]

get_posts_df()

Return stored posts for diagnostics.

Return type

pandas.DataFrame

get_source_last_post_timestamps(source_ids)

Return the stored last_post_published_at for the given source IDs.

Used to gate backfill fallbacks: a source whose stored timestamp is not None has already been seen before and does not need a fallback individual timeline read.

Parameters

source_ids (Iterable[int]) – Iterable of numeric source IDs to look up.

Returns

Mapping of source_id last_post_published_at (None when the column has never been set for that row).

Return type

dict[int, datetime.datetime | None]

get_sync_state(key)

Read a value from the feed_sync_state table.

Parameters

key (str) –

Return type

Optional[str]

get_tracked_sources_df()

Return tracked source rows for diagnostics.

Return type

pandas.DataFrame

insert_posts(source_id, posts)

Insert posts for a source and return the number of new rows.

Parameters
Return type

int

mark_source_failure(source_id, error, *, checked_at=None)

Update sync state for a failed or skipped source fetch.

Parameters
Return type

None

mark_source_success(source_id, *, checked_at=None, last_post_published_at=None)

Update sync state for a successful source fetch.

Parameters
Return type

None

prune_posts(max_post_age_days)

Delete posts older than the configured retention period.

Parameters

max_post_age_days (int) –

Return type

int

save()

Force a checkpoint.

Return type

None

set_sync_state(key, value)

Write a value to the feed_sync_state table.

Parameters
  • key (str) –

  • value (str) –

Return type

None

upsert_tracked_source(source)

Insert or update one tracked source and return its source ID.

Parameters

source (eth_defi.feed.sources.TrackedPostSource) –

Return type

int

upsert_tracked_sources(sources)

Insert or update tracked sources and return source IDs by logical key.

Parameters

sources (Iterable[eth_defi.feed.sources.TrackedPostSource]) –

Return type

dict[tuple[str, str, str, str], int]

resolve_feed_database_path()

Resolve the vault post feed DuckDB database path.

Mirrors the resolution used by the post scanner (scripts/erc-4626/scan-vault-posts.py) so the JSON export reads the same database the feed collector writes. Keeping the resolver next to DEFAULT_VAULT_POST_DATABASE avoids each caller repeating the same environment lookup and path expansion logic.

The FEED_DB_PATH override takes precedence, falling back to the DB_PATH variable consumed by the post scanner, then the default path.

Returns

Path from FEED_DB_PATH, then DB_PATH, then the default vault post database path.

Return type

pathlib.Path