feed.collector
Documentation for eth_defi.feed.collector Python module.
Vault post collection and feed normalisation.
Functions
|
Build live feed URLs for a LinkedIn company id. |
|
Build live feed URLs for a Twitter handle. |
|
Collect posts for all configured sources and persist them in DuckDB. |
|
Collect posts for one tracked source. |
|
Collect Twitter/X posts through a single X list timeline read. |
Fetch an optional Webshare proxy rotator for feed fetching. |
|
Backwards-compatible alias for |
Classes
Detailed collection result for one tracked source. |
|
Summary counters for one collector run. |
Exceptions
Raised when every bridge URL for a social feed source fails. |
- exception AllBridgesFailedError
Bases:
RuntimeErrorRaised when every bridge URL for a social feed source fails.
- Parameters
source_label – Human-readable source type label for error messages.
canonical_url – Canonical source URL for diagnostics.
bridge_errors – List of
(url, http_status_or_none)for each attempt.Nonefor the status code indicates a non-HTTP failure such as a timeout.
- __init__(source_label, canonical_url, bridge_errors)
- __new__(**kwargs)
- add_note(note, /)
Add a note to the exception
- property indicates_auth_block: bool
Return True when at least one bridge returned HTTP 503 (LinkedIn auth barrier).
When all bridges fail and at least one specifically returns 503, LinkedIn is most likely redirecting unauthenticated requests to the login page for this company. Bridges that are simply down (502 or connection error) do not indicate anything about LinkedIn’s stance on the company page, so they are not required to return 503.
- with_traceback(tb, /)
Set self.__traceback__ to tb and return self.
- class CollectedSourceResult
Bases:
objectDetailed collection result for one tracked source.
- __init__(feeder_id, name, role, source_type, status, posts_fetched=0, posts_inserted=0, last_post_published_at=None, error=None, auth_blocked=False)
- class CollectorRunSummary
Bases:
objectSummary counters for one collector run.
- __init__(sources_loaded=0, sources_succeeded=0, sources_failed=0, feeders_skipped=0, posts_fetched=0, posts_inserted=0, source_results=None, twitter_method=None, rss_duration_seconds=None, linkedin_duration_seconds=None, twitter_duration_seconds=None, total_duration_seconds=None)
- build_linkedin_rss_feed_urls(company_id, url_templates)
Build live feed URLs for a LinkedIn company id.
- build_twitter_rss_feed_urls(handle, base_urls, *, url_templates=None)
Build live feed URLs for a Twitter handle.
- collect_posts(db, sources, *, max_posts_per_source=20, max_workers=8, request_timeout=20.0, request_delay_seconds=1.0, twitter_rss_base_urls=None, twitter_url_templates=None, linkedin_url_templates=None, proxy_rotator=None, max_proxy_rotations=3, twitter_bearer_token=None, twitter_user_cache=None, label='')
Collect posts for all configured sources and persist them in DuckDB.
- Parameters
sources (Sequence[eth_defi.feed.sources.TrackedPostSource]) –
max_posts_per_source (int) –
max_workers (int) –
request_timeout (float) –
request_delay_seconds (float) –
proxy_rotator (Optional[eth_defi.event_reader.webshare.ProxyRotator]) –
max_proxy_rotations (int) –
twitter_user_cache (Optional[eth_defi.feed.twitter_api.TwitterUserCache]) –
label (str) –
- Return type
- collect_posts_for_source(source, *, max_posts_per_source, request_timeout, twitter_rss_base_urls, twitter_url_templates=None, linkedin_url_templates=None, proxy_rotator=None, max_proxy_rotations=3, twitter_bearer_token=None, twitter_user_cache=None)
Collect posts for one tracked source.
- Parameters
source (eth_defi.feed.sources.TrackedPostSource) –
max_posts_per_source (int) –
request_timeout (float) –
proxy_rotator (Optional[eth_defi.event_reader.webshare.ProxyRotator]) –
max_proxy_rotations (int) –
twitter_user_cache (Optional[eth_defi.feed.twitter_api.TwitterUserCache]) –
- Return type
- collect_twitter_list_posts(db, sources, *, list_id, bearer_token, twitter_user_cache, max_tweets, fallback_max_tweets=5, label='Twitter list')
Collect Twitter/X posts through a single X list timeline read.
The list timeline API returns tweets across all list members in reverse chronological order. This lets production collection avoid one API call per tracked account while still storing posts under the account-specific tracked source rows.
When a handle has no tweets in the list timeline and has no stored
last_post_published_at(i.e. it is a brand-new handle whose first scan returned nothing), the collector falls back to a single individual timeline read. This seeds the timestamp and stores a small number of recent posts without firing per-account API calls on steady-state runs where the list stopped early because all recent tweets were already known.- Parameters
db (eth_defi.feed.database.VaultPostDatabase) – Vault post database.
sources (Sequence[eth_defi.feed.sources.TrackedPostSource]) – Twitter tracked sources whose handles are represented in the X list.
list_id (str) – Numeric X list ID.
bearer_token (str) – X API bearer token used for list timeline reads.
twitter_user_cache (eth_defi.feed.twitter_api.TwitterUserCache) – Cache containing handle-to-user-ID mappings.
max_tweets (int) – Maximum tweets to read from the list timeline.
fallback_max_tweets (int) – Maximum tweets to fetch per account when the list timeline returns zero results for that handle. Used to populate
last_post_published_atfor inactive accounts. Defaults to 5.label (str) – Dashboard label for this collection phase.
- Returns
Collector run summary with per-source insert counters.
- Return type
- fetch_feed_proxy_rotator()
Fetch an optional Webshare proxy rotator for feed fetching.
- Return type
Optional[eth_defi.event_reader.webshare.ProxyRotator]
- load_feed_proxy_rotator()
Backwards-compatible alias for
fetch_feed_proxy_rotator().- Return type
Optional[eth_defi.event_reader.webshare.ProxyRotator]