Is OpenCorporates data fresh?
OpenCorporates' value model is breadth, and keeping ~145 jurisdictions live by hand is hard. The model OC ships is scrape-and-cache. Each register has its own crawl schedule, and what you query is the most recent crawl, not the upstream record at the second of your call. For some workflows that is fine. For onboarding, sanctions screening, fraud, and post-incident KYB refresh, it is not. OpenRegistry calls the upstream registry on every request.
What "live" means here
Every OpenRegistry tool is a real-time HTTP request to the official
government registry's API or portal at the moment your client calls it.
A short performance cache keeps a hot company page off the upstream
when a hundred concurrent requests want it; that cache is measured in
minutes and you can bypass it with fresh=true. There is no
daily, weekly, or monthly crawl. The data is as current as the
registry's own record at the moment you ask.
When the lag bites
- KYB at customer onboarding. A company that was struck off this morning should not still read "active" because the aggregator's crawler runs on Tuesdays.
- Sanctions and PEP screening. A new director appointment from two days ago has to surface. An old crawl misses it.
- Adverse-event response. A counterparty just filed for liquidation. The analyst running the post-incident review cannot wait for the next refresh cycle.
- Time-sensitive M&A diligence. Filing currency, latest charges, latest CS01. All of these change daily, and the cache window is the failure mode.
When it doesn't
- Aggregate research. "How many companies in jurisdiction X file a SIC code Y." A snapshot from last quarter is fine.
- Bulk lead lists and B2B targeting. A few weeks of staleness rarely changes the result.
- Historical analysis. Looking at how a sector evolved over years. The cache lag is irrelevant.
For those workloads, OpenCorporates' breadth and bulk-export licence are a better fit than ours. We are not pretending otherwise.
How freshness composes with raw payload
A normalised aggregator that is also stale has two failure modes layered on top of each other. The data may be old, and the schema mapping may have lost a field. OpenRegistry returns the raw upstream payload on every call, so the agent sees the registry's own field names and values at the moment of query. One layer of opinion, not two.
The reasoning behind keeping the payload raw is in "Why OpenRegistry returns raw upstream data".
How to check yourself
Pick a UK company that filed a CS01 today. Read its
last_full_members_list_date on the live Companies House
service, then on OpenCorporates, then through OpenRegistry's
get_company_profile. The first and third agree minute to
minute. The second depends on when OC last crawled the source.