Documents

fetch_document

Download the raw filed document - PDF / XHTML iXBRL / XML / base64 inline.

The primary tool for reading a filing's content. Pass a document_id from list_filings / get_financials. Mandatory for any substantive answer - filing metadata (dates, form codes, descriptions) alone is rarely enough. Small documents are inlined as bytes; oversized ones return a resource_link plus navigation tools.

Parameters

NameTypeRequiredDescription
jurisdictionstringyesISO code.
document_idstringyesFrom list_filings or get_financials.
formatstringnoxhtml / xbrl / pdf - defaults to server's best-fit.

Supported jurisdictions (12)

Related tools

Frequently asked questions

Where do document_ids come from?

list_filings (every filing record), get_financials (annual accounts), or get_charges (charge filings on GB). Document IDs are jurisdiction-scoped — passing a GB id with jurisdiction='FR' returns 404.

What does the response look like?

Either inline bytes (base64-encoded under bytes_base64, with chosen_format and size_bytes) for small documents, OR a resource_link to fetch externally if the doc is too large for one tool response. The cutoff depends on the agent's context window — typically ~5–10 MB for Claude / GPT-4. get_document_metadata tells you the size beforehand.

What formats exist for one document?

Varies. GB annual accounts since 2014: iXBRL (xhtml+xml) AND PDF. Pre-2014 GB: PDF only. FI: iXBRL only. NL: XBRL only. KR DART: PDF + structured JSON (audit report). Call get_document_metadata first to see available formats and pick deliberately.

Why iXBRL instead of PDF for financials?

iXBRL is machine-readable: every revenue / profit / asset figure is tagged with an XBRL element. Parsing it gives you typed numbers, not OCR. For text-only filings (resolutions, board changes) PDF is fine — XBRL adds no value.

How do I handle documents > 5 MB?

Three options. (1) get_document_navigation to find the outline + recommended page ranges. (2) fetch_document_pages for a specific page range. (3) search_document to locate a phrase, then fetch only those pages. The 'fetch the whole 200-page annual report' anti-pattern wastes context window and is rarely needed.

Why is the format parameter ignored?

Server best-fits to the requested format if available, falls back if not. Requesting format='pdf' on an XBRL-only filing returns the XBRL with a warning rather than erroring. To check upfront, call get_document_metadata which lists available formats.

Are documents cached?

Yes — same document_id served from edge cache (TTL 30 days for closed filings, 1 day for currently-open accounting periods). Most filings are immutable once accepted by the registry, so the cache is safe. fresh=true on the upstream tool that returned the document_id will force-bypass.

What if the registry returns 404?

Means the filing exists in the index but the document file was withdrawn or never uploaded. Pre-electronic filings (paper-only) frequently fail this way. has_document on the list_filings record is your upfront indicator — if false, fetch_document will 404.

Can I get OCR / extracted text from a scanned PDF?

Not server-side — we return the bytes as-is. To OCR scanned filings, run them through your own OCR pipeline (Tesseract, AWS Textract, etc.). For native PDFs (text layer present), use fetch_document_pages with format='xhtml' to get rendered text without OCR.

Is access metered?

Yes — counts against the per-tool rate-limit budget. Large documents use one call regardless of size. Page-range fetches via fetch_document_pages each count separately. Enterprise tier removes per-minute caps but a 100-page-by-page sweep still serializes.