fetch_documentDownload the raw filed document - PDF / XHTML iXBRL / XML / base64 inline.
The primary tool for reading a filing's content. Pass a document_id from list_filings / get_financials. Mandatory for any substantive answer - filing metadata (dates, form codes, descriptions) alone is rarely enough. Small documents are inlined as bytes; oversized ones return a resource_link plus navigation tools.
| Name | Type | Required | Description |
|---|---|---|---|
jurisdiction | string | yes | ISO code. |
document_id | string | yes | From list_filings or get_financials. |
format | string | no | xhtml / xbrl / pdf - defaults to server's best-fit. |
ES - SpainFI - FinlandGB - United KingdomIS - IcelandJP - JapanKR - South KoreaMC - MonacoMX - MexicoNZ - New ZealandPL - PolandRU - RussiaSE - Swedenlist_filings (every filing record), get_financials (annual accounts), or get_charges (charge filings on GB). Document IDs are jurisdiction-scoped — passing a GB id with jurisdiction='FR' returns 404.
Either inline bytes (base64-encoded under bytes_base64, with chosen_format and size_bytes) for small documents, OR a resource_link to fetch externally if the doc is too large for one tool response. The cutoff depends on the agent's context window — typically ~5–10 MB for Claude / GPT-4. get_document_metadata tells you the size beforehand.
Varies. GB annual accounts since 2014: iXBRL (xhtml+xml) AND PDF. Pre-2014 GB: PDF only. FI: iXBRL only. NL: XBRL only. KR DART: PDF + structured JSON (audit report). Call get_document_metadata first to see available formats and pick deliberately.
iXBRL is machine-readable: every revenue / profit / asset figure is tagged with an XBRL element. Parsing it gives you typed numbers, not OCR. For text-only filings (resolutions, board changes) PDF is fine — XBRL adds no value.
Three options. (1) get_document_navigation to find the outline + recommended page ranges. (2) fetch_document_pages for a specific page range. (3) search_document to locate a phrase, then fetch only those pages. The 'fetch the whole 200-page annual report' anti-pattern wastes context window and is rarely needed.
Server best-fits to the requested format if available, falls back if not. Requesting format='pdf' on an XBRL-only filing returns the XBRL with a warning rather than erroring. To check upfront, call get_document_metadata which lists available formats.
Yes — same document_id served from edge cache (TTL 30 days for closed filings, 1 day for currently-open accounting periods). Most filings are immutable once accepted by the registry, so the cache is safe. fresh=true on the upstream tool that returned the document_id will force-bypass.
Means the filing exists in the index but the document file was withdrawn or never uploaded. Pre-electronic filings (paper-only) frequently fail this way. has_document on the list_filings record is your upfront indicator — if false, fetch_document will 404.
Not server-side — we return the bytes as-is. To OCR scanned filings, run them through your own OCR pipeline (Tesseract, AWS Textract, etc.). For native PDFs (text layer present), use fetch_document_pages with format='xhtml' to get rendered text without OCR.
Yes — counts against the per-tool rate-limit budget. Large documents use one call regardless of size. Page-range fetches via fetch_document_pages each count separately. Enterprise tier removes per-minute caps but a 100-page-by-page sweep still serializes.