financial

SEC EDGAR Scraper — Filings, Full-Text & XBRL Financials

Extract SEC EDGAR filing metadata, full-text 10-K/10-Q sections, EDGAR full-text search results, and XBRL financial facts as clean, RAG-ready JSON. No API key required.

TypeScript Cheerio United States

Key Features

Four modes in one actor — filing metadata, full document text, EDGAR full-text search, and XBRL financial facts

RAG-ready chunking — section, paragraph (~2000 chars), or none; every chunk tagged with its source Item and order

Automatic 'Item N' section parsing for 10-K/10-Q (Item 1A Risk Factors, Item 7 MD&A, and more)

XBRL facts with taxonomy, tag, label, unit, value, fiscal year/period, form, and accession number

Fact deduplication collapses XBRL restatements to one row per period, keeping the earliest disclosure

Company resolution by ticker, CIK, or name against SEC's official ticker file, with fuzzy fallback

EDGAR full-text search with quoted phrases, form-type, and date-range filters (2001 → present)

SEC-compliant rate limiting and User-Agent — no API key and no login required

Use Cases

  • Financial RAG / LLM pipelines needing section-chunked, embedding-ready 10-K and 10-Q text
  • Investment research — pull revenue, net income, and diluted EPS time series as clean XBRL rows
  • Compliance & ownership monitoring — Form 4, SC 13D/13G, and risk-factor language across companies
  • Fintech products that need structured EDGAR data without building a custom crawler
  • Market and industry research via full-text search — who discloses a phrase, and since when
  • BI and spreadsheet workflows consuming clean XBRL financial fact rows

Input Parameters

ParameterTypeRequiredDescription
modestringYesExtraction mode: filings, fulltext, search, or facts (default: filings).
tickerstringNoStock ticker, e.g. AAPL. One of ticker/cik/companyName is required at runtime.
cikstringNoSEC Central Index Key, e.g. 320193 — takes precedence over ticker and companyName.
companyNamestringNoSEC registrant name, exact or fuzzy-matched against the official ticker file.
querystringNoEDGAR full-text search expression, e.g. "supply chain disruption" — required for search mode.
formTypesarrayNoForm types to include, e.g. 10-K, 10-Q, 8-K, S-1, DEF 14A, Form 4. Empty = all forms.
chunkingstringNoFulltext RAG strategy: section, paragraph (~2000 chars), or none (default: section).
maxItemsnumberNoMax items saved; each saved item is one billed event (default: 100).

Output Example

 1{
 2  "itemType": "filing-fulltext",
 3  "title": "Apple Inc. — 10-K 2025-10-31",
 4  "company": "Apple Inc.",
 5  "ticker": "AAPL",
 6  "cik": "0000320193",
 7  "formType": "10-K",
 8  "filedAt": "2025-10-31",
 9  "accessionNo": "0000320193-25-000123",
10  "documentUrl": "https://www.sec.gov/Archives/edgar/data/320193/...",
11  "sections": [
12    { "name": "Item 1A — Risk Factors", "charCount": 38241 },
13    { "name": "Item 7 — Management's Discussion and Analysis", "charCount": 21077 }
14  ],
15  "chunks": [
16    { "text": "The Company's business, reputation, results of operations...", "section": "Item 1A — Risk Factors", "order": 12 }
17  ],
18  "textLength": 220151
19}

Pricing

Pay-per-event — you’re billed only for items actually saved:

EventPriceWhat it covers
Filing metadata / search hit$0.001One filing metadata record or full-text search result
Full-text filing$0.005One filing extracted, section-parsed, and chunked for RAG
XBRL fact$0.0002One XBRL financial fact row

A 1,000-filing metadata pull is ~$1.00; 1,000 XBRL facts is ~$0.20. maxItems caps both volume and cost.

Tips

  • Start with facts mode for financials. Pulling XBRL rows (revenue, net income, EPS) is far cheaper and cleaner than parsing full filings when you only need the numbers.
  • Use chunking: section for RAG. It keeps each Item (Risk Factors, MD&A) intact so retrieval returns coherent, citable passages.
  • Full-text search covers 2001 onward. For older disclosures, resolve the company by CIK and pull filing metadata directly.

Frequently Asked Questions

Do I need an SEC API key?
No. SEC EDGAR is free public data. The actor identifies itself with a compliant User-Agent and self-throttles under SEC's rate limit automatically — no key and no login.
How far back does the data go?
EDGAR full-text search (EFTS) covers filings from 2001-05-04 onward and is capped at 10,000 hits per query. Filing metadata goes back to 1994, and XBRL financial facts cover everything the company has reported in XBRL.
Is the output ready for LLMs and RAG?
Yes. In fulltext mode the actor parses 10-K/10-Q into 'Item N' sections and emits embedding-ready chunks (section or ~2000-char paragraph), each tagged with its source Item and order index.
Why are some filings skipped in fulltext mode?
Filings whose primary document isn't HTML or TXT (for example, XBRL-only Form 4 XML) can't be section-parsed, so they're skipped — and not charged.

Related Tools

Ready to Extract Data?

Start using SEC EDGAR Scraper — Filings, Full-Text & XBRL Financials on Apify, or hire me for a custom solution.