media

YouTube Subtitle & Transcript Scraper — JSON, SRT, VTT, LLM

Extract YouTube subtitles and transcripts from videos, Shorts, playlists, and channels as JSON, SRT, VTT, plain text, or clean LLM-ready text. 100+ languages, rich metadata, no API key — and failed extractions are free.

TypeScript InnerTube Global

Key Features

One input handles videos, Shorts, youtu.be links, playlists, and channels — mixed in a single run

Five output formats — JSON (timestamped), SRT, VTT, plain text, and LLM-ready (strips [Music], [Applause], and speaker labels)

100+ languages with a priority-ordered language list and toggleable auto-caption fallback

Rich metadata — title, channel, description, publish date, view count, thumbnail, duration, and available languages

Batch entire playlists and channels with a maxVideos cap and 1–10 concurrency

Residential proxy support plus optional cookies to reduce bot-check blocks

Multi-layer extraction — up to nine fallbacks across InnerTube clients, with a yt-dlp PO-token last resort

Circuit breaker and per-item error handling keep large batches running

Use Cases

  • AI/ML teams building RAG or fine-tuning datasets from spoken video (LLM-ready text output)
  • Content teams repurposing transcripts into blog posts, show notes, and social captions
  • SEO marketers extracting searchable video text for indexing and keyword research
  • Editors and publishers needing standard SRT/VTT subtitle files
  • Researchers batch-collecting transcripts across an entire channel or playlist
  • Developers needing structured, timestamped captions without a YouTube API key

Input Parameters

ParameterTypeRequiredDescription
urlsarrayNoYouTube URLs or bare IDs — videos, Shorts, youtu.be links, playlists, or channels. Required at runtime.
outputFormatstringNoTranscript format: json, srt, vtt, text, or llm (default: json).
languagesarrayNoPreferred subtitle languages in priority order, ISO 639-1 codes (default: en).
includeAutoGeneratedbooleanNoFall back to auto-generated captions when manual ones are missing (default: true).
maxVideosnumberNoCap on videos processed per run, e.g. for playlists/channels (default: 0 = unlimited).
maxConcurrencynumberNoVideos processed in parallel, 1–10 (default: 3).
proxyConfigurationobjectNoProxy settings; defaults to Apify Residential pinned to the US.
youtubeCookiesstringNoOptional YouTube cookies (Cookie header or cookies.txt) to reduce bot-check blocks.

Output Example

 1{
 2  "videoId": "dQw4w9WgXcQ",
 3  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
 4  "title": "Rick Astley - Never Gonna Give You Up (Official Video)",
 5  "channelName": "Rick Astley",
 6  "channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
 7  "publishDate": "2009-10-25",
 8  "viewCount": 1761003712,
 9  "availableLanguages": ["en", "de-DE", "ja", "pt-BR", "es-419"],
10  "language": "en",
11  "isAutoGenerated": false,
12  "duration": 213,
13  "wordCount": 487,
14  "segmentCount": 61,
15  "text": "We're no strangers to love, you know the rules and so do I...",
16  "segments": [{ "text": "We're no strangers to love", "start": 18.64, "end": 21.88 }],
17  "error": null
18}

Pricing

Pay-per-event: billed once per successfully extracted transcript — failed videos are never charged. Feed a whole channel or playlist and pay only for the captions you actually get back.

Tips

  • Use outputFormat: llm for RAG and fine-tuning — it removes non-speech annotations so your embeddings see clean prose.
  • Keep a residential proxy on. YouTube aggressively blocks datacenter IPs; the actor defaults to US residential for a reason.
  • Start maxConcurrency at 1–3 for big jobs and raise it gradually — high concurrency increases rate-limit risk on large channel scrapes.

Frequently Asked Questions

Do I need a YouTube API key or to log in?
No. The actor extracts captions directly — no YouTube Data API key and no login. An optional cookies input can be supplied to reduce 'Sign in to confirm you're not a bot' blocks on some videos.
Which languages and caption types are supported?
Over 100 languages. Provide a priority-ordered list of ISO 639-1 codes (default: en); the actor prefers manually-created captions and falls back to auto-generated ones unless you disable includeAutoGenerated.
Am I billed for videos that fail to extract?
No. Billing is pay-per-event on a single transcript-extracted event, charged only after a transcript is successfully saved. Failed videos appear in the output with an error field and aren't charged.
Can I get clean, LLM-ready text for RAG?
Yes. Set outputFormat to 'llm' to strip [Music]/[Applause] annotations and speaker labels, producing clean prose ideal for embeddings and fine-tuning.

Related Tools

Ready to Extract Data?

Start using YouTube Subtitle & Transcript Scraper — JSON, SRT, VTT, LLM on Apify, or hire me for a custom solution.