media

YouTube Subtitle & Transcript Scraper — JSON, SRT, VTT, LLM

Extract YouTube subtitles and transcripts from videos, Shorts, playlists, and channels as JSON, SRT, VTT, plain text, or clean LLM-ready text. 100+ languages, rich metadata, no API key — and failed extractions are free.

TypeScript InnerTube Global

Try on Apify

Key Features

One input handles videos, Shorts, youtu.be links, playlists, and channels — mixed in a single run

Five output formats — JSON (timestamped), SRT, VTT, plain text, and LLM-ready (strips [Music], [Applause], and speaker labels)

100+ languages with a priority-ordered language list and toggleable auto-caption fallback

Rich metadata — title, channel, description, publish date, view count, thumbnail, duration, and available languages

Batch entire playlists and channels with a maxVideos cap and 1–10 concurrency

Residential proxy support plus optional cookies to reduce bot-check blocks

Multi-layer extraction — up to nine fallbacks across InnerTube clients, with a yt-dlp PO-token last resort

Circuit breaker and per-item error handling keep large batches running

Use Cases

AI/ML teams building RAG or fine-tuning datasets from spoken video (LLM-ready text output)
Content teams repurposing transcripts into blog posts, show notes, and social captions
SEO marketers extracting searchable video text for indexing and keyword research
Editors and publishers needing standard SRT/VTT subtitle files
Researchers batch-collecting transcripts across an entire channel or playlist
Developers needing structured, timestamped captions without a YouTube API key

Input Parameters

Parameter	Type	Required	Description
`urls`	array	No	YouTube URLs or bare IDs — videos, Shorts, youtu.be links, playlists, or channels. Required at runtime.
`outputFormat`	string	No	Transcript format: json, srt, vtt, text, or llm (default: json).
`languages`	array	No	Preferred subtitle languages in priority order, ISO 639-1 codes (default: en).
`includeAutoGenerated`	boolean	No	Fall back to auto-generated captions when manual ones are missing (default: true).
`maxVideos`	number	No	Cap on videos processed per run, e.g. for playlists/channels (default: 0 = unlimited).
`maxConcurrency`	number	No	Videos processed in parallel, 1–10 (default: 3).
`proxyConfiguration`	object	No	Proxy settings; defaults to Apify Residential pinned to the US.
`youtubeCookies`	string	No	Optional YouTube cookies (Cookie header or cookies.txt) to reduce bot-check blocks.

Output Example

 1{
 2  "videoId": "dQw4w9WgXcQ",
 3  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
 4  "title": "Rick Astley - Never Gonna Give You Up (Official Video)",
 5  "channelName": "Rick Astley",
 6  "channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
 7  "publishDate": "2009-10-25",
 8  "viewCount": 1761003712,
 9  "availableLanguages": ["en", "de-DE", "ja", "pt-BR", "es-419"],
10  "language": "en",
11  "isAutoGenerated": false,
12  "duration": 213,
13  "wordCount": 487,
14  "segmentCount": 61,
15  "text": "We're no strangers to love, you know the rules and so do I...",
16  "segments": [{ "text": "We're no strangers to love", "start": 18.64, "end": 21.88 }],
17  "error": null
18}

Pricing

Pay-per-event: billed once per successfully extracted transcript — failed videos are never charged. Feed a whole channel or playlist and pay only for the captions you actually get back.

Tips

Use outputFormat: llm for RAG and fine-tuning — it removes non-speech annotations so your embeddings see clean prose.
Keep a residential proxy on. YouTube aggressively blocks datacenter IPs; the actor defaults to US residential for a reason.
Start maxConcurrency at 1–3 for big jobs and raise it gradually — high concurrency increases rate-limit risk on large channel scrapes.

Frequently Asked Questions

Do I need a YouTube API key or to log in?

No. The actor extracts captions directly — no YouTube Data API key and no login. An optional cookies input can be supplied to reduce 'Sign in to confirm you're not a bot' blocks on some videos.

Which languages and caption types are supported?

Over 100 languages. Provide a priority-ordered list of ISO 639-1 codes (default: en); the actor prefers manually-created captions and falls back to auto-generated ones unless you disable includeAutoGenerated.

Am I billed for videos that fail to extract?

No. Billing is pay-per-event on a single transcript-extracted event, charged only after a transcript is successfully saved. Failed videos appear in the output with an error field and aren't charged.

Can I get clean, LLM-ready text for RAG?

Yes. Set outputFormat to 'llm' to strip [Music]/[Applause] annotations and speaker labels, producing clean prose ideal for embeddings and fine-tuning.

Related Tools

🦋

social

Bluesky Scraper — Posts, Profiles, Feeds & Interactions

Scrape Bluesky posts, profiles, feeds, and interactions.

Learn more