yttranscript
Fetch any YouTube video transcript to a .txt file — no API key, no auth, no async runtime
v1.0.0Quick Start
Install via jcli (recommended)
jcli install yttranscript
Or via the installer
curl -fsSL https://cli.johlem.net/tools/yttranscript/install.sh | bash
One-shot smoke test
yttranscript https://www.youtube.com/watch?v=9C7SS019CY4
# → writes 9C7SS019CY4.txt with one segment per line
yttranscript -t -l en -o talk.txt 9C7SS019CY4
# → timestamped lines [HH:MM:SS], explicit output path, bare ID accepted
yttranscript --list-langs https://youtu.be/9C7SS019CY4
# → CODE / TYPE / LANGUAGE table; no file written
What it does
yttranscript pulls the transcript of any YouTube video by scraping the public
watch page, locating the caption-track base URL inside
ytInitialPlayerResponse, then downloading the timedtext payload in YouTube's
JSON3 format. No API key, no OAuth, no async runtime — a single Rust binary built on
blocking reqwest. Output is a UTF-8 .txt file with a small header
and one segment per line, optionally timestamped.
- Five input formats. Standard
watch?v=,youtu.be/short links,/shorts/,/embed/, and bare 11-character IDs all resolve to the same video. - Language fallback. Manual transcript matching
--langwins; otherwise auto-generated match; otherwise first manual track (with a stderr[WARN]); otherwise first auto-generated track. - JSON3 over XML. Appending
&fmt=json3to the timedtext URL gives a much simpler payload than the legacy XML. Drops formatting-only events, collapses inner newlines to spaces. - Deterministic output. Header is fixed (
YouTube Transcript/Video ID/Language), separator is U+2500 × 60, body is one segment per line. - No gates. Read-only HTTPS GETs to two public endpoints —
--i-am-authorizedis not required.
Options
| Flag | What it does |
|---|---|
<URL> | YouTube URL or bare 11-character video ID. Required. |
-o, --output <FILE> | Output file path. Default: <video_id>.txt in CWD. |
-l, --lang <CODE> | Preferred language code (ISO 639-1). Default: en. |
-t, --timestamped | Prefix each line with [HH:MM:SS]. |
--list-langs | Print available transcript languages and exit. |
-h, --help | Print help (with the suite footer). |
-V, --version | Print version. |
Output file shape
YouTube Transcript
Video ID : 9C7SS019CY4
Language : en (manual)
────────────────────────────────────────────────────────────
Hello world this is the first segment.
This is another line of the transcript.
With --timestamped, each line is prefixed with [HH:MM:SS]
derived from the segment's start offset. The separator is U+2500
(─) repeated 60 times.
Exit codes
| Code | Meaning |
|---|---|
0 | Ok |
1 | Runtime failure — network, parse, no transcripts available, write failure |
2 | Usage error (clap) |
Touches / Produces / Gates
- Touches (network, read-only):
https://www.youtube.com/watch?v=<id>(HTML) and the caption track'sbaseUrlon the timedtext endpoint (JSON3). - Touches (filesystem, write): the file at
--output(default<video_id>.txtin CWD). Overwrites without prompting. - Produces: the
.txtfile plus an[OK] Transcript saved → … (N bytes, N segments)line on stdout, followed by the suite footer. - Gates: none.
--i-am-authorizedis not required — both endpoints are public and read-only.
Bot-detection caveat
YouTube occasionally serves a bot-detection challenge instead of the normal watch HTML —
more common from datacenter IPs (CI runners, cloud shells) than residential connections.
When that happens you'll see Failed to parse YouTube player data. on a video
you know has captions. v1.0 does not bypass the challenge: there is no cookie injection,
no proxy support. Try again from a different network. Cookie / proxy support is on the
v1.1 roadmap.
Build from source
cd tools/yttranscript/rust
cargo build --release
cargo test # 17 unit tests
Toolchain pin: Rust 1.85.0. Dependencies are all in the suite-permitted allowlist
(clap, reqwest [blocking + rustls-tls],
serde, serde_json, anyhow,
regex).