yttranscript

Fetch any YouTube video transcript to a .txt file — no API key, no auth, no async runtime

v1.0.0

Linux

Quick Start

Install via jcli (recommended)

jcli install yttranscript

Or via the installer

curl -fsSL https://cli.johlem.net/tools/yttranscript/install.sh | bash

One-shot smoke test

yttranscript https://www.youtube.com/watch?v=9C7SS019CY4
# → writes 9C7SS019CY4.txt with one segment per line

yttranscript -t -l en -o talk.txt 9C7SS019CY4
# → timestamped lines [HH:MM:SS], explicit output path, bare ID accepted

yttranscript --list-langs https://youtu.be/9C7SS019CY4
# → CODE / TYPE / LANGUAGE table; no file written

What it does

yttranscript pulls the transcript of any YouTube video by scraping the public watch page, locating the caption-track base URL inside ytInitialPlayerResponse, then downloading the timedtext payload in YouTube's JSON3 format. No API key, no OAuth, no async runtime — a single Rust binary built on blocking reqwest. Output is a UTF-8 .txt file with a small header and one segment per line, optionally timestamped.

Five input formats. Standard watch?v=, youtu.be/ short links, /shorts/, /embed/, and bare 11-character IDs all resolve to the same video.
Language fallback. Manual transcript matching --lang wins; otherwise auto-generated match; otherwise first manual track (with a stderr [WARN]); otherwise first auto-generated track.
JSON3 over XML. Appending &fmt=json3 to the timedtext URL gives a much simpler payload than the legacy XML. Drops formatting-only events, collapses inner newlines to spaces.
Deterministic output. Header is fixed (YouTube Transcript / Video ID / Language), separator is U+2500 × 60, body is one segment per line.
No gates. Read-only HTTPS GETs to two public endpoints — --i-am-authorized is not required.

Options

Flag	What it does
`<URL>`	YouTube URL or bare 11-character video ID. Required.
`-o, --output <FILE>`	Output file path. Default: `<video_id>.txt` in CWD.
`-l, --lang <CODE>`	Preferred language code (ISO 639-1). Default: `en`.
`-t, --timestamped`	Prefix each line with `[HH:MM:SS]`.
`--list-langs`	Print available transcript languages and exit.
`-h, --help`	Print help (with the suite footer).
`-V, --version`	Print version.

Output file shape

YouTube Transcript
Video ID : 9C7SS019CY4
Language : en (manual)
────────────────────────────────────────────────────────────

Hello world this is the first segment.
This is another line of the transcript.

With --timestamped, each line is prefixed with [HH:MM:SS] derived from the segment's start offset. The separator is U+2500 (─) repeated 60 times.

Exit codes

Code	Meaning
`0`	Ok
`1`	Runtime failure — network, parse, no transcripts available, write failure
`2`	Usage error (clap)

Touches / Produces / Gates

Touches (network, read-only): https://www.youtube.com/watch?v=<id> (HTML) and the caption track's baseUrl on the timedtext endpoint (JSON3).
Touches (filesystem, write): the file at --output (default <video_id>.txt in CWD). Overwrites without prompting.
Produces: the .txt file plus an [OK] Transcript saved → … (N bytes, N segments) line on stdout, followed by the suite footer.
Gates: none. --i-am-authorized is not required — both endpoints are public and read-only.

Bot-detection caveat

YouTube occasionally serves a bot-detection challenge instead of the normal watch HTML — more common from datacenter IPs (CI runners, cloud shells) than residential connections. When that happens you'll see Failed to parse YouTube player data. on a video you know has captions. v1.0 does not bypass the challenge: there is no cookie injection, no proxy support. Try again from a different network. Cookie / proxy support is on the v1.1 roadmap.

Build from source

cd tools/yttranscript/rust
cargo build --release
cargo test                                  # 17 unit tests

Toolchain pin: Rust 1.85.0. Dependencies are all in the suite-permitted allowlist (clap, reqwest [blocking + rustls-tls], serde, serde_json, anyhow, regex).