mediagrab
Generic web media downloader — find video/audio on any standards-compliant page, save as open-source container
v1.0.0Quick Start
Install via jcli (recommended)
jcli install mediagrab
Or via the installer
curl -fsSL https://cli.johlem.net/tools/mediagrab/install.sh | bash
One-shot smoke test
mediagrab probe https://archive.org/download/BigBuckBunny_124/Content/big_buck_bunny_720p_surround.mp4
# → detects direct .mp4, shows file size, no download
mediagrab video -o bbb.mkv https://archive.org/.../big_buck_bunny_720p_surround.mp4
# → downloads + transcodes mp4 → mkv via ffmpeg + progress bar on stderr
mediagrab audio -f mp3 -b 192k -o bbb.mp3 https://archive.org/.../big_buck_bunny_720p_surround.mp4
# → downloads + extracts audio + transcodes to MP3 at 192kbps
What it does
mediagrab detects video and audio media on any standards-compliant
web page or major video host, picks the highest-quality stream by default, and
writes the result to an open-source container of your choice. Single Rust binary.
No async runtime. No Python. No yt-dlp dependency.
The architecture is an extractor trait with a registry of site-specific and generic-fallback implementations. Adding a new site is one new file; the dispatch loop walks the registry, first match wins.
Coverage
| Source | How it's detected | Notes |
|---|---|---|
| Direct file URLs (.mp4, .webm, .mkv, .m3u8, .mpd, .mp3, .opus, …) | URL extension + HEAD Content-Type gate |
Pages that look like media URLs but serve HTML are correctly rejected and fall through to the next extractor. |
| YouTube (youtube.com, youtu.be, /shorts/, /embed/) | ytInitialPlayerResponse.streamingData.formats + adaptiveFormats |
Streams that require signatureCipher / nsig decoding are skipped with a clear error. Many older or shorts videos work; some newer ones don't. |
| Vimeo | player.vimeo.com/video/<id>/config JSON |
Progressive + HLS variants. |
| Any standards-compliant page | <video src>, <source src>, og:video, twitter:player:stream, <link rel="alternate" type="application/x-mpegURL|dash+xml">, JSON-LD VideoObject.contentUrl, inline .m3u8/.mpd URLs |
The long tail. Wikipedia, news sites, blogs, CMS embeds all generally work. |
Out of scope: DRM (HLS AES-128, Widevine), YouTube signatureCipher streams. These produce a clear error message — no silent failure.
Subcommands
| Command | What it does |
|---|---|
mediagrab probe <URL> | Detect media, print a table of candidates ranked best-first, exit. No download. |
mediagrab video <URL> [-f mkv|mp4|webm|ogg] [-q best|720|1080|...] | Download highest-quality video by default. Container default: mkv. |
mediagrab audio <URL> [-f mp3|opus|ogg|flac|wav|m4a] [-b 192k] | Extract audio. Container default: mp3 at 192kbps. |
Output templates
-o takes a literal path or a template. Tokens:
| Token | Expands to |
|---|---|
{title} | Slug of the page or video title |
{id} | Source ID (video id, hash) |
{site} | Host (e.g. youtube.com) |
{ext} | Chosen container extension |
Default template: {title}.{ext}.
Progress bars
Every download writes a colored indicatif progress bar to
stderr so stdout stays clean for the [OK] line and
JSON-piping. Three bar styles:
- Bytes-known downloads:
█████ 47.3 MB / 120.5 MB · 8.4 MB/s · ETA 9s - HLS segment loops:
45/120 segments - Transcoding / unknown total: spinner with bytes-so-far + rate
The bar is hidden automatically when stderr isn't a TTY (CI logs, pipes).
Touches / Produces / Gates
- Touches (network, read-only): HTTPS GET to the input URL and any media URL(s) it points at. HEAD on direct-file URLs.
- Touches (subprocess):
ffmpegshell-out when transcoding to a different container or extracting audio. - Touches (filesystem, write): the file at
--outputvia atomic.part→ rename. Intermediate temp files are cleaned up. - Produces: the media file plus an
[OK] Saved → … (N bytes)line on stdout, followed by the suite footer. - Gates: none.
--i-am-authorizedis NOT required — all endpoints used are public.
Exit codes
| Code | Meaning |
|---|---|
0 | Ok |
1 | Runtime failure — network, parse, no candidates, ffmpeg failure, DRM rejection |
2 | Usage error (clap) |
Why not yt-dlp?
yt-dlp is excellent and covers far more sites — 1700+ dedicated
extractors. mediagrab targets a different point in the trade-off
space: a single ~4 MB Rust binary that does ~80% of what you actually need from
a generic media downloader, without a Python runtime or the ongoing
extractor-rotation maintenance treadmill. When you hit a site mediagrab can't
handle, fall through to yt-dlp — they coexist fine.
Build from source
cd tools/mediagrab/rust
cargo build --release
cargo test # 62 unit tests
Toolchain pin: Rust 1.85.0. Dependencies are all in the suite-permitted allowlist
(clap, reqwest [blocking + rustls-tls],
serde, serde_json, anyhow,
regex, url, indicatif).
Runtime prereq for transcoding and HLS muxing:
ffmpeg on PATH. Without ffmpeg you can still download direct files
in their source container.