mediagrab

Generic web media downloader — find video/audio on any standards-compliant page, save as open-source container

v1.0.0

Linux

Quick Start

Install via jcli (recommended)

jcli install mediagrab

Or via the installer

curl -fsSL https://cli.johlem.net/tools/mediagrab/install.sh | bash

One-shot smoke test

mediagrab probe https://archive.org/download/BigBuckBunny_124/Content/big_buck_bunny_720p_surround.mp4
# → detects direct .mp4, shows file size, no download

mediagrab video -o bbb.mkv https://archive.org/.../big_buck_bunny_720p_surround.mp4
# → downloads + transcodes mp4 → mkv via ffmpeg + progress bar on stderr

mediagrab audio -f mp3 -b 192k -o bbb.mp3 https://archive.org/.../big_buck_bunny_720p_surround.mp4
# → downloads + extracts audio + transcodes to MP3 at 192kbps

What it does

mediagrab detects video and audio media on any standards-compliant web page or major video host, picks the highest-quality stream by default, and writes the result to an open-source container of your choice. Single Rust binary. No async runtime. No Python. No yt-dlp dependency.

The architecture is an extractor trait with a registry of site-specific and generic-fallback implementations. Adding a new site is one new file; the dispatch loop walks the registry, first match wins.

Coverage

Source	How it's detected	Notes
Direct file URLs (.mp4, .webm, .mkv, .m3u8, .mpd, .mp3, .opus, …)	URL extension + HEAD `Content-Type` gate	Pages that look like media URLs but serve HTML are correctly rejected and fall through to the next extractor.
YouTube (youtube.com, youtu.be, /shorts/, /embed/)	`ytInitialPlayerResponse.streamingData.formats` + `adaptiveFormats`	Streams that require `signatureCipher` / nsig decoding are skipped with a clear error. Many older or shorts videos work; some newer ones don't.
Vimeo	`player.vimeo.com/video/<id>/config` JSON	Progressive + HLS variants.
Any standards-compliant page	`<video src>`, `<source src>`, `og:video`, `twitter:player:stream`, `<link rel="alternate" type="application/x-mpegURL\|dash+xml">`, JSON-LD `VideoObject.contentUrl`, inline `.m3u8`/`.mpd` URLs	The long tail. Wikipedia, news sites, blogs, CMS embeds all generally work.

Out of scope: DRM (HLS AES-128, Widevine), YouTube signatureCipher streams. These produce a clear error message — no silent failure.

Subcommands

Command	What it does
`mediagrab probe <URL>`	Detect media, print a table of candidates ranked best-first, exit. No download.
`mediagrab video <URL> [-f mkv\|mp4\|webm\|ogg] [-q best\|720\|1080\|...]`	Download highest-quality video by default. Container default: mkv.
`mediagrab audio <URL> [-f mp3\|opus\|ogg\|flac\|wav\|m4a] [-b 192k]`	Extract audio. Container default: mp3 at 192kbps.

Output templates

-o takes a literal path or a template. Tokens:

Token	Expands to
`{title}`	Slug of the page or video title
`{id}`	Source ID (video id, hash)
`{site}`	Host (e.g. `youtube.com`)
`{ext}`	Chosen container extension

Default template: {title}.{ext}.

Progress bars

Every download writes a colored indicatif progress bar to stderr so stdout stays clean for the [OK] line and JSON-piping. Three bar styles:

Bytes-known downloads: █████ 47.3 MB / 120.5 MB · 8.4 MB/s · ETA 9s
HLS segment loops: 45/120 segments
Transcoding / unknown total: spinner with bytes-so-far + rate

The bar is hidden automatically when stderr isn't a TTY (CI logs, pipes).

Touches / Produces / Gates

Touches (network, read-only): HTTPS GET to the input URL and any media URL(s) it points at. HEAD on direct-file URLs.
Touches (subprocess): ffmpeg shell-out when transcoding to a different container or extracting audio.
Touches (filesystem, write): the file at --output via atomic .part → rename. Intermediate temp files are cleaned up.
Produces: the media file plus an [OK] Saved → … (N bytes) line on stdout, followed by the suite footer.
Gates: none. --i-am-authorized is NOT required — all endpoints used are public.

Exit codes

Code	Meaning
`0`	Ok
`1`	Runtime failure — network, parse, no candidates, ffmpeg failure, DRM rejection
`2`	Usage error (clap)

Why not yt-dlp?

yt-dlp is excellent and covers far more sites — 1700+ dedicated extractors. mediagrab targets a different point in the trade-off space: a single ~4 MB Rust binary that does ~80% of what you actually need from a generic media downloader, without a Python runtime or the ongoing extractor-rotation maintenance treadmill. When you hit a site mediagrab can't handle, fall through to yt-dlp — they coexist fine.

Build from source

cd tools/mediagrab/rust
cargo build --release
cargo test                                  # 62 unit tests

Toolchain pin: Rust 1.85.0. Dependencies are all in the suite-permitted allowlist (clap, reqwest [blocking + rustls-tls], serde, serde_json, anyhow, regex, url, indicatif).

Runtime prereq for transcoding and HLS muxing: ffmpeg on PATH. Without ffmpeg you can still download direct files in their source container.