urlrecon

Async multi-module URL / domain reconnaissance

v1.7.0

Linux

Quick Start

Install via jcli (recommended)

jcli install urlrecon

If you don't have jcli yet, install it first with curl -fsSL https://cli.johlem.net/tools/jcli/install.sh | bash.

Scan a target

urlrecon --target example.com                                 # default modules, terminal output
urlrecon --target example.com --output json                   # pipe-clean JSON
urlrecon --target example.com --modules headers,tls,waf       # subset
urlrecon --target example.com --modules jsendpoints           # opt-in JS endpoint + secret scan
urlrecon --target example.com --output json --out-file r.json # write to file
urlrecon --file targets.txt --output json                     # bulk: one report per target

Absorbed subcommands

# Certificate Transparency / chain validation (former certchain tool)
urlrecon certchain query example.com
urlrecon certchain validate example.com:443
urlrecon certchain inspect ./leaf.pem

# Historical subdomain inventory + takeover detection (former domaindrift tool)
urlrecon domaindrift history example.com
urlrecon domaindrift takeover example.com

What it does

urlrecon is a passive recon CLI: feed it a URL or domain and it runs a fleet of 19 analysis modules concurrently, returning a structured report. Built in Rust, single binary, zero runtime dependencies. Re-implements + extends the capabilities of the retired headerscan, certchain, and domaindrift tools (security headers, WAF, TLS, redirect-chain, CT-log query, chain validation, and subdomain-takeover detection are now part of urlrecon).

Concurrent module orchestrator. All selected modules run in parallel against a single target. Per-module failures degrade gracefully — one timed-out module doesn't abort the run.
Pipe-clean JSON output. --output json emits a stable schema suitable for jq / CI pipelines / dashboards. CSV and Markdown are also available.
Configurable. --modules selects a subset, --timeout bounds slow probes, --concurrency caps async workers, --rate-limit throttles, --no-color for log-friendly output.
Opt-in heavy modules. jsendpoints and domain_history are off in --modules all because they add real network cost — run them explicitly when you want that depth.
Post-recon file fetch. --download <bucket> reads the inventory.file.* URLs and writes them to ./urlrecon-downloads/ (override with --download-dir).

Modules

Modules marked opt-in are skipped by --modules all; pass the name explicitly (e.g. --modules jsendpoints) to run them.

Module	What it does
`cors`	Sends a probe with a synthetic Origin and classifies `Access-Control-Allow-Origin` / `-Credentials`: reflective origin or ``+creds → Critical, `null` → High, `` alone → Low
`dns`	A / AAAA / MX / NS / TXT / SOA records via hickory-resolver (system config → Cloudflare fallback)
`dnssec`	Direct DNSKEY + DS queries with raw-bytes RDATA parsing. Reports key count, KSK/ZSK split, algorithms, and chain posture (signed+delegated, DS-only → bogus, etc.)
`domain_history` (opt-in)	Historical control signal: crt.sh issuer eras (flags CA churn) + Wayback CDX first/last/max-gap. No API key required
`emailauth`	SPF + DMARC + DKIM posture over DNS TXT. SPF `+all` Critical, DMARC `p=none` Medium, multi-string TXT concatenation per RFC 7208 §3.3
`geo`	IP geolocation + ASN / ISP via ipwho.is. Pre-resolves the domain to an A record first for reliability
`headers`	HTTP security header analysis: HSTS, CSP, X-Content-Type-Options, X-Frame-Options, Referrer-Policy. 0–10 risk score
`inventory`	Root-page link + media + file-download census. Buckets file links (`pdf`, `doc`, `spreadsheet`, `archive`, …) so `--download` can pick them up
`jsendpoints` (opt-in)	Fetches every `<script src>` from the root (cap 20, 2 MB each) and regex-scans bodies for API paths plus AWS / GitHub / Slack / Stripe / Google / JWT secret shapes. Secrets are redacted in the report
`ports`	Common-port TCP probe (26 ports). Telnet → Critical, exposed databases → High, RDP/VNC → Medium
`redirects`	Hop-by-hop redirect-chain walk. HTTPS → HTTP downgrade flagged Critical; loops flagged High
`robots`	`/robots.txt` fetch + parse. Disallow paths matching admin/api/.env/etc. escalate to Medium
`shodan`	Optional API lookup (`~/.urlrecon/keys.toml`: `shodan_key`). Open ports + banners + ASN + CVE/CVSS. Degrades to an Info finding when no key is set
`sitemap`	`/sitemap.xml` fetch with `/sitemap_index.xml` fallback. Same interesting-path flagging
`subdomains`	crt.sh certificate-transparency log enumeration. 500-subdomain emit cap with truncation note
`tech`	Passive Wappalyzer-style fingerprinting against 33 technologies (CMS, frameworks, JS libs, servers)
`tls`	TLS handshake via tokio-rustls. Protocol + cipher + cert (subject/issuer/SAN/expiry) with strength tiers
`waf`	Passive WAF / CDN fingerprinting against 10 products (Cloudflare, Akamai, Imperva, Sucuri, Fastly, etc.)
`whois`	Domain registration via RDAP (modern WHOIS replacement). Registrar, expiry, DNSSEC, EPP status flags

Absorbed subcommands

urlrecon ships the former certchain and domaindrift tools as subcommands rather than the flat recon path. Output formats terminal / json / csv / markdown apply uniformly via --format.

Subcommand	What it does
`certchain query <domain>`	Query crt.sh CT logs for certificates issued for a domain. `--include-expired` opt-in
`certchain validate <host[:port]>`	Fetch the live TLS chain and walk it: trust path, expiry, key strength, SAN coverage
`certchain inspect <path.pem>`	Parse a local PEM file (DER follow-up): subject, issuer, validity, SANs, key type
`certchain batch <file>`	Newline-delimited domain list — one CT query per domain
`domaindrift history <domain>`	Historical subdomain inventory via CT logs + Wayback Machine. `--include-expired`, `--limit`
`domaindrift takeover <domain>`	Subdomain takeover detection against ~45 service fingerprints (S3, GitHub Pages, Heroku, …)

CLI

Flag	What it does
`-t, --target <URL>`	Target host, URL, or IP
`-f, --file <PATH>`	Bulk mode — newline-delimited target list (`#` comments, blank lines ignored)
`-M, --modules <LIST>`	Comma-separated module subset, or `all` (default; opt-in modules excluded)
`-o, --output <FMT>`	`terminal` / `json` / `csv` / `markdown`
`-O, --out-file <PATH>`	Write structured output to file (banner suppressed)
`--timeout <SECS>`	Per-request timeout (default 10s)
`--concurrency <N>`	Max concurrent async workers (default 10)
`--rate-limit <MS>`	Minimum milliseconds between requests (default 0)
`--download <BUCKETS>`	Post-recon: fetch `inventory.file.*` URLs for the listed buckets (`pdf,doc,spreadsheet,…`)
`--download-dir <PATH>`	Download root (default `./urlrecon-downloads/`)
`-T, --tui`	Full-screen ratatui dashboard
`-m, --minimal`	One finding per line, pipe-friendly
`-v, --verbose`	Per-module timings on stderr, raw data dump
`--no-color`	Disable ANSI colors
`--self-check`	Check `cli.johlem.net` for an available update
`--self-update`	Download + SHA-256-verify + atomic-rename the latest signed binary (previous → `.bak`)

Design choices

rustls everywhere. HTTP via reqwest+rustls-tls, TLS handshake via tokio-rustls. No openssl-sys, deployable on hardened systems without OpenSSL.
RDAP, not legacy WHOIS. The whois module queries rdap.org over HTTPS for structured JSON — no fragile text parsing, no TCP-43 firewalling concerns.
Pre-resolution for geo. ipwho.is's own resolver is unreliable for many domains; the geo module pre-resolves via hickory-resolver and queries with the resolved IP for clean results.
Bounded port scan. 26 well-known ports, 3s per-port timeout, 32 concurrent connects. Full TCP handshakes (not stealth scan) — connections appear in target logs. Operator authorisation expected.
Embedded signatures. WAF (10), security headers (9), TLS ciphers, tech (33) are all embedded in the binary via include_str!. Updates require a tool rebuild but the runtime is offline-capable.

Authorisation: urlrecon's ports module performs full TCP handshakes against the target's host. Run only against systems you own or have explicit written authorisation to scan. The subdomains and geo modules use third-party APIs (crt.sh and ipwho.is) — review their ToS before automating against client targets.