credsweep
Credential / secret scanner for files, dirs, stdin, and git history
v1.2.2Quick Start
Install via jcli (recommended)
jcli install credsweep
If you don't have jcli yet, install it first with
curl -fsSL https://cli.johlem.net/tools/jcli/install.sh | bash.
Scan a codebase
credsweep --path ./src # walk a directory
credsweep --stdin < suspicious.log # read from stdin
credsweep --git --path . # scan git history for secrets ever committed
credsweep --path . --output sarif > cs.sarif # SARIF for GitHub Code Scanning
credsweep --install-hook --path . # write a .git/hooks/pre-commit
What it does
credsweep is a credential / secret scanner — feed it a file tree, stdin, or a git
repository's history and it returns every accidentally committed AWS key, GitHub PAT,
private key, JWT, database connection string, and ~45 other things. Built in Rust,
single binary. Absorbed the retired
credaudit tool — its full 53-pattern
catalogue plus its sensitivity filter, entropy validation, git-history scan, and
false-positive filtering now live here.
- 53 patterns across 9 kinds — cloud_credential, token, api_key,
connection_string, private_key, webhook, password, secret, auth_header. AWS, GitHub,
Slack, Stripe, Google, Azure, Mailgun, Datadog, Cloudflare, Vault, Terraform Cloud,
Heroku, Twilio, SendGrid, DigitalOcean, Shopify, GitLab, NPM, Docker Hub, and the
generic patterns. All embedded in the binary via
include_str!. - Three sensitivity levels.
--sensitivity highruns only prefixed/high-confidence patterns (lowest FP rate).medium(default) adds DB connection strings, JWTs, bearer tokens.lowadds the genericpassword = "..."heuristics (highest yield, highest FP rate). - Value-scoped false-positive filtering. Generic-assignment
patterns capture the value in group 1; entropy + FP-keyword checks run against
the value, not the whole match. So
password = "password"is suppressed butpassword = "<real value>"fires. - Shannon-entropy floor per pattern. Each pattern declares an
entropy_min; matches below the threshold are dropped as likely placeholder (xxxxxxxx,your_api_key_here, etc.). - Git history scan.
--gitwalks every commit's diff (with--rootso initial commits are included) and stamps the SHA on every finding. Catches credentials that were committed and later removed. - SARIF v2.1.0 output.
--output sarifemits GitHub Code Scanning-compatible JSON with stablepartialFingerprintsso GitHub deduplicates findings across runs. - Baseline / update.
--baselinesnapshots current findings;--updateonly reports new ones. Ship credsweep against a codebase with existing known leaks without flooding CI. - Allow-list.
--allow-list <FILE>reads a TOML file withfile_glob/pattern/snippetmatchers — all ANDed. Documented test fixtures get a clean pass without disabling patterns globally. - Pre-commit hook installer.
--install-hookwrites.git/hooks/pre-commit(respectingcore.hooksPathfor monorepos). Refuses to clobber an existing hook unless--force.
CLI
| Flag | What it does |
|---|---|
--path <PATH> | Scan a file or directory tree (with skip list: .git, node_modules, target, …) |
--stdin | Read the input from stdin (mutually exclusive with --path) |
--git | Scan git history of the repo at --path (or CWD). Catches credentials introduced and later removed |
--sensitivity {high,medium,low} | Pattern subset (default medium) |
-o, --output <FMT> | terminal / json / csv / markdown / sarif |
-O, --out-file <PATH> | Write structured output to file |
--install-hook | Write a .git/hooks/pre-commit at --path (or CWD) |
--force | Overwrite an existing hook (use with --install-hook) |
--baseline | Record current findings to the baseline file, exit 0 |
--update | Filter against an existing baseline — report only new findings |
--baseline-file <PATH> | Where to read/write the baseline (default .credsweep-baseline.json) |
--allow-list <FILE> | Path to a TOML allow-list — findings matching any rule are suppressed |
-X, --exclude-path <GLOB> | Skip paths matching GLOB (repeatable, relative to scan root). Examples: -X 'vendor/**', --exclude-path 'tests/fixtures/*.env'. Short-circuits the walker so excluded subtrees are never opened. |
--max-file-size <MB> | Skip files larger than N MB (default 5) |
--follow-symlinks | Follow symlinks during directory walk (default off — prevents loops) |
--include-hidden | Include dotfiles and dot-directories |
--summary | After the scan, print a stderr block: totals, severity breakdown, top 5 patterns by hit count, files scanned/skipped. Stderr-only so it doesn't pollute structured-output pipelines. |
-m, --minimal | One finding per line, pipe-friendly |
-v, --verbose | Per-file timings on stderr |
Subcommand: credsweep patterns list
Show every embedded detection pattern with its severity, sensitivity, kind, and confidence. Useful for confirming what the scanner is looking for before committing to a sensitivity tier, or building per-team allow-lists from the catalogue.
credsweep patterns list # full catalogue (text)
credsweep patterns list --severity critical # filter by severity
credsweep patterns list --sensitivity high --format md # markdown table
credsweep patterns list --format json | jq '.[].name' # pipe to jq
Flags: --format {text,json,md}, --severity {info,low,medium,high,critical}, --sensitivity {high,medium,low}.
Allow-list TOML format
# All present matchers are ANDed.
# At least one of (file_glob, pattern, snippet) must be set.
[[allow]]
file_glob = "tests/**"
pattern = "JWT Token"
note = "test fixtures — checked manually"
[[allow]]
snippet = "AKIAJSIRBVE5YQQTESTX"
note = "intentional fixture key in docs"
[[allow]]
pattern = "Generic Password Assignment"
note = "this codebase's password=... lines are template variables"
CI workflow
Common pattern: baseline once, then update on every PR. Pre-existing leaks don't break CI; new ones do.
# One-time setup (committed alongside the codebase)
credsweep --path . --sensitivity medium --baseline
git add .credsweep-baseline.json
git commit -m "ci: credsweep baseline"
# CI step (.github/workflows/ci.yml etc.)
credsweep --path . --sensitivity medium --update --output sarif --out-file cs.sarif
Design choices
- Subprocess to
git, notgit2. The--gitscanner shells out togit diff-tree -p --no-commit-id --root -r <commit>. The--rootflag is critical — without it, the initial commit (no parent) produces empty diff output and the credential introduced there is missed. Saves ~10 MB of binary vs vendoring libgit2. - Snippet redaction is mandatory. The full matched string never
leaves the binary.
"AKIA1234567890ABCDEF"becomes"AKIA…CDEF"in every output mode. Operators can grep their codebase for the same pattern but can't accidentally paste a real secret into logs / Slack / a JIRA ticket. - Pipe-clean structured output. When
--output json|csv|markdown|sarifis set without--out-file, the banner and per-finding terminal display are suppressed so stdout is pure structured output. - Exit codes. 0 = clean, 1 = findings, 2 = runtime error, 3 = bad arguments. Suitable for CI gating.
Not a silver bullet: credsweep catches credentials that match its
53 patterns plus value-scoped entropy heuristics. Custom organisational secret formats
won't match — add them to data/patterns.toml (one TOML entry per pattern,
no code changes needed). For supply-chain scenarios (CycloneDX/SPDX SBOMs containing
credentials in dependency metadata) credsweep currently treats them as plain JSON —
an SBOM-aware mode is on the backlog per the 2026-05-29 suite audit.