369 Rules, Zero Trust: CredVigil, an Open-Source Credential Scanner in Go

CredVigil — Real-Time Credential Protection
369detection rules
75+credential categories
3xsignal detection
0raw secrets stored

The Problem That Wouldn’t Stop Bugging Me

If you’ve worked in any engineering role long enough, you’ve seen it happen: someone commits an AWS key to a repo. A .env file with production database credentials gets pushed. A Slack token shows up in a config file that was “only supposed to be local.”

I saw this constantly in my work as a performance test engineer. API keys scattered across JMeter scripts. Database connection strings hardcoded in test configs. Tokens embedded in CI/CD pipelines. And every time, the response was the same — somebody caught it (hopefully), rotated the credentials (hopefully quickly), and said “we should really scan for this stuff.”

So I built the thing.

What Is CredVigil?

CredVigil is an open-source secrets scanner written in Go. It scans codebases, config files, git history, and live file changes for exposed credentials — API keys, tokens, passwords, private keys, connection strings — across 75+ platforms and services.

But the part I’m most proud of isn’t the number of rules. It’s how it detects secrets.

Why Regex Alone Isn’t Enough

Most secrets scanners rely on regex patterns: if a string matches AKIA[0-9A-Z]{16}, it’s probably an AWS access key. That works great for well-known formats. But what about:

Regex misses all of these. That’s why CredVigil uses triple detection.

The Three Signals

MethodWhat it doesWhat it catches
Regex (369 rules) Pattern matching against compiled regexes for every known credential format AWS, GCP, Azure, GitHub, Stripe, OpenAI, Slack, JWT, private keys, DB URIs, 75+ categories
Shannon Entropy Measures information density of a string; high randomness = likely a secret Custom API keys, random tokens, high-entropy strings with no known format
BPE Token Efficiency Measures how poorly a string compresses under Byte Pair Encoding; secrets compress poorly Passwords, opaque tokens, anything that looks random to a language model tokenizer

These three signals combine into a confidence score from 0–100% per finding. Not a binary “this is a secret.” A score that lets you set thresholds and eliminate noise.

Why BPE? Byte Pair Encoding is the same tokenization used in GPT models. It compresses repetitive text efficiently. Normal English compresses well; random secret strings — high entropy, no repetitive structure — compress poorly. Low BPE efficiency is a signal I haven’t seen in any other secrets scanner. It’s genuinely useful for catching opaque tokens that look like noise.

Zero-Trust by Design

Here’s something that always bothered me about other tools: they store the raw secret in their output. If your scanner’s report file leaks, you’ve now leaked the secrets again.

CredVigil never stores raw secrets. Every finding includes:

The raw match exists only in memory during the scan. A five-stage post-processing pipeline runs on every finding before it reaches output:

Each stage is a pure function that transforms a finding. Easy to test, easy to reason about, easy to extend. If your scanner’s report file leaks, you haven’t leaked the secrets again. That was a non-negotiable design goal.

Architecture: Five Components, Each Independently Tested

ComponentWhat it does
Core Detection EngineRegex + entropy + BPE scanning, concurrent file processing, confidence scoring
Secure PipelineHash → redact → enrich → fingerprint → sanitize (5-stage post-processing)
Git IntegrationClone repos, walk commit history, diff branches, incremental scanning
File System WatcherReal-time monitoring with fsnotify, debounced events, smart exclusions
Event BusInternal pub/sub for decoupled communication between components

Each component has its own test suite. The whole system passes with Go’s race detector enabled. There are 14 end-to-end tests covering real-world scanning scenarios.

One thing I learned: building components in isolation and testing them separately made the whole system dramatically easier to debug. When something broke, I knew exactly which layer to look at.

Try It

git clone https://github.com/svemulapati/CredVigil-Secrets-Scanner.git cd CredVigil-Secrets-Scanner go build -o credvigil ./cmd/credvigil   # Scan a directory ./credvigil scan ./your-project/   # Scan from stdin echo 'AKIAIOSFODNN7EXAMPLE' | ./credvigil scan --stdin   # Scan full git history ./credvigil scan --git ./your-repo/

Here’s what a real scan looks like — three runs, three different severity thresholds:

credvigil@local ~ zsh scan 1 / 3

What I Learned Building This

What’s Next

CredVigil currently has 5 core components fully built and tested. The roadmap includes:

View CredVigil on GitHub

Open source, Apache 2.0 licensed. PRs, issues, and feedback welcome.

Sudeep Nag Vemulapati

Senior Site Reliability Engineer with 15+ years building scalable, resilient production systems. Building DevSecOps tooling in Go. Reach out at svemulapati@gmx.com or @svemulapati on GitHub.