369 Rules, Zero Trust: CredVigil, an Open-Source Credential Scanner in Go

CredVigil — Real-Time Credential Protection

369detection rules

75+credential categories

3xsignal detection

0raw secrets stored

The Problem That Wouldn’t Stop Bugging Me

If you’ve worked in any engineering role long enough, you’ve seen it happen: someone commits an AWS key to a repo. A .env file with production database credentials gets pushed. A Slack token shows up in a config file that was “only supposed to be local.”

I saw this constantly in my work as a performance test engineer. API keys scattered across JMeter scripts. Database connection strings hardcoded in test configs. Tokens embedded in CI/CD pipelines. And every time, the response was the same — somebody caught it (hopefully), rotated the credentials (hopefully quickly), and said “we should really scan for this stuff.”

So I built the thing.

What Is CredVigil?

CredVigil is an open-source secrets scanner written in Go. It scans codebases, config files, git history, and live file changes for exposed credentials — API keys, tokens, passwords, private keys, connection strings — across 75+ platforms and services.

But the part I’m most proud of isn’t the number of rules. It’s how it detects secrets.

Why Regex Alone Isn’t Enough

Most secrets scanners rely on regex patterns: if a string matches AKIA[0-9A-Z]{16}, it’s probably an AWS access key. That works great for well-known formats. But what about:

A custom internal API key that doesn’t match any known pattern?
A base64-encoded token that looks like random noise?
A password stored as db_pass=Xk9$mP2!qR7@nL4 — no standard format, pure entropy?

Regex misses all of these. That’s why CredVigil uses triple detection.

The Three Signals

Method	What it does	What it catches
Regex (369 rules)	Pattern matching against compiled regexes for every known credential format	AWS, GCP, Azure, GitHub, Stripe, OpenAI, Slack, JWT, private keys, DB URIs, 75+ categories
Shannon Entropy	Measures information density of a string; high randomness = likely a secret	Custom API keys, random tokens, high-entropy strings with no known format
BPE Token Efficiency	Measures how poorly a string compresses under Byte Pair Encoding; secrets compress poorly	Passwords, opaque tokens, anything that looks random to a language model tokenizer

These three signals combine into a confidence score from 0–100% per finding. Not a binary “this is a secret.” A score that lets you set thresholds and eliminate noise.

Why BPE? Byte Pair Encoding is the same tokenization used in GPT models. It compresses repetitive text efficiently. Normal English compresses well; random secret strings — high entropy, no repetitive structure — compress poorly. Low BPE efficiency is a signal I haven’t seen in any other secrets scanner. It’s genuinely useful for catching opaque tokens that look like noise.

Zero-Trust by Design

Here’s something that always bothered me about other tools: they store the raw secret in their output. If your scanner’s report file leaks, you’ve now leaked the secrets again.

CredVigil never stores raw secrets. Every finding includes:

A SHA-256 hash (for deduplication and tracking, not reversibility)
A redacted preview — e.g. wJal****EKEY
A fingerprint for cross-referencing across scans
Severity, confidence score, file type, environment classification

The raw match exists only in memory during the scan. A five-stage post-processing pipeline runs on every finding before it reaches output:

// Five-stage secure pipeline — no raw secrets leave this sequence

hash

SHA-256 digest

→

□

redact

mask raw value

→

⊕

enrich

add context

→

⧉

fingerprint

unique ID

→

✓

sanitize

safe output

finding { raw: "AKIAIOSFODNN7EXAMPLE" }

Each stage is a pure function that transforms a finding. Easy to test, easy to reason about, easy to extend. If your scanner’s report file leaks, you haven’t leaked the secrets again. That was a non-negotiable design goal.

Architecture: Five Components, Each Independently Tested

Component	What it does
Core Detection Engine	Regex + entropy + BPE scanning, concurrent file processing, confidence scoring
Secure Pipeline	Hash → redact → enrich → fingerprint → sanitize (5-stage post-processing)
Git Integration	Clone repos, walk commit history, diff branches, incremental scanning
File System Watcher	Real-time monitoring with fsnotify, debounced events, smart exclusions
Event Bus	Internal pub/sub for decoupled communication between components

Each component has its own test suite. The whole system passes with Go’s race detector enabled. There are 14 end-to-end tests covering real-world scanning scenarios.

One thing I learned: building components in isolation and testing them separately made the whole system dramatically easier to debug. When something broke, I knew exactly which layer to look at.

Try It

          git clone https://github.com/svemulapati/CredVigil-Secrets-Scanner.git
          cd CredVigil-Secrets-Scanner
          go build -o credvigil ./cmd/credvigil
           
          # Scan a directory
          ./credvigil scan ./your-project/
           
          # Scan from stdin
          echo 'AKIAIOSFODNN7EXAMPLE' | ./credvigil scan --stdin
           
          # Scan full git history
          ./credvigil scan --git ./your-repo/
        

Here’s what a real scan looks like — three runs, three different severity thresholds:

credvigil@local ~ zsh scan 1 / 3

What I Learned Building This

Regex is deep. Writing 369 detection rules that are precise enough to catch real secrets but not so greedy they flag every string that starts with sk_ taught me more about regex than years of using them casually.
Entropy isn’t magic. Shannon entropy is a powerful signal, but it’s not sufficient alone. Base64-encoded JSON has high entropy but isn’t a secret. You need context — keyword proximity, file type, pattern structure — to make entropy useful. That’s why it’s one signal among several in the confidence score, not the whole answer.
Concurrency in Go is a superpower. Scanning thousands of files needs to be fast. Go’s goroutines and channels made it natural to process files concurrently while keeping the race detector happy. Getting to zero race conditions with go test -race was satisfying.
The pipeline pattern is underrated. Hash → redact → enrich → fingerprint → sanitize. Each stage is a pure function that transforms a finding. Easy to test, easy to reason about, easy to extend.
False positives are the real enemy. A scanner that fires on every string that looks vaguely like a secret is useless. Detecting placeholders (EXAMPLE, changeme, TODO), penalizing test fixtures, and computing confidence scores were just as important as the detection rules themselves.

What’s Next

CredVigil currently has 5 core components fully built and tested. The roadmap includes:

REST API server for CI/CD integrations
Web dashboard for visual risk overview
GitHub Actions pre-commit hook
Slack / email / webhook notifications
ML-based anomaly detection for catching secret patterns no regex could

View CredVigil on GitHub

Open source, Apache 2.0 licensed. PRs, issues, and feedback welcome.

View on GitHub → ← All posts

Sudeep Nag Vemulapati

Senior Site Reliability Engineer with 15+ years building scalable, resilient production systems. Building DevSecOps tooling in Go. Reach out at svemulapati@gmx.com or @svemulapati on GitHub.