The 30-second pitch
ccglass is a local reverse proxy that captures LLM API traffic from coding agent CLIs (Claude Code, Codex, DeepSeek, Kimi, etc.) and shows you a real-time dashboard of prompts, costs, and cache hit rates.
It's open source. It's 5,000 lines of Node. It's MIT licensed.
GitHub: https://github.com/jianshuo/ccglass
The constraint that shaped everything
The hardest part wasn't building a proxy. It was making it work with coding agent CLIs that deliberately bypass HTTP_PROXY.
Every native CLI (Claude Code is Node, Codex is Node, DeepSeek's CLI is Go, etc.) opens HTTPS sockets directly. They don't honor HTTP_PROXY env vars. So the standard "man-in-the-middle" pattern (mitmproxy, Charles) doesn't apply — these tools need a CA cert to intercept HTTPS, but the CLI isn't going to trust your CA.
The trick: intercept the local loopback hop, not the wire.
The CLI's API base URL is https://api.anthropic.com. We override it to http://127.0.0.1:8123. Now the local hop is plain HTTP — no cert, no interception, no TLS. The CLI's Node https module makes a request to http://127.0.0.1:8123, which our proxy receives, logs, and forwards to the real https://api.anthropic.com.
Architecture
┌─────────────┐ plain HTTP ┌─────────────┐ HTTPS ┌─────────────┐
│ Claude │ ──────────────▶ │ ccglass │ ──────────▶ │ Anthropic │
│ Code CLI │ 127.0.0.1:8123 │ proxy │ │ API │
└─────────────┘ └─────────────┘ └─────────────┘
│
│ log + dashboard
▼
┌─────────────┐
│ Browser │
│ UI :8123 │
└─────────────┘
3 components:
-
Spawn wrapper — overrides
*_BASE_URLenv vars, spawns the CLI as a child process - Proxy server — logs requests, forwards upstream, captures responses (SSE streaming included)
- Web UI — real-time dashboard, web-socket fed
What I learned about streaming
The trickiest part: LLM APIs use Server-Sent Events (SSE) for streaming. The CLI expects an openai-sse or anthropic-sse stream. We need to:
- Proxy the response as a stream (no buffering)
- Tee the stream to the log file (we need every chunk)
- Compute the cost incrementally as chunks arrive (token counts come in the final chunk)
In Node, this is pipeline() with a Transform stream that hashes each chunk and writes it to a side channel. The CLI gets the original stream unchanged.
Cost calculation
Each provider has a different pricing model. Cache hits, prompt caching, batch API, all change the math.
I extracted pricing into a JSON file (data/pricing.json) keyed by provider:model and updated monthly. The cost is computed during the response stream so you see cost accumulating in real time on the dashboard.
MCP integration
The wild feature: ccglass has its own MCP (Model Context Protocol) server. When Claude Code starts, it can call our MCP tools. One of them is get_recent_requests — Claude can query its own request history from inside the chat.
User: what did I prompt you with 3 turns ago?
Claude: [calls ccglass MCP get_recent_requests]
Claude: You prompted me with "refactor the user service to use the new repository pattern".
It's recursive and weird. I love it.
What's next
- More providers — every new coding agent CLI that ships will need a config
- Cost forecasting — given your usage pattern, predict next month's bill
- Team sharing — local mode stays local, but opt-in to share specific sessions with teammates (encrypted, E2E)
Try it
npm i -g ccglass
ccglass claude
Open the dashboard. Run a few prompts. The first time you see your own cache hit rate, you'll get it.
Top comments (0)