侯垒

Posted on Jun 17

Building ccglass: the architecture of a local LLM reverse proxy

#node #claudecode #opensource #tutorial

The 30-second pitch

ccglass is a local reverse proxy that captures LLM API traffic from coding agent CLIs (Claude Code, Codex, DeepSeek, Kimi, etc.) and shows you a real-time dashboard of prompts, costs, and cache hit rates.

It's open source. It's 5,000 lines of Node. It's MIT licensed.

GitHub: https://github.com/jianshuo/ccglass

The constraint that shaped everything

The hardest part wasn't building a proxy. It was making it work with coding agent CLIs that deliberately bypass HTTP_PROXY.

Every native CLI (Claude Code is Node, Codex is Node, DeepSeek's CLI is Go, etc.) opens HTTPS sockets directly. They don't honor HTTP_PROXY env vars. So the standard "man-in-the-middle" pattern (mitmproxy, Charles) doesn't apply — these tools need a CA cert to intercept HTTPS, but the CLI isn't going to trust your CA.

The trick: intercept the local loopback hop, not the wire.

The CLI's API base URL is https://api.anthropic.com. We override it to http://127.0.0.1:8123. Now the local hop is plain HTTP — no cert, no interception, no TLS. The CLI's Node https module makes a request to http://127.0.0.1:8123, which our proxy receives, logs, and forwards to the real https://api.anthropic.com.

Architecture

┌─────────────┐   plain HTTP    ┌─────────────┐    HTTPS    ┌─────────────┐
│  Claude     │ ──────────────▶ │  ccglass    │ ──────────▶ │ Anthropic   │
│  Code CLI   │  127.0.0.1:8123 │  proxy      │             │ API         │
└─────────────┘                 └─────────────┘             └─────────────┘
                                       │
                                       │ log + dashboard
                                       ▼
                                ┌─────────────┐
                                │  Browser    │
                                │  UI :8123   │
                                └─────────────┘

3 components:

Spawn wrapper — overrides *_BASE_URL env vars, spawns the CLI as a child process
Proxy server — logs requests, forwards upstream, captures responses (SSE streaming included)
Web UI — real-time dashboard, web-socket fed

What I learned about streaming

The trickiest part: LLM APIs use Server-Sent Events (SSE) for streaming. The CLI expects an openai-sse or anthropic-sse stream. We need to:

Proxy the response as a stream (no buffering)
Tee the stream to the log file (we need every chunk)
Compute the cost incrementally as chunks arrive (token counts come in the final chunk)

In Node, this is pipeline() with a Transform stream that hashes each chunk and writes it to a side channel. The CLI gets the original stream unchanged.

Cost calculation

Each provider has a different pricing model. Cache hits, prompt caching, batch API, all change the math.

I extracted pricing into a JSON file (data/pricing.json) keyed by provider:model and updated monthly. The cost is computed during the response stream so you see cost accumulating in real time on the dashboard.

MCP integration

The wild feature: ccglass has its own MCP (Model Context Protocol) server. When Claude Code starts, it can call our MCP tools. One of them is get_recent_requests — Claude can query its own request history from inside the chat.

User: what did I prompt you with 3 turns ago?
Claude: [calls ccglass MCP get_recent_requests]
Claude: You prompted me with "refactor the user service to use the new repository pattern".

It's recursive and weird. I love it.

What's next

More providers — every new coding agent CLI that ships will need a config
Cost forecasting — given your usage pattern, predict next month's bill
Team sharing — local mode stays local, but opt-in to share specific sessions with teammates (encrypted, E2E)

Try it

npm i -g ccglass
ccglass claude

Open the dashboard. Run a few prompts. The first time you see your own cache hit rate, you'll get it.