Product @ Pre6 AI · I build the machinery behind AI products. · hritikd.github.io
Product by title, builder by craft — I design AI products and ship the engineering behind them: from-scratch LLM internals, agent systems, and the infrastructure that makes them reliable.
Three repos that build a language model's core machinery from first principles — pure Python/NumPy, no frameworks, every piece verified against ground truth, each with a live in-browser demo.
| Project | What it builds | Proof | |
|---|---|---|---|
| 1 | mosaic | The tokenizer — train a real byte-pair-encoding tokenizer on your own text and watch strings break into token tiles. | lossless decode(encode(x)) == x invariant · live studio |
| 2 | nabla | The autograd engine — reverse-mode automatic differentiation, with a visualizer that animates gradients flowing backward through the graph. | gradient-checked to ~1e-10 · live demo |
| 3 | loom | The GPT itself — a transformer in pure NumPy, every gradient derived by hand, trained on Shakespeare. Generate text and inspect what each attention head looked at. | hand-derived gradients checked to ~1e-8 · KV-cache decode proven identical to full forward · live demo |
The half of AI products that decides whether they survive real users. Zero-dependency, deterministic, tested — clone and run in minutes, no API keys.
| Project | What it does |
|---|---|
| tinycoder | A real coding agent in ~570 lines — the agentic loop behind Cursor/Claude Code, from scratch: tool use, sandboxed workspace, confirmation gates. Fully unit-tested via an injected fake model client. |
| stencil | Constrained decoding — compiles a JSON Schema to a DFA and masks tokens so invalid LLM output is impossible (100% valid by construction vs ~0% unconstrained). Live demo |
| winnow | Budget-aware context compression for RAG/agents — BM25 relevance + MMR diversity packs the highest-signal chunks into a token budget (0.75 vs 0.19 gold recall against truncation). Live demo |
| introspect-mcp | MCP server that stops coding agents hallucinating APIs — introspects the exact package versions installed in your project and serves real signatures and source over MCP. |
More — retrieval, safety, evals, and systems work
- warren — from-scratch HNSW vector index (the algorithm behind FAISS/Qdrant): recall@10 ≥ 0.99 scanning ~5% of the database · live demo
- mend — repairs malformed LLM JSON (fences, trailing commas, truncation): 16/16 real-world defects recovered vs stdlib's 0 · live playground
- semcache — semantic cache for LLM calls: similar prompt in, cached response out; pluggable embedders/stores, TTL, LRU
- verdict — adversarial LLM red-teaming: PAIR, Crescendo, and injection attacks with attack-success-rate reports
- hermes — test-time compute scaling: o1-style reasoning search via process reward models, MCTS, beam search
- agent-evals-lab — evaluation workbench for agent reliability with a trace-inspection dashboard · live demo
- rag-safety-gateway — scans RAG context for prompt injection, secrets, and PII before it reaches a model · live demo
- gemma4-multi-agent — supervisor + 4 specialist agents with live reasoning traces (LangGraph + Gemini)
- tally — streaming statistics in bounded memory: Welford, reservoir sampling, Count-Min sketch, Misra-Gries
Derived, not imported → if I claim to understand it, I can build it from scratch
Verified, not vibes → gradients checked numerically, benchmarks reproducible, invariants unit-tested
Honest READMEs → every project states its limitations before you find them
Reviewable in 60 seconds → clone, run, understand — zero API keys to start
Python · NumPy · TypeScript · LangGraph · React · Streamlit · MCP · pytest · GitHub Actions · uv
Open to conversations on AI agent engineering, LLM internals, and evals.

