Skip to content
View Hritikd's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Hritikd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Hritikd/README.md

Hritik Datta

Product @ Pre6 AI  ·  I build the machinery behind AI products.  ·  hritikd.github.io

Product by title, builder by craft — I design AI products and ship the engineering behind them: from-scratch LLM internals, agent systems, and the infrastructure that makes them reliable.


The LLM stack, from scratch

Three repos that build a language model's core machinery from first principles — pure Python/NumPy, no frameworks, every piece verified against ground truth, each with a live in-browser demo.

Project What it builds Proof
1 mosaic The tokenizer — train a real byte-pair-encoding tokenizer on your own text and watch strings break into token tiles. lossless decode(encode(x)) == x invariant · live studio
2 nabla The autograd engine — reverse-mode automatic differentiation, with a visualizer that animates gradients flowing backward through the graph. gradient-checked to ~1e-10 · live demo
3 loom The GPT itself — a transformer in pure NumPy, every gradient derived by hand, trained on Shakespeare. Generate text and inspect what each attention head looked at. hand-derived gradients checked to ~1e-8 · KV-cache decode proven identical to full forward · live demo

LLM engineering, the unglamorous parts

The half of AI products that decides whether they survive real users. Zero-dependency, deterministic, tested — clone and run in minutes, no API keys.

Project What it does
tinycoder A real coding agent in ~570 lines — the agentic loop behind Cursor/Claude Code, from scratch: tool use, sandboxed workspace, confirmation gates. Fully unit-tested via an injected fake model client.
stencil Constrained decoding — compiles a JSON Schema to a DFA and masks tokens so invalid LLM output is impossible (100% valid by construction vs ~0% unconstrained). Live demo
winnow Budget-aware context compression for RAG/agents — BM25 relevance + MMR diversity packs the highest-signal chunks into a token budget (0.75 vs 0.19 gold recall against truncation). Live demo
introspect-mcp MCP server that stops coding agents hallucinating APIs — introspects the exact package versions installed in your project and serves real signatures and source over MCP.
More — retrieval, safety, evals, and systems work
  • warren — from-scratch HNSW vector index (the algorithm behind FAISS/Qdrant): recall@10 ≥ 0.99 scanning ~5% of the database · live demo
  • mend — repairs malformed LLM JSON (fences, trailing commas, truncation): 16/16 real-world defects recovered vs stdlib's 0 · live playground
  • semcache — semantic cache for LLM calls: similar prompt in, cached response out; pluggable embedders/stores, TTL, LRU
  • verdict — adversarial LLM red-teaming: PAIR, Crescendo, and injection attacks with attack-success-rate reports
  • hermes — test-time compute scaling: o1-style reasoning search via process reward models, MCTS, beam search
  • agent-evals-lab — evaluation workbench for agent reliability with a trace-inspection dashboard · live demo
  • rag-safety-gateway — scans RAG context for prompt injection, secrets, and PII before it reaches a model · live demo
  • gemma4-multi-agent — supervisor + 4 specialist agents with live reasoning traces (LangGraph + Gemini)
  • tally — streaming statistics in bounded memory: Welford, reservoir sampling, Count-Min sketch, Misra-Gries

How I build

Derived, not imported     →  if I claim to understand it, I can build it from scratch
Verified, not vibes       →  gradients checked numerically, benchmarks reproducible, invariants unit-tested
Honest READMEs            →  every project states its limitations before you find them
Reviewable in 60 seconds  →  clone, run, understand — zero API keys to start

Stack

Python · NumPy · TypeScript · LangGraph · React · Streamlit · MCP · pytest · GitHub Actions · uv


Open to conversations on AI agent engineering, LLM internals, and evals.

Pinned Loading

  1. ai-code-reviewer ai-code-reviewer Public

    AI-powered code review CLI tool — get structured feedback on any code file using GPT-4o

    Python

  2. agent-evals-lab agent-evals-lab Public

    AI agent evaluation workbench for reliability, safety, tool-use, latency, and cost

    TypeScript

  3. contract-watch contract-watch Public

    CLI that diffs two OpenAPI contracts and flags breaking API changes before they reach clients — CI-friendly

    TypeScript

  4. gemma4-multi-agent gemma4-multi-agent Public

    Production-ready multi-agent AI system — Supervisor + 4 specialist agents — Google Gemini, LangGraph & Streamlit

    Python

  5. rag-safety-gateway rag-safety-gateway Public

    AI security gateway for scanning RAG context for prompt injection, secrets, PII, and exfiltration risk

    TypeScript

  6. repo-pulse repo-pulse Public

    CLI that turns any Git repo into an engineering-health report — churn × complexity hotspot scoring

    Python