You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We now have a precise interprocedural call graph available two ways: in-memory as a NetworkX DiGraph, and — with the new TypeScript Neo4j backend (TSNeo4jBackend) — as a property graph queryable with Cypher. On top of that call graph we should expose a small, composable family of reachability queries whose purpose is source↔sink matching.
This is intentionally broader than a single "reverse reachability" method. The objective is a coherent API in which sources and sinks are just sets of call-graph nodes, and we can answer the three fundamental questions between them (forward, reverse, and witness paths). Source↔sink matching is the headline use case; the three primitives compose into it.
Motivation
"Can untrusted input reach a dangerous operation, and how?" is the core question behind taint/data-flow-style analysis, attack-surface mapping, and security triage. With the call graph as the substrate we can answer the call-graph-reachability approximation of this cheaply — especially over Neo4j, where transitive reachability is exactly what the engine is built for.
Terminology
Source — a callable where interesting/untrusted data originates: an HTTP route handler / entrypoint, process.env, req.body, a deserialization entry, etc.
Sink — a callable or external (phantom/library) symbol where it is dangerous/interesting for data to arrive: child_process.exec, fs.writeFile, a DB driver call, node:crypto.*, etc.
Reaches — a reaches b iff there is a directed path a -CALLS*-> b in the call graph.
Sources and sinks should be expressible flexibly: explicit signature sets, predicates (e.g. "every External symbol whose module is node:child_process"), or built-in catalogs (entrypoints as sources; a curated list of dangerous externals as sinks). The graph already distinguishes internal callables from External phantom symbols and marks entrypoints, so these catalogs are largely already present.
Many-to-many matching(composed) — given a set of sources and a set of sinks, return the connected (source, sink) pairs, optionally each with a witness path. This is the headline "source↔sink matching" capability; it is built on 1–3.
Why this fits the graph backends
The key insight is that reverse reachability needs no new data — it is the forward query with the edge direction flipped. The same holds across backends:
Operation
Neo4j (Cypher)
In-memory (NetworkX)
Forward
MATCH (s {signature:$src})-[:CALLS*1..]->(t)
nx.descendants(G, src)
Reverse
MATCH (s)-[:CALLS*1..]->(t {signature:$sink})
nx.ancestors(G, sink)
Paths
shortestPath(...), allShortestPaths(...), APOC apoc.path.* for bounded k-paths
Neo4j makes the transitive case a one-liner and scales it; the in-memory backends give identical semantics for languages without a graph DB.
Design notes / open questions
Where it lives: these belong on the backend contract (TSAnalysisBackend / JavaAnalysisBackend / PythonAnalysisBackend) and surfaced on the façades, so every language with a call graph gets them — Neo4j just makes them cheap. (The backend ABCs landed in Add Neo4j-backed TypeScript analysis backend and formalize backend contracts #159.)
Parameters worth supporting: max_depth / hop bound; max_paths; shortest-only vs. all; include/exclude External (phantom) nodes; filter by edge provenance / dispatch tags; restrict to a module subtree.
Return shapes: matched node signatures, and where relevant the witness path as a list of signatures plus per-edge metadata (so a result is auditable, not just a boolean).
Sources/sinks as predicates and named catalogs, not just explicit sets (proposed: yes).
Bounding: simple-path enumeration can explode; default to shortest/k-bounded with an explicit opt-in for exhaustive enumeration, and always surface when results were truncated.
Summary
We now have a precise interprocedural call graph available two ways: in-memory as a NetworkX
DiGraph, and — with the new TypeScript Neo4j backend (TSNeo4jBackend) — as a property graph queryable with Cypher. On top of that call graph we should expose a small, composable family of reachability queries whose purpose is source↔sink matching.This is intentionally broader than a single "reverse reachability" method. The objective is a coherent API in which sources and sinks are just sets of call-graph nodes, and we can answer the three fundamental questions between them (forward, reverse, and witness paths). Source↔sink matching is the headline use case; the three primitives compose into it.
Motivation
"Can untrusted input reach a dangerous operation, and how?" is the core question behind taint/data-flow-style analysis, attack-surface mapping, and security triage. With the call graph as the substrate we can answer the call-graph-reachability approximation of this cheaply — especially over Neo4j, where transitive reachability is exactly what the engine is built for.
Terminology
process.env,req.body, a deserialization entry, etc.child_process.exec,fs.writeFile, a DB driver call,node:crypto.*, etc.areachesbiff there is a directed patha -CALLS*-> bin the call graph.Sources and sinks should be expressible flexibly: explicit signature sets, predicates (e.g. "every
Externalsymbol whose module isnode:child_process"), or built-in catalogs (entrypoints as sources; a curated list of dangerous externals as sinks). The graph already distinguishes internal callables fromExternalphantom symbols and marks entrypoints, so these catalogs are largely already present.The family of operations
(source, sink)pairs, optionally each with a witness path. This is the headline "source↔sink matching" capability; it is built on 1–3.Why this fits the graph backends
The key insight is that reverse reachability needs no new data — it is the forward query with the edge direction flipped. The same holds across backends:
MATCH (s {signature:$src})-[:CALLS*1..]->(t)nx.descendants(G, src)MATCH (s)-[:CALLS*1..]->(t {signature:$sink})nx.ancestors(G, sink)shortestPath(...),allShortestPaths(...), APOCapoc.path.*for bounded k-pathsnx.shortest_path,nx.all_shortest_paths,nx.all_simple_paths(cutoff=…)Neo4j makes the transitive case a one-liner and scales it; the in-memory backends give identical semantics for languages without a graph DB.
Design notes / open questions
TSAnalysisBackend/JavaAnalysisBackend/PythonAnalysisBackend) and surfaced on the façades, so every language with a call graph gets them — Neo4j just makes them cheap. (The backend ABCs landed in Add Neo4j-backed TypeScript analysis backend and formalize backend contracts #159.)max_depth/ hop bound;max_paths; shortest-only vs. all; include/excludeExternal(phantom) nodes; filter by edgeprovenance/ dispatch tags; restrict to a module subtree.k-bounded with an explicit opt-in for exhaustive enumeration, and always surface when results were truncated.Tracking
The composed many-to-many "matching matrix" is the end goal and can be a thin convenience over the three once they exist.