A Python static-analysis toolkit โ the CLDK backend that emits a canonical symbol table and call graph, as analysis.json or a Neo4j property graph.
canpy is a static analyzer for Python built on Jedi, with optional
CodeQL-resolved call edges and
Tree-sitter parsing. It produces the canonical CodeLLM-DevKit
(CLDK) analysis.json โ a symbol table plus a call graph โ and can project that same analysis into a
Neo4j property graph. It is the Python backend behind
CLDK, mirroring its
TypeScript (cants) and
Java siblings.
Every run produces a symbol table and a call graph. Edges come from Jedi's lexical resolution by
default; --codeql resolves additional edges (RPC / third-party / dynamically-dispatched targets)
and merges them with the Jedi-derived edges, also backfilling callees Jedi could not resolve.
- Symbol table โ modules, classes, functions, methods, variables, decorators, imports, and docstrings, with precise source spans.
- Call graph โ Jedi's lexical resolver by default, with optional CodeQL-resolved edges
(
--codeql) for RPC / third-party / dynamically-dispatched targets, merged with the Jedi edges; CodeQL also backfills callees Jedi could not resolve. - Neo4j output โ project the analysis into a labeled property graph: a self-contained
graph.cyphersnapshot, or an incremental push to a live database over Bolt. - Versioned schema โ a machine-readable, version-stamped Neo4j schema contract (
--emit schema), checked in asschema.neo4j.jsonand shipped with every release. - Incremental cache โ per-file results are cached under
.codeanalyzer;--lazy(default) reuses them,--eagerforces a clean rebuild.--raydistributes the work across cores. - Compact output โ canonical
analysis.json, or binaryanalysis.msgpackfor smaller artifacts.
-
Python 3.10 or newer.
-
A C toolchain and the
venv/ development headers โ the analyzer builds an isolated virtual environment per project (via Python'svenv) so Jedi can resolve types and imports:# Ubuntu / Debian sudo apt install python3-venv python3-dev build-essential # Fedora / RHEL / CentOS sudo dnf group install "Development Tools" && sudo dnf install python3-venv python3-devel # macOS xcode-select --install
pip install codeanalyzer-python
canpy --helpFor the optional live Neo4j push (--emit neo4j --neo4j-uri โฆ), install the neo4j extra:
pip install 'codeanalyzer-python[neo4j]'Install the CLI as an isolated tool with the one-line installer (provisions via uv / pipx / pip):
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/codellm-devkit/codeanalyzer-python/releases/latest/download/canpy-installer.sh | shbrew install codellm-devkit/tap/codeanalyzer-pythonThe formula depends on uv and installs canpy as an isolated,
version-pinned uv tool (the package and its dependencies are resolved and cached on first run).
This project uses uv for dependency management.
git clone https://github.com/codellm-devkit/codeanalyzer-python
cd codeanalyzer-python
uv sync --all-groups
uv run canpy --helpcanpy --input /path/to/python/projectWith no --output, the analysis is printed to stdout as compact JSON; with --output <dir> it is
written to analysis.json (or graph.cypher for --emit neo4j, or analysis.msgpack with
--format msgpack) in that directory.
$ canpy --help
Usage: canpy [OPTIONS] COMMAND [ARGS]...
Static Analysis on Python source code using Jedi, CodeQL and Tree sitter.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input -i PATH Path to the โ
โ project root โ
โ directory (not โ
โ required for โ
โ --emit schema). โ
โ --output -o PATH Output directory โ
โ for artifacts. โ
โ --format -f [json|msgpack] Output format for โ
โ --emit json: json โ
โ or msgpack. โ
โ [default: json] โ
โ --emit [json|neo4j|sche Output target: โ
โ ma] json โ
โ (analysis.json, โ
โ default) | neo4j โ
โ (graph.cypher or โ
โ live Bolt push) | โ
โ schema (the Neo4j โ
โ schema.json โ
โ contract). โ
โ [default: json] โ
โ --app-name TEXT Logical โ
โ application name โ
โ for the graph โ
โ :PyApplication โ
โ anchor (default: โ
โ input dir name). โ
โ --neo4j-uri TEXT Push the graph to โ
โ a live Neo4j over โ
โ Bolt โ
โ (incremental); โ
โ omit to write โ
โ graph.cypher. โ
โ [env var: โ
โ NEO4J_URI] โ
โ --neo4j-user TEXT Neo4j username. โ
โ [env var: โ
โ NEO4J_USERNAME] โ
โ [default: neo4j] โ
โ --neo4j-password TEXT Neo4j password. โ
โ Prefer the env โ
โ var over the flag โ
โ (the flag is โ
โ visible in shell โ
โ history / process โ
โ list). โ
โ [env var: โ
โ NEO4J_PASSWORD] โ
โ [default: neo4j] โ
โ --neo4j-database TEXT Neo4j database โ
โ name (default: โ
โ server default). โ
โ [env var: โ
โ NEO4J_DATABASE] โ
โ --codeql --no-codeql Enable โ
โ CodeQL-based โ
โ analysis. โ
โ [default: โ
โ no-codeql] โ
โ --ray --no-ray Enable Ray for โ
โ distributed โ
โ analysis. โ
โ [default: no-ray] โ
โ --eager --lazy Enable eager or โ
โ lazy analysis. โ
โ Defaults to lazy. โ
โ [default: lazy] โ
โ --skip-tests --include-tests Skip test files โ
โ in analysis. โ
โ [default: โ
โ skip-tests] โ
โ --no-venv --venv Skip virtualenv โ
โ creation and โ
โ dependency โ
โ installation; โ
โ resolve imports โ
โ against the โ
โ ambient Python โ
โ environment โ
โ instead. โ
โ [default: venv] โ
โ --file-name PATH Analyze only the โ
โ specified file โ
โ (relative to โ
โ input directory). โ
โ --cache-dir -c PATH Directory to โ
โ store analysis โ
โ cache. Defaults โ
โ to โ
โ '.codeanalyzer' โ
โ in the input โ
โ directory. โ
โ --clear-cache --keep-cache Clear cache after โ
โ analysis. By โ
โ default, cache is โ
โ retained. โ
โ [default: โ
โ keep-cache] โ
โ -v INTEGER Increase โ
โ verbosity: -v, โ
โ -vv, -vvv โ
โ [default: 0] โ
โ --help Show this message โ
โ and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
-
Basic analysis to stdout, or to a file:
canpy --input ./my-python-project # compact JSON on stdout canpy --input ./my-python-project --output ./out # โ ./out/analysis.json
-
Binary output (msgpack):
canpy --input ./my-python-project --output ./out --format msgpack # โ ./out/analysis.msgpack -
Resolve extra call edges with CodeQL:
canpy --input ./my-python-project --codeql
By default, edges come from Jedi's lexical analysis. Adding
--codeqlresolves additional edges (including RPC / third-party / dynamically-dispatched targets) and merges them with the Jedi-derived edges; CodeQL also backfills resolved callees Jedi could not resolve. CodeQL integration is experimental; the CLI is downloaded into<cache_dir>/codeql/on first use. -
Emit a Neo4j snapshot, or push to a live database:
canpy --input ./my-python-project --emit neo4j --output ./out # โ ./out/graph.cypher canpy --input ./my-python-project --emit neo4j \ --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password secret -
Emit the Neo4j schema contract:
canpy --emit schema # print schema.json to stdout (no project needed) canpy --emit schema --output ./out # โ ./out/schema.json
-
Force a clean rebuild with a custom cache directory:
canpy --input ./my-python-project --eager --cache-dir /path/to/custom-cache
canpy builds one analysis in memory and can emit it three ways (--emit):
A PyApplication document โ the canonical CLDK contract:
By default this is printed to stdout in JSON; with --output it is written to analysis.json (or
analysis.msgpack with --format msgpack, a more compact binary format).
--emit neo4j projects the same analysis into a labeled property graph. Every node label is
Py-prefixed and every relationship type is PY_-prefixed (e.g. :PyClass, PY_CALLS) so multiple
language analyzers can share one database without label or relationship-type collisions. Declarations
are keyed by their signature under a shared :PySymbol label; calls, imports, inheritance,
decorators, and call sites are relationships:
- Without
--neo4j-uriโ writes a self-containedgraph.cypher(constraints + indexes, a scoped wipe, then batchedMERGEs). Load it withcypher-shell < graph.cypher. Needs no extra dependencies. - With
--neo4j-uriโ pushes to a live Neo4j over Bolt incrementally: only modules whose content hash changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires theneo4jextra. Every graph carries aschema_versionon its:PyApplicationnode.
Call-graph endpoints that aren't present in the symbol table (third-party / framework / RPC targets)
are materialized as :PyExternal ghost nodes, mirroring the analyzer's own ghost-node behaviour.
The connection options also read from the standard Neo4j environment variables โ NEO4J_URI,
NEO4J_USERNAME, NEO4J_PASSWORD, NEO4J_DATABASE โ when the corresponding flag is omitted (an
explicit flag wins). Prefer the env var for the password so it doesn't land in shell history or the
process list:
export NEO4J_URI=bolt://localhost:7687
export NEO4J_PASSWORD=secret
canpy -i ./my-project --emit neo4j # credentials picked up from the environment--emit schema writes the machine-readable, version-stamped Neo4j schema (schema.json: node labels,
relationships, properties, constraints, and indexes). It needs no project and is checked into the repo
as schema.neo4j.json and bundled in every release as a GitHub Release asset, so a consumer can
validate producer/consumer compatibility without invoking the tool. The shape of the contract matches
the codeanalyzer-typescript backend.
A UML of the analysis.json schema (the PyApplication containment tree) is checked in as
schema-uml.drawio, and the property-graph schema as
neo4j-schema.drawio.
This project uses uv.
uv sync --all-groups
uv run canpy --input /path/to/project # run from source
uv run canpy --emit schema > schema.neo4j.json # regenerate the checked-in schema contract
uv run python scripts/update_readme.py # regenerate the canpy --help block above
uv run pytest # run the test suiteThe Neo4j schema-conformance test always runs. The Neo4j bolt integration test spins up a real Neo4j via Testcontainers and is opt-in โ it needs a container runtime (Docker or Podman) and is enabled with an environment variable:
RUN_CONTAINER_TESTS=1 uv run pytest test/test_neo4j_bolt.py -sApache 2.0 โ see LICENSE.
{ "symbol_table": { /* file path โ module (classes, functions, variables, imports, โฆ) */ }, "call_graph": [ /* CALL_DEP edges: { source, target, weight, provenance } keyed by callable signature */ ] }