Skip to content

Releases: codellm-devkit/codeanalyzer-python

v0.2.1

22 Jun 21:29
c02b92d

Choose a tag to compare

Install codeanalyzer-python v0.2.1

Shell script (installs the canpy CLI via uv / pipx / pip):

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/codellm-devkit/codeanalyzer-python/releases/latest/download/canpy-installer.sh | sh

PyPI:

pip install codeanalyzer-python==0.2.1

For the optional live Neo4j push (--emit neo4j --neo4j-uri ...):

pip install 'codeanalyzer-python[neo4j]==0.2.1'

Download

File Description
codeanalyzer_python-0.2.1-py3-none-any.whl Python wheel
codeanalyzer_python-0.2.1.tar.gz Source distribution
canpy-installer.sh Shell installer (uv / pipx / pip)
schema.json Neo4j schema contract

What's Changed

🚀 Features

  • Analysis venv (uv + Jedi wiring), external_symbols, app-scoped prune, --no-venv (#44 #45 #46 #47) by @rahlk in #48

Full Changelog: v0.2.0...v0.2.1

v0.2.0

20 Jun 19:57
73e62a0

Choose a tag to compare

Install codeanalyzer-python v0.2.0

Shell script (installs the canpy CLI via uv / pipx / pip):

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/codellm-devkit/codeanalyzer-python/releases/latest/download/canpy-installer.sh | sh

PyPI:

pip install codeanalyzer-python==0.2.0

For the optional live Neo4j push (--emit neo4j --neo4j-uri ...):

pip install 'codeanalyzer-python[neo4j]==0.2.0'

Download

File Description
codeanalyzer_python-0.2.0-py3-none-any.whl Python wheel
codeanalyzer_python-0.2.0.tar.gz Source distribution
canpy-installer.sh Shell installer (uv / pipx / pip)
schema.json Neo4j schema contract

What's Changed

🚀 Features

  • feature/neo4j: property-graph output with Py/PY_-namespaced schema; rename CLI to canpy by @rahlk in #33

🛠 Other Changes

  • Fix CodeQL call-graph edges dropped on (file, start_line) join miss (#25) by @rahlk in #26

New contributors and full diff: https://github.com/codellm-devkit/codeanalyzer-python/releases/tag/v0.2.0

v0.1.14

13 May 23:58
787daa3

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Added

  • Call graph in analysis output: PyApplication.call_graph: List[PyCallEdge]. Every run now produces a call graph in addition to the symbol table. Edges carry source, target (both PyCallable.signature), weight, and provenance (jedi / codeql / joern).
  • call_graph module (codeanalyzer.semantic_analysis.call_graph) with to_digraph / from_digraph networkx adapters, jedi_call_graph_edges, and merge_edges. Endpoints absent from the symbol table become ghost nodes so RPC / third-party / framework edges are preserved.
  • CodeQL Python query rewritten against the CodeQL Python library (was Java idioms before). Resolves direct calls and constructor calls via ClassValue.lookup("__init__"), using the modern Value.getACall() predicate (CodeQL Python 7.x).
  • augment_call_sites: when --codeql is enabled, CodeQL backfills PyCallsite.callee_signature entries Jedi left unresolved.
  • resolve_unresolved_constructors: heuristic fallback that walks the symbol table by class short-name and scope to fill in constructor sites neither Jedi nor CodeQL resolved (common for classes nested inside functions/methods). Synthesizes <class>.__init__ signatures.
  • iter_classes_in_symbol_table: full recursive walker over classes — including inner classes, classes nested in functions, and classes nested in class methods.

Changed

  • BREAKING: Removed --analysis-level / analysis_level. The call graph is built unconditionally; use --codeql/--no-codeql to control CodeQL participation. Jedi-derived edges are always available.
  • Jedi constructor calls now resolve to <class>.__init__ (was: bare <class>). When script.infer() returns a class, the qualified name is rewritten to point at the constructor — matching where method PyCallables actually live in the symbol table. PyCallsite.is_constructor_call now reflects Jedi's type inference (was: method_name == "__init__", only true for explicit obj.__init__() calls).
  • _call_sites scope correctness: replaced naive ast.walk with _iter_calls_in_scope, which stops at nested FunctionDef / AsyncFunctionDef / ClassDef bodies (those have their own PyCallable.call_sites). Decorators, default arguments, return annotations, base classes and class keyword args are still walked since they execute in the enclosing scope. Previously, outer functions over-attributed every call from every nested definition.
  • CodeQL CLI binary is now downloaded into <cache_dir>/codeql/bin/ (per-project, respecting --cache-dir) and discovered before any CodeQL operation — including when the database cache is reused. The downloaded archive is removed after extraction.
  • CodeQLQueryRunner now accepts the resolved binary path instead of relying on PATH. The temporary .ql file is written inside a per-project qlpack (<cache_dir>/codeql/qlpack/) whose codeql/python-all dependency is resolved once via codeql pack install, eliminating the lockfile / search-path gymnastics.

Fixed

  • zipfile extraction dropped Unix permissions on the CodeQL CLI launcher, causing PermissionError on first query run. Entries are now extracted with their stored external_attr mode applied, plus a defensive chmod +x on the resolved binary.
  • rglob("codeql") matched the bundled codeql/codeql/ directory before the launcher file, returning a directory instead of an executable. Both CodeQLLoader and _ensure_codeql_bin now filter to is_file().
  • CodeQLQueryRunner crashed on subprocess errors with 'NoneType' object has no attribute 'stderr' because stderr=None returns None from communicate(). Now captures stderr=PIPE and decodes bytes safely.

Detailed Changes (auto-generated)

🚀 Features

  • Release 0.1.14: call graph + CodeQL integration

v0.1.13

22 Jul 21:53
v0.1.13
a1a3ca0

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Improved

  • CLI Help Documentation: Comprehensive help text added for all command-line options
    • Added descriptive help messages for all CLI parameters including --output, --format, --analysis-level, etc.
    • Enhanced user experience with clear option descriptions in --help output
    • Improved CLI parameter organization using Annotated type hints for better maintainability
    • Added case-insensitive support for --format option
    • Updated verbosity option help to clearly indicate multiple usage (-v, -vv, -vvv)

Technical Details

  • Refactored CLI function signature to use consistent Annotated type hint pattern
  • Added comprehensive help text for all 12 command-line options
  • Improved code organization and type safety in CLI parameter definitions

Detailed Changes (auto-generated)

v0.1.12

21 Jul 19:40
v0.1.12
133baa8

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Changed

  • BREAKING CHANGE: Refactored Codeanalyzer constructor to use AnalysisOptions dataclass in response to #12
    • Replaced multiple individual parameters with single AnalysisOptions object for cleaner API
    • Improved type safety and configuration management through centralized options structure
    • Enhanced maintainability and extensibility for future configuration additions
    • Updated CLI integration to create and pass AnalysisOptions instance
    • Maintained backward compatibility in terms of functionality while improving code architecture

Added

  • New AnalysisOptions dataclass in codeanalyzer.options module in response to #12
    • Centralized configuration structure with all analysis parameters
    • Type-safe configuration with proper defaults and validation
    • Support for OutputFormat enum integration
    • Clean separation between CLI and library configuration handling

Technical Details

  • Added new codeanalyzer.options package with AnalysisOptions dataclass
  • Updated Codeanalyzer.__init__() to accept single options parameter instead of 9 individual parameters
  • Modified CLI handler in __main__.py to create AnalysisOptions instance from command line arguments
  • Improved code organization and maintainability for configuration management
  • Enhanced API design following best practices for parameter object patterns

Detailed Changes (auto-generated)

v0.1.11

21 Jul 19:07
v0.1.11
09185c0

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Fixed

  • CRITICAL: Fixed NumPy build failure on Python 3.12+ (addresses #19)
    • Updated NumPy dependency constraints to handle Python 3.12+ compatibility
    • Split NumPy version constraints into three tiers:
      • numpy>=1.21.0,<1.24.0 for Python < 3.11
      • numpy>=1.24.0,<2.0.0 for Python 3.11.x
      • numpy>=1.26.0,<2.0.0 for Python 3.12+ (requires NumPy 1.26+ which supports Python 3.12)
    • Resolves ModuleNotFoundError: No module named 'distutils' errors on Python 3.12+
    • Ensures compatibility with Python 3.12 which removed distutils from the standard library
  • Fixed Pydantic v1/v2 compatibility issues in JSON serialization throughout codebase
    • Added comprehensive Pydantic version detection and compatibility layer
    • Introduced model_dump_json() and model_validate_json() helper functions for cross-version compatibility
    • Fixed PyApplication.parse_raw() deprecated method usage (replaced with model_validate_json())
    • Updated CLI output methods to use compatible serialization functions
    • Resolved forward reference updates only for Pydantic v1 (v2 handles these automatically)

Changed

  • Enhanced Pydantic compatibility infrastructure in schema module
    • Added runtime Pydantic version detection using importlib.metadata
    • Created compatibility abstraction layer for JSON serialization/deserialization
    • Improved forward reference resolution logic to work with both Pydantic v1 and v2
    • Updated all JSON serialization calls to use new compatibility functions
    • Better error handling for missing Pydantic dependency

Technical Details

  • Added packaging dependency for robust version comparison
  • Enhanced schema module with runtime version detection and compatibility helpers
  • Updated core analysis caching system to use compatible Pydantic JSON methods
  • Improved CLI output formatting with cross-version Pydantic support

Detailed Changes (auto-generated)

🐛 Fixes

v0.1.10

21 Jul 17:28
v0.1.10
9cc33d0

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Added

  • Ray distributed processing support for parallel symbol table generation (addresses #16)
  • --ray/--no-ray CLI flag to enable/disable Ray-based distributed analysis
  • --skip-tests/--include-tests CLI flag to control whether test files are analyzed (improves analysis performance)
  • --file-name CLI flag for single file analysis (addresses part of #16)
  • Incremental caching system with SHA256-based file change detection
    • Automatic caching of analysis results to analysis_cache.json
    • File-level caching with content hash validation to avoid re-analyzing unchanged files
    • Significant performance improvements for subsequent analysis runs
    • Cache reuse statistics logging
  • Custom exception classes for better error handling in symbol table building:
    • SymbolTableBuilderException (base exception)
    • SymbolTableBuilderFileNotFoundError (file not found errors)
    • SymbolTableBuilderParsingError (parsing errors)
    • SymbolTableBuilderRayError (Ray processing errors)
  • Enhanced PyModule schema with metadata fields for caching:
    • last_modified timestamp tracking
    • content_hash for precise change detection
  • Progress bar support for both serial and parallel processing modes
  • Enhanced test fixtures including xarray project for comprehensive testing
  • Comprehensive __init__.py exports for syntactic analysis module
  • Smart dependency installation with conditional logic:
    • Only installs requirements files when they exist (requirements.txt, requirements-dev.txt, dev-requirements.txt, test-requirements.txt)
    • Only performs editable installation when package definition files are present (pyproject.toml, setup.py, setup.cfg)
    • Improved virtual environment setup with better dependency detection and installation logic

Changed

  • BREAKING CHANGE: Updated Python version requirement from >=3.10 to >=3.9 for broader compatibility (closes #17)
  • BREAKING CHANGE: Updated dependency versions with more conservative constraints for better stability:
    • pydantic downgraded from >=2.11.7 to >=1.8.0,<2.0.0 for stability
    • pandas constrained to >=1.3.0,<2.0.0
    • numpy constrained to >=1.21.0,<1.24.0
    • rich constrained to >=12.6.0,<14.0.0
    • typer constrained to >=0.9.0,<1.0.0
    • Other dependencies updated with conservative version ranges for better compatibility
  • Major Architecture Enhancement: Complete rewrite of analysis caching system
    • analyze() method now implements intelligent caching with PyApplication serialization
    • Symbol table building redesigned to support incremental updates and cache reuse
    • File change detection using SHA256 content hashing for maximum accuracy
  • Enhanced Codeanalyzer constructor signature to accept file_name parameter for single file analysis
  • Refactored symbol table building from monolithic build() method to cache-aware file-level processing
  • Enhanced Codeanalyzer constructor signature to accept skip_tests and using_ray parameters
  • Improved error handling with proper context managers in core analyzer
  • Updated CLI to use Pydantic v1 compatible JSON serialization methods
  • Reorganized syntactic analysis module structure with proper exception handling and exports
  • Enhanced virtual environment detection with better fallback mechanisms
  • Symbol table builder now sets metadata fields (last_modified, content_hash) for all PyModule objects

Fixed

  • Fixed critical symbol table bug for nested functions (closes #15)
    • Corrected _callables() method recursion logic to properly capture both outer and inner functions
    • Previously, only inner/nested functions were being captured in the symbol table
    • Now correctly processes module-level functions, class methods, and all nested function definitions
  • Fixed nested method/function signature generation in symbol table builder
    • Corrected _callables() method to properly build fully qualified signatures for nested structures
    • Fixed issue where nested functions and methods were getting incorrect signatures (e.g., main.__init__ instead of main.outer_function.NestedClass.__init__)
    • Added prefix parameter to _callables() and _add_class() methods to maintain proper nesting context
    • Signatures now correctly reflect the full nested hierarchy (e.g., main.outer_function.NestedClass.nested_class_method.method_nested_function)
    • Updated class method processing to pass class signature as prefix to nested callable processing
    • Improved path relativization to project directory for cleaner signature generation
  • Fixed Pydantic v2 compatibility issues by reverting to v1 API (json() instead of model_dump_json())
  • Fixed missing import statements and type annotations throughout the codebase
  • Fixed symbol table builder to support individual file processing for distributed execution
  • Improved error handling in virtual environment detection and Python interpreter resolution
  • Fixed schema type annotations to use proper string keys for better serialization
  • Enhanced import ordering and removed unnecessary blank lines in CLI module
  • Improved virtual environment setup reliability:
    • Fixed unnecessary pip installs by adding conditional logic to only install when dependencies are available
    • Only attempts to install requirements files if they actually exist in the project
    • Only performs editable installation when package definition files are present
    • Prevents errors and warnings from attempting to install non-existent dependencies

Technical Details

  • Added Ray as a core dependency for distributed computing capabilities (addresses #16)
  • Implemented @ray.remote decorator for parallel file processing
  • Comprehensive caching system implementation:
    • _load_pyapplication_from_cache() and _save_analysis_cache() methods for PyApplication serialization
    • _file_unchanged() method with SHA256 content hash validation
    • Cache-aware symbol table building with selective file processing
    • Automatic cache statistics and performance reporting
  • Enhanced progress tracking for both serial and parallel execution modes with Rich progress bars
  • Updated schema to use Dict[str, PyModule] instead of dict[Path, PyModule] for better serialization
  • Extended PyModule schema with optional last_modified and content_hash fields for caching metadata
  • Added comprehensive exception hierarchy for better error classification and handling
  • Refactored symbol table building into modular, file-level processing suitable for distribution
  • Enhanced Python interpreter detection with support for multiple version managers (pyenv, conda, asdf)
  • Added hashlib integration for file content hashing throughout the codebase
  • Enhanced virtual environment setup logic:
    • Modified _add_class() method to accept prefix parameter and pass class signature to method processing
    • Updated _callables() method signature to include prefix parameter for nested context tracking
    • Enhanced signature building logic to use prefix when available, falling back to Jedi resolution for top-level definitions
    • Fixed recursive calls to pass current signature as prefix for proper nesting hierarchy
    • Implemented conditional dependency installation with existence checks for requirements files and package definition files

Notes

  • This release significantly addresses the performance improvements requested in #16:
    • ✅ Ray parallelization implemented
    • ✅ Incremental caching with SHA256-based change detection implemented
    • --file-name option for single-file analysis implemented
    • --nproc options not yet included (still uses all available cores with Ray)
  • ✅ Critical bug fix for nested function detection (#15) is now included in this version
  • Expected performance improvements: 2-10x faster on subsequent runs depending on code change frequency
  • Enhanced symbol table accuracy ensures all function definitions are properly captured
  • Virtual environment setup is now more robust and only installs dependencies when they are actually available

Detailed Changes (auto-generated)

  • no changes

v0.1.9

14 Jul 18:51
v0.1.9
4af838c

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Fixed

  • Fixed AttributeError: 'OutputFormat' object has no attribute 'casefold' when using --format flag with case-insensitive options
  • Changed OutputFormat enum to inherit from str to support typer's case-insensitive string processing

Detailed Changes (auto-generated)

  • no changes

v0.1.8

14 Jul 18:10
v0.1.8
30316a8

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Added

  • Added missing config module with OutputFormat enum for better code organization
  • Added proper __init__.py and config.py files to the config directory

Fixed

  • Fixed missing config directory files that were not previously tracked in git

Detailed Changes (auto-generated)

  • no changes

v0.1.7

14 Jul 18:03
v0.1.7
b32af01

Choose a tag to compare

Release Notes (from CHANGELOG.md)

Changed

  • Relaxed Python version requirement from ==3.10.* to >=3.10 for improved flexibility
  • Enhanced compatibility to support Python 3.10+ versions while maintaining backward compatibility

Detailed Changes (auto-generated)

  • no changes