-
Notifications
You must be signed in to change notification settings - Fork 511
Add basic assumptions to AGENTS.md #3552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rambleraptor
wants to merge
1
commit into
apache:main
Choose a base branch
from
rambleraptor:agents-md
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+66
−3
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,14 +17,77 @@ | |
| under the License. | ||
| --> | ||
|
|
||
| # Apache Iceberg Python — Agent Instructions | ||
| # PyIceberg — Agent Instructions | ||
|
|
||
| This file provides repository-specific guidance for automated agents working | ||
| in this repository. | ||
| Project conventions, architecture, and coding patterns synthesized from PyIceberg developers. | ||
|
|
||
| ## Security Model | ||
|
|
||
| When assessing potential vulnerabilities or calibrating automated security | ||
| findings, use [`SECURITY-THREAT-MODEL.md`](SECURITY-THREAT-MODEL.md) as the | ||
| authoritative detailed description of this repository's security boundaries, | ||
| trust assumptions, and non-boundaries. | ||
|
|
||
| ## Architecture | ||
|
|
||
| PyIceberg is a pure-Python library — it has no separate engine modules. The code | ||
| lives under `pyiceberg/`, organized by concern rather than by engine: | ||
|
|
||
| - **`schema.py`, `types.py`, `transforms.py`, `partitioning.py`, `conversions.py`**: The table spec core — schemas, types, partition specs, and value conversions. Must stay engine- and catalog-agnostic. | ||
| - **`table/`**: Table abstraction, metadata (`metadata.py`), snapshots, refs, sort orders, and the transaction/update machinery (`table/update/`). The commit path lives here. | ||
| - **`catalog/`**: Catalog implementations — `rest/`, `hive`, `glue`, `dynamodb`, `sql`, `bigquery_metastore`, `memory`, `noop`. New catalogs subclass the base `Catalog` in `catalog/__init__.py`. Catalog-specific assumptions must not leak into `table/` or the spec core. | ||
| - **`io/`**: The `FileIO` abstraction over storage. `pyarrow.py` and `fsspec.py` are the two backends. Never hard-code a storage SDK where `FileIO` exists. | ||
| - **`expressions/`**: The expression DSL and its visitors (predicate binding, projection, evaluation). | ||
| - **`avro/`, `manifest.py`**: Manifest and Avro read/write — performance-sensitive, partly accelerated by Cython. | ||
| - **`cli/`**: The `pyiceberg` command-line interface (Click + Rich). | ||
| - **`utils/`**: Shared helpers (`deprecated.py`, `concurrent.py`, `config.py`, `bin_packing.py`, `singleton.py`, etc.). Check here before writing new utility code. | ||
|
|
||
| ## Coding Conventions | ||
|
|
||
| ### Style & Typing | ||
|
|
||
| - Formatting and linting are enforced by `ruff` via `prek` (pre-commit): line length **130**, double-quoted strings, isort with `pyiceberg`/`tests` as first-party. Run `make lint` — ruff autofixes most issues. | ||
| - Full type annotations are required (`mypy` runs in strict mode: `disallow_untyped_defs`, `no_implicit_optional`, `warn_unused_ignores`). Avoid Any types. | ||
| - Docstrings follow the project's pydocstyle config; one-line summaries on public functions. No personal pronouns in comments. | ||
| - Define typed exceptions in `pyiceberg/exceptions.py` and raise the most specific one; don't raise bare `Exception`. | ||
| - Comments should be succinct and follow the same style of comments found in the rest of the codebase. | ||
|
|
||
| ### Dependencies | ||
|
|
||
| - Large/integration libraries must be **optional extras** in `pyproject.toml`, not core `dependencies`. | ||
|
|
||
| ## Testing | ||
|
|
||
| - Bias towards adding tests to existing files, rather than creating new files. | ||
| - Use existing test fixtures when possible. | ||
|
|
||
| ## Commands | ||
|
|
||
| - **Install / set up dev env:** `make install` (installs `uv`, syncs all extras, builds Cython, installs pre-commit hooks) | ||
| - **Run unit tests:** `make test` | ||
| - **Run a subset:** `make test PYTEST_ARGS="-v -k <pattern>"` | ||
| - **Integration tests (Spark/Docker):** `make test-integration` (rebuild with `make test-integration-rebuild`) | ||
| - **Cloud storage suites:** `make test-s3` / `make test-adls` / `make test-gcs` | ||
| - **Lint & format:** `make lint` | ||
| - **Docs preview / build:** `make docs-serve` / `make docs-build` | ||
| - **Clean build artifacts:** `make clean` | ||
| - **Use a specific Python:** prefix with `PYTHON=3.12`, e.g. `PYTHON=3.12 make install` | ||
|
|
||
| ## PR & Commit Conventions | ||
|
|
||
| - Ensure that there are no existing PRs for this feature before beginning development. | ||
| - One concern per PR. Keep unrelated formatting/import churn out of feature PRs. | ||
| - Keep the first version of a PR minimal; defer optimizations and edge cases to follow-ups. | ||
| - Commit messages explain the *what* and *why*, not line-by-line implementation. Be as succinct as possible. | ||
| - The Apache License header is required on every new source file (enforced by `./dev/check-license` and pre-commit). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The license header isn't enforced by pre-commit as far as I confirmed. |
||
| - Run `make lint` and `make test` before pushing; CI runs both plus the lockfile check. | ||
|
|
||
| ## Boundaries | ||
|
|
||
| - **Never** add a hard (non-optional) dependency without discussion — keep heavy/integration libraries as optional extras. | ||
| - **Never** edit `uv.lock` by hand — regenerate it with `uv lock`. | ||
| - **Never** reach for a storage SDK (`boto3`, `s3fs`, `gcsfs`, `adlfs`) outside its `FileIO` backend module. | ||
| - **Never** break the public API or remove a public method without a `@deprecated` cycle. | ||
| - **Never** leak catalog- or engine-specific assumptions into the spec core (`schema.py`, `types.py`, `table/metadata.py`). | ||
| - **Never** commit secrets, credentials, or cloud tokens (integration tests can delete catalog data — never point them at production). | ||
| - **Ask first** before adding any third-party dependency or promoting internal (`_`-prefixed) APIs to public. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ValueError is widely used in this project. We could rephrase it like this: