Sorted collections#17
Open
codeboost wants to merge 14 commits into
Open
Conversation
Implements on-disk sorted maps wrapping the Java SortedMap (rank-augmented B-tree), behaving like Clojure's sorted-map for string and keyword keys. - xitdb.util.sorted-key: order-preserving, reversible key codec (1-byte type tag + UTF-8) for strings and keywords. - xitdb.util.sorted-operations: bridges wrapper types to Read/WriteSortedMap. - xitdb.sorted-map: XITDBSortedMap (read) and XITDBWriteSortedMap (write), modelled on xitdb.hash-map, with ordered seq and print-method. - conversion/v->slot!: detect PersistentTreeMap before the generic map? branch, persist as SORTED_MAP; reject custom comparators. - xitdb-types/read-from-cursor: SORTED_MAP read dispatch. Sorted/Indexed/Reversible (subseq, nth, rseq) and numeric/temporal keys are deferred to Issues 2 and 3. Includes PRD and issue breakdown. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extend the order-preserving sorted-key codec with tagged encodings for long, double, java.time.Instant and java.util.Date, so numeric and temporal keys sort in their natural order on disk. - long: 8-byte big-endian with the sign bit flipped (XOR 0x80) so signed integers sort correctly under unsigned byte comparison. - double: IEEE-754 big-endian with the order-preserving bit flip (flip all bits when sign set, else flip sign bit). NaN is rejected (ordering undefined). - Instant: epoch-second (sign-flipped) + nano-of-second (4-byte BE). - Date: epoch-millis, sign-flipped 8-byte BE; distinct tag from Instant. Cross-type tag order: long < double < instant < date < string < keyword, making heterogeneous-key comparison total (never throws). Tests: round-trip + order-preservation property tests (deterministic randomized loops, fixed seeds; test.check is not a dependency), boundary values, cross-type-never-throws, and :memory-DB integration showing numeric/temporal keys iterate in natural order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make XITDBSortedMap a fully sorted Clojure collection so sorted?, subseq, rsubseq, seq, rseq and indexed nth work against disk. - clojure.lang.Sorted: comparator (Arrays.compareUnsigned over encoded keys, consistent with the engine and total across types), entryKey, seq(ascending?), seqFrom(k, ascending?). Ascending seqFrom uses the native O(log n) iteratorFrom lower-bound seek; descending uses rank + a lazy descending getIndexKeyValuePair index walk (no native reverse iterator exists). - clojure.lang.Indexed: nth via getIndexKeyValuePair (negative indices count from the end; nth/2 returns not-found, nth/1 throws IndexOutOfBoundsException out of range). - clojure.lang.Reversible: rseq (lazy descending walk). seqFrom/rseq helpers return nil (not an empty lazy-seq) when empty so clojure.core/subseq's when-let short-circuits instead of NPEing. Added sorted-key/key-comparator and sorted-operations helpers (smap-seq-from, smap-nth, smap-rank, smap-rseq). Tests assert subseq / rsubseq / rseq / nth entry-for-entry against a plain sorted-map oracle for all bound forms, plus empty-map and negative-index edge cases. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Persist clojure.lang.PersistentTreeSet as a SORTED_SET (a SortedMap with no values, member-as-key) and expose it as a fully ordered Clojure set. - conversion/v->slot!: detect PersistentTreeSet before the generic set? branch; reject non-default comparators. Generalise default-sorted-comparator? to any clojure.lang.Sorted. - sorted-operations: sset-* helpers (count/contains/assoc/disj/empty/seq/ seq-from/nth/rank/rseq), decoding members from each entry's key cursor. - xitdb.sorted-set: XITDBSortedSet (read) implementing IPersistentSet, Counted, Seqable, IFn, Iterable plus Sorted/Indexed/Reversible (subseq/rsubseq/rseq/ nth/sorted?); XITDBWriteSortedSet (write) with mutating conj/disj/empty. - read-from-cursor: SORTED_SET tag dispatch. Members of string/keyword/long/double/Instant/Date all iterate in natural order; ordered ops verified entry-for-entry against a plain sorted-set oracle. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New public namespace xitdb.sorted exposing the rank-augmented B-tree's
superpowers over both XITDBSortedMap and XITDBSortedSet:
- (rank coll k) - O(log n) count of entries strictly less than k; the
index of a present key/member, or the would-be
insertion index of an absent one. Inverse of nth.
- (from-index coll n) - lazy ordered seq starting at rank n, backed by the
engine's iteratorFromIndex (O(log n) seek + streaming
walk); does not materialise the whole collection.
- (page coll offset limit) - lazy ordered page, stops cleanly at the end.
For a sorted map elements are MapEntry pairs; for a sorted set, members. The
public ns is a thin dispatch over common/-unwrap; the streaming work lives in
sorted-operations (smap-seq-from-index / sset-seq-from-index).
Documented and tested as a timestamp->id secondary index that is paged
chronologically. rank/nth verified as inverses; pagination verified lazy
against a 2000-entry collection.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The array-list and linked-list element writers predate sorted-collection support and dispatched a PersistentTreeMap/PersistentTreeSet through their generic map?/set? branches, silently persisting it as a HASH_MAP/HASH_SET. Ordering was then lost (sorted? false, subseq/rank broken); it only looked correct when key hashes happened to iterate in order. Delegate the tree-type cases to v->slot! (which already checks the tree types before the hash branches and rejects custom comparators) in both coll->ArrayListCursor! and list->LinkedArrayListCursor!. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These were planning artifacts for the sorted map/set work, not intended to live in the repo. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eys, cursor writes, negative offsets - Keyword keys now encode with a namespace-presence flag + ns/name layout so byte order matches Clojure's default comparator (non-namespaced before namespaced) and (keyword nil "a/b") no longer collides with :a/b. - Materialization/printing/empty rebuild with (sorted-map-by key-comparator) / (sorted-set-by key-comparator) so heterogeneous key types don't throw. - write-cursor-for-key handles Tag/SORTED_MAP so keypath writes can traverse into sorted maps; xit-tag->keyword maps the sorted tags. - from-index/page reject negative ranks eagerly with IllegalArgumentException. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sorted-set member error - default-sorted-comparator? now also accepts sorted-key/key-comparator (the engine's own byte ordering), so a materialized sorted map/set — which materialize stamps with that comparator — can be written back into a db instead of being rejected as a custom comparator. - write-cursor-for-key now handles Tag/SORTED_SET with a clear, specific error: sorted-set members are immutable B-tree keys (the engine only exposes a writeable value slot, unused by sets), so there is no in-place member cursor. The message points to conj/disj on the set, replacing the confusing generic "Cannot get cursor ... tag ':sorted-set'". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements the six items from the code review, developed test-first (TDD). 1. Numeric keys now share one order-preserving space so longs and doubles interleave by value (1 < 1.5 < 2) instead of being split into two adjacent type-tagged ranges. Previously a double bound on a long-keyed map (e.g. `(subseq m >= 1.5)`) returned nothing, and storing a sorted-map with mixed numeric keys silently reordered it on disk. A single `tag-number` now carries a double-precision sort key + subtype + exact bytes (reversible, type- preserving). Same-type ordering is exact; the only residual is cross-type ordering of values differing beyond 2^53, documented in `number-body`. NOTE: changes the on-disk encoding of numeric sorted keys (unreleased feature). 2. Remove unused `xitdb.util.conversion` require from `xitdb.sorted-map`. 3. `smap-empty!` / `sset-empty!` now mirror `operations/map-empty!` and write a fresh empty SORTED_MAP/SET slot. The previous discarded-constructor approach silently degraded the collection to a hash map/set after `(swap! db empty)`, so keys reinserted afterwards lost sorted semantics. Now verified by tests. 4. `XITDBWriteSortedMap` / `XITDBWriteSortedSet` implement Sorted/Indexed/ Reversible, so `nth`/`subseq`/`rseq`/`comparator` work on the value handed to `swap!` (the write types are Read* subclasses, so ops delegate directly). 5. Integer keys outside the signed 64-bit long range now fail fast with a clear, key-specific message instead of a raw numeric-cast error. 6. README: new "Sorted collections" section (ordered ops + `xitdb.sorted` rank/page/from-index) and sorted maps/sets added to Supported Data Types. Full suite: 164 tests, 1050 assertions, 0 failures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprXbzsgFtSJnXpR9G12dH
Type-hint the per-element decode helpers so `seq`/iteration over sorted maps and sets no longer reflect: - sorted_operations.clj: hint `kvpair->entry`, `member-from-cursor` and `kvpair->member` params with `ReadCursor$KeyValuePairCursor` / `ReadCursor` (added the inner class to the import). These run once per element. - sorted_key.clj: coerce the `String(byte[], int, int, Charset)` offset/length args to int in `decode-keyword`'s namespaced branch. Both feature namespaces now compile reflection-clean. Full suite: 164 tests, 1050 assertions, 0 failures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprXbzsgFtSJnXpR9G12dH
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.