Skip to content

Sorted collections#17

Open
codeboost wants to merge 14 commits into
masterfrom
sorted-collections
Open

Sorted collections#17
codeboost wants to merge 14 commits into
masterfrom
sorted-collections

Conversation

@codeboost

Copy link
Copy Markdown
Owner

No description provided.

codeboost and others added 14 commits June 23, 2026 17:54
Implements on-disk sorted maps wrapping the Java SortedMap (rank-augmented
B-tree), behaving like Clojure's sorted-map for string and keyword keys.

- xitdb.util.sorted-key: order-preserving, reversible key codec (1-byte
  type tag + UTF-8) for strings and keywords.
- xitdb.util.sorted-operations: bridges wrapper types to Read/WriteSortedMap.
- xitdb.sorted-map: XITDBSortedMap (read) and XITDBWriteSortedMap (write),
  modelled on xitdb.hash-map, with ordered seq and print-method.
- conversion/v->slot!: detect PersistentTreeMap before the generic map?
  branch, persist as SORTED_MAP; reject custom comparators.
- xitdb-types/read-from-cursor: SORTED_MAP read dispatch.

Sorted/Indexed/Reversible (subseq, nth, rseq) and numeric/temporal keys
are deferred to Issues 2 and 3. Includes PRD and issue breakdown.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extend the order-preserving sorted-key codec with tagged encodings for
long, double, java.time.Instant and java.util.Date, so numeric and
temporal keys sort in their natural order on disk.

- long: 8-byte big-endian with the sign bit flipped (XOR 0x80) so signed
  integers sort correctly under unsigned byte comparison.
- double: IEEE-754 big-endian with the order-preserving bit flip (flip all
  bits when sign set, else flip sign bit). NaN is rejected (ordering
  undefined).
- Instant: epoch-second (sign-flipped) + nano-of-second (4-byte BE).
- Date: epoch-millis, sign-flipped 8-byte BE; distinct tag from Instant.

Cross-type tag order: long < double < instant < date < string < keyword,
making heterogeneous-key comparison total (never throws).

Tests: round-trip + order-preservation property tests (deterministic
randomized loops, fixed seeds; test.check is not a dependency), boundary
values, cross-type-never-throws, and :memory-DB integration showing
numeric/temporal keys iterate in natural order.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make XITDBSortedMap a fully sorted Clojure collection so sorted?, subseq,
rsubseq, seq, rseq and indexed nth work against disk.

- clojure.lang.Sorted: comparator (Arrays.compareUnsigned over encoded
  keys, consistent with the engine and total across types), entryKey,
  seq(ascending?), seqFrom(k, ascending?). Ascending seqFrom uses the
  native O(log n) iteratorFrom lower-bound seek; descending uses rank +
  a lazy descending getIndexKeyValuePair index walk (no native reverse
  iterator exists).
- clojure.lang.Indexed: nth via getIndexKeyValuePair (negative indices
  count from the end; nth/2 returns not-found, nth/1 throws
  IndexOutOfBoundsException out of range).
- clojure.lang.Reversible: rseq (lazy descending walk).

seqFrom/rseq helpers return nil (not an empty lazy-seq) when empty so
clojure.core/subseq's when-let short-circuits instead of NPEing.

Added sorted-key/key-comparator and sorted-operations helpers
(smap-seq-from, smap-nth, smap-rank, smap-rseq). Tests assert subseq /
rsubseq / rseq / nth entry-for-entry against a plain sorted-map oracle
for all bound forms, plus empty-map and negative-index edge cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Persist clojure.lang.PersistentTreeSet as a SORTED_SET (a SortedMap with no
values, member-as-key) and expose it as a fully ordered Clojure set.

- conversion/v->slot!: detect PersistentTreeSet before the generic set? branch;
  reject non-default comparators. Generalise default-sorted-comparator? to any
  clojure.lang.Sorted.
- sorted-operations: sset-* helpers (count/contains/assoc/disj/empty/seq/
  seq-from/nth/rank/rseq), decoding members from each entry's key cursor.
- xitdb.sorted-set: XITDBSortedSet (read) implementing IPersistentSet, Counted,
  Seqable, IFn, Iterable plus Sorted/Indexed/Reversible (subseq/rsubseq/rseq/
  nth/sorted?); XITDBWriteSortedSet (write) with mutating conj/disj/empty.
- read-from-cursor: SORTED_SET tag dispatch.

Members of string/keyword/long/double/Instant/Date all iterate in natural
order; ordered ops verified entry-for-entry against a plain sorted-set oracle.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New public namespace xitdb.sorted exposing the rank-augmented B-tree's
superpowers over both XITDBSortedMap and XITDBSortedSet:

- (rank coll k)        - O(log n) count of entries strictly less than k; the
                          index of a present key/member, or the would-be
                          insertion index of an absent one. Inverse of nth.
- (from-index coll n)  - lazy ordered seq starting at rank n, backed by the
                          engine's iteratorFromIndex (O(log n) seek + streaming
                          walk); does not materialise the whole collection.
- (page coll offset limit) - lazy ordered page, stops cleanly at the end.

For a sorted map elements are MapEntry pairs; for a sorted set, members. The
public ns is a thin dispatch over common/-unwrap; the streaming work lives in
sorted-operations (smap-seq-from-index / sset-seq-from-index).

Documented and tested as a timestamp->id secondary index that is paged
chronologically. rank/nth verified as inverses; pagination verified lazy
against a 2000-entry collection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The array-list and linked-list element writers predate sorted-collection
support and dispatched a PersistentTreeMap/PersistentTreeSet through their
generic map?/set? branches, silently persisting it as a HASH_MAP/HASH_SET.
Ordering was then lost (sorted? false, subseq/rank broken); it only looked
correct when key hashes happened to iterate in order.

Delegate the tree-type cases to v->slot! (which already checks the tree
types before the hash branches and rejects custom comparators) in both
coll->ArrayListCursor! and list->LinkedArrayListCursor!.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These were planning artifacts for the sorted map/set work, not intended
to live in the repo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eys, cursor writes, negative offsets

- Keyword keys now encode with a namespace-presence flag + ns/name layout so
  byte order matches Clojure's default comparator (non-namespaced before
  namespaced) and (keyword nil "a/b") no longer collides with :a/b.
- Materialization/printing/empty rebuild with (sorted-map-by key-comparator) /
  (sorted-set-by key-comparator) so heterogeneous key types don't throw.
- write-cursor-for-key handles Tag/SORTED_MAP so keypath writes can traverse
  into sorted maps; xit-tag->keyword maps the sorted tags.
- from-index/page reject negative ranks eagerly with IllegalArgumentException.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sorted-set member error

- default-sorted-comparator? now also accepts sorted-key/key-comparator (the
  engine's own byte ordering), so a materialized sorted map/set — which
  materialize stamps with that comparator — can be written back into a db
  instead of being rejected as a custom comparator.
- write-cursor-for-key now handles Tag/SORTED_SET with a clear, specific error:
  sorted-set members are immutable B-tree keys (the engine only exposes a
  writeable value slot, unused by sets), so there is no in-place member cursor.
  The message points to conj/disj on the set, replacing the confusing generic
  "Cannot get cursor ... tag ':sorted-set'".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements the six items from the code review, developed test-first (TDD).

1. Numeric keys now share one order-preserving space so longs and doubles
   interleave by value (1 < 1.5 < 2) instead of being split into two adjacent
   type-tagged ranges. Previously a double bound on a long-keyed map (e.g.
   `(subseq m >= 1.5)`) returned nothing, and storing a sorted-map with mixed
   numeric keys silently reordered it on disk. A single `tag-number` now carries
   a double-precision sort key + subtype + exact bytes (reversible, type-
   preserving). Same-type ordering is exact; the only residual is cross-type
   ordering of values differing beyond 2^53, documented in `number-body`.
   NOTE: changes the on-disk encoding of numeric sorted keys (unreleased feature).

2. Remove unused `xitdb.util.conversion` require from `xitdb.sorted-map`.

3. `smap-empty!` / `sset-empty!` now mirror `operations/map-empty!` and write a
   fresh empty SORTED_MAP/SET slot. The previous discarded-constructor approach
   silently degraded the collection to a hash map/set after `(swap! db empty)`,
   so keys reinserted afterwards lost sorted semantics. Now verified by tests.

4. `XITDBWriteSortedMap` / `XITDBWriteSortedSet` implement Sorted/Indexed/
   Reversible, so `nth`/`subseq`/`rseq`/`comparator` work on the value handed to
   `swap!` (the write types are Read* subclasses, so ops delegate directly).

5. Integer keys outside the signed 64-bit long range now fail fast with a clear,
   key-specific message instead of a raw numeric-cast error.

6. README: new "Sorted collections" section (ordered ops + `xitdb.sorted`
   rank/page/from-index) and sorted maps/sets added to Supported Data Types.

Full suite: 164 tests, 1050 assertions, 0 failures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XprXbzsgFtSJnXpR9G12dH
Type-hint the per-element decode helpers so `seq`/iteration over sorted
maps and sets no longer reflect:

- sorted_operations.clj: hint `kvpair->entry`, `member-from-cursor` and
  `kvpair->member` params with `ReadCursor$KeyValuePairCursor` / `ReadCursor`
  (added the inner class to the import). These run once per element.
- sorted_key.clj: coerce the `String(byte[], int, int, Charset)` offset/length
  args to int in `decode-keyword`'s namespaced branch.

Both feature namespaces now compile reflection-clean. Full suite: 164 tests,
1050 assertions, 0 failures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XprXbzsgFtSJnXpR9G12dH
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants