Slot-based C# syntax tree via a Roslyn source generator#3523
Conversation
19cac08 to
cd395a7
Compare
b3ae4a2 to
4aeaf62
Compare
The C# AST inherited NRefactory's freezable model (IFreezable, Freeze, IsFrozen, a frozen flag bit, and ThrowIfFrozen guards on every mutator), but the decompiler never uses it: nothing calls Freeze(), not even the generated null-node singletons, so every IsFrozen guard only ever evaluated false. The decompiler is single-threaded and never shares or freezes nodes. Remove the whole apparatus as preparation for the slot-based AST rewrite, which has no place for it. Roles are untouched here, so the flags word keeps its role index; only the freed frozen bit goes away.
212ddc5 to
2ab01a2
Compare
Introduce a Roslyn source generator that emits the visitor boilerplate for the C# AST from [DecompilerAstNode]-tagged node declarations: the IAstVisitor interface, the AcceptVisitor overloads, the pattern-placeholder nodes, and the initial DoMatch support. AccessorKind lets an accessor's keyword be chosen independently of its role, an early step toward shedding the NRefactory role model.
The generator emits the IAstVisitor interface, the AcceptVisitor overloads, and the null-node and pattern-placeholder nodes from [DecompilerAstNode] declarations, so drop the hand-written equivalents across the C# AST: per-node AcceptVisitor/DoMatch, the #region Null / #region PatternPlaceholder blocks, IAstVisitor.cs, and now-dead usings. Also adds AccessorKind and moves IdentifierExpressionBackreference into the PatternMatching folder. Assisted-by: Claude:claude-opus-4-8:Claude Code
Each child of a C# AST node is declared as a [Slot] partial property, and the source generator emits the accessor bodies and an ordered slot schema (SlotCount/GetSlotRole/IsCollectionSlot) from them. Generating the schema keeps slot order from being mis-stated by hand and lets a DEBUG invariant check declared slot order against document order on every decompile. The node hierarchy is converted family by family; the EntityDeclaration leaves flatten their inherited Attributes/ReturnType/NameToken into each leaf's ordered slot set. Storage stays the NRefactory linked list at this stage, so only the declaration model changes and output is unchanged. Assisted-by: Claude:claude-opus-4-8:Claude Code
The InsertMissingTokensDecorator path (TokenWriter.CreateWriterThatSetsLocationsInAST) reconstructs token nodes and assigns source locations onto the AST, feeding PDB sequence points and GUI navigation. The Pretty suite never drives it, so it had no coverage at all. Before reworking the token model, lock its observable consequences: the located path emits the same text as the plain path, real nodes receive ordered locations, location-based navigation resolves into the method body, and sequence points are produced for a method body. Assisted-by: Claude:claude-opus-4-8:Claude Code
Member and local modifiers were stored as modifier token children, and a ComposedType's ref/readonly/nullable/pointer specifiers and an array rank as token and comma children. The output visitor already derived all of these from scalar accessors, so move them to plain enum/bool/int fields. This removes another dependency on token children ahead of deleting the token nodes; the emitted keyword and specifier sequences are unchanged. Assisted-by: Claude:claude-opus-4-8:Claude Code
2ab01a2 to
13768ac
Compare
There was a problem hiding this comment.
Posted by an AI agent (Claude) on Siegfried's behalf.
Reviewed the full diff with focused passes over the source generator, the AST runtime, the pattern matcher, output/locations, the null-object→nullable migration, and the node declarations.
Built clean in Release and Debug (0 warnings / 0 errors, no packages.lock.json churn). Ran the in-memory AST tests locally: PatternMatching and LocationsInAst are green (35/36; the one failure is a missing legacy reference-assembly in my environment, not the change). The matcher's index-alignment, checkpoint save/restore, and null handling also pass an out-of-process probe harness.
No correctness blockers. The inline notes are minor: one stale comment, two DoMatch tightenings that have no live consumer, an enumerator-semantics narrowing, a narrowed backreference contract, and latent (currently-unread) metadata in the generator.
Non-blocking coverage gaps (all pass when probed, just not pinned): the matcher's absent-single-optional-slot, null-capture, Repeat+OptionalNode, and backreference-across-Repeat-backtrack paths in PatternMatchingTests; and the bare catch { (type-less) sequence-point span in LocationsInAstSamples.
e5a9407 to
f6c2dbc
Compare
Source locations were virtual, computed by recursing to the first and last child, whose leftmost and rightmost leaves are token nodes; sequence-point coordinates likewise came from reconstructed token nodes. Store locations as fields assigned while printing, and derive sequence-point coordinates from the surrounding real nodes plus the decompiler's fixed formatting, so neither depends on token children. The using/foreach await modifier becomes a plain bool field. Characterization gates lock the emitted locations and PDB coordinates, which are unchanged. Assisted-by: Claude:claude-opus-4-8:Claude Code
Comments and preprocessor directives were positional children interleaved into the child list, and punctuation, keywords and operators were token-node children. Add a leading/trailing trivia side-channel for comments and directives, emit it from the output visitor, and re-home every comment receiver onto it (including inside-block comments as comment-only empty statements and undecodable attribute arguments as an ErrorExpression). With locations and sequence points no longer sourced from token nodes, stop reconstructing them on the locations path and delete CSharpTokenNode, CSharpModifierToken and InsertSpecialsDecorator. The AST no longer carries token children or positional comments; output is byte-identical. Assisted-by: Claude:claude-opus-4-8:Claude Code
Children were kept in a per-node doubly-linked list with the slot accessors layered over it as a view. Storage now is the slot model: each node stores its children in generated backing fields, AstNodeCollection<T> is backed by a List<T>, and the flattened child-index space is owned by generated GetChildCount/GetChild/SetChild/GetChildSlot members, with sibling navigation, the role API and Clone re-expressed over them and indices renumbered lazily. A DEBUG CheckInvariant runs after each transform, the analog of the IL pipeline's per-transform check, so a transform that corrupts the tree fails at that transform. Output is unchanged. Assisted-by: Claude:claude-opus-4-8:Claude Code
The pattern matcher walked collections through INode.Role/FirstChild/ NextSibling, skipping siblings of a different role. Now that each AstNodeCollection<T> is already the per-role child list, the engine matches two collections by list index, and INode sheds Role/FirstChild/NextSibling entirely. A collection exposes its IReadOnlyList<INode> view through a cached adapter rather than implementing the interface directly, so a typed collection does not become ambiguous for LINQ. Characterization tests pin the matcher's behavior first. Assisted-by: Claude:claude-opus-4-8:Claude Code
Introduce the successor to node.Role for child-slot identity: the generator emits a CSharpSlotInfo per [Slot], exposed as node.Slot, plus a shared SlotKind enum for the polymorphic "is this node in an embedded-statement / condition / base-type slot?" comparisons a per-node identity cannot express. Migrate the printer and transform position checks from node.Role to node.Slot and node.Slot.Kind, and read identifier children and role-keyed writes through the typed properties. Role is still present and is removed later; output is unchanged. Assisted-by: Claude:claude-opus-4-8:Claude Code
The flags word existed to back per-node bit state, but its only user was Identifier.IsVerbatim, and it was retained as the seed for a deferred flag-packing optimization. Measuring that optimization showed it saves essentially nothing -- the CLR already packs a node's bools and narrow enums into the object's existing alignment padding (ParameterDeclaration and BinaryOperatorExpression: 0 bytes saved by hand-packing). So IsVerbatim becomes a plain bool (free, it lands in padding) and the flags field is removed. Removing it shrinks every AstNode by 8 bytes -- the field occupied its own aligned slot rather than pairing with childIndex as assumed (a minimal node measured 64 -> 56 bytes), a universal win across the whole tree that the per-node packing never delivered. Assisted-by: Claude:claude-opus-4-8:Claude Code
…kind
From a generated-code review. InterpolatedStringExpression.Content carried a leftover
[Slot("Role")] from the old role system -- the only generic 'Role' kind among 141 generated files;
it becomes [Slot("Content")], which also drops 'Role' from the generated SlotKind enum (nothing
referenced it). The generator's internal NameSlots/NullOnEmpty identifiers were stale -- the [NameSlot]
attribute and nullOnEmpty flag they were named after no longer exist (optionality is inferred from the
nullable annotation) -- so they become NameAccessors/IsOptional, matching the optional/required wording
used elsewhere. The *AstType visit-method rename uses EndsWith instead of a Contains substring match.
Assisted-by: Claude:claude-opus-4-8:Claude Code
A blank line as the first line inside a node class body (right after the opening brace) or the last line before the closing brace, left over from the migration. Cosmetic only -- deletions of blank lines at class boundaries; method bodies are untouched. Assisted-by: Claude:claude-opus-4-8:Claude Code
Assisted-by: Claude:claude-opus-4-8:Claude Code
Assisted-by: Claude:claude-opus-4-8:Claude Code
Assisted-by: Claude:claude-opus-4-8:Claude Code
Assisted-by: Claude:claude-opus-4-8:Claude Code
Assisted-by: Claude:claude-opus-4-8:Claude Code
Completes the nullable migration for ICSharpCode.Decompiler/CSharp/Syntax: every hand-written file now carries #nullable enable. The pattern-matching API (INode/Pattern/PatternExtensions DoMatch/Match/IsMatch) takes a nullable 'other', since matching legitimately compares a pattern against a missing child. Members that genuinely return or accept null are annotated to match (AttributeSection's target token, IAnnotatable.Annotation<T>, Statement/Expression.ReplaceWith, SyntaxExtensions.GetNextStatement), and a latent null dereference in Backreference.DoMatch is fixed. Assisted-by: Claude:claude-opus-4-8:Claude Code
A DEBUG-only structural check, the analog of ILInstruction.CheckInvariant, run after every AST transform in RunTransforms. It recursively asserts that each required (non-optional) single slot holds a child, every child's Parent points back, the stored flattened index matches the slot position, and the runtime type fits the slot, so a transform that corrupts the tree fails at that transform instead of as a downstream output diff. CSharpSlotInfo gains an IsOptional flag (emitted by the generator) so the check can tell required from optional slots. Assisted-by: Claude:claude-opus-4-8:Claude Code
Add a generic CSharpSlotInfo<T> (T is the child type, the element type for a collection) and have the generator emit each per-node slot field as the typed variant. This lets the typed child accessors infer the result type from the slot, so the explicit type argument can drop out of the SlotKind-based child API in the following steps. No behavior change -- the typed slots are still consumed through the non-generic CSharpSlotInfo base. Assisted-by: Claude:claude-opus-4-8:Claude Code
GetChild/GetChildren/SetChild take a typed CSharpSlotInfo<T> and infer the child type from the slot, delegating to the existing kind-based lookups. Call sites can pass a node's static slot (e.g. node.GetChild(PropertyDeclaration.GetterSlot)) instead of an explicit type argument plus a SlotKind. Assisted-by: Claude:claude-opus-4-8:Claude Code
The generator emits a Slots holder of typed CSharpSlotInfo<T> kind constants (T is the child type, widened to AstNode for the few kinds reused with several child types). Call sites whose kind maps to a single type switch from GetChildByRole<T>(SlotKind.X) to GetChild(Slots.X), inferring the result type from the slot. The ByRole methods and SlotKind remain for the multi-type Attribute accessor and the internal add/insert plumbing, removed in following steps. Assisted-by: Claude:claude-opus-4-8:Claude Code
A declaration's attribute collection (AstNodeCollection<AttributeSection>) and an
attribute section's own attributes (AstNodeCollection<Attribute>) both carried
[Slot("Attribute")], so the kind mapped to two child types and the shared
Slots.Attribute widened to AstNode. Give the declaration-level slot its own
"AttributeSection" kind so every kind maps to a single type; EntityDeclaration's
Attributes accessor now reads through the typed Slots.AttributeSection. This was
the last child-access keyed on the SlotKind matching enum.
Assisted-by: Claude:claude-opus-4-8:Claude Code
The slot kind is now the canonical typed CSharpSlotInfo<T> in Slots, matched by object identity (the IL SlotInfo model), instead of a parallel SlotKind enum. CSharpSlotInfo.Kind points at the canonical shared slot; a Slots constant is its own kind (null Kind, never read -- only per-node slots are asked for their kind), so there is no self-reference. Child access (GetChild/GetChildren/SetChild, GetCollectionByKind, AddChild/InsertChild) and the polymorphic node.Slot.Kind == Slots.X comparisons key on the canonical reference; the generated SlotKind enum is removed. No behavior change. Assisted-by: Claude:claude-opus-4-8:Claude Code
A contributor guide next to the node classes: the [DecompilerAstNode]/[Slot]/ [ExcludeFromMatch] attributes, the slot-and-kind model (per-node CSharpSlotInfo vs the canonical Slots constants), scalars and generated constructors, CheckInvariant, and a step-by-step for adding a node. Assisted-by: Claude:claude-opus-4-8:Claude Code
InsertMissingTokensDecorator removes a pending node by value and uses whether it was still present, so a HashSet keys that membership-removal on O(1); iteration order is irrelevant because every pending node receives the same location. Rename the field to nodesAwaitingStartLocation for clarity. Assisted-by: Claude:claude-opus-4-8:Claude Code
The generated GetChildNodes materialized a List<AstNode> (plus a boxed List.Enumerator at every foreach) for each node, so AstNode.Children and the visitor's per-node child walk allocated three objects per traversal. Decompiling System.Private.CoreLib that came to ~1.7 GB of extra garbage, roughly +7% over the linked-list model the slot tree replaced. A yield iterator removes the List but trades it for an equally costly per-node state machine, so it is not enough on its own. Enumerate children through a by-value struct enumerator over the existing FirstChild/NextSibling primitives, capturing each child's successor before it is yielded so a transform may still remove or replace the current child mid-traversal. AstNodeCollection<T> gets the same struct treatment for a direct foreach. Child enumeration now allocates nothing, bringing total allocations back to the linked-list baseline at byte-identical output (full Pretty suite green with CheckInvariant active). Assisted-by: Claude:claude-opus-4-8:Claude Code
AstNodeCollection<T> created its List<T> eagerly in the constructor, and the collection itself is created on first access of its slot property. So every collection slot that is read but stays empty -- Attributes, TypeArguments, type-parameter constraints and the like, which are absent on the vast majority of nodes -- still allocated a List that never held an element. Decompiling System.Private.CoreLib that was ~100 MB of short-lived empty Lists churning gen0. Make the list nullable and allocate it on first Add; an accessed-but-empty collection now costs only the wrapper. The backing array was already lazy (List defers it to the first add), so this drops the redundant List object for the empty case. Output is byte-identical and the Pretty suite stays green with CheckInvariant active. Assisted-by: Claude:claude-opus-4-8:Claude Code
A node's children carry a cached flattened childIndex, rebuilt by EnsureChildIndices after any structural mutation invalidates it. The single-slot setter and the collection indexer invalidated the whole set on every in-place replace, so the next sibling navigation (e.g. the visitor's NextSibling walk) rebuilt all indices in O(children). A transform that replaces each element of a block while traversing it was therefore O(N^2). But replacing a child in place does not move anything: a single slot always occupies the same flattened index, and a replaced collection element keeps its position. So carry the old child's index to the new child and skip the invalidation. Setting or clearing a slot still invalidates, since the new child's index is not known locally; a stale carried value is corrected by the next renumber anyway. Microbenchmark (replace every statement in an N-statement block): at N=32000, 2158 ms -> 1 ms. Output is byte-identical and the Pretty suite stays green with CheckInvariant active. Assisted-by: Claude:claude-opus-4-8:Claude Code
Removing a collection element invalidated the parent's whole flattened index set, so the next sibling navigation rebuilt it in O(children) -- and that renumber, not the array shift, was the dominant cost: a reverse (tail-first) removal, which shifts nothing, was still quadratic, isolating EnsureChildIndices as the culprit. When a node's only collection is also its last slot it owns the contiguous range [base, base + Count) with nothing after it, so an element's flattened index is just base + its local position (base is the slot's declaration position, since the preceding slots are all single children). On that fast-path -- which the generator now flags, passing the base index -- Add indexes only the appended element, Insert/Remove renumber only the shifted suffix, and IndexOf is base-relative O(1); none of them invalidate. Other shapes (several collections, or a slot after the collection) keep the invalidate-and-rebuild fallback. Tail and scattered removal, and removal during traversal, no longer pay the per-operation renumber. Microbenchmark, removing every element of an N-element block: tail-first at N=32000 went 1493 ms -> 0.5 ms (now O(N)); front-first is ~1.6x faster and no longer renumbers (its residual cost is the inherent array shift). Output is byte-identical and the Pretty suite stays green with CheckInvariant validating the maintained indices after every transform. Assisted-by: Claude:claude-opus-4-8:Claude Code
A single slot always occupies the same flattened index, so filling, clearing, or replacing it moves no other child -- only the new child's own index needs setting. The generated single-slot setters now pass that index to SetChildNode (a compile-time constant when no collection precedes the slot; SetChild forwards its argument), which assigns childIndex directly and never invalidates -- mirroring the IL AST's SetChildInstruction(ref, value, index). Previously a set-from-null could not know the index here and fell back to invalidating, forcing a later O(children) EnsureChildIndices rebuild. With the indices now kept current by construction, NextSibling/PrevSibling inline the validity check and skip the (non-inlinable) EnsureChildIndices call in the overwhelmingly common already-valid case. Together these idle the renumber machinery on a System.Private.CoreLib decompile: EnsureChildIndices calls 53.4M -> 1.6M, actual rebuilds 1.97M -> 183K, elements renumbered 3.40M -> 757K. Output is byte-identical and the Pretty suite stays green with CheckInvariant validating the directly-assigned indices after every transform. Assisted-by: Claude:claude-opus-4-8:Claude Code
These helpers are documented to return null when nothing matches, but returned `null!`, hiding the very nullable warnings #nullable enable is meant to surface: a miss handed back null typed as non-null and would NRE downstream with no compile-time signal. Returning T? lets the compiler enforce the guard at each call site (both existing callers already null-check). GetVariable forwards the result, so it becomes VariableInitializer? to match. Assisted-by: Claude:claude-opus-4-8:Claude Code
f6c2dbc to
cdf1c22
Compare
A slot kind names one child position, so the generator emits a single typed Slots.X constant for it. A kind used with two different child types had to widen that constant to AstNode, and the typed GetChildren<T> accessor then cast the live AstNodeCollection<Concrete> to AstNodeCollection<AstNode> -- an InvalidCastException waiting for the first caller (latent today, but the model permitted it). Make the loose state unrepresentable instead of guarding it at runtime: DSTG001 errors when a [Slot] kind is declared with more than one child type. The six kinds that only coincidentally shared a name (Body, True, False, Initializer, SwitchSection, Variable) are split into precisely typed kinds; a genuinely either/or position (a lambda body) stays one declared type, AstNode. Every Slots.X now carries its real element type, so the GetChildren cast can no longer fail. Output is unchanged: the Pretty suite stays byte-identical with CheckInvariant green in DEBUG. Assisted-by: Claude:claude-opus-4-8:Claude Code
After the Role and TokenRole hierarchies were removed, the Roles class no longer held any roles -- only the punctuation and keyword text the output visitor writes (LPar, Arrow, ClassKeyword, ...). The name was a misleading vestige of the deleted system. Rename it to Tokens, which is what it now is; the constants and their values are unchanged. Also refresh the CSharpSlotInfo doc, which still described slot identity as the successor to the removed node.Role == Roles.X comparison. Assisted-by: Claude:claude-opus-4-8:Claude Code
Overview
This reworks the C# AST in
ICSharpCode.Decompiler/CSharp/Syntax/from the NRefactoryrole + doubly-linked-list child model to a slot-based model (in the spirit of the ILAst),
with the per-node boilerplate emitted by a Roslyn source generator, and sheds the NRefactory
machinery the decompiler does not need. The end state also turns on nullable reference types across
the AST and its consumers.
Each child of a node is declared once as a
[Slot]partial property; the generator owns everythingmechanical derived from those declarations.
What changes
[DecompilerAstNode]/[Slot]declarations it emits thevisitor interface,
AcceptVisitor,DoMatch, the slot accessors and flattened child-index API(
GetChildCount/GetChild/SetChild/GetChildSlotInfo/GetCollectionByKind/CloneChildrenInto),a single-pass child enumerator, the per-node constructors (parameters in source order), and
node.Slot, the per-nodeCSharpSlotInfo<T>, and a generatedSlotsholder of the canonical kinds. A[Slot]on anAstNode-typed propertyis a child slot; a
[Slot]on astringproperty is a name -- a convenience string over abacking
Identifiertoken slot the generator owns (the property type disambiguates, so no separateattribute is needed).
AstNodeCollection<T>backed by aList<T>.CheckInvariant(DEBUG only, the analog ofILInstruction.CheckInvariant): runs before thetransform pipeline and after every transform. It recursively verifies the slot structure (each
child's parent back-pointer, its flattened index, and its slot type) and that every required
(non-optional) slot is filled, and node types override it to assert their own scalar constraints --
the invariants formerly enforced by throwing setters (e.g.
ComposedType.PointerRank >= 0,ArraySpecifier.Dimensions >= 1) now read as auto-properties checked there.CSharpTokenNode/CSharpModifierTokenandInsertSpecialsDecoratoraregone. Punctuation/keyword/operator text lives on the nodes; modifiers, specifiers, the await flag,
etc. become scalar fields. Source locations are stored at print time instead of recomputed from
child token nodes.
space (like annotations), rather than positional children.
T?(with the C#grammar as the oracle for optionality), the null-object pattern is deleted, and
#nullable enableis turned on across the transforms, the AST consumer layer (output visitor, the IL-to-C# builders,
TypeSystemAstBuilder, the translation-result wrappers, sequence-point/namespace collectors, theannotation helpers), and the syntax node files themselves. Optional names follow suit: a name that
may be absent is typed
string?and reads asnullwhen absent (rather than a non-null empty-stringsentinel); required names stay non-null
string.Role/Role<T>andTokenRole: child positions are keyed on typed slot objects --node.GetChild(Slots.Left)infers the child type andnode.Slot.Kind == Slots.Initializeris matchedby object identity (there is no kind enum); token text is carried as plain constants.
INode, list-by-index collectionmatching, with a nullable
otherso a pattern can match against an absent child); generatedDoMatchmatches every structural member, which fixed a few latent under-matching bugs.properties become auto-properties; leaf node classes are sealed; dead parsing-only state and the
NodeTypeenum are removed. The now-unused per-nodeflagsword was dropped (its only user,Identifier.IsVerbatim, is a plainbool), which the CLR's field-padding makes free on the nodecarrying it while shrinking every other node by 8 bytes -- a measured win that a planned bit-packing
scheme did not deliver (the CLR already packs scalars into padding).
Behavior and testing
This is a representation change, not an output change: the Pretty test suite stays byte-identical
throughout, and the full decompiler suite is green on top of current
master. Each commit builds thedecompiler libraries,
ilspycmd, and the ILSpy UI, and the suite is green per logical step. GeneratedDoMatchmade several matchers stricter where the hand-written ones under-matched; those are capturedby the pattern-matching tests.
Notes
generator bring-up, the later ones the migration on top.
CSharp/consumers; the semanticresolver layer (
CSharp/Resolver/) is intentionally out of scope here.CSharp/Syntax/README.mddocuments the generator,its attributes, the slot/kind model,
CheckInvariant, and how to add a node.