Skip to content

Latest commit

 

History

History
290 lines (219 loc) · 12.5 KB

File metadata and controls

290 lines (219 loc) · 12.5 KB

Schemata Pattern Reference

Schemata uses 80+ unique regex patterns across hundreds of JSON Schema files to validate Nostr data structures. This document catalogs every pattern, organized by category, with examples and codegen classification.

Table of Contents


1. Cryptographic

Hex-encoded cryptographic values used throughout Nostr events.

Pattern Description Used For Codegen Op
^[a-f0-9]{64}$ 32-byte lowercase hex pubkey, event id, shared secret hex(64, lower)
^[a-fA-F0-9]{64}$ 32-byte mixed-case hex SHA-256 hash (case-insensitive) hex(64, mixed)
^[a-f0-9]{128}$ 64-byte lowercase hex Schnorr signature hex(128, lower)
^[a-f0-9]{40}$ 20-byte lowercase hex Git SHA-1 hash hex(40, lower)
^[a-fA-F0-9]{40}$ 20-byte mixed-case hex SHA-1 hash (case-insensitive) hex(40, mixed)
^[a-f0-9]{7,40}$ Abbreviated git SHA Git short hash hex_range(7, 40, lower)
^0x[0-9a-f]{4}$ 4-digit hex with 0x prefix Color codes, identifiers hex_prefixed("0x", 4)
`^(02 03)[a-f0-9]{64}$` Compressed secp256k1 public key (33 bytes) NIP-61 P2PK pubkey tag

Examples:

  • Valid pubkey (x-only): a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2
  • Valid compressed pubkey (P2PK): 0229f1fad861410f3dcabb3cd75ceb0e8b7cc6a8d1fa17dbd10e8133c000326a96
  • Valid signature: 128 hex chars
  • Invalid: A1B2C3... (uppercase in lowercase-only field), xyz123 (non-hex)

2. Network

URL patterns for relay WebSocket connections, HTTP endpoints, and streaming protocols.

Pattern Description Used For Codegen Op
^wss?://[a-zA-Z0-9._-]+(?::[0-9]+)?(?:/.*)?$ Strict relay URL Centralized relay URL type (@/relay-url.yaml) relay_url
`^(ws:// wss://).+$` Permissive relay URL Legacy relay positions (being migrated)
^wss?://.+ Relay URL (no anchor) Relay hints starts_with_any
^(https?://).+$ HTTP(S) URL Web resources starts_with_any
^https?://.+$ HTTP(S) URL (anchored) Image/media URLs starts_with_any
^https?://\S+$ HTTP(S) no whitespace Strict URL references prefix_no_whitespace
^(https?://|rtmp://|ws://|wss://).+$ Multi-protocol Streaming URLs starts_with_any
^/.+ Absolute path Resource paths starts_with_any

Valid relay URLs (strict pattern):

  • wss://relay.damus.io -- standard relay
  • wss://relay.ditto.pub -- NIP-50 search relay
  • wss://nos.lol -- public relay
  • ws://localhost:7777 -- local dev
  • wss://relay.example_host.com -- underscore in hostname
  • wss://abc123def456.onion -- Tor hidden service

Invalid relay URLs (strict pattern):

  • wss://!!! -- invalid hostname chars
  • wss:// has spaces -- spaces not allowed
  • http://relay.example.com -- wrong protocol

3. Nostr Coordinates

Event address patterns used in a tags and kind-specific references.

Pattern Description Used For Codegen Op
^\d+:[a-f0-9]{64}:.+$ Generic address a tag (kind:pubkey:dtag) a_tag
^30311:[a-f0-9]{64}:.+$ Kind 30311 address Live event a tag a_tag
^30312:[a-f0-9]{64}:.+$ Kind 30312 address Room a tag a_tag
^31990:[a-f0-9]{64}:.+$ Kind 31990 address NIP-89 handler address a_tag
^(31922|31923):[a-f0-9]{64}:.+$ Calendar kinds Calendar event address a_tag

Example: 30311:a1b2c3...64hex...:my-live-event


4. Numeric

Number patterns for timestamps, dimensions, and amounts.

Pattern Description Used For Codegen Op
^[0-9]+$ Unsigned integer string Timestamps, amounts, expiration all_digits
^\d+$ Unsigned integer (shorthand) Equivalent to above all_digits
^-?[0-9]+$ Signed integer string Relative values all_digits(allowNeg)
^\d+(?:\.\d+)?$ Decimal number Prices, coordinates decimal
^\d+x\d+$ / ^[0-9]+x[0-9]+$ Dimensions Image/video dimensions (WxH) dim
^[0-9]+(\.[0-9]+)*$ Version number Semantic versioning dotted_digits

Examples:

  • 1711234567 -- valid timestamp
  • -100 -- valid signed integer
  • 19.99 -- valid decimal
  • 1920x1080 -- valid dimensions

5. Structured Metadata (imeta)

Content prefix patterns for imeta tag entries (NIP-92).

Pattern Description Codegen Op
^url https?://\S+$ Media URL entry prefix_no_whitespace
^m [a-zA-Z].*/.* MIME type entry mime_type
^dim [0-9]{1,5}x[0-9]{1,5}$ Dimensions entry imeta_dim
^blurhash [A-Za-z0-9+/]+.*$ Blurhash entry prefix_nonempty
^x [a-f0-9]{64}$ SHA-256 hash entry hex_prefixed
^alt .+$ Alt text entry prefix_nonempty
^ox [a-f0-9]{64}$ Original hash entry hex_prefixed
^size \d+$ File size entry prefix_delim_rest
^fallback https?://\S+$ Fallback URL entry prefix_no_whitespace
^thumb \d+x\d+$ Thumbnail dimensions dim (prefixed)

6. Identity

Patterns for identifiers, bech32-encoded entities, and user references.

Bech32 / NIP-19

Pattern Description Used For Codegen Op
^npub1[02-9ac-hj-np-z]{58}$ Public key (fixed 63 chars) npub display format bech32("npub", 58)
^note1[02-9ac-hj-np-z]{58}$ Event ID (fixed 63 chars) note display format bech32("note", 58)
^nprofile1[02-9ac-hj-np-z]+$ Profile (TLV, variable) nprofile entities bech32("nprofile")
^nevent1[02-9ac-hj-np-z]+$ Event (TLV, variable) nevent entities bech32("nevent")
^naddr1[02-9ac-hj-np-z]+$ Address (TLV, variable) naddr entities bech32("naddr")
^nostr:(npub|note)1... NIP-27 URI nostr: protocol links nostr_uri
^lnurl1[02-9ac-hj-np-z]+$ LNURL (bech32) Lightning URL encoding bech32("lnurl")

Note: These patterns validate character set, prefix, and length only. Bech32 checksum validation requires actual decoding and is not expressible in regex. Use a bech32 library for full validation.

The bech32 character set [02-9ac-hj-np-z] is the standard Bech32 alphabet, which excludes 1 (separator), b, i, o (confusable with digits) from lowercase alphanumeric.

Other Identity

Pattern Description Codegen Op
^(([_A-Za-z0-9.-]+)|_)@... NIP-05 identifier nip05_identifier
^[a-z0-9._-]+$ Simple identifier chars_in("a-z0-9._-")
^[A-Za-z0-9]+$ Alphanumeric chars_in("A-Za-z0-9")
^[A-Za-z]+$ Alpha only chars_in("A-Za-z")
^[A-Z]+$ Uppercase alpha chars_in("A-Z")
^[A-Za-z]{3,6}$ Short alpha code chars_in("A-Za-z", 3, 6)

7. Protocol-Specific

Lightning

Pattern Description Codegen Op
^lnbc[a-z0-9]*1[02-9ac-hj-np-z]+$ BOLT11 invoice ln_invoice
^lnurl1[02-9ac-hj-np-z]+$ LNURL bech32 bech32("lnurl")

Git (NIP-34)

Pattern Description Codegen Op
^[a-f0-9]{40}$ SHA-1 commit hash hex(40, lower)
^[a-f0-9]{7,40}$ Abbreviated commit hex_range(7, 40)
^refs/(heads|tags)/[^\s]+$ Git branch/tag ref prefix_no_whitespace
^refs/.* Any git ref starts_with_any

Dates & Times

Pattern Description Codegen Op
^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ISO 8601 date date_iso
^\d{4}-\d{2}-\d{2}(T\d{2}:\d{2}...)?$ Full ISO 8601 datetime datetime_iso

Content Types

Pattern Description Codegen Op
^[A-Za-z0-9][A-Za-z0-9!#$&^_.+-]*/... MIME type (RFC 2045) mime_type_strict

Cryptographic Signatures

Pattern Description Codegen Op
^-----BEGIN PGP SIGNATURE-----... PGP signature block wrapped

Other

Pattern Description Codegen Op
^(?:[A-Za-z0-9+/]{4})*... Base64 (standard) base64
^[A-Za-z0-9+/]+={0,2}\?iv=... Base64 with IV (NIP-04/44) nip04_encrypted
^10\.\d{4,9}/[-._;()/:A-Z0-9]+$ DOI (Digital Object Identifier) doi

8. Codegen Pipeline

The schemata-codegen tool converts these regex patterns into native, regex-free code for multiple target languages.

How It Works

  1. Extract: Codegen reads dist/ JSON schemas and extracts all pattern fields
  2. Classify: classify-pattern.ts converts each regex into a PatternCheck IR node
  3. Emit: Language-specific emitters render each PatternCheck as native code

PatternCheck IR

PatternCheck =
  | hex(len, case)           -- fixed-length hex string
  | hex_range(min, max, case) -- variable-length hex string
  | hex_prefixed(prefix, len) -- hex with literal prefix
  | all_digits(allowNeg?)    -- numeric string
  | starts_with_any(prefixes) -- prefix check
  | chars_in(charset, min?, max?) -- character set validation
  | bech32(hrp, dataLen?)    -- bech32-encoded string
  | date_iso                 -- ISO 8601 date (YYYY-MM-DD)
  | datetime_iso             -- ISO 8601 datetime with optional time/timezone
  | decimal                  -- decimal number (digits with optional .digits)
  | relay_url                -- WebSocket relay URL with hostname/port/path validation
  | a_tag                    -- Nostr coordinate (kind:hex64:dtag)
  | nostr_uri                -- NIP-27 nostr: URI with bech32 entity
  | ln_invoice               -- BOLT11 Lightning invoice
  | nip05_identifier         -- NIP-05 internet identifier
  | nip04_encrypted          -- NIP-04 encrypted payload (base64?iv=base64)
  | mime_type / mime_type_strict -- RFC 2045 content type
  | base64                   -- standard Base64
  | doi                      -- Digital Object Identifier
  | dim                      -- dimensions (WxH)
  | imeta_dim                -- prefixed dimensions (dim WxH)
  | prefix_no_whitespace     -- prefix + no-whitespace tail
  | prefix_nonempty          -- prefix + non-empty tail
  | prefix_delim_rest        -- charset + delimiter + tail
  | wrapped(open, close)     -- delimited block (e.g., PGP signature)
  | compound(checks[])       -- AND combination
  | regex(pattern)           -- fallback (preserved as-is)

Classification Example

Input:  ^[a-f0-9]{64}$
Output: { op: 'hex', len: 64, case: 'lower' }

 C:    check_hex_64(s)
       --> iterates 64 chars, checks each is 0-9 or a-f, verifies null terminator

 Rust: check_hex_64(s)
       --> s.len() == 64 && s.bytes().all(|b| matches!(b, b'0'..=b'9' | b'a'..=b'f'))

 Go:   checkHex64(s)
       --> len(s) == 64 && all chars in [0-9a-f]

Coverage Status

100% of patterns are classified into native ops (no regex dependency needed). All 80 unique patterns in the current schema set have dedicated native implementations across 13 languages: C, Rust, Go, TypeScript, Python, Java, Kotlin, Swift, Dart, C#, C++, PHP, Ruby.

This means generated validators can run without importing any regex library. Each native op is implemented with exhaustive compiler guards (TypeScript switch with never default, Rust match exhaustiveness, etc.) so adding a new op is a compile error until all 13 emitters handle it.

Adding a New Pattern

  1. Add the regex to the appropriate schema YAML in nips/
  2. Run pnpm build to generate dist/ JSON
  3. If the pattern should be native, add a classifier in classify-pattern.ts
  4. Add rendering in each emitter (emit-c.ts, emit-rust.ts, etc.)
  5. Add tests in tests/classify-pattern.test.ts

See Also