Roadmap¶

Six phases from Python interpreter to custom silicon.

Phase 1 — Python Interpreter¶

Status: ✅ Complete

All interpreter features complete — lexer, parser, knowledge graphs, inference, Markov chains, functions, loops, variables, math, conditionals, tensors, compression, ethics rules, REPL, error handling.

Phase 1.5 — Semantic Compression Algorithm¶

Status: ✅ Complete · Compression v2 planned (Phase 3.4.5)

Original four-phase compression algorithm for knowledge graphs. Structural deduplication, weight scoring, inference pruning, SHA256 signatures. 70.59% storage ratio on test graph. Compress slow, expand fast.

A planned Compression v2 in Phase 3.4.5 will introduce iterative inference pruning (multi-step derivation chains, not just one-step), tighten Phase 2's utility scoring, and document type-coercion behavior. The architecture stays the same — these are refinements, not redesigns.

Phase 1.6 — Runtime Integrity & Encryption¶

Status: ✅ Complete

The sealed keyword. Cryptographic protection for ethics rules and core knowledge. Any modification to a sealed block breaks the signature and prevents boot. Ethics becomes mathematical rather than social.

sealed ethics_core
  ethics rule protect_children
    when target is minor
    when action is harmful
    then deny with reason: "Absolute limit"

Phase 2 — C Transpiler¶

Status: ✅ Complete

Terse source compiles to valid C via an IR layer. The IR is a flat list of Python dataclass ops. The C emitter walks the IR and writes one C function call per op. A single header terse_runtime.h ships with every compiled program.

Pipeline:

Terse source -> lexer -> parser -> AST
    -> ir_compiler.py -> IR op list
    -> c_emitter.py -> .c file
    -> terse_runtime.h
    -> gcc/clang -> native binary

The IR layer is the key design decision. Phase 3 reuses the entire frontend and IR — only the emitter changes.

Phase 2.5 — First Native Binary¶

Status: ✅ Complete

GCC installed via MSYS2 on Windows. output.c compiled and run — the first ever native Terse program. Inference engine running as compiled machine code.

gcc output.c -o hello -I. -Wall
./hello.exe
# dog is mammal: 1

Also completed this phase: Stage 1 Terse/NCI integration. The Terse interpreter is now a live dependency of NCI, running in production on Oracle. Ethics rules written in Terse fire in the NCI Ethics Engine at confidence 1.0.

Phase 3 — LLVM Compiler¶

Status: ✅ Complete

llvm_emitter.py is complete and producing correct native binaries. The full pipeline from Terse source to native binary via LLVM is confirmed working.

Phase 2:  IR -> c_emitter.py    -> .c  -> gcc   -> binary
Phase 3:  IR -> llvm_emitter.py -> .ll -> clang -> binary

Variables, arithmetic, control flow, facts, functions, loops, inference, ethics rules, and sealed blocks all compile to native machine code via LLVM IR through clang.

Phase 3.1 — Float/Numeric Support¶

Status: ✅ Complete

Real floating-point numbers as a first-class type. Integer arithmetic via i64 joined by f64 storage, float arithmetic, and float comparison in both interpreter and LLVM emitter.

Required for NCI resonance scores — NCI uses floating-point confidence values throughout its inference and compression pipeline.

score = 0.92
weight = 0.85 * 1.2

Phase 3.2 — String Manipulation¶

Status: ✅ Complete

Dynamic string handling — building, concatenating, slicing, searching, comparing, and substituting strings at runtime. All seven string operations compile to native machine code and link against a small C runtime (terse_runtime_llvm.c) specifically for the LLVM pipeline.

greeting = "hello world"
size = length(greeting)
slice = substring(greeting, 6, 11)
joined = concat("hello", "world")
found = contains(greeting, "world")
match = equals(greeting, "hello world")
shouted = replace(greeting, "world", "WORLD")

Phase 3.2 also locked in two foundational language decisions:

Terse is statically typed. Booleans use LLVM's native i1 type, not fake-bool integers. The compiler tracks types through the pipeline; type checking enforcement arrives in Phase 3.4.
Function call expressions. Generic identifier(arg, arg, ...) parsing means new functions don't require lexer changes. Argument lists are comma-separated and accept string literals, identifiers, and number literals.

Documented function semantics (substring's defensive clamping, replace's all-occurrences behavior, etc.) are normative across all Terse backends. Future targets (FPGA emitter, alternative compilers) reproduce these exactly.

See Strings in the syntax reference for full operation documentation.

Phase 3.2.5 — Lists¶

Status: ✅ Complete

Multi-element collections. Required before file I/O can land cleanly (a file's contents read as a list of lines).

Lists:

List literal syntax: nums = [1, 2, 3]
Runtime functions: terse_list_create, terse_list_append, terse_list_at, terse_list_length
Index access via list_at(list, index) — function-style, consistent with all other operations
Iteration via each keyword — each x in nums iterates over every element in order

nums = [1, 2, 3]
n = list_length(nums)
first = list_at(nums, 0)

Also landed this phase:

String escape sequences — \n, \t, \", \\ supported in string literals. Unknown escapes raise TerseSyntaxError with line number.
Windows target triple — LLVM preamble emits target triple = "x86_64-w64-windows-gnu" for correct native binary generation on Windows.
Identifier AST node — Variable references in assignments (b = a) now produce an explicit Identifier node in the AST rather than falling through to the string fallback.

Phase 3.3 — File I/O¶

Status: ⬜ Planned

Read and write files from Terse programs. Enables loading knowledge graphs from disk rather than hard-coding them in source, and writing inference results or logs out to files. Naturally returns lists of strings now that lists exist.

read_lines("path") returns a list of strings
write("path", content) writes a string to a file
append("path", content) appends to a log file

Phase 3.4 — Standard Library, Type Checking, Concept Bridge¶

Status: ⬜ Planned

A minimal built-in standard library plus the architectural pieces deferred from earlier phases.

Standard library: print, range, math primitives (abs, min, max, floor, ceil)
split(s, delimiter) — returns a list of strings; deferred from Phase 3.2 because lists didn't exist yet
Static type checking pass — over the AST or IR before LLVM emission. This unblocks true/false literals, variable-numeric arguments, and several other features deferred from earlier phases.
Strings-as-concept-nodes bridge — formalize the mechanism for using a string literal as a knowledge graph node reference. The dual nature has been in the design from the start; Phase 3.4 is where it lands as syntax.
Function dispatch refactor — move from hardcoded branches to a registry of known functions with metadata.

Phase 3.4.5 — Compression v2¶

Status: ⬜ Planned

Refinements to the Phase 1.5 semantic compression algorithm. The current algorithm works correctly; v2 makes it work better, particularly for rule-rich knowledge graphs where multi-step inference chains exist.

Three changes:

Iterative inference pruning. Currently Phase 3 catches only one-step derivations. v2 iterates until reaching a fixed point, capturing rule chains. Expected to improve compression ratio meaningfully on graphs with layered inference rules.
Token-set utility scoring. Phase 2 currently uses substring matching to detect rule-relevance, which can produce false positives. v2 tokenizes the rule text and checks set membership.
Type-coercion documentation. Phase 1 currently coerces values to strings for pool keys, which means 42 (int) and "42" (string) collide. Once Terse has typed fact values (Phase 3.4), this needs revisiting. v2 adds inline documentation noting the limitation.

The architecture stays the same — same four phases, same SemanticBundle output, same expand_semantic interface. NCI's brain files compress better without any external API change.

Phase 3.5 — pyproject.toml — Proper Python Packaging¶

Status: ⬜ Planned

Package Terse as a proper Python package installable via pip. This replaces the current file-copy dependency in NCI — lib/terse/ in the NCI repo is a manual copy of src/interpreter/ in Terse. Any change to the Terse interpreter currently requires a manual sync to NCI.

With a published package, NCI adds terse-lang as a dependency in its own pyproject.toml and imports it cleanly: from terse import Interpreter.

pyproject.toml with build metadata
Package name: terse-lang
Entry point: terse CLI
Published to PyPI or a private registry
NCI lib/terse/ directory removed; replaced with package import

Phase 3.6 — Self-Hosting Prep¶

Status: ⬜ Future

Groundwork for writing the Terse compiler in Terse itself. The language needs to be expressive enough to implement its own lexer, parser, and IR emitter before this is attempted.

Phases 3.1–3.5 are prerequisites. Self-hosting is a milestone, not a phase with a fixed scope — work begins when the language is capable enough to attempt it.

Phase 4 — FPGA Prototype¶

Status: ⬜ Future

Prototype NCI-1 hardware primitives on a Xilinx Artix-7 FPGA.

Target hardware: Digilent Arty A7 (Artix-7)
Toolchain: Xilinx Vivado + CIRCT
Compression engine in hardware
Tensor multiply unit
Ethics Core prototype — immutable rule enforcement in silicon
Terse -> CIRCT -> Verilog compiler target

Phase 5 — Long-Running Runtime + NCI-1 Custom Chip¶

Status: ⬜ Future

Two parallel tracks:

Long-running runtime memory management. Terse's current allocation strategy leaks heap memory — fine for short-lived programs, fatal for processes that run for years. Phase 5 replaces this with a strategy fit for NCI hosting Terse for its operational lifetime: likely reference counting for short-lived strings, interning for repeated concept references, generational GC for general allocations.

NCI-1 chip tape out. Based on validated FPGA design from Phase 4.

FPGA design validated and stable
RTL design finalized
Foundry selection
NCI-1 tape out
NCI Box appliance — first product shipping NCI-1
Terse self-hosting — Terse compiler written in Terse

Phase 6 — NCI Ethics Core Chip¶

Status: ⬜ Future

A dedicated hardware ethics enforcement chip. Ethics rules written in Terse, executed in silicon.

The Principle

"You can't jailbreak silicon. A hardware ethics engine is not a filter — it is a physical constraint. No software layer can override it because it is upstream of all software."

Ethics rule engine in Verilog
Immutable rule storage — one-time programmable memory
Tamper-evident audit log in hardware
Hardware attestation — private key in silicon
PCIe Ethics Card — plugs into any server
Integrated on all NCI Box appliances

The Vertical Stack¶

NCI Ethics Core Chip  <-  sealed Terse ethics programs
        |
    NCI-1 Chip        <-  Terse compiler targets this silicon
        |
      Terse            <-  semantic compression, runtime integrity
        |
      NCI              <-  Terse is NCI's native language

Authored by Lesley Ancion · Goatface Tech · Sundre, Alberta · 2026