Skip to content

Roadmap

Six phases from Python interpreter to custom silicon.


Phase 1 — Python Interpreter

Status: ✅ Complete

All interpreter features complete — lexer, parser, knowledge graphs, inference, Markov chains, functions, loops, variables, math, conditionals, tensors, compression, ethics rules, REPL, error handling.


Phase 1.5 — Semantic Compression Algorithm

Status: ✅ Complete · Compression v2 planned (Phase 3.4.5)

Original four-phase compression algorithm for knowledge graphs. Structural deduplication, weight scoring, inference pruning, SHA256 signatures. 70.59% storage ratio on test graph. Compress slow, expand fast.

A planned Compression v2 in Phase 3.4.5 will introduce iterative inference pruning (multi-step derivation chains, not just one-step), tighten Phase 2's utility scoring, and document type-coercion behavior. The architecture stays the same — these are refinements, not redesigns.


Phase 1.6 — Runtime Integrity & Encryption

Status: ✅ Complete

The sealed keyword. Cryptographic protection for ethics rules and core knowledge. Any modification to a sealed block breaks the signature and prevents boot. Ethics becomes mathematical rather than social.

sealed ethics_core
  ethics rule protect_children
    when target is minor
    when action is harmful
    then deny with reason: "Absolute limit"

Phase 2 — C Transpiler

Status: ✅ Complete

Terse source compiles to valid C via an IR layer. The IR is a flat list of Python dataclass ops. The C emitter walks the IR and writes one C function call per op. A single header terse_runtime.h ships with every compiled program.

Pipeline:

Terse source -> lexer -> parser -> AST
    -> ir_compiler.py -> IR op list
    -> c_emitter.py -> .c file
    -> terse_runtime.h
    -> gcc/clang -> native binary

The IR layer is the key design decision. Phase 3 reuses the entire frontend and IR — only the emitter changes.


Phase 2.5 — First Native Binary

Status: ✅ Complete

GCC installed via MSYS2 on Windows. output.c compiled and run — the first ever native Terse program. Inference engine running as compiled machine code.

gcc output.c -o hello -I. -Wall
./hello.exe
# dog is mammal: 1

Also completed this phase: Stage 1 Terse/NCI integration. The Terse interpreter is now a live dependency of NCI, running in production on Oracle. Ethics rules written in Terse fire in the NCI Ethics Engine at confidence 1.0.


Phase 3 — LLVM Compiler

Status: ✅ Complete

llvm_emitter.py is complete and producing correct native binaries. The full pipeline from Terse source to native binary via LLVM is confirmed working.

Phase 2:  IR -> c_emitter.py    -> .c  -> gcc   -> binary
Phase 3:  IR -> llvm_emitter.py -> .ll -> clang -> binary

Variables, arithmetic, control flow, facts, functions, loops, inference, ethics rules, and sealed blocks all compile to native machine code via LLVM IR through clang.


Phase 3.1 — Float/Numeric Support

Status: ✅ Complete

Real floating-point numbers as a first-class type. Integer arithmetic via i64 joined by f64 storage, float arithmetic, and float comparison in both interpreter and LLVM emitter.

Required for NCI resonance scores — NCI uses floating-point confidence values throughout its inference and compression pipeline.

score = 0.92
weight = 0.85 * 1.2

Phase 3.2 — String Manipulation

Status: ✅ Complete

Dynamic string handling — building, concatenating, slicing, searching, comparing, and substituting strings at runtime. All seven string operations compile to native machine code and link against a small C runtime (terse_runtime_llvm.c) specifically for the LLVM pipeline.

greeting = "hello world"
size = length(greeting)
slice = substring(greeting, 6, 11)
joined = concat("hello", "world")
found = contains(greeting, "world")
match = equals(greeting, "hello world")
shouted = replace(greeting, "world", "WORLD")

Phase 3.2 also locked in two foundational language decisions:

  • Terse is statically typed. Booleans use LLVM's native i1 type, not fake-bool integers. The compiler tracks types through the pipeline; type checking enforcement arrives in Phase 3.4.
  • Function call expressions. Generic identifier(arg, arg, ...) parsing means new functions don't require lexer changes. Argument lists are comma-separated and accept string literals, identifiers, and number literals.

Documented function semantics (substring's defensive clamping, replace's all-occurrences behavior, etc.) are normative across all Terse backends. Future targets (FPGA emitter, alternative compilers) reproduce these exactly.

See Strings in the syntax reference for full operation documentation.


Phase 3.2.5 — Lists

Status: ✅ Complete

Multi-element collections. Required before file I/O can land cleanly (a file's contents read as a list of lines).

Lists:

  • List literal syntax: nums = [1, 2, 3]
  • Runtime functions: terse_list_create, terse_list_append, terse_list_at, terse_list_length
  • Index access via list_at(list, index) — function-style, consistent with all other operations
  • Iteration via each keyword — each x in nums iterates over every element in order
nums = [1, 2, 3]
n = list_length(nums)
first = list_at(nums, 0)

Also landed this phase:

  • String escape sequences\n, \t, \", \\ supported in string literals. Unknown escapes raise TerseSyntaxError with line number.
  • Windows target triple — LLVM preamble emits target triple = "x86_64-w64-windows-gnu" for correct native binary generation on Windows.
  • Identifier AST node — Variable references in assignments (b = a) now produce an explicit Identifier node in the AST rather than falling through to the string fallback.

Phase 3.3 — File I/O

Status: ⬜ Planned

Read and write files from Terse programs. Enables loading knowledge graphs from disk rather than hard-coding them in source, and writing inference results or logs out to files. Naturally returns lists of strings now that lists exist.

  • read_lines("path") returns a list of strings
  • write("path", content) writes a string to a file
  • append("path", content) appends to a log file

Phase 3.4 — Standard Library, Type Checking, Concept Bridge

Status: ⬜ Planned

A minimal built-in standard library plus the architectural pieces deferred from earlier phases.

  • Standard library: print, range, math primitives (abs, min, max, floor, ceil)
  • split(s, delimiter) — returns a list of strings; deferred from Phase 3.2 because lists didn't exist yet
  • Static type checking pass — over the AST or IR before LLVM emission. This unblocks true/false literals, variable-numeric arguments, and several other features deferred from earlier phases.
  • Strings-as-concept-nodes bridge — formalize the mechanism for using a string literal as a knowledge graph node reference. The dual nature has been in the design from the start; Phase 3.4 is where it lands as syntax.
  • Function dispatch refactor — move from hardcoded branches to a registry of known functions with metadata.

Phase 3.4.5 — Compression v2

Status: ⬜ Planned

Refinements to the Phase 1.5 semantic compression algorithm. The current algorithm works correctly; v2 makes it work better, particularly for rule-rich knowledge graphs where multi-step inference chains exist.

Three changes:

  • Iterative inference pruning. Currently Phase 3 catches only one-step derivations. v2 iterates until reaching a fixed point, capturing rule chains. Expected to improve compression ratio meaningfully on graphs with layered inference rules.
  • Token-set utility scoring. Phase 2 currently uses substring matching to detect rule-relevance, which can produce false positives. v2 tokenizes the rule text and checks set membership.
  • Type-coercion documentation. Phase 1 currently coerces values to strings for pool keys, which means 42 (int) and "42" (string) collide. Once Terse has typed fact values (Phase 3.4), this needs revisiting. v2 adds inline documentation noting the limitation.

The architecture stays the same — same four phases, same SemanticBundle output, same expand_semantic interface. NCI's brain files compress better without any external API change.


Phase 3.5 — pyproject.toml — Proper Python Packaging

Status: ⬜ Planned

Package Terse as a proper Python package installable via pip. This replaces the current file-copy dependency in NCI — lib/terse/ in the NCI repo is a manual copy of src/interpreter/ in Terse. Any change to the Terse interpreter currently requires a manual sync to NCI.

With a published package, NCI adds terse-lang as a dependency in its own pyproject.toml and imports it cleanly: from terse import Interpreter.

  • pyproject.toml with build metadata
  • Package name: terse-lang
  • Entry point: terse CLI
  • Published to PyPI or a private registry
  • NCI lib/terse/ directory removed; replaced with package import

Phase 3.6 — Self-Hosting Prep

Status: ⬜ Future

Groundwork for writing the Terse compiler in Terse itself. The language needs to be expressive enough to implement its own lexer, parser, and IR emitter before this is attempted.

Phases 3.1–3.5 are prerequisites. Self-hosting is a milestone, not a phase with a fixed scope — work begins when the language is capable enough to attempt it.


Phase 4 — FPGA Prototype

Status: ⬜ Future

Prototype NCI-1 hardware primitives on a Xilinx Artix-7 FPGA.

  • Target hardware: Digilent Arty A7 (Artix-7)
  • Toolchain: Xilinx Vivado + CIRCT
  • Compression engine in hardware
  • Tensor multiply unit
  • Ethics Core prototype — immutable rule enforcement in silicon
  • Terse -> CIRCT -> Verilog compiler target

Phase 5 — Long-Running Runtime + NCI-1 Custom Chip

Status: ⬜ Future

Two parallel tracks:

Long-running runtime memory management. Terse's current allocation strategy leaks heap memory — fine for short-lived programs, fatal for processes that run for years. Phase 5 replaces this with a strategy fit for NCI hosting Terse for its operational lifetime: likely reference counting for short-lived strings, interning for repeated concept references, generational GC for general allocations.

NCI-1 chip tape out. Based on validated FPGA design from Phase 4.

  • FPGA design validated and stable
  • RTL design finalized
  • Foundry selection
  • NCI-1 tape out
  • NCI Box appliance — first product shipping NCI-1
  • Terse self-hosting — Terse compiler written in Terse

Phase 6 — NCI Ethics Core Chip

Status: ⬜ Future

A dedicated hardware ethics enforcement chip. Ethics rules written in Terse, executed in silicon.

The Principle

"You can't jailbreak silicon. A hardware ethics engine is not a filter — it is a physical constraint. No software layer can override it because it is upstream of all software."

  • Ethics rule engine in Verilog
  • Immutable rule storage — one-time programmable memory
  • Tamper-evident audit log in hardware
  • Hardware attestation — private key in silicon
  • PCIe Ethics Card — plugs into any server
  • Integrated on all NCI Box appliances

The Vertical Stack

NCI Ethics Core Chip  <-  sealed Terse ethics programs
        |
    NCI-1 Chip        <-  Terse compiler targets this silicon
        |
      Terse            <-  semantic compression, runtime integrity
        |
      NCI              <-  Terse is NCI's native language

Authored by Lesley Ancion · Goatface Tech · Sundre, Alberta · 2026