Roadmap¶
Six phases from Python interpreter to custom silicon.
Phase 1 — Python Interpreter¶
Status: ✅ Complete
All interpreter features complete — lexer, parser, knowledge graphs, inference, Markov chains, functions, loops, variables, math, conditionals, tensors, compression, ethics rules, REPL, error handling.
Phase 1.5 — Semantic Compression Algorithm¶
Status: ✅ Complete · Compression v2 planned (Phase 3.4.5)
Original four-phase compression algorithm for knowledge graphs. Structural deduplication, weight scoring, inference pruning, SHA256 signatures. 70.59% storage ratio on test graph. Compress slow, expand fast.
A planned Compression v2 in Phase 3.4.5 will introduce iterative inference pruning (multi-step derivation chains, not just one-step), tighten Phase 2's utility scoring, and document type-coercion behavior. The architecture stays the same — these are refinements, not redesigns.
Phase 1.6 — Runtime Integrity & Encryption¶
Status: ✅ Complete
The sealed keyword. Cryptographic protection for ethics rules and core knowledge.
Any modification to a sealed block breaks the signature and prevents boot.
Ethics becomes mathematical rather than social.
sealed ethics_core
ethics rule protect_children
when target is minor
when action is harmful
then deny with reason: "Absolute limit"
Phase 2 — C Transpiler¶
Status: ✅ Complete
Terse source compiles to valid C via an IR layer. The IR is a flat list of Python
dataclass ops. The C emitter walks the IR and writes one C function call per op.
A single header terse_runtime.h ships with every compiled program.
Pipeline:
Terse source -> lexer -> parser -> AST
-> ir_compiler.py -> IR op list
-> c_emitter.py -> .c file
-> terse_runtime.h
-> gcc/clang -> native binary
The IR layer is the key design decision. Phase 3 reuses the entire frontend and IR — only the emitter changes.
Phase 2.5 — First Native Binary¶
Status: ✅ Complete
GCC installed via MSYS2 on Windows. output.c compiled and run — the first ever
native Terse program. Inference engine running as compiled machine code.
Also completed this phase: Stage 1 Terse/NCI integration. The Terse interpreter is now a live dependency of NCI, running in production on Oracle. Ethics rules written in Terse fire in the NCI Ethics Engine at confidence 1.0.
Phase 3 — LLVM Compiler¶
Status: ✅ Complete
llvm_emitter.py is complete and producing correct native binaries. The full pipeline from Terse source to native binary via LLVM is confirmed working.
Phase 2: IR -> c_emitter.py -> .c -> gcc -> binary
Phase 3: IR -> llvm_emitter.py -> .ll -> clang -> binary
Variables, arithmetic, control flow, facts, functions, loops, inference, ethics rules, and sealed blocks all compile to native machine code via LLVM IR through clang.
Phase 3.1 — Float/Numeric Support¶
Status: ✅ Complete
Real floating-point numbers as a first-class type. Integer arithmetic via i64
joined by f64 storage, float arithmetic, and float comparison in both
interpreter and LLVM emitter.
Required for NCI resonance scores — NCI uses floating-point confidence values throughout its inference and compression pipeline.
Phase 3.2 — String Manipulation¶
Status: ✅ Complete
Dynamic string handling — building, concatenating, slicing, searching, comparing,
and substituting strings at runtime. All seven string operations compile to
native machine code and link against a small C runtime (terse_runtime_llvm.c)
specifically for the LLVM pipeline.
greeting = "hello world"
size = length(greeting)
slice = substring(greeting, 6, 11)
joined = concat("hello", "world")
found = contains(greeting, "world")
match = equals(greeting, "hello world")
shouted = replace(greeting, "world", "WORLD")
Phase 3.2 also locked in two foundational language decisions:
- Terse is statically typed. Booleans use LLVM's native
i1type, not fake-bool integers. The compiler tracks types through the pipeline; type checking enforcement arrives in Phase 3.4. - Function call expressions. Generic
identifier(arg, arg, ...)parsing means new functions don't require lexer changes. Argument lists are comma-separated and accept string literals, identifiers, and number literals.
Documented function semantics (substring's defensive clamping, replace's all-occurrences behavior, etc.) are normative across all Terse backends. Future targets (FPGA emitter, alternative compilers) reproduce these exactly.
See Strings in the syntax reference for full operation documentation.
Phase 3.2.5 — Lists¶
Status: ✅ Complete
Multi-element collections. Required before file I/O can land cleanly (a file's contents read as a list of lines).
Lists:
- List literal syntax:
nums = [1, 2, 3] - Runtime functions:
terse_list_create,terse_list_append,terse_list_at,terse_list_length - Index access via
list_at(list, index)— function-style, consistent with all other operations - Iteration via
eachkeyword —each x in numsiterates over every element in order
Also landed this phase:
- String escape sequences —
\n,\t,\",\\supported in string literals. Unknown escapes raiseTerseSyntaxErrorwith line number. - Windows target triple — LLVM preamble emits
target triple = "x86_64-w64-windows-gnu"for correct native binary generation on Windows. - Identifier AST node — Variable references in assignments (
b = a) now produce an explicitIdentifiernode in the AST rather than falling through to the string fallback.
Phase 3.3 — File I/O¶
Status: ⬜ Planned
Read and write files from Terse programs. Enables loading knowledge graphs from disk rather than hard-coding them in source, and writing inference results or logs out to files. Naturally returns lists of strings now that lists exist.
read_lines("path")returns a list of stringswrite("path", content)writes a string to a fileappend("path", content)appends to a log file
Phase 3.4 — Standard Library, Type Checking, Concept Bridge¶
Status: ⬜ Planned
A minimal built-in standard library plus the architectural pieces deferred from earlier phases.
- Standard library:
print,range, math primitives (abs,min,max,floor,ceil) split(s, delimiter)— returns a list of strings; deferred from Phase 3.2 because lists didn't exist yet- Static type checking pass — over the AST or IR before LLVM emission. This
unblocks
true/falseliterals, variable-numeric arguments, and several other features deferred from earlier phases. - Strings-as-concept-nodes bridge — formalize the mechanism for using a string literal as a knowledge graph node reference. The dual nature has been in the design from the start; Phase 3.4 is where it lands as syntax.
- Function dispatch refactor — move from hardcoded branches to a registry of known functions with metadata.
Phase 3.4.5 — Compression v2¶
Status: ⬜ Planned
Refinements to the Phase 1.5 semantic compression algorithm. The current algorithm works correctly; v2 makes it work better, particularly for rule-rich knowledge graphs where multi-step inference chains exist.
Three changes:
- Iterative inference pruning. Currently Phase 3 catches only one-step derivations. v2 iterates until reaching a fixed point, capturing rule chains. Expected to improve compression ratio meaningfully on graphs with layered inference rules.
- Token-set utility scoring. Phase 2 currently uses substring matching to detect rule-relevance, which can produce false positives. v2 tokenizes the rule text and checks set membership.
- Type-coercion documentation. Phase 1 currently coerces values to strings
for pool keys, which means
42(int) and"42"(string) collide. Once Terse has typed fact values (Phase 3.4), this needs revisiting. v2 adds inline documentation noting the limitation.
The architecture stays the same — same four phases, same SemanticBundle
output, same expand_semantic interface. NCI's brain files compress better
without any external API change.
Phase 3.5 — pyproject.toml — Proper Python Packaging¶
Status: ⬜ Planned
Package Terse as a proper Python package installable via pip. This replaces
the current file-copy dependency in NCI — lib/terse/ in the NCI repo is a
manual copy of src/interpreter/ in Terse. Any change to the Terse interpreter
currently requires a manual sync to NCI.
With a published package, NCI adds terse-lang as a dependency in its own
pyproject.toml and imports it cleanly: from terse import Interpreter.
pyproject.tomlwith build metadata- Package name:
terse-lang - Entry point:
terseCLI - Published to PyPI or a private registry
- NCI
lib/terse/directory removed; replaced with package import
Phase 3.6 — Self-Hosting Prep¶
Status: ⬜ Future
Groundwork for writing the Terse compiler in Terse itself. The language needs to be expressive enough to implement its own lexer, parser, and IR emitter before this is attempted.
Phases 3.1–3.5 are prerequisites. Self-hosting is a milestone, not a phase with a fixed scope — work begins when the language is capable enough to attempt it.
Phase 4 — FPGA Prototype¶
Status: ⬜ Future
Prototype NCI-1 hardware primitives on a Xilinx Artix-7 FPGA.
- Target hardware: Digilent Arty A7 (Artix-7)
- Toolchain: Xilinx Vivado + CIRCT
- Compression engine in hardware
- Tensor multiply unit
- Ethics Core prototype — immutable rule enforcement in silicon
- Terse -> CIRCT -> Verilog compiler target
Phase 5 — Long-Running Runtime + NCI-1 Custom Chip¶
Status: ⬜ Future
Two parallel tracks:
Long-running runtime memory management. Terse's current allocation strategy leaks heap memory — fine for short-lived programs, fatal for processes that run for years. Phase 5 replaces this with a strategy fit for NCI hosting Terse for its operational lifetime: likely reference counting for short-lived strings, interning for repeated concept references, generational GC for general allocations.
NCI-1 chip tape out. Based on validated FPGA design from Phase 4.
- FPGA design validated and stable
- RTL design finalized
- Foundry selection
- NCI-1 tape out
- NCI Box appliance — first product shipping NCI-1
- Terse self-hosting — Terse compiler written in Terse
Phase 6 — NCI Ethics Core Chip¶
Status: ⬜ Future
A dedicated hardware ethics enforcement chip. Ethics rules written in Terse, executed in silicon.
The Principle
"You can't jailbreak silicon. A hardware ethics engine is not a filter — it is a physical constraint. No software layer can override it because it is upstream of all software."
- Ethics rule engine in Verilog
- Immutable rule storage — one-time programmable memory
- Tamper-evident audit log in hardware
- Hardware attestation — private key in silicon
- PCIe Ethics Card — plugs into any server
- Integrated on all NCI Box appliances
The Vertical Stack¶
NCI Ethics Core Chip <- sealed Terse ethics programs
|
NCI-1 Chip <- Terse compiler targets this silicon
|
Terse <- semantic compression, runtime integrity
|
NCI <- Terse is NCI's native language
Authored by Lesley Ancion · Goatface Tech · Sundre, Alberta · 2026