Session 11 — Phase 3 Complete — Full LLVM Pipeline Working¶
Date: April 2026
Status: Complete ✅
Milestone¶
Phase 3 is complete. Terse programs now compile to native binaries via LLVM with no intermediate C step. The full pipeline from .trs source through LLVM IR to a native Windows binary is confirmed working end-to-end.
python test_llvm_pipeline.py
/c/msys64/mingw64/bin/clang.exe output.ll -o hello_llvm.exe
./hello_llvm.exe
Two complete phases shipped today:
| Phase | Description | Status |
|---|---|---|
| 2.5 | GCC install — first native binary | ✅ Complete |
| 3 | LLVM compiler — native via LLVM | ✅ Complete |
Phase 2.5 — First Native Binary¶
GCC installed via MSYS2 MINGW64 on Windows. output.c compiled and run — the first ever native Terse program. The inference engine ran as compiled machine code, not Python.
dog is mammal: 1 — inference derived from dog has fur and when has fur then is mammal, running as a native binary.
Phase 3 — LLVM Compiler¶
What Was Built¶
llvm_emitter.py — a complete LLVM IR emitter that takes the same IR produced by ir_compiler.py and emits valid LLVM IR instead of C.
Phase 2: IR -> c_emitter.py -> .c -> gcc -> binary
Phase 3: IR -> llvm_emitter.py -> .ll -> clang -> binary
The frontend (lexer, parser, AST, IR compiler) is identical. Only the emitter changed.
Handlers Implemented¶
Variables and arithmetic:
emit_store_var— alloca + store i64emit_math_op— load, add/sub/mul/sdiv, store resultemit_compare— load, icmp sgt/slt/sge/sle/eq/ne
Control flow:
emit_label— basic block labels withblock_terminatedflag for implicit fallthroughbremit_jump— unconditionalbr labelemit_jump_if_false— conditionalbr i1, opens then-blockemit_jump_if_true— conditionalbr i1, opens else-block
Facts:
emit_store_fact— interns fact string as LLVM global constant, callsprintfdirectly
Functions:
emit_func_def— hoisted out of@maininto a real LLVM function definitionemit_func_call— interns arg string, GEPs pointer, calls the functionemit_return—ret i8* %param
Loops:
emit_each_start/emit_each_end— compile-time unroll; one concrete copy of the loop body per known fact subject
Architecture — generate() Pre-Processing Pipeline¶
ir_program.ops
│
▼
_collect_facts() ← build facts dict from StoreFact "is" ops
│
▼
_preprocess()
├── pass 1: extract FuncDef..Return → func_blocks (hoisted out of @main)
└── pass 2: expand EachStart..EachEnd → concrete ops per subject
│
▼
_emit_func_section() ← emit define ... for each func_block
│
▼
emit_op() loop ← emit @main body from main_ops
│
▼
_emit_string_constants() ← collect all interned strings after all emit calls
│
▼
assemble: PREAMBLE + strings + functions + @main
Bug Fixes¶
LLVM String Escape Sequences¶
Python string escaping was mangling LLVM hex escape sequences in LLVM_PREAMBLE:
c"\0a\00"— Python consumed\0as a null byte, leavinga\00→ chr(0)+a+chr(0). Fix:c"\\0a\\00"c"%f\00"— Python consumed\00as chr(0). Fix:c"%f\\00"_emit_string_constantsescaped\nto\\n(two printable chars). Fix: escape to\\0a
Basic Block Terminators¶
LLVM IR requires every basic block to end with a terminator (br, ret) before the next label. Two structural bugs fixed:
emit_jump_if_falsewas pre-adding abr label %elseto the then-block — wrong label, wrong position. Removed. The then-block's terminator comes from the explicitJumpop that follows in the IR.emit_labelhad no awareness of whether the preceding block was terminated. Ablock_terminatedboolean was added.emit_labelchecks the flag and emits an implicit fallthroughbronly when the prior block has no terminator.
Parser Bug — with reason:¶
with and reason were not in KEYWORDS in lexer.py. The relationship handler consumed them as identifiers before the ethics parser could reach them, producing stray output: linked: with -> reason -> :.
Fix: added with and reason to KEYWORDS.
SEALED_SIGNATURES Bug¶
seal.py prints "Paste this into your project" but interpreter.py initialized self.sealed_signatures = {} (empty dict). Sealed block verification always failed with "has no registered signature".
Fix: SEALED_SIGNATURES added as a module-level constant in interpreter.py. __init__ uses it instead of an empty dict.
Toolchain¶
- LLVM 22 —
lli,llc,optvia MSYS2 MINGW64 - Clang 22 — full
.ll→ native binary pipeline on Windows - Note:
gccassembler does not understand Windows SEH directives from LLVM output — useclang, notgcc, for LLVM IR compilation on Windows
Design Decisions¶
- Functions extracted from main op stream before emit — LLVM IR doesn't allow nested function definitions;
_preprocesshoists everyFuncDef..Returnblock out of@main - Each loops unrolled at compile time — one concrete copy of the body per known fact subject; no runtime collection needed
- Facts pre-collected in first pass —
_collect_factsruns before_preprocess; loop unroller needs the complete fact set before scanning forEachStart block_terminatedflag — tracks whether the current basic block has a terminator;emit_labeluses it to add implicit fallthrough branches only where needed- Use clang not gcc for LLVM IR — gcc assembler does not understand Windows SEH directives from LLVM output
- LLVM
\0afor newlines — LLVM c-string literals use hex escapes; Python\nmust be escaped to\\0ain_emit_string_constants
Related Projects¶
| Project | Status |
|---|---|
| NCI | Session 32 complete — profile system live, Stage 1 Terse integration confirmed |
| Terse | Phase 3 complete — LLVM pipeline working, Phase 4 FPGA next |
Next Session Goals¶
emit_infer— make inference work in compiled binaryemit_register_rule— make inference rules work in compiled binaryemit_ethics/emit_sealed— ethics rules in compiled binary- Float support — real floats for NCI resonance scores
pyproject.toml— so NCI can import Terse as a proper package