Skip to content

Session 11 — Phase 3 Complete — Full LLVM Pipeline Working

Date: April 2026
Status: Complete ✅


Milestone

Phase 3 is complete. Terse programs now compile to native binaries via LLVM with no intermediate C step. The full pipeline from .trs source through LLVM IR to a native Windows binary is confirmed working end-to-end.

python test_llvm_pipeline.py
/c/msys64/mingw64/bin/clang.exe output.ll -o hello_llvm.exe
./hello_llvm.exe
dog is animal
dog has fur
cat is animal
cat has fur
rabbit is animal
rabbit has fur
size is big

Two complete phases shipped today:

Phase Description Status
2.5 GCC install — first native binary ✅ Complete
3 LLVM compiler — native via LLVM ✅ Complete

Phase 2.5 — First Native Binary

GCC installed via MSYS2 MINGW64 on Windows. output.c compiled and run — the first ever native Terse program. The inference engine ran as compiled machine code, not Python.

gcc output.c -o hello -I. -Wall
./hello.exe
# dog is mammal: 1

dog is mammal: 1 — inference derived from dog has fur and when has fur then is mammal, running as a native binary.


Phase 3 — LLVM Compiler

What Was Built

llvm_emitter.py — a complete LLVM IR emitter that takes the same IR produced by ir_compiler.py and emits valid LLVM IR instead of C.

Phase 2:  IR -> c_emitter.py    -> .c  -> gcc   -> binary
Phase 3:  IR -> llvm_emitter.py -> .ll -> clang -> binary

The frontend (lexer, parser, AST, IR compiler) is identical. Only the emitter changed.

Handlers Implemented

Variables and arithmetic:

  • emit_store_var — alloca + store i64
  • emit_math_op — load, add/sub/mul/sdiv, store result
  • emit_compare — load, icmp sgt/slt/sge/sle/eq/ne

Control flow:

  • emit_label — basic block labels with block_terminated flag for implicit fallthrough br
  • emit_jump — unconditional br label
  • emit_jump_if_false — conditional br i1, opens then-block
  • emit_jump_if_true — conditional br i1, opens else-block

Facts:

  • emit_store_fact — interns fact string as LLVM global constant, calls printf directly

Functions:

  • emit_func_def — hoisted out of @main into a real LLVM function definition
  • emit_func_call — interns arg string, GEPs pointer, calls the function
  • emit_returnret i8* %param

Loops:

  • emit_each_start / emit_each_end — compile-time unroll; one concrete copy of the loop body per known fact subject

Architecture — generate() Pre-Processing Pipeline

ir_program.ops
_collect_facts()          ← build facts dict from StoreFact "is" ops
_preprocess()
    ├── pass 1: extract FuncDef..Return → func_blocks (hoisted out of @main)
    └── pass 2: expand EachStart..EachEnd → concrete ops per subject
_emit_func_section()      ← emit define ... for each func_block
emit_op() loop            ← emit @main body from main_ops
_emit_string_constants()  ← collect all interned strings after all emit calls
assemble: PREAMBLE + strings + functions + @main

Bug Fixes

LLVM String Escape Sequences

Python string escaping was mangling LLVM hex escape sequences in LLVM_PREAMBLE:

  • c"\0a\00" — Python consumed \0 as a null byte, leaving a\00 → chr(0)+a+chr(0). Fix: c"\\0a\\00"
  • c"%f\00" — Python consumed \00 as chr(0). Fix: c"%f\\00"
  • _emit_string_constants escaped \n to \\n (two printable chars). Fix: escape to \\0a

Basic Block Terminators

LLVM IR requires every basic block to end with a terminator (br, ret) before the next label. Two structural bugs fixed:

  • emit_jump_if_false was pre-adding a br label %else to the then-block — wrong label, wrong position. Removed. The then-block's terminator comes from the explicit Jump op that follows in the IR.
  • emit_label had no awareness of whether the preceding block was terminated. A block_terminated boolean was added. emit_label checks the flag and emits an implicit fallthrough br only when the prior block has no terminator.

Parser Bug — with reason:

with and reason were not in KEYWORDS in lexer.py. The relationship handler consumed them as identifiers before the ethics parser could reach them, producing stray output: linked: with -> reason -> :.

Fix: added with and reason to KEYWORDS.

SEALED_SIGNATURES Bug

seal.py prints "Paste this into your project" but interpreter.py initialized self.sealed_signatures = {} (empty dict). Sealed block verification always failed with "has no registered signature".

Fix: SEALED_SIGNATURES added as a module-level constant in interpreter.py. __init__ uses it instead of an empty dict.


Toolchain

  • LLVM 22lli, llc, opt via MSYS2 MINGW64
  • Clang 22 — full .ll → native binary pipeline on Windows
  • Note: gcc assembler does not understand Windows SEH directives from LLVM output — use clang, not gcc, for LLVM IR compilation on Windows

Design Decisions

  • Functions extracted from main op stream before emit — LLVM IR doesn't allow nested function definitions; _preprocess hoists every FuncDef..Return block out of @main
  • Each loops unrolled at compile time — one concrete copy of the body per known fact subject; no runtime collection needed
  • Facts pre-collected in first pass_collect_facts runs before _preprocess; loop unroller needs the complete fact set before scanning for EachStart
  • block_terminated flag — tracks whether the current basic block has a terminator; emit_label uses it to add implicit fallthrough branches only where needed
  • Use clang not gcc for LLVM IR — gcc assembler does not understand Windows SEH directives from LLVM output
  • LLVM \0a for newlines — LLVM c-string literals use hex escapes; Python \n must be escaped to \\0a in _emit_string_constants

Project Status
NCI Session 32 complete — profile system live, Stage 1 Terse integration confirmed
Terse Phase 3 complete — LLVM pipeline working, Phase 4 FPGA next

Next Session Goals

  1. emit_infer — make inference work in compiled binary
  2. emit_register_rule — make inference rules work in compiled binary
  3. emit_ethics / emit_sealed — ethics rules in compiled binary
  4. Float support — real floats for NCI resonance scores
  5. pyproject.toml — so NCI can import Terse as a proper package