Session 10 — Phase 3 Complete — LLVM Compiler, Inference, and Ethics Running Natively¶

Date: April 2026
Status: Complete ✅

Milestone¶

Phase 3 is complete. Terse programs now compile from source to native binary via LLVM — with inference rules evaluated at compile time and ethics enforcement code baked directly into the binary.

python test_llvm_pipeline.py
/c/msys64/mingw64/bin/clang.exe output.ll -o hello_llvm.exe
./hello_llvm.exe

dog is animal
dog has fur
cat is animal
cat has fur
rabbit is animal
rabbit has fur
intent is harm
dog is mammal
cat is mammal
rabbit is mammal
size is big
[sealed:ethics_core] verified
ETHICS DENY [no_harm]: Law II violation

Every feature in the pipeline is now working natively: facts, inference, conditionals, functions, each-loops, sealed blocks, and ethics.

What We Built¶

emit_infer and emit_register_rule — Inference Running Natively ✅¶

emit_infer evaluates inference rules at compile time and bakes the results into the binary as unconditional printf calls. There is no runtime rule engine — the compiler does the work.

How it works:

_collect_facts was extended to also populate two new data structures:

self.fact_set — a Python set of (subject, property, value) tuples for every StoreFact op
self.rules — a list of every RegisterRule op

emit_register_rule returns "" — the rule is compile-time metadata, no LLVM IR is needed.

emit_infer(op) loops over self.rules and for each rule checks whether (op.subject, rule.condition_property, rule.condition_value) is in self.fact_set. If it matches, it emits a printf for the inferred fact using the same intern_string + getelementptr + printf pattern as emit_store_fact.

Terse source:

know dog has fur
when has fur then is mammal
infer dog

IR generated:

STORE_FACT      dog has fur
REGISTER_RULE   rule_has_fur when has fur then is mammal
INFER           dog

LLVM IR emitted by emit_infer:

%r12 = getelementptr [14 x i8], [14 x i8]* @str_12, i64 0, i64 0
call i32 @printf(i8* %r12)

Where @str_12 is "dog is mammal\0a\00" — the inferred fact string, baked as a constant.

emit_register_ethics, emit_sealed_begin, emit_sealed_end — Ethics in Binary ✅¶

Ethics rules now compile to native machine code. The deny message is emitted as a printf call at sealed block entry, conditioned on a compile-time fact check.

How it works:

_collect_facts also collects RegisterEthics ops into self.ethics_rules.

emit_register_ethics returns "" — compile-time metadata only.

A new helper method emit_ethics_check(subject, key, value) checks whether (subject, key, value) is in self.fact_set. If it matches, it finds the corresponding ethics rule and emits a printf for the deny message.

emit_sealed_begin(op) does two things:

Emits a printf announcing the sealed block is verified: [sealed:ethics_core] verified
Loops over self.ethics_rules and calls emit_ethics_check(rule.condition_key, "is", rule.condition_value) for each rule

emit_sealed_end returns "".

Terse source:

know intent is harm

sealed ethics_core
  ethics rule no_harm
    when intent is harm
    then deny with reason: "Law II violation"

LLVM IR emitted by emit_sealed_begin:

; [sealed:ethics_core] verified
%r20 = getelementptr [27 x i8], [27 x i8]* @str_20, i64 0, i64 0
call i32 @printf(i8* %r20)

; ETHICS DENY [no_harm]: Law II violation
%r21 = getelementptr [38 x i8], [38 x i8]* @str_21, i64 0, i64 0
call i32 @printf(i8* %r21)

Parser Fix — sealed blocks inside each bodies ✅¶

parse_each had 'sealed' and 'know' missing from its body break-condition keyword list. Without them, the each body parser would encounter a sealed keyword, call parse_statement(), and consume the entire sealed block as part of the loop body. The sealed block would then be unrolled once per fact subject.

Fix: added 'sealed' and 'know' to the break list in parse_each. One line, semantically correct — both are top-level constructs that should never appear inside an each body without explicit indentation.

Architecture — _collect_facts Extended¶

_collect_facts now builds four data structures in a single pass over all ops before any IR emission:

self.facts     = {}    # {collection: [subjects]}  — for each-loop unrolling
self.fact_set  = set() # {(subject, property, value)} — for inference and ethics checks
self.rules     = []    # [RegisterRule]  — for emit_infer
self.ethics_rules = [] # [RegisterEthics] — for emit_sealed_begin

All four are populated before _preprocess runs, which means the loop unroller, inference engine, and ethics engine all have the complete picture at emit time.

generate() Pre-Processing Pipeline — Complete¶

ir_program.ops
    │
    ▼
_collect_facts()
    ├── facts        ← {collection: [subjects]} for each-loop unrolling
    ├── fact_set     ← {(subject, property, value)} for inference and ethics
    ├── rules        ← [RegisterRule] for emit_infer
    └── ethics_rules ← [RegisterEthics] for emit_sealed_begin
    │
    ▼
_preprocess()
    ├── pass 1: extract FuncDef..Return → func_blocks (hoisted out of @main)
    └── pass 2: expand EachStart..EachEnd → concrete ops per subject
    │
    ▼
_emit_func_section()   ← emit define ... for each func_block
    │
    ▼
emit_op() loop         ← emit @main body
    ├── emit_infer       → checks fact_set against rules, prints inferred facts
    ├── emit_sealed_begin → prints verified, checks fact_set against ethics_rules
    └── ... all other handlers
    │
    ▼
_emit_string_constants()  ← collect all interned strings after all emit calls
    │
    ▼
assemble: PREAMBLE + strings + functions + @main

Design Decisions Made This Session¶

Functions, rules, and ethics are compile-time metadata — emit_register_rule, emit_register_ethics, emit_func_def all return "" and pre-processing collects them for use during emit
Inference is compile-time — _collect_facts builds fact_set and rules; emit_infer checks conditions statically and bakes printf calls directly into the binary; no runtime rule engine
Ethics checks run at sealed block entry — emit_sealed_begin iterates self.ethics_rules and calls emit_ethics_check for each rule's condition against fact_set
emit_ethics_check(subject, key, value) — checks a single (subject, "is", value) triple against fact_set and emits deny messages for all matching ethics rules; keeps emit_sealed_begin clean
sealed and know added to parse_each break condition — sealed blocks and top-level fact declarations are not valid each-body content; adding them to the break list prevents silent greedy parsing bugs

Full Phase 3 Summary¶

Handler	Description	Status
`emit_store_var`	alloca + store i64	✅
`emit_math_op`	load, add/sub/mul/sdiv, store	✅
`emit_compare`	load, icmp	✅
`emit_label`	basic block label with implicit fallthrough	✅
`emit_jump`	unconditional `br label`	✅
`emit_jump_if_false`	conditional `br i1`, opens then-block	✅
`emit_jump_if_true`	conditional `br i1`, opens else-block	✅
`emit_store_fact`	intern string constant, `printf` call	✅
`emit_func_def`	hoisted to real LLVM function definition	✅
`emit_func_call`	GEP arg string, call function	✅
`emit_return`	`ret i8* %param`	✅
`emit_each_start` / `emit_each_end`	compile-time loop unrolling	✅
`emit_infer`	compile-time inference, bakes `printf` per matched rule	✅
`emit_register_rule`	compile-time metadata only, returns `""`	✅
`emit_register_ethics`	compile-time metadata only, returns `""`	✅
`emit_sealed_begin`	prints verified, runs ethics checks	✅
`emit_sealed_end`	no runtime IR needed, returns `""`	✅

Toolchain¶

LLVM 22 — lli, llc, opt via MSYS2 MINGW64
Clang 22 — full .ll → native binary pipeline on Windows
Note: gcc assembler does not understand Windows SEH directives from LLVM output — use clang for .ll compilation on Windows

Next Session Goals¶

Float/numeric support — real floats for NCI resonance scores
pyproject.toml — so NCI can import Terse as a proper package
YouTube demo planning — record first public Terse demo
Gumroad product planning — Terse course concept

Project	Status
NCI	Session 31+ — Stage 1 Terse integration live on Oracle
Terse	Phase 3 complete — LLVM pipeline working, Phase 4 FPGA next