Session 10 — Phase 3 Complete — LLVM Compiler, Inference, and Ethics Running Natively¶
Date: April 2026
Status: Complete ✅
Milestone¶
Phase 3 is complete. Terse programs now compile from source to native binary via LLVM — with inference rules evaluated at compile time and ethics enforcement code baked directly into the binary.
python test_llvm_pipeline.py
/c/msys64/mingw64/bin/clang.exe output.ll -o hello_llvm.exe
./hello_llvm.exe
dog is animal
dog has fur
cat is animal
cat has fur
rabbit is animal
rabbit has fur
intent is harm
dog is mammal
cat is mammal
rabbit is mammal
size is big
[sealed:ethics_core] verified
ETHICS DENY [no_harm]: Law II violation
Every feature in the pipeline is now working natively: facts, inference, conditionals, functions, each-loops, sealed blocks, and ethics.
What We Built¶
emit_infer and emit_register_rule — Inference Running Natively ✅¶
emit_infer evaluates inference rules at compile time and bakes the results into the binary as unconditional printf calls. There is no runtime rule engine — the compiler does the work.
How it works:
_collect_facts was extended to also populate two new data structures:
self.fact_set— a Pythonsetof(subject, property, value)tuples for everyStoreFactopself.rules— a list of everyRegisterRuleop
emit_register_rule returns "" — the rule is compile-time metadata, no LLVM IR is needed.
emit_infer(op) loops over self.rules and for each rule checks whether (op.subject, rule.condition_property, rule.condition_value) is in self.fact_set. If it matches, it emits a printf for the inferred fact using the same intern_string + getelementptr + printf pattern as emit_store_fact.
Terse source:
IR generated:
LLVM IR emitted by emit_infer:
Where @str_12 is "dog is mammal\0a\00" — the inferred fact string, baked as a constant.
emit_register_ethics, emit_sealed_begin, emit_sealed_end — Ethics in Binary ✅¶
Ethics rules now compile to native machine code. The deny message is emitted as a printf call at sealed block entry, conditioned on a compile-time fact check.
How it works:
_collect_facts also collects RegisterEthics ops into self.ethics_rules.
emit_register_ethics returns "" — compile-time metadata only.
A new helper method emit_ethics_check(subject, key, value) checks whether (subject, key, value) is in self.fact_set. If it matches, it finds the corresponding ethics rule and emits a printf for the deny message.
emit_sealed_begin(op) does two things:
- Emits a
printfannouncing the sealed block is verified:[sealed:ethics_core] verified - Loops over
self.ethics_rulesand callsemit_ethics_check(rule.condition_key, "is", rule.condition_value)for each rule
emit_sealed_end returns "".
Terse source:
know intent is harm
sealed ethics_core
ethics rule no_harm
when intent is harm
then deny with reason: "Law II violation"
LLVM IR emitted by emit_sealed_begin:
; [sealed:ethics_core] verified
%r20 = getelementptr [27 x i8], [27 x i8]* @str_20, i64 0, i64 0
call i32 @printf(i8* %r20)
; ETHICS DENY [no_harm]: Law II violation
%r21 = getelementptr [38 x i8], [38 x i8]* @str_21, i64 0, i64 0
call i32 @printf(i8* %r21)
Parser Fix — sealed blocks inside each bodies ✅¶
parse_each had 'sealed' and 'know' missing from its body break-condition keyword list. Without them, the each body parser would encounter a sealed keyword, call parse_statement(), and consume the entire sealed block as part of the loop body. The sealed block would then be unrolled once per fact subject.
Fix: added 'sealed' and 'know' to the break list in parse_each. One line, semantically correct — both are top-level constructs that should never appear inside an each body without explicit indentation.
Architecture — _collect_facts Extended¶
_collect_facts now builds four data structures in a single pass over all ops before any IR emission:
self.facts = {} # {collection: [subjects]} — for each-loop unrolling
self.fact_set = set() # {(subject, property, value)} — for inference and ethics checks
self.rules = [] # [RegisterRule] — for emit_infer
self.ethics_rules = [] # [RegisterEthics] — for emit_sealed_begin
All four are populated before _preprocess runs, which means the loop unroller, inference engine, and ethics engine all have the complete picture at emit time.
generate() Pre-Processing Pipeline — Complete¶
ir_program.ops
│
▼
_collect_facts()
├── facts ← {collection: [subjects]} for each-loop unrolling
├── fact_set ← {(subject, property, value)} for inference and ethics
├── rules ← [RegisterRule] for emit_infer
└── ethics_rules ← [RegisterEthics] for emit_sealed_begin
│
▼
_preprocess()
├── pass 1: extract FuncDef..Return → func_blocks (hoisted out of @main)
└── pass 2: expand EachStart..EachEnd → concrete ops per subject
│
▼
_emit_func_section() ← emit define ... for each func_block
│
▼
emit_op() loop ← emit @main body
├── emit_infer → checks fact_set against rules, prints inferred facts
├── emit_sealed_begin → prints verified, checks fact_set against ethics_rules
└── ... all other handlers
│
▼
_emit_string_constants() ← collect all interned strings after all emit calls
│
▼
assemble: PREAMBLE + strings + functions + @main
Design Decisions Made This Session¶
- Functions, rules, and ethics are compile-time metadata —
emit_register_rule,emit_register_ethics,emit_func_defall return""and pre-processing collects them for use during emit - Inference is compile-time —
_collect_factsbuildsfact_setandrules;emit_inferchecks conditions statically and bakesprintfcalls directly into the binary; no runtime rule engine - Ethics checks run at sealed block entry —
emit_sealed_beginiteratesself.ethics_rulesand callsemit_ethics_checkfor each rule's condition againstfact_set emit_ethics_check(subject, key, value)— checks a single(subject, "is", value)triple againstfact_setand emits deny messages for all matching ethics rules; keeps emit_sealed_begin cleansealedandknowadded to parse_each break condition — sealed blocks and top-level fact declarations are not valid each-body content; adding them to the break list prevents silent greedy parsing bugs
Full Phase 3 Summary¶
| Handler | Description | Status |
|---|---|---|
emit_store_var |
alloca + store i64 | ✅ |
emit_math_op |
load, add/sub/mul/sdiv, store | ✅ |
emit_compare |
load, icmp | ✅ |
emit_label |
basic block label with implicit fallthrough | ✅ |
emit_jump |
unconditional br label |
✅ |
emit_jump_if_false |
conditional br i1, opens then-block |
✅ |
emit_jump_if_true |
conditional br i1, opens else-block |
✅ |
emit_store_fact |
intern string constant, printf call |
✅ |
emit_func_def |
hoisted to real LLVM function definition | ✅ |
emit_func_call |
GEP arg string, call function | ✅ |
emit_return |
ret i8* %param |
✅ |
emit_each_start / emit_each_end |
compile-time loop unrolling | ✅ |
emit_infer |
compile-time inference, bakes printf per matched rule |
✅ |
emit_register_rule |
compile-time metadata only, returns "" |
✅ |
emit_register_ethics |
compile-time metadata only, returns "" |
✅ |
emit_sealed_begin |
prints verified, runs ethics checks | ✅ |
emit_sealed_end |
no runtime IR needed, returns "" |
✅ |
Toolchain¶
- LLVM 22 —
lli,llc,optvia MSYS2 MINGW64 - Clang 22 — full
.ll→ native binary pipeline on Windows - Note:
gccassembler does not understand Windows SEH directives from LLVM output — useclangfor.llcompilation on Windows
Next Session Goals¶
- Float/numeric support — real floats for NCI resonance scores
pyproject.toml— so NCI can import Terse as a proper package- YouTube demo planning — record first public Terse demo
- Gumroad product planning — Terse course concept
Related Projects¶
| Project | Status |
|---|---|
| NCI | Session 31+ — Stage 1 Terse integration live on Oracle |
| Terse | Phase 3 complete — LLVM pipeline working, Phase 4 FPGA next |