Skip to content

Session 10 — Phase 3 Complete — LLVM Compiler, Inference, and Ethics Running Natively

Date: April 2026
Status: Complete ✅


Milestone

Phase 3 is complete. Terse programs now compile from source to native binary via LLVM — with inference rules evaluated at compile time and ethics enforcement code baked directly into the binary.

python test_llvm_pipeline.py
/c/msys64/mingw64/bin/clang.exe output.ll -o hello_llvm.exe
./hello_llvm.exe
dog is animal
dog has fur
cat is animal
cat has fur
rabbit is animal
rabbit has fur
intent is harm
dog is mammal
cat is mammal
rabbit is mammal
size is big
[sealed:ethics_core] verified
ETHICS DENY [no_harm]: Law II violation

Every feature in the pipeline is now working natively: facts, inference, conditionals, functions, each-loops, sealed blocks, and ethics.


What We Built

emit_infer and emit_register_rule — Inference Running Natively ✅

emit_infer evaluates inference rules at compile time and bakes the results into the binary as unconditional printf calls. There is no runtime rule engine — the compiler does the work.

How it works:

_collect_facts was extended to also populate two new data structures:

  • self.fact_set — a Python set of (subject, property, value) tuples for every StoreFact op
  • self.rules — a list of every RegisterRule op

emit_register_rule returns "" — the rule is compile-time metadata, no LLVM IR is needed.

emit_infer(op) loops over self.rules and for each rule checks whether (op.subject, rule.condition_property, rule.condition_value) is in self.fact_set. If it matches, it emits a printf for the inferred fact using the same intern_string + getelementptr + printf pattern as emit_store_fact.

Terse source:

know dog has fur
when has fur then is mammal
infer dog

IR generated:

STORE_FACT      dog has fur
REGISTER_RULE   rule_has_fur when has fur then is mammal
INFER           dog

LLVM IR emitted by emit_infer:

%r12 = getelementptr [14 x i8], [14 x i8]* @str_12, i64 0, i64 0
call i32 @printf(i8* %r12)

Where @str_12 is "dog is mammal\0a\00" — the inferred fact string, baked as a constant.


emit_register_ethics, emit_sealed_begin, emit_sealed_end — Ethics in Binary ✅

Ethics rules now compile to native machine code. The deny message is emitted as a printf call at sealed block entry, conditioned on a compile-time fact check.

How it works:

_collect_facts also collects RegisterEthics ops into self.ethics_rules.

emit_register_ethics returns "" — compile-time metadata only.

A new helper method emit_ethics_check(subject, key, value) checks whether (subject, key, value) is in self.fact_set. If it matches, it finds the corresponding ethics rule and emits a printf for the deny message.

emit_sealed_begin(op) does two things:

  1. Emits a printf announcing the sealed block is verified: [sealed:ethics_core] verified
  2. Loops over self.ethics_rules and calls emit_ethics_check(rule.condition_key, "is", rule.condition_value) for each rule

emit_sealed_end returns "".

Terse source:

know intent is harm

sealed ethics_core
  ethics rule no_harm
    when intent is harm
    then deny with reason: "Law II violation"

LLVM IR emitted by emit_sealed_begin:

; [sealed:ethics_core] verified
%r20 = getelementptr [27 x i8], [27 x i8]* @str_20, i64 0, i64 0
call i32 @printf(i8* %r20)

; ETHICS DENY [no_harm]: Law II violation
%r21 = getelementptr [38 x i8], [38 x i8]* @str_21, i64 0, i64 0
call i32 @printf(i8* %r21)

Parser Fix — sealed blocks inside each bodies ✅

parse_each had 'sealed' and 'know' missing from its body break-condition keyword list. Without them, the each body parser would encounter a sealed keyword, call parse_statement(), and consume the entire sealed block as part of the loop body. The sealed block would then be unrolled once per fact subject.

Fix: added 'sealed' and 'know' to the break list in parse_each. One line, semantically correct — both are top-level constructs that should never appear inside an each body without explicit indentation.


Architecture — _collect_facts Extended

_collect_facts now builds four data structures in a single pass over all ops before any IR emission:

self.facts     = {}    # {collection: [subjects]}  — for each-loop unrolling
self.fact_set  = set() # {(subject, property, value)} — for inference and ethics checks
self.rules     = []    # [RegisterRule]  — for emit_infer
self.ethics_rules = [] # [RegisterEthics] — for emit_sealed_begin

All four are populated before _preprocess runs, which means the loop unroller, inference engine, and ethics engine all have the complete picture at emit time.


generate() Pre-Processing Pipeline — Complete

ir_program.ops
_collect_facts()
    ├── facts        ← {collection: [subjects]} for each-loop unrolling
    ├── fact_set     ← {(subject, property, value)} for inference and ethics
    ├── rules        ← [RegisterRule] for emit_infer
    └── ethics_rules ← [RegisterEthics] for emit_sealed_begin
_preprocess()
    ├── pass 1: extract FuncDef..Return → func_blocks (hoisted out of @main)
    └── pass 2: expand EachStart..EachEnd → concrete ops per subject
_emit_func_section()   ← emit define ... for each func_block
emit_op() loop         ← emit @main body
    ├── emit_infer       → checks fact_set against rules, prints inferred facts
    ├── emit_sealed_begin → prints verified, checks fact_set against ethics_rules
    └── ... all other handlers
_emit_string_constants()  ← collect all interned strings after all emit calls
assemble: PREAMBLE + strings + functions + @main

Design Decisions Made This Session

  • Functions, rules, and ethics are compile-time metadataemit_register_rule, emit_register_ethics, emit_func_def all return "" and pre-processing collects them for use during emit
  • Inference is compile-time_collect_facts builds fact_set and rules; emit_infer checks conditions statically and bakes printf calls directly into the binary; no runtime rule engine
  • Ethics checks run at sealed block entryemit_sealed_begin iterates self.ethics_rules and calls emit_ethics_check for each rule's condition against fact_set
  • emit_ethics_check(subject, key, value) — checks a single (subject, "is", value) triple against fact_set and emits deny messages for all matching ethics rules; keeps emit_sealed_begin clean
  • sealed and know added to parse_each break condition — sealed blocks and top-level fact declarations are not valid each-body content; adding them to the break list prevents silent greedy parsing bugs

Full Phase 3 Summary

Handler Description Status
emit_store_var alloca + store i64
emit_math_op load, add/sub/mul/sdiv, store
emit_compare load, icmp
emit_label basic block label with implicit fallthrough
emit_jump unconditional br label
emit_jump_if_false conditional br i1, opens then-block
emit_jump_if_true conditional br i1, opens else-block
emit_store_fact intern string constant, printf call
emit_func_def hoisted to real LLVM function definition
emit_func_call GEP arg string, call function
emit_return ret i8* %param
emit_each_start / emit_each_end compile-time loop unrolling
emit_infer compile-time inference, bakes printf per matched rule
emit_register_rule compile-time metadata only, returns ""
emit_register_ethics compile-time metadata only, returns ""
emit_sealed_begin prints verified, runs ethics checks
emit_sealed_end no runtime IR needed, returns ""

Toolchain

  • LLVM 22lli, llc, opt via MSYS2 MINGW64
  • Clang 22 — full .ll → native binary pipeline on Windows
  • Note: gcc assembler does not understand Windows SEH directives from LLVM output — use clang for .ll compilation on Windows

Next Session Goals

  1. Float/numeric support — real floats for NCI resonance scores
  2. pyproject.toml — so NCI can import Terse as a proper package
  3. YouTube demo planning — record first public Terse demo
  4. Gumroad product planning — Terse course concept

Project Status
NCI Session 31+ — Stage 1 Terse integration live on Oracle
Terse Phase 3 complete — LLVM pipeline working, Phase 4 FPGA next