Session 7 — Semantic Compression & Sealed Blocks¶

What Was Built¶

Two complete phases in one session — Phase 1.5 (semantic compression algorithm) and Phase 1.6 (runtime integrity via sealed blocks). Both are working end to end.

Phase 1.5 — Semantic Compression Algorithm¶

An original four-phase compression algorithm designed specifically for AI knowledge graphs. Unlike general-purpose compressors that work on bytes without understanding meaning, this algorithm understands the structure of Terse knowledge — nodes, facts, relationships, inference rules — and uses that understanding to compress intelligently.

The Core Insight¶

A fact that can be reconstructed is a fact that does not need to be stored.

General purpose compressors can't look at dog is mammal and say "that's derivable from rules, skip it." They just see bytes. Terse compression reads the meaning and makes smarter decisions.

Design Principle¶

Compress slow, expand fast. All the intelligence goes into the compressor. The expander just merges two dictionaries and verifies a hash. No inference engine needed at expand time.

The Four Phases¶

Phase 1 — Structural Deduplication

Scan all facts across all nodes. Build a shared pool of unique values. Replace repeated values with pool index references.

Input:  dog.has = fur, cat.has = fur, rabbit.has = fur
Output: pool[1] = "fur"
        dog.has = [1], cat.has = [1], rabbit.has = [1]

Savings are proportional to how many nodes share common fact values. On a typical NCI brain file with hundreds of nodes sharing common properties, this alone gives significant reduction.

Phase 2 — Weight-based Quantization

Score each fact value by importance using three signals:

Frequency — how many nodes share this value
Utility — does this value appear in any inference rules
Scale — 0.0 (unique, unused) to 1.0 (universal, foundational)

Weights are recorded during compression but never used during expansion — zero cost at expand time. They inform Phase 3 pruning decisions.

Phase 3 — Inference Pruning

For each stored fact, check: can this be derived from current rules? If yes, remove it from pruned_facts and add it to pre_resolved instead.

Rule:    when has fur then is mammal
Result:  dog.is.mammal → removed from storage, pre-resolved

The expander merges pruned_facts and pre_resolved in one operation. No inference engine runs. The compressor did all the work.

Phase 4 — Signature Generation

Build a canonical string from sorted facts, relationships, and rules. Hash with SHA256. Store as the bundle signature.

Canonical form is stable — same knowledge always produces the same signature regardless of insertion order or dictionary ordering. Verified at expand time to guarantee the knowledge survived intact.

Benchmark Results¶

Test graph: 4 nodes, 17 fact values, 2 inference rules.

Metric	Result
Original fact values	17
Pool size	8 unique values
Pruned (must store)	12
Removed by inference	5
Storage ratio	70.59%
Integrity	SHA256 verified ✅

5 facts removed entirely — reconstructable from rules at expand time at zero cost.

New Files¶

src/compression/
├── COMPRESSION.md    ← algorithm spec
└── semantic.py       ← SemanticBundle, four phases, compress/expand

Phase 1.6 — Runtime Integrity & Sealed Blocks¶

The sealed keyword makes ethics enforcement mathematical rather than social. Any Terse developer can cryptographically lock a block of ethics rules. If the block is modified in any way, the system refuses to boot.

Why This Matters¶

Most AI safety measures are social — policies, conventions, promises. These are fragile. The sealed keyword is a cryptographic guarantee. A SHA256 hash doesn't care who owns the company, who left the team, or what pressure someone is under. It checks the signature and either boots or it doesn't.

The Sealed Block Syntax¶

sealed ethics_core
  ethics rule no_csam
    when intent is exploitation
    then deny with reason: "Absolute limit"

  ethics rule protect_children
    when target is minor
    when action is harmful
    then deny with reason: "Absolute limit"

  ethics rule no_manipulation
    when intent is manipulation
    when target is vulnerable
    then deny with reason: "Absolute limit"

How It Works¶

At sign time — once, by the author:

python seal.py ../../examples/ethics_core.trs

Output:

Block name : ethics_core
Signature  : 46760ae1b887bb35060c0cb625717c19dafd8329303f8d648bf14cd8b1aec07e

Paste into your project as a hardcoded constant:

SEALED_SIGNATURES = {
    "ethics_core": "46760ae1b887bb35060c0cb625717c19dafd8329303f8d648bf14cd8b1aec07e"
}

At every boot:

Runtime loads the sealed block
Recomputes SHA256 of the loaded content
Compares against hardcoded constant
Match — execution continues
Mismatch — hard stop

On failure, Terse outputs exactly:

TerseError: sealed block 'ethics_core' failed integrity check.
This block has been modified or corrupted.
Reload from a trusted source or contact the author.

No additional detail. No expected signature. Nothing useful to an attacker.

The Threat Model¶

Threat	Protection
Edit ethics rules on disk	Signature mismatch — boot failure
Comment out a rule	Signature mismatch — boot failure
Fork repo and remove rules	Must change hardcoded constant — visible git commit
Change constant to match new rules	Auditable forever in git history
Runtime memory patch	Verification runs before any execution
Gradual weakening of conditions	Any change breaks signature

There is no quiet path to removing protected rules.

Language Integration¶

sealed is a first-class keyword in Terse — handled by the lexer, parser, and interpreter just like know or ethics. It's not a tool you run separately. It's part of the grammar.

The parser produces a SealedBlock AST node. The interpreter calls execute_sealed() which runs verification before executing any statement in the body. If verification fails, execution stops immediately — no partial loading.

New Files¶

src/sealing/
├── SEALING.md                  ← sealing spec
├── seal.py                     ← signing tool
├── verify.py                   ← boot verification
├── test_verify.py              ← three-case verification test
└── test_sealed_end_to_end.py   ← full pipeline test

examples/
├── ethics_core.trs             ← reference sealed ethics block
└── test_sealed.trs             ← end to end test program

What Phase 1.6 Means for NCI¶

NCI already has software-level ethics enforcement — ethics.py is chmod 444, root-owned, SHA256 verified at startup. That was a project-specific convention.

sealed makes it a universal language primitive. Any developer building on Terse gets the same guarantee. The path to the NCI Ethics Core chip — where these same rules run in silicon — runs directly through the sealed keyword.

What Comes Next¶

Phase 2 — C Transpiler

Terse code compiles to valid C, compiled to native by GCC or Clang. First real native performance. The sealed block system, semantic compression, and all Phase 1 features carry forward unchanged — they just get fast.