Session 1 — Lexer & Parser¶
What Was Built¶
The foundation of the Terse interpreter — the two components that turn raw source code into a structured representation the interpreter can execute.
The Lexer¶
File: src/interpreter/lexer.py
The lexer reads raw Terse source code character by character and produces a flat list of labeled tokens. A token is the smallest meaningful unit of the language — a keyword, an identifier, a number, a string.
Becomes:
Token(KEYWORD, 'know', line=1)
Token(IDENTIFIER, 'dog', line=1)
Token(KEYWORD, 'is', line=1)
Token(IDENTIFIER, 'animal', line=1)
Token(NEWLINE, '\n', line=1)
What the Lexer handles¶
- Keywords —
know,is,has,when,then,infer,each,in,to,return,while,learn,predict,generate,ethics,rule,deny,allow, and more - Identifiers — any word that isn't a keyword
- Numbers — integers and decimals
- Strings — text in double quotes
- Comments —
//to end of line, silently skipped - Arrows —
->operator - Newlines — tracked for line counting and statement separation
The Parser¶
File: src/interpreter/parser.py
The parser reads the token list from the lexer and builds an Abstract Syntax Tree (AST) — a tree of structured objects representing the program's meaning.
Analogy
The lexer is like breaking a sentence into individual words. The parser is like diagramming the sentence — identifying the subject, verb, and object, and understanding how they relate.
AST Nodes¶
Each statement type in Terse becomes a specific AST node:
| Node | Terse Syntax | What It Represents |
|---|---|---|
KnowStatement |
know dog is animal |
A fact about a node |
RelationshipStatement |
dog chases cat |
An edge between nodes |
WhenStatement |
when has fur then is mammal |
An inference rule |
InferStatement |
infer dog |
Apply rules to a node |
FunctionDefinition |
to classify thing |
Define a function |
FunctionCall |
classify dog |
Call a function |
ReturnStatement |
return thing |
Return from a function |
Program |
(root) | The whole program |
How the Parser decides what kind of statement it's reading¶
- Line starts with
know→KnowStatement - Line starts with
when→WhenStatement - Line starts with
infer→InferStatement - Line starts with
to→FunctionDefinition - Line starts with an identifier →
RelationshipStatementorFunctionCall
Two-word identifier lines (classify dog) are function calls. Three-word identifier lines (dog chases cat) are relationships.
The Pipeline So Far¶
Terse source code
↓
Lexer
"know dog is animal" → [Token, Token, Token, Token]
↓
Parser
[tokens] → KnowStatement(subject='dog', verb='is', obj='animal')
↓
Program([KnowStatement, RelationshipStatement, ...])
The interpreter (built in Sessions 2 and 3) takes this Program and executes it.