Skip to content

Session 1 — Lexer & Parser

What Was Built

The foundation of the Terse interpreter — the two components that turn raw source code into a structured representation the interpreter can execute.


The Lexer

File: src/interpreter/lexer.py

The lexer reads raw Terse source code character by character and produces a flat list of labeled tokens. A token is the smallest meaningful unit of the language — a keyword, an identifier, a number, a string.

know dog is animal

Becomes:

Token(KEYWORD, 'know', line=1)
Token(IDENTIFIER, 'dog', line=1)
Token(KEYWORD, 'is', line=1)
Token(IDENTIFIER, 'animal', line=1)
Token(NEWLINE, '\n', line=1)

What the Lexer handles

  • Keywordsknow, is, has, when, then, infer, each, in, to, return, while, learn, predict, generate, ethics, rule, deny, allow, and more
  • Identifiers — any word that isn't a keyword
  • Numbers — integers and decimals
  • Strings — text in double quotes
  • Comments// to end of line, silently skipped
  • Arrows-> operator
  • Newlines — tracked for line counting and statement separation

The Parser

File: src/interpreter/parser.py

The parser reads the token list from the lexer and builds an Abstract Syntax Tree (AST) — a tree of structured objects representing the program's meaning.

Analogy

The lexer is like breaking a sentence into individual words. The parser is like diagramming the sentence — identifying the subject, verb, and object, and understanding how they relate.

AST Nodes

Each statement type in Terse becomes a specific AST node:

Node Terse Syntax What It Represents
KnowStatement know dog is animal A fact about a node
RelationshipStatement dog chases cat An edge between nodes
WhenStatement when has fur then is mammal An inference rule
InferStatement infer dog Apply rules to a node
FunctionDefinition to classify thing Define a function
FunctionCall classify dog Call a function
ReturnStatement return thing Return from a function
Program (root) The whole program

How the Parser decides what kind of statement it's reading

  • Line starts with knowKnowStatement
  • Line starts with whenWhenStatement
  • Line starts with inferInferStatement
  • Line starts with toFunctionDefinition
  • Line starts with an identifier → RelationshipStatement or FunctionCall

Two-word identifier lines (classify dog) are function calls. Three-word identifier lines (dog chases cat) are relationships.


The Pipeline So Far

Terse source code
   Lexer
   "know dog is animal" → [Token, Token, Token, Token]
   Parser
   [tokens] → KnowStatement(subject='dog', verb='is', obj='animal')
   Program([KnowStatement, RelationshipStatement, ...])

The interpreter (built in Sessions 2 and 3) takes this Program and executes it.