Lexer generator · LR(1) & LALR(1) parser generator · NFA engine
Unlike traditional tools, clex needs no separate code generation step. Define patterns at runtime and start tokenizing immediately.
Thompson NFA construction provides reliable matching with support for grouping, alternation, character classes, ranges, and quantifiers.
Full LR(1) parser construction with optional LALR(1) state merging. Handles left-recursive grammars and complex language constructs.
Dynamic NFA transitions in clex and symbol-ID indexed action/goto tables in cparse remove fixed slot limits and speed parse-time lookups.
Lexer and parser operations return typed status codes. Structured errors include exact position, expected token(s), offending lexeme, and detailed LR conflict diagnostics during parser generation.
Get structured parse trees with symbol values, matched tokens for terminals, and child node vectors for easy traversal.
A tiny, battle-tested lexer generator for C. Feed it regular expressions, get tokens back.
* + ?CLEX_MAX_RULES)clexError)| Function | Description |
|---|---|
| clexInit() | Allocate and return a new lexer |
| clexRegisterKind() | Register a regex pattern for a token kind (returns clexStatus) |
| clexReset() | Point lexer at a new input string |
| clex() | Lex the next token into an out-parameter (returns clexStatus) |
| clexGetLastError() | Retrieve structured lexer error details |
| clexDeleteKinds() | Clear all rules for reuse |
| clexLexerDestroy() | Free all lexer resources |
(ab) Groupinga|b Alternation[a-z] Character classes[A-Z] Rangesa* Zero or morea+ One or morea? Optional\( Escape sequences#include "clex.h" #include <stdio.h> #include <stdlib.h> typedef enum TokenKind { INT, OPARAN, CPARAN, IDENTIFIER, CONSTANT, SEMICOL } TokenKind; int main() { clexLexer *lexer = clexInit(); clexRegisterKind(lexer, "int", INT); clexRegisterKind(lexer, "\\(", OPARAN); clexRegisterKind(lexer, "\\)", CPARAN); clexRegisterKind(lexer, "[1-9][0-9]*", CONSTANT); clexRegisterKind(lexer, ";", SEMICOL); clexRegisterKind(lexer, "[a-zA-Z_]([a-zA-Z_]|[0-9])*", IDENTIFIER); clexReset(lexer, "int main()"); clexToken tok; clexTokenInit(&tok); while (1) { clexStatus st = clex(lexer, &tok); if (st == CLEX_STATUS_EOF) break; if (st != CLEX_STATUS_OK) { const clexError *err = clexGetLastError(lexer); fprintf(stderr, "lexical error at %zu:%zu near '%s'\n", err->position.line, err->position.column, err->offending_lexeme ? err->offending_lexeme : ""); break; } printf("kind=%d lexeme='%s' @ %zu:%zu\n", tok.kind, tok.lexeme, tok.span.start.line, tok.span.start.column); } clexTokenClear(&tok); clexLexerDestroy(lexer); }
An LR(1) and LALR(1) parser generator for C. Define grammars in plain text, get parse trees out.
libcparse.a static library build| Function | Description |
|---|---|
| cparseGrammar() | Parse a grammar string into internal representation |
| cparseCreateLR1Parser() | Build an LR(1) parser from a grammar and token-name map (array + count) |
| cparseCreateLALR1Parser() | Build an LALR(1) parser (merged states) from a token-name map (array + count) |
| cparseAccept() | Validate input (returns cparseStatus) |
| cparse() | Parse input into an out-parameter parse tree (returns cparseStatus) |
| cparseGetLastError() | Retrieve parser error details: position, expected terminals, offending lexeme |
| cparseFreeParseTree() | Recursively release a parse tree |
| cparseFreeParser() | Release parser state |
| cparseFreeGrammar() | Release grammar data structures |
NonTerminal -> symbol1 symbol2 | altepsilon for empty productions# are comments| to specify alternative productions# Arithmetic expression grammar Expr -> Term ExprTail ExprTail -> PLUS Term ExprTail | epsilon Term -> Factor TermTail TermTail -> STAR Factor TermTail | epsilon Factor -> NUMBER | LPAREN Expr RPAREN
typedef struct ParseTreeNode { char *value; /* grammar symbol */ clexToken token; /* matched token (term) */ clexSourceSpan span; /* node source range */ PtrVec children; /* ParseTreeNode* */ } ParseTreeNode;
Expr ├─ Term │ └─ Factor │ └─ NUMBER "8" ├─ ExprTail │ ├─ PLUS "+" │ ├─ Term │ │ ├─ Factor │ │ │ └─ NUMBER "5" │ │ └─ TermTail │ │ ├─ STAR "*" │ │ └─ Factor │ │ └─ NUMBER "2" │ └─ ExprTail │ └─ epsilon
#include "cparse.h" #include "clex/clex.h" #include <stdio.h> int main(void) { /* ── Step 1: Set up the lexer ─────────────────────────────── */ clexLexer *lexer = clexInit(); clexRegisterKind(lexer, "[0-9]+", 0); /* NUMBER */ clexRegisterKind(lexer, "\\+", 1); /* PLUS */ clexRegisterKind(lexer, "\\*", 2); /* STAR */ clexRegisterKind(lexer, "\\(", 3); /* LPAREN */ clexRegisterKind(lexer, "\\)", 4); /* RPAREN */ /* ── Step 2: Define the grammar ──────────────────────────── */ const char *grammar_src = "Expr -> Term ExprTail\n" "ExprTail -> PLUS Term ExprTail | epsilon\n" "Term -> Factor TermTail\n" "TermTail -> STAR Factor TermTail | epsilon\n" "Factor -> NUMBER | LPAREN Expr RPAREN"; Grammar *grammar = cparseGrammar(grammar_src); /* ── Step 3: Build the parser ───────────────────────────── */ const char *names[] = {"NUMBER", "PLUS", "STAR", "LPAREN", "RPAREN"}; LALR1Parser *parser = cparseCreateLALR1Parser( grammar, lexer, names, sizeof(names) / sizeof(names[0])); /* ── Step 4: Parse input ───────────────────────────────── */ const char *input = "8 + 5 * 2"; if (cparseAccept(parser, input) == CPARSE_STATUS_OK) { ParseTreeNode *tree = NULL; if (cparse(parser, input, &tree) == CPARSE_STATUS_OK) { /* ... traverse or inspect the parse tree ... */ } cparseFreeParseTree(tree); } else { const cparseError *err = cparseGetLastError(parser); /* err->position, err->expected_tokens, err->offending_lexeme */ } /* ── Cleanup ────────────────────────────────────────────── */ cparseFreeParser(parser); cparseFreeGrammar(grammar); clexLexerDestroy(lexer); }
cparse bundles clex as a git submodule.
Pull in the clex lexer dependency.
Builds libcparse.a and runs the test suite.
Build and run the expression parser demo.
If you only need the lexer, you can use clex on its own.
# Clone clex standalone git clone https://github.com/h2337/clex.git cd clex # Run the test suite make test-all # Build for library use make lib # Or compile directly gcc your_app.c fa.c clex.c -o your_app
# After building, link the static library gcc your_parser.c -L. -lcparse clex/clex.o clex/fa.o -o your_parser # Or embed the sources directly gcc your_parser.c grammar.c lr1_lalr1.c util.c \ clex/clex.c clex/fa.c -o your_parser