in Courses

Parsing Algorithms

Available coupons:

Parsing or syntactic analysis is one of the first stages in designing and implementing a compiler. A well-designed syntax of your programming language is a big motivation why users would prefer and choose exactly your language.

Follow the Hacker news thread for details.


The problem with “parsers theory” in classic compiler schools and books is that this theory is often considered as “too advanced”, going right into complicated formal descriptions from the Theory of Computation and formal grammars. As a result students may lose an interest in building a compiler already at parsing stage.

The opposite problem often seen in describing a parser is a superficial approach describing only manual (usually recursive descent) parsing, leaving the students with issues understanding the actual techniques behind the automated parsers.


I believe this deep dive into the parsing theory should be combined together with a hands-on approach, which goes in parallel and allows seeing all the learned theoretical material on practice.

In the Essentials of Parsing (aka Parsing Algorithms) class we dive into different aspects of the parsing theory, describing in detail the LL and LR parsers. However at the same time to make the learning process and understanding easy and fun, we build in parallel an automatic parser for a full programming language, similar to JavaScript or Python, from scratch.

After this class not only you will be able to use a parser generator to build parsers for programming languages, but will also understand how the parser generators work under the hood themselves.

Implementing a parser for a programing language would also make your practical usage of other programming languages more professional.

You can watch preview lectures, and also enroll to the full course, describing LL and LR parsing algorithms and covering implementation of an automatic parser from scratch, in animated and live-annotated format. See details below what is in the course.

– or –

This class is for any curious engineer, who would like to gain skills of building complex systems (and building a parser for a programing language is a pretty advanced engineering task!), and obtain a transferable knowledge for building such systems.

If you are interested specifically in compilers, interpreters, and source code transformation tools, then this class is also for you.

The only pre-requisite for this class is basic data structures and algorithms: trees, lists, traversal.

If you took or going to take class on Building an Interpreter from scratch, the parsers class can be a syntax frontend for the interpreter built in that class.

Since we build a language very similar in semantics to JavaScript or Python (the two most popular programming languages today) we use specifically JavaScript — its elegant multi-paradigm structure which combines functional programming, class-based, and prototype-based OOP fits ideal for that.

Many engineers are familiar with JavaScript so it should be easier to start coding right away. To generate the automated parser we use Syntax tool which is a language-agnostic parser generator, and supports plugins for Python, Ruby, C#, PHP, Java, Rust, etc. That is, the implementation of this parser can easily be transferred to any other language of your choice and taste.

Note: we want our students to actually follow, understand and implement every detail of the parser themselves, instead of just copy-pasting from final solution. The full source code for the language is available in video lectures, showing and guiding how to structure specific modules.

The main features of these lectures are:

  • Concise and straight to the point. Each lecture is self-sufficient, concise, and describes information directly related to the topic, not distracting on unrelated materials or talks.
  • Animated presentation combined with live-editing notes. This makes understanding of the topics easier, and shows how (and when at time) the object structures are connected. Static slides simply don’t work for a complex content.
  • Live coding session end-to-end with assignments. The full source code, starting from scratch, and up to the very end is presented in video lectures of the class

The course is divided into four parts, in total of 22 lectures, and many sub-topics in each lecture. Below is the table of contents and curriculum.

In this part we describe different parsing pipelines, talk about formal grammars, derivations, what is ambiguous and unambitious grammar, and start building our programming language.

  • Lecture 1: Formal grammars, context-free
    • Course overview
    • Parsing pipeline
    • Tokenizer module
    • Parser module
    • AST: Abstract syntax tree
    • Hand-written vs. Automatic parsers
    • Recursive descent
    • LL and LR parsing
    • Formal grammars
    • Terminal, non-terminals and productions
    • Chomsky grammar hierarchy
    • Context-free grammars

  • Lecture 2: Grammar derivations
    • BNF notation (Backus-Naur form)
    • RegExp notation
    • Tokenizer and Parser
    • Derivation process
    • Left-most and Right-most derivations
    • Parse trees
    • Depth-first traversal — string on leaves
    • Ambiguous grammars

  • Lecture 3: Ambiguous grammars
    • Ambiguous grammars
    • Left- and Right-associativity
    • Operator precedence
    • Left recursion
    • Non-terminal levels
    • Unambiguous grammars

  • Lecture 4: Syntax tool | Letter
    • Syntax tool introduction
    • Parser generators
    • BNF grammar
    • Associativity and precedence
    • LALR(1) parsing mode
    • Letter programming language

  • Lecture 5: Abstract Syntax Trees
    • CST: Concrete Syntax Tree (aka Parse Tree)
    • AST: Abstract Syntax Tree
    • Semantic actions
    • Inline interpreter for simple DSLs
    • AST nodes generation
    • Parenthesized expression

In this part we talk in detail about Top-down parsing, describing manual recursive and backtracking parser, and also dive into the LL(1) parsing algorithm.

  • Lecture 6: Backtracking parser
    • Top-down parsers
    • Bottom-up parsers aka Shift-Reduce
    • Left recursion
    • Recursive descent parser
    • Backtracking parser
    • Left factoring

  • Lecture 7: Left-recursion and Left-factoring
    • Common prefix rules
    • Why backtracking is slow
    • Left-factoring
    • Left-recursion
    • Indirect recursion

  • Lecture 8: Predictive Recursive descent parser
    • Predictive parsing
    • Concept of Lookahead tokens
    • Recursive descent parser
    • First & Follow sets

  • Lecture 9: LL(1) parsing: First & Follow sets
    • Predictive parsing
    • Concept of Lookahead tokens
    • Structure of the LL(1) parser
    • First set calculation
    • Follow set calculation
    • Syntax tool example

  • Lecture 10: Construction of LL(1) parsing table
    • Lookahead tokens
    • Structure of LL(1) parser
    • First & Follow sets
    • LL(1) parsing table
    • Predict set
    • LL(1) conflicts

  • Lecture 11: LL(1) parsing algorithm
    • Structure of LL(1) parser
    • LL(1) parsing table
    • Parsing stack (Push-down automata)
    • Left-most derivation
    • Abstract algorithm of the LL(1) parser

In this part we describe Bottom-up parsers and LR parsing algorithm. In parallel we continue building our programming language, analyzing shift-reduce conflicts and fixing them.

  • Lecture 12: Back to practice: Statements | Blocks
    • Module include: reusing helper functions
    • AST formats: explicit and S-expression format
    • Statements and Statement lists
    • Program: main entry point
    • Blocks: groups of statements

  • Lecture 13: Function Declarations
    • Optional statements
    • Empty blocks
    • Function declarations
    • If-else statements
    • Shift-reduce conflict example

  • Lecture 14: LR parsing: Canonical Collection of LR-items
    • LR-parser structure
    • Canonical Collection of LR-items
    • LR-items
    • Closure & Goto operations
    • DFA: Deterministic Finite Automata (State Machine)
    • PDA: Push-Down Automata

  • Lecture 15: LR parsing table: LR(0) and SLR(1)
    • LR parsing table
    • Action & Goto
    • LR(0) parsing mode
    • SLR(1) parsing mode
    • Shift/Reduce conflicts
    • Reduce/Reduce conflicts

  • Lecture 16: CLR(1) and LALR(1) parsing tables
    • LR(0) vs. LR(1) items
    • Lookahead sets
    • CLR(1) parsing table
    • LALR(1) parsing table
    • Comparison of parsers: LL and LR

  • Lecture 17: LR(1) parsing algorithm
    • LR(1) parsing process
    • Shift-reduce algorithm
    • RHS handles
    • Parsing stack
    • Abstract LR algorithm description

The final part of the course is completely practical, we’re finishing our Letter programming language, building variables, functions, cycles, control structures, object-oriented programming, and the final parser.

  • Lecture 18: Control structures: If-statement
    • If-statement
    • Shift-Reduce conflict
    • Relational expression
    • Equality expression
    • Boolean literals

  • Lecture 19: Variables | Assignment
    • Logical AND expression
    • Logical OR expression
    • Assignment expression
    • Chained assignment
    • Variable declaration

  • Lecture 20: Function calls | Unary expression
    • Call expression and Function calls
    • Unary expression
    • Chained calls
    • Arguments list

  • Lecture 21: Member expression | Iteration
    • Member expression
    • Property access
    • Array indicies
    • String literals
    • Iteration statement
    • While, Do, For loops

  • Lecture 22: OOP | Final parser
    • Object-oriented programming
    • Class declaration
    • Super calls
    • New expression
    • Parser generation
    • Final executable

I hope you’ll enjoy the class, and will be glad to discuss any questions and suggestion in comments.

Write a Comment

Comment

  1. finally! thank you, sir
    i took your interpreters class and been waiting for this one

  2. @srinavas: thanks for your feedback, glad you liked the Interpreters and hope you’ll enjoy the Parsers class.

    @Sam: yes, coming soon, likely next week.