in Courses

Building a Parser from scratch

Available coupons:
  • None at the moment

Course overview

Parsing or syntactic analysis is one of the first stages in designing and implementing a compiler. A well-designed syntax of your programming language is a big motivation why users would prefer and choose exactly your language.

Note: this is a practical class on building a manual Recursive-descent parser. If you’re interested in parsing theory and automated algorithms you may also consider the [ Parsing Algorithms ] class.

Recursive descent parsers are the group of parsers which are widely used on practice in many production programming languages. In contrast with automated parsing algorithms, the manual implementation allows having full control over the parsing process, and handling complex constructs, which may not be possible in the automatic parsers.

Besides, implementing a full manual parser from scratch allows understanding and seeing this process from inside, demystifying internal structures, and turning building parsers into an interesting engineering task.

In the Building a Parser from scratch class we dive into pure practical implementation, building and learning different aspects of parsers.

In this class you will learn concept of Recursive descent parsing, understand what is Tokenizer and how it cooperates with Parser module, learn what is Abstract Syntax Tree (AST), and how to have different formats of these ASTs, what is “lookahead” and the predictive parsing, and eventually build a parser for a full programming language, similar to Java or JavaScript.

Implementing a parser would also make your practical usage of other programming languages more professional.

How to?

You can watch preview lectures, and also enroll to the full course, describing implementation of a Recursive descent parser from scratch, in animated and live-annotated format. See details below what is in the course.

Available coupons:
  • None at the moment
Who this class is for?

This class is for any curious engineer, who would like to gain skills of building complex systems (and building a parser for a programing language is a pretty advanced engineering task!), and obtain a transferable knowledge for building such systems.

If you are interested specifically in compilers, interpreters, and source code transformation tools, then this class is also for you.

The pre-requisites for this class are the basic data structures and algorithms: trees, lists, traversal, and regular expressions.

If you took or going to take class on Building an Interpreter from scratch, the parsers class can be a syntax frontend for the interpreter built in that class.

What is used for implementation?

Since we build a language very similar in syntax to JavaScript or Java we use specifically JavaScript — its elegant multi-paradigm structure which combines functional programming, class-based, and prototype-based OOP fits ideal for that.

Many engineers are familiar with JavaScript so it should be easier to start coding right away. However we do not use very JS-specific constructs, so the implementation of the parser can easily be transferred to any other language of your choice.

Note: we want our students to actually follow, understand and implement every detail of the parser themselves, instead of just copy-pasting from final solution. The full source code for the language is available in video lectures, showing and guiding how to structure specific modules.

What’s specific in this class?

The main features of these lectures are:

  • Concise and straight to the point. Each lecture is self-sufficient, concise, and describes information directly related to the topic, not distracting on unrelated materials or talks.
  • Animated presentation combined with live-editing notes. This makes understanding of the topics easier, and shows how (and when at time) the object structures are connected. Static slides simply don’t work for a complex content.
  • Live coding session end-to-end with assignments. The full source code, starting from scratch, and up to the very end is presented in video lectures of the class
What is in the course?

The course is divided into four parts, in total of 18 lectures, and many sub-topics in each lecture. Below is the table of contents and curriculum.

Part 1: Basic expressions and Tokenizer

In this part we describe basic expressions, such as Numbers and Strings, and also build the Tokenizer modules, operating with regular expressions.

  • Lecture 1: Tokenizer | Parser
  • [ Watch now → ]
    • Course overview and agenda
    • Parsing pipeline
    • Tokenizer module (Lexical analysis)
    • Parser module (Syntactic analysis)
    • Abstract Syntax Tree (AST)
    • Regular expression notation
    • Backus-Naur form (BNF) notation
    • Grammars and productions
    • Hand-written and Automatic parsers
    • Syntax: language-agnostic parser generator
    • The Letter programming language
    • Numeric literals

  • Lecture 2: Numbers | Strings
  • [ Watch now → ]
    • Tokenizer module (Lexical analysis)
    • Number and String tokens
    • Program AST node
    • Lookahead
    • Numeric literals
    • String literals
    • Finite state machine

  • Lecture 3: From State Machines to Regular Expressions
  • [ Watch now → ]
    • Tokenizers as Finite state machines
    • Regular Expressions notation
    • Tokenizer spec
    • Generic getNextToken
    • Single-line and Multi-line comments
Part 2: Program structure

In this part we talk about program structures, such as statements and statement lists, blocks and recursive production rules. In addition we discuss different AST formats and start building more complex expressions.

Part 3: Control flow and Functions

In this part we implement variables, assignment, work with operator precedence, and introduce function abstraction. In addition we define control structures such as If-statement and iteration loops.

  • Lecture 8: Assignment Expression
    • Identifiers: variable names
    • Chained assignment
    • Left-hand side expression

  • Lecture 9: Variable Statement
    • Variable statement
    • Keyword tokens
    • Variable declarations
    • Name and optional Initializer

  • Lecture 10: If-Statement
    • Control flow
    • If-else statement
    • Consequent and Alternate parts
    • Relational expression

  • Lecture 11: Equality | Logical
    • Equality expression
    • Logical AND expression
    • Logical OR expression
    • Boolean literals
    • Null literal
  • Lecture 12: Unary Expression
    • Unary expression
    • Logical NOT operator
    • Minus operator
    • Single operand

  • Lecture 13: Iteration Statement
    • Control flow
    • Iteration Statement
    • While loop
    • Do-while loop
    • For-cycle
    • Inline variable declaration

  • Lecture 14: Function Declaration
    • Function declaration
    • Return statement
    • Formal parameters
    • Function body
    • Optional return
Part 4: Object-oriented programming

The final part of the course we implement classes and objects, talk about property and array access. In addition we implement generic function and method calls, and build the final parser executable.

  • Lecture 15: Member Expression
    • Member Expression
    • Property access
    • Objects and Properties
    • Computed vs. Non-computed properties
    • Chained objects
    • Assigning to object properties

  • Lecture 16: Call Expression
    • Call Expression
    • Function calls
    • Method calls
    • Call | Arguments
    • Chained calls

  • Lecture 17: OOP | Classes
    • Object-oriented programming
    • Class declaration
    • New expression
    • Super calls
    • Methods
  • Lecture 18: Final Executable
    • Parser CLI
    • Parsing expressions and files
    • Project overview
    • Next steps
    • Further related classes

I hope you’ll enjoy the class, and will be glad to discuss any questions and suggestion in comments.

Write a Comment



  1. Hello Mr Dmitry,
    You have done a great explanation!!! But there are only 4 videos on youtube, we cant find the other videos of these course. Can you please tell us where can we find them.

  2. Hi Dmitry,

    I checked out your course last night and I saw there was a coupon for half price. When I went to purchase it this afternoon I can see that the coupon was no longer there, can you confirm if I just stumbled upon a different link or has the offer expired.


  3. Hi Grant Tapp, thanks for the interest. Yeah, unfortunately Udemy’s coupons are all redeemed at the moment. I’ll try having more coupons next month.

  4. Hi Dmitry.
    I’ve watched all the videos on YouTube and your series hyped me a lot !
    I see that there’s the following videos on paid platforms like Udemy, but I’m only in high school, and I cannot afford the formation.
    Do you have some other free references or something like that for me to continue the project or not ?

    Best regards,

  5. Hi RedsTom, thanks for the feedback, glad you liked the courses. Yes, we do support students in high school, and periodically publish special coupons for this. You can find them on the main page of this site.

  6. Hello Dmitry, I’m interested in two of your courses. Is it possible to get an discount on Udemy?

  7. Hi Christian – thanks for the interested, and yes – periodically we have coupons appear for the courses, including on Udemy. Please monitor the main page, usually the discounts are there.

  8. Hello,
    I was looking to purchase some of your courses on Udemy and wondering if you plan to put them on sale? Most of Udemy have their courses on salen now but yours are not, could you let me know if you plan to have a sale?

  9. @chris – thanks for the interest, and yes – we have coupons available right now. You can always find them in the top right corner.

  10. Hello,
    I have completed RDP course and I will implement many concept presented into a compiler I’m currently working on. I use C programming language for that. But I have a very hard time with POSIX regular expressions about comments (multi-line and single line comments). Apparently PCRE behave differently with the same Regex. On another subject, it would have been interesting to add to the Tokenizer a way of knowing which line the token comes from. All the compilers I know of are able to mention which line of code a faulty token comes from.

  11. Guillaume – great call, different versions of regexp may have different implementations, so you have to accomodate for those specifics. In addition, if you want to capture line numbers, which is a requirement in a production programming language, just stripping out comments might be problematic, since you need to explicitly count for \n chars.