Building a Transpiler from scratch

Promo coupons:

$19.00 on Teachable
$19.99 on Udemy

Course overview

In recent compilers implementation, it became popular to translate from one high-level language to another high-level language. Examples might be desugaring new version of JavaScript to an older version of JavaScript, or languages like TypeScript, etc. Such high-level compilers are known today as transpilers, and what is the topic of our course.

It proved to be very practical, since we can fully rely on an existing runtime such as a JavaScript engine, avoid dealing with lower-level constructs like memory, bytecode instructions, etc.

If you would like to explore the world of compilers and implement your own programming language, starting from the high-level compilation is a good hands-on practical approach, and which we apply in this class.

Often related books on compilers go to theoretical aspects, stucking in parsing, regular grammars, etc – not explaining how actually to build a practical transpiler. I believe we should be able to build a transpiler for a full programming language, end-to-end, in 2-4 hours — with a content going straight to the point, showed in live coding sessions as pair-programming and described in a comprehensible way.

In the Building a Transpiler from scratch class we focus on compiling a high-level concurrent programming language with message-passing processes, down to JavaScript. That is, in addition to the deep dive on the transpiling, you will get a better understanding of how concurrent functional languages, such as Erlang, or even an operating system itself with its processes and threads, work today.

Implementing a transpiler will also increase your engineering level, as it touches several aspects of data structures and algorithms.

Continue reading →

Programming Language with LLVM

Promo coupons:

$19.00 on Teachable

Course overview

How programming languages work under the hood? What’s the difference between compiler and interpreter? What is a virtual machine, and JIT-compiler? And what about the difference between functional and imperative programming?

There are so many questions when it comes to implementing a programming language!

The problem with “compiler classes” in school is such classes are usually presented as some “hardcore rocket science” which is only for advanced engineers.

Moreover, classic compiler books start from the least significant topic, such as Lexical analysis, going straight down to the theoretical aspects of formal grammars. And by the time of implementing the first Tokenizer module, students simply lose an interest to the topic, not having a chance to actually start implementing a programing language itself. And all this is spread to a whole semester of messing with tokenizers and BNF grammars, without understanding an actual semantics of programming languages.

I believe we should be able to build and understand a full programming language semantics, end-to-end, in 4-6 hours — with a content going straight to the point, showed in live coding sessions as pair-programming and described in a comprehensible way.

In the Programming Language with LLVM class we focus on compiling our language to LLVM IR, and build a lower-level programming language. Working closely with the LLVM compiler infrastructure level you will understand how lower-level compilation and production-level languages, such as C++, Rust, etc work today.

Implementing a programing language would also make your practical level in other programming languages more professional.

Continue reading →

Building a Typechecker from scratch

Promo coupons:

$19.00 on Teachable
$19.99 on Udemy

Course overview

Untyped programs are often prone to errors, runtime exceptions, and can make debugging much harder. That’s why many production languages implement a static typechecker — an extra module, which is aimed to increase programs safety and make development simpler.

Type checking or type inference? What is Type theory and Type judgements? Is my language weakly or strongly typed? And how am I actually going to implement a typechecker?

There are so many questions when it comes to implementing this module. If you’ve been asking those questions in implementing your programming language, or just want to understand how typeckechers work under the hood, on a hands-on practical implementation, this course is for you.

Often related books on Type theory and type judgements go to theoretical aspects viewing types as mathematical sets, not explaining how actually to build a practical typechecker. I believe we should be able to build and understand a typechecker for a full programming language, end-to-end, in 2-4 hours — with a content going straight to the point, showed in live coding sessions as pair-programming and described in a comprehensible way.

In the Building a Typechecker from scratch class we focus specifically on a static typechecker, and build a similar to TypeScript, Java, etc. We slightly touch Type theory and already since the first lecture go into the practical implementation.

Implementing a typechecker would also increase your engineering level, as it touches several aspects of data structures and algorithms.

Continue reading →

Compiler Engineer Path

Promo coupons:

$199.00 on Teachable

Building an Interpreter from scratch
Building a Transpiler from scratch
Building a Parser from scratch
Building a Typechecker from scratch
Building a Virtual Machine from scratch
Programming Language with LLVM
Parsing Algorithms
Garbage Collection Algorithms
Automata Theory: inside a RegExp machine
Recommended literature

Many books on engineering a compiler usually start from the topic of Lexical Analysis, going deeper into formalism of Automata Theory. Having stuck in formal state machines semantics, students may lose interest in building programming languages, attributing them to “compiler engineers” only. I believe it is always good to start with a language runtime, understanding a bigger picture of its semantics, and only after that to go down and deeper into theoretical aspects.

In this note we cover the courses from the Programming Languages Design and Implementation series, and tell in which order to take them for the best practical outcomes.

Courses in this series are divided into “… from scratch”, and also “… Theory” and “… Algorithms” parts. We recommend starting from the practical “from scratch” classes, and go to the theory after that.

Note: you may also find all courses combined in the single Programming Languages: Ultimate – 3rd Edition bundle.

Building an Interpreter from scratch

In this course we start our journey into the interesting world of programming languages, talking about different parsing and transformation pipelines, VMs, and eventually implement an AST interpreter, from scratch.

Promo coupons:

$19.00 $59.00 on Teachable

To skip the parsing stage altogether and focus specifically on the runtime (that is, the actual semantics of a programming language), we use S-expression format for the lightweight syntax. We implement the interpreter in one of the most popular programming languages, JavaScript — and the semantics of our language — Eva — is very inspired by the JavaScript itself, Python, and other dynamic PLs.

By the end of this course students learn how to build own programming language from scratch at higher (AST) level. You will learn and understand how closures work, what is an activation record and lexical environment, how to call a function, and add support for OOP.

You may find more info and enroll into the course on the class page.

Building a Transpiler from scratch

Having implemented the interpreter runtime manually, it’s worth exploring high-level compilation process, covered by transpilers.

Promo coupons:

$19.00 $59.00 on Teachable
$19.99 $59.99 on Udemy

In this class we build a concurrent, process-based programming language, by compiling it down to JavaScript.

By the end of this class students learn how the translation pipeline works at higher (AST) level. You will learn and understand how code generations works, understand semantics of concurrent processes, and compile your language to JS.

You may find more info and enroll into the course on the class page.

Building a Parser from scratch

Once you have a fully working runtime of a programming language, it is a good time to think about ergonomic syntax.

Yes, often syntax and runtime of a language may deeply intersect and depend on each other, however for the same runtime you may have multiple syntaxes. This also allows working with concept of syntactic sugar, i.e. inventing ergonomic syntactic constructs which delegate to the same runtime semantics.

Promo coupons:

$19.00 $59.00 on Teachable

In this second course we build a manual Recursive-descent parser, also from scratch. You will learn how to construct Abstract Syntax Tree (AST) nodes, about Tokenizer and Parsing process, and also see different AST formats. As an implementation we also use JavaScript.

Again, you may find more info and enroll into the course on the class page.

In addition, the Interpreter and the Parser classes may be taken as the Programming Language Bundle.

Building a Typechecker from scratch

On top of your dynamic programming language, you may consider adding a static typechecker module.

Untyped programs are often prone to errors, and may fail at runtime. Having an extra type checker may improve robustness of your programs, improve documentation and ergonomics.

Promo coupons:

$19.00 $59.00 on Teachable
$19.99 $59.99 on Udemy

In this course we build a Typechecker for the Eva programming language, also from scratch. You will learn about type systems and Type theory, see the difference between static and dynamic type checking, and get familiar with checking and inference algorithms. As an implementation we use JavaScript.

You may find more info and enroll into the course on the class page.

Building a Virtual Machine from scratch

By this moment you should already have a fully working programming language with ergonomic syntax, and understand how the languages work at higher level. It is time now to go to production-level VMs, improving performance, and understanding low-level semantics.

Promo coupons:

$19.00 $64.00 on Teachable

In this course we build a Stack-based Virtual Machine, from scratch. Basic C++ experience and the Interpreters course are required as prerequisites for this class. However, we do not use too specific C++ constructs, and the code should be easily transferable to other languages.

Students will learn concepts of bytecode, compilation process, and lower-level interpretation. Concepts of stack- and heap-allocated values, and also Garbage Collection are considered. As a result we implement a VM similar in semantics to Python, JavaScript and other languages – with full support for functional programming, closures, and OOP.

You may find more info and enroll into the course on the class page.

In addition, the runtime classes can be taken as the Interpreters and Virtual Machines bundle.

Programming Language with LLVM

After implementing a custom low-level VM, it is worth considering using a production-level compiler infrastructure, LLVM.

Promo coupons:

$19.00 $64.00 on Teachable

In this course we compile our programming language down to LLVM IR. Basic C++ experience and the Interpreters course are required as prerequisites for this class. However, we do not use too specific C++ constructs, and the code should be easily transferable to other languages.

Students will learn concepts of LLVM frontend, IR, and lower-level compilation. Details of stack- and heap-allocated values, and also Garbage Collection are considered. As a result you will understand how Clang, Rust, and other compilers work.

You may find more info and enroll into the course on the class page.

By the end of the four practical courses you should have a fully working programming language. The next classes may be taken in any order to extend learned concepts and to deeper understand the topics.

Parsing Algorithms

This course focuses on theory of parsing process, going deeper into discussions of the automated LL and LR parsers. The class combines theory and practice.

Promo coupons:

$19.00 $59.00 on Teachable
$19.99 $59.99 on Udemy

Students will learn and understand full algorithms of LL(1), LALR(1) and other parsers, and also will implement a parser for a full programming language, using parser generator tool, Syntax. For implementation we use JavaScript.

You may find more info and enroll into the course on the class page.

Garbage Collection Algorithms

In the Virtual Machines class we implemented Mark-Sweep garbage collector. It is a good time to go deeper into GC algorithms, and learn the theory and implementation strategies behind them.

Promo coupons:

$19.00 $64.00 on Teachable
$19.99 $64.99 on Udemy

Students will learn all the major GC techniques used today on practice. The class is mainly theoretical, explaining concepts of Tracing and Direct collectors, talking about Generational, Region-based collectors, Tri-color abstraction, and other topics. Specific algorithms of Mark-Sweep, Mark-Compact, Reference Counting, Copying, Immix, and G1 collectors are considered.

The course will be interesting to any curious programmer and is also recommended for professional compiler engineers.

You can find more info and enroll into the course on the class page.

Automata Theory: inside a RegExp machine

This theoretical course is devoted to the formalism of Automata, talking about State machines used to implement concept of Regular Expressions.

Promo coupons:

$19.00 $59.00 on Teachable
$19.99 $59.99 on Udemy

The concepts of NFA (Non-deterministic Finite Automata) and DFA (Deterministic Finite Automata) are considered. The modules are also used for implementing tokenizer modules in parsing process. In addition, the topic of DFA minimization is described. Implementation skeleton is presented showing how to implement a RegExp machine.

Again, you can find more info and enroll into the course on the class page.

Recommended literature

The following classic and practical books are recommended for further reading:

“Compiler Design: Principles, Techniques and Tools” (aka “Dragon Book”), Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman
“The Garbage Collection Handbook: The Art of Automatic Memory Management”, Antony Hosking, Eliot Moss, and Richard Jones
“Parsing Techniques: A Practical Guide”, Dick Grune, Ceriel J.H. Jacobs
“Structure and Interpretation of Computer Programs (SICP)”, Harold Abelson and Gerald Jay Sussman
“Programming Languages: Application and Interpretation (PLAI)”, Shriram Krishnamurthi
“Types and Programming Languages”, Benjamin C. Pierce

I hope you find the journey into the world of compilers, virtual machines, and programming languages interesting and engaging — and these courses would be a great learning material for your own programming language, which you invent.

Thanks, and see you in the class!

Sincerely,
Dmitry Soshnikov

Building a Virtual Machine for Programming Language

Promo coupons:

$19.00 on Teachable

Course overview

There are so many questions when it comes to implementing a programming language!

The problem with “compiler classes” in school is such classes are usually presented as some “hardcore rocket science” which is only for advanced engineers.

In the Building a Virtual Machine class we focus specifically on runtime semantics, and build a stack-based VM for a programming language very similar to JavaScript or Python. Working closely with the bytecode level you will understand how lower-level interpretation works in production VMs today.

Implementing a programing language would also make your practical level in other programming languages more professional.

Continue reading →

Building a Parser from scratch

Promo coupons:

$19.00 on Teachable

Course overview

Parsing or syntactic analysis is one of the first stages in designing and implementing a compiler. A well-designed syntax of your programming language is a big motivation why users would prefer and choose exactly your language.

Note: this is a practical class on building a manual Recursive-descent parser. If you’re interested in parsing theory and automated algorithms you may also consider the [ Parsing Algorithms ] class.

Recursive descent parsers are the group of parsers which are widely used on practice in many production programming languages. In contrast with automated parsing algorithms, the manual implementation allows having full control over the parsing process, and handling complex constructs, which may not be possible in the automatic parsers.

Besides, implementing a full manual parser from scratch allows understanding and seeing this process from inside, demystifying internal structures, and turning building parsers into an interesting engineering task.

In the Building a Parser from scratch class we dive into pure practical implementation, building and learning different aspects of parsers.

In this class you will learn concept of Recursive descent parsing, understand what is Tokenizer and how it cooperates with Parser module, learn what is Abstract Syntax Tree (AST), and how to have different formats of these ASTs, what is “lookahead” and the predictive parsing, and eventually build a parser for a full programming language, similar to Java or JavaScript.

Implementing a parser would also make your practical usage of other programming languages more professional.

Continue reading →

Parsing Algorithms

Promo coupons:

$19.00 on Teachable
$19.99 on Udemy

Course overview

Note: this is a class on theory of parsers and parsing algorithms. If you’re interested in manual practical parsing class you may also consider the [ Building a Parser from scratch ] where we build a Recursive descent parser.

Follow the Hacker news thread for details.

The problem with “parsers theory” in classic compiler schools and books is that this theory is often considered as “too advanced”, going right into complicated formal descriptions from the Theory of Computation and formal grammars. As a result students may lose an interest in building a compiler already at parsing stage.

The opposite problem often seen in describing a parser is a superficial approach describing only manual (usually recursive descent) parsing, leaving the students with issues understanding the actual techniques behind the automated parsers.

I believe this deep dive into the parsing theory should be combined together with a hands-on approach, which goes in parallel and allows seeing all the learned theoretical material on practice.

In the Essentials of Parsing (aka Parsing Algorithms) class we dive into different aspects of the parsing theory, describing in detail the LL and LR parsers. However at the same time to make the learning process and understanding easy and fun, we build in parallel an automatic parser for a full programming language, similar to JavaScript or Python, from scratch.

After this class not only you will be able to use a parser generator to build parsers for programming languages, but will also understand how the parser generators work under the hood themselves.

Implementing a parser for a programing language would also make your practical usage of other programming languages more professional.

Continue reading →

Building an Interpreter from scratch

Promo coupons:

$19.00 on Teachable

Course overview

There are so many questions when it comes to implementing a programming language!

The problem with “compiler classes” in school is they usually are presented as some “hardcore rocket science” which is only for advanced engineers.

Moreover, classic compiler books start from the least significant topic, such as Lexical analysis, going right away deep down to the theoretical aspects of formal grammars. And by the time of implementing a first Tokenizer module, students simply lose an interest to the topic, not having a chance to actually start implementing a programing language itself. And all this is spread to a whole semester of messing with tokenizers and BNF grammars, without understanding an actual semantics of programming languages.

In the Essentials of Interpretations class we focus specifically on runtime semantics, and build an interpreter for a programming language very similar to JavaScript or Python.

Implementing a programing language would also make your practical usage level of other programming languages more professional.

Continue reading →

Automata Theory: inside a RegExp machine

Course overview

Promo coupons:

$19.00 on Teachable
$19.99 on Udemy

State machines — the fundamental concept used today in many practical applications, starting from UI programming in React, automated reply systems, lexical analysis in parsers and formal language theory — i.e. the RegExp machines, — and up to real life use cases, such as simple traffic lights, vending machines, and others.

The state machines are backed by the larger theoretical field of computer science known as Theory of Computation, and also by its direct theoretical model — the Automata Theory.

In this class we study the Automata Theory on the practical example of implementing a Regular Expressions machine.

See also: Essentials of Garbage Collectors class devoted to automatic memory management.

Continue reading →

Garbage Collection Algorithms

Promo coupons:

$19.00 on Teachable
$19.99 on Udemy

Course overview

Memory leaks and dangling pointers are the main issues of the manual memory management. You delete a parent node in a linked list, forgetting to delete all its children first — and your memory is leaking. You delete an object chain in correct order — but suddenly your program crashes since you forgot about second owner of this resource, which now tries to dereference a null-pointer.

Continue reading →