in Courses

Essentials of Garbage Collectors

Memory leaks and dangling pointers are the main issues of the manual memory management. You delete a parent node in a linked list, forgetting to delete all its children first — and your memory is leaking. You delete an object chain in correct order — but suddenly your program crashes since you forgot about second owner of this resource, which now tries to dereference a null-pointer.

To avoid these issues, most of the modern high-level programming languages implement automatic memory management. You allocate objects manually, however don’t bother with their deallocation: a special program, garbage collector, knows how to automatically deallocate them correctly, and reclaim for future reuse.

In the Essentials of Garbage Collectors class we study all different techniques and algorithms related to the automatic memory management, which are used today on practice.

See also: Automata Theory. Building a RegExp machine class devoted to state machines, and regular expressions.

You can watch preview lectures, and also enroll to the full course, covering all the aspects of the memory management in the animated and live-annotated format. See details below what is in the course.

This class is on Christmas promotion now for a limited time. Use coupon link for a great discount.

First of all, for compiler engineers.

In implementing your programming language, there is a very high chance you’ll need to implement a garbage collector. Even languages which initially were positioned as “memory-safe”, such as Rust, eventually implemented automatic reference counting (ARC) and other collectors.

To reiterate: in most of the modern high-level programming languages, a garbage collector module (or multiple GC modules, like in Java) is pretty much a requirement today.

If you are not a compiler engineer in your day-to-day work — then this class can still be interesting for you. Implementing a garbage collector or a memory manager in general, is a pretty advanced engineering task.

And it’s a simple trick: you take some complex project (such as a garbage collector, compiler, interpreter, etc), and implement it from scratch in a language you’re familiar with. And while building it, you learn all different data structures and algorithms. And then come back to “every-day programming”, improved as a better engineer, with the transferable generic knowledge of complex systems.

In general, GC algorithms are the abstract concept, which can be applied on other, higher-level, data-structures (and not only on lower-level heap manipulations). For example, one can automatically clean up some collection or storage with specific postponed objects deletion. So, this knowledge is fully transferable to other systems.

Not really! Of course, C and C++ are probably the best languages for raw memory manipulations and fit well here, however in the course we study generic design algorithms and focus mainly on theoretical aspects of garbage collectors and memory allocators. This means you can implement them in any language you want. For example, you can allocate an ArrayBuffer in JavaScript for a virtual heap, or similarly bytearray in Python, Rust, etc.

Most of the algorithms in the course are described in generic pseudo-code, so you can port them to any language.

The main features of these lectures are:

  • Concise and straight to the point. Each lecture is self-sufficient, concise, and describes information directly related to the topic, not distracting on unrelated materials or talks.
  • Animated presentation combined with live-editing notes. This makes understanding of the topics easier, and shows how (and when at time) the object structures are connected. Static slides simply don’t work for a complex content.

The course is divided into three parts, in total of 17 lectures, and many sub-topics in each lecture. Below is the table of contents and curriculum.

In this part we discuss the history of memory allocation, see how and when the objects are created, and how they become a garbage. We talk about specific issues, and types of the garbage, discuss in detail memory layout. In addition we implement a memory allocator, similar to malloc function.

  • Lecture 1: Allocation types

    • Static, Stack, Heap allocations
    • Compile vs. Runtime allocation
    • Fixed size vs. Dynamic size allocation
    • Recursion
    • Stack frame
    • Stack overflow
    • Memory leaks, dangling pointers
  • Lecture 2: Manual memory management

    • Memory leaks: forget to deallocate
    • Dangling pointers: deallocate too early
    • Human factor in memory management
  • Lecture 3: Object header

    • Memory alignment
    • Object header
    • Collector meta-information
  • Lecture 4: Virtual memory and Memory layout

    • Virtual address space
    • Kernel space vs. User space
    • Code segment, data segment
    • Stack and stack pointer
    • Heap and break pointer
    • Memory mapping
    • Memory management unit (MMU)
    • Translation lookaside buffer (TLB)
  • Lecture 5: Mutator, Allocator, Collector

    • Life cycle of a GC’ed program
    • Mutator: user program
    • Allocator: memory allocation
    • Collector: garbage collector
    • Mutator vs. Collector object graph view
    • GC pause
    • “Stop the World” state
    • System calls to invoke GC from Mutator
  • Lecture 6: Allocators: Free-list vs. Sequential

    • Sequential (aka “Bump”) allocator
    • Free-list allocator
    • First fit, Next-fit, Best-fit searches
    • Segregated list
    • Blocks coalescing and splitting
    • Lab session to implement malloc and free
  • Lecture 7: Semantic vs. Syntactic garbage

    • Semantic: objects that will not be reached
    • Syntactic: objects that cannot be reached

This part is devoted to actual garbage collection algorithms. We discuss tracing and direct collectors, moving to detailed description of the main four collectors: Mark-Sweep, Mark-Compact, Reference counting, and Copying collector.

  • Lecture 8: Tracing vs. Direct collectors

    • Tracing: trace for alive objects
    • Direct: directly identify the garbage
    • Main four collection algorithms
  • Lecture 9: Mark-Sweep collector

    • Mark phase: objects trace
    • Sweep phase: reclamation
    • Detailed algorithm description with code example
    • Non-moving collector
    • Free-list allocation
    • Heap fragmentation
  • Lecture 10: Mark-Compact collector

    • Mark phase: objects trace
    • Compact phase: objects relocation
    • Fixing fragmentation issue
    • Moving collector
    • Forwarding address
    • Lisp 2 algorithm detailed description
    • Sequential allocator
  • Lecture 11: Copying collector

    • Semi-space collector
    • Heap reservation
    • Forwarding address
    • Objects evacuation
    • Fixing child pointers
    • Bump-allocator
    • Detailed algorithm description with code example
  • Lecture 12: Reference counting collector

    • Directly identify the garbage
    • Strong vs. Weak references
    • Deeply integrated into Mutator
    • Barrier pointer operations
    • Pauses on deep nested structures removal
    • Cyclic references
    • Detailed algorithm description with code example

In the advanced part we talk about optimization techniques to reduce GC pauses, and improve GC performance. We discuss Generational, Parallel, Incremental, and Concurrent collectors, describing Tri-color marking and GC barriers.

  • Lecture 13: Generational collector

    • Heap partitioning
    • Young vs. Old generations
    • Minor vs. Major collection cycles
    • Objects promotion
    • Intergenerational pointers
    • Write barrier
  • New! Lecture 14: Mark-Region CG: Immix collector

    • Mark-Region collector
    • Mark-Sweep + Copying + Heap partitioning
    • Immix collector
    • Heap segregation: Blocks and Lines
    • Free, Recyclable, Occupied blocks
    • Bump-allocation in recyclable blocks
    • Sweep-to-region technique
    • Heap defragmentation
    • Evacuation from Source to Destination blocks
  • Lecture 15: Parallel, Incremental, Concurrent GC

    • “Stop the World” state
    • Multiple collection threads
    • Collector interruption by Mutator
    • Concurrently running collection thread
    • Synchronization barriers
  • Lecture 16: Tri-color abstraction

    • Black, Gray, and White objects
    • Collection time slices
    • Illegal BW-pointers
    • Legal BW-pointers
    • GC barriers
  • Lecture 17: GC barriers

    • Illegal BW-pointers
    • Read and Write barriers
    • Snapshot-at-Beginning
    • Incremental update (Dijkstra and Steel’s)
    • Baker’s Copying method
    • Brook’s method
    • Final notes

I hope you’ll enjoy the class, and will be glad to discuss any questions and suggestion in comments.

Write a Comment

Comment

  1. Awesome!

    Have been tracking this class since you published the first lecture. Question: are you planning to create reading materials for this class?

    Thanks for the great content!

  2. watched your lecture on Generational GC (on youtube) – clarified my mind around garbage collectors in Java.

    thank you, your website is a great source of information

  3. nice work,sir, thank you
    this site is a gem for advanced content, especially on javascript

  4. Thanks a lot! Have purchased it. Hope to provide feedback soon.

  5. @Jeremy Chen, thanks for the feedback, and yes, there will be an accompanying book for this class.

    @Alex Levine, xian_bao, thanks guys, glad to see the materials are useful.

    @Srikanth Suresh, awesome, thanks! Look forward for your feedback on the course page.

  6. Just saw your post on HN. I am happy to buy the course but I don’t like video format? Is there a ebook or a transcript that I can buy? if you can give your email, I can directly send you payment.

  7. @Ram, thanks for the feedback. Currently all the lectures have full captions/transcripts available, which I manually edited, and verified for correctness, so they directly reflect the contents. In addition, I’m planning later a companion book for this course with similar 16 chapters.