Essentials of interpretation. Checkpoint: part 1

This article describes in some details our interpreter which we have created during the course Essentials of interpretation. We summarize intermediate results and the main parts of the evaluator making notes which were omitted in the code articles.

Those of you who actively followed the code articles on GitHub will find this summary as a good repetition of the passed material which will help to improve the knowledge and understanding. Others, who just have joined the topic, may first read this article and already after that analyze the code articles in detail (though, the vice-versa variant is also possible).

During this stage of the course I was glad to see several forks of the project’s repository and to analyze interesting implementations and solutions for the exercises. Besides, I received some questions and alternative solutions via e-mail.

OK, so let’s summarize our small language. It’s already turned out powerful enough since supports things such as first-class functions and closures, thus making it easy to use closure pattern for OOP.

We all know the eval function in different languages which evaluates the arbitrary code passed to it. Exactly the processes of evaluation (that is, retrieving the value of an expression) eval‘s short name reflects. We used the full name of this core process and call the function as evaluate.

The main task of the evaluate is (not surprisingly) to evaluate a passed expression. That is, to obtain the value which corresponds to the result of the expression.

In the most top level, the expression is the whole program itself. However, while we go recursively deeper into the program parts, we evaluate its sub-expressions which are functions, declarations of variables, conditional expressions, etc.

And in essence evaluate is a very simple and primitive function. It just determines the type of an expression and according to this type executes corresponding handler procedure which knows how to evaluate this particular expression. That’s it. Seriously, there is no any other “magic” in interpretation of a programming language.

The abstract signature of this core function is:

function evaluate(expression, environment) {
  
  if (expression is of "Foo" type)
    execute Foo type handler
  
  else if (expression is of "Bar" type)
    then execute Bar type handler
  
  // etc.
  
}

The environment parameter passes the environment (which stores variables and functions) in which the expression is evaluated.

Let’s see the real definition of our evaluate function (not just "Foo" and "Bar" abstract examples) from the lesson 7:

function evaluate(expression, environment) {

  environment || (environment = TheGlobalEnvironment);

  if (isSelfEvaluating(expression))
    return expression;

  if (isVariable(expression))
    return lookupVariableValue(expression, environment);

  if (isDefinition(expression))
    return evalDefinition(expression, environment);

  if (isAssignment(expression))
    return evalAssignment(expression, environment);

  if (isLambda(expression))
    return makeFunction(
      getLambdaParameters(expression),
      getLambdaBody(expression),
      environment
    );

  if (isBlock(expression))
    return evalSequence(
      getBlockActions(expression),
      environment
    );

  if (isIf(expression))
    return evalIf(expression, environment);

  if (isCond(expression))
    return evaluate(
      condToIf(expression),
      environment
    );

  if (isApplication(expression))
    return apply(
      evaluate(getOperator(expression), environment),
      getArgumentValues(getArguments(expression), environment)
    );

}

In this part we’ll talk only about first four expression types: self-evaluating, variables look up and also variables definition and assignment. We leave functions and other expressions for the next part.

Our evaluator (or interpreter — these are synonyms in this case) works with so-called AST (Abstract syntax tree) format.

AST format is usually simplified and convenient to recursively traverse the AST via the evaluator. It also helps us not to bother with some syntactic complexities of a concrete syntax. The concrete syntax though is processed by parsers which produce as their result exactly the AST. We considered example of a parser in lessons 2 and 3.

As we said the process of concrete syntax parsing in general isn’t related with the process of interpretation (the parsing is just a previous stage of it; we may have any concrete syntax which can be transformed into our AST), so in the course we work exactly with interpretation of AST and not parsers.

We have chosen a very simple expressions format and use JavaScript arrays for it. The first element of this array is exactly the expression type which is called the type-tag and other elements are already depended on this type:

[<type-tag>, ... other fields ...]

For example:

["define", "x", 10]

The expression above is the variable definition. Its type-tag (define) directly says about it and we also see the name of the variable — x and its value, 10.

We have a common type-tag tester function which is used by all highly-abstracted predicates. It’s called isTaggedList and accepts the type-tag with the expression and checks whether the expression is of the passed type:

function isTaggedList(tag, expression) {
  return expression[0] == tag;
}

The function is again very simple and straightforward but exactly it is used during the whole cycle of evaluation, and exactly it helps to route the evaluator to the needed handler procedure.

Below we describe main expression types in our language and their related helper procedures (such as highly-abstracted type-tag testers, expression chunks splitters, etc).

This is, for example, how a higher abstracted predicate of an expression type checking may look like:

function isDefinition(expression) {
  return isTaggedList("define", expression);
}

The function above as it follows from its name checks whether an expression is the variable definition. A logical question may follow — why do we need at all these wrappers? Why not just simply to test something like this:

if (expression[0] == "define") {
  // execute variable definition handler
}

The answer is also simple: abstracting such checks into separate functions, we give ourselves the ability not to make commitment about again the exact format of the AST. If later, we will want to change this exact form of some expression, we have to change only related with it procedures. Moreover, abstracted predicates allow us to process some expressions individually, e.g. having alternative type-tags:

if (isDefinition(expression)) {
  // execute variable definition handler
}

function isDefinition(expression) {
  return isTaggedList("define", expression) || isTaggedList("var", expression);
}

So, having armed with this information, let’s see which expressions we have in our language.

This is the simplest type of expressions. Self-evaluating expressions do not require additional evaluation, but directly return their values as a result of evaluation.

In the language we have only two types of self-evaluating expressions: numbers and strings.

Thus, strings in contrast with other languages are quoted with only one single quote (it’s quite easy to do, since expression parts are separated being elements of the expression array and quoted with double quotes at JavaScript level). And numbers are the same as in other languages.

Examples:

10
'hello-world

Their predicate is very simple:

function isSelfEvaluating(expression) {
  return !isNaN(expression) ||
    (typeof expression == "string" && expression[0] == "'");
}

// examples

isSelfEvaluating(10); // true
isSelfEvaluating("'hello-world"); // true

And the eval‘s part related with this type of expression:

// the simplest expression type,
// doesn't require additional evaluation

if (isSelfEvaluating(expression))
  return expression;

OK, let’s move forward. Variables.

Variables are represented by their names and values. Names are specially formed identifiers and values can be different: numbers, strings, user-defined function, built-in functions, etc.

Variables are stored in so-called environments. Every peace of evaluation is proceeded in some environment. And the eval must look up variables in the passed environment to find their values.

Let’s see on the predicate function:

function isVariable(expression) {
  return /^([a-zA-Z0-9_$+*/\-?!=><])+$/.test(expression);
}

The function checks whether an expression is the name of a variable. Examples: x, null?, +, etc.

Any value can be assigned (bound) and reassigned (rebound) to the name. This is why association of a variable name and its value is called a binding.

We represent environments as simple JavaScript objects, so the look up and assignment are done in very simple manner — checking whether a property corresponding to the variable name exists in the object and if so — we simply get its value.

There is a special global environment which is stored in the TheGlobalEnvironment object and which exists in the single exemplar and is available prior the program execution. The global environment is prepopulated with some built-in bindings, among which e.g. math-functions, such as addition, multiplication, print function, etc.

var TheGlobalEnvironment = {

  "print": function (/*arguments*/) {
    return console.log.apply(console, arguments);
  },

  "+": function (/*arguments*/) {
    return [].reduce.call(arguments, function(a, b) {
      return a + b;
    });
  },

  "null?": function (a) {
    return a === null;
  },

  "VERSION": 0.1,

  "null": null,
  "true": true,
  "false": false
  
  // etc. -- any built-in binding
  
};

The lookup procedure is also simple enough:

function lookupVariableValue(variable, environment) {
  if (variable in environment) {
    return environment[variable];
  }
  throw ReferenceError('"' + variable + '" is not defined');
}

And the part of the eval related with variables lookup accordingly is:

if (isVariable(expression))
  return lookupVariableValue(variable, environment);

Examples:

isVariable("print"); // true
lookupVariableValue("print", TheGlobalEnvironment); // "print" built-in function

isVariable("x"); // true
lookupVariableValue("x", TheGlobalEnvironment); // Error: "x" is not defined

As example above shows, sometimes a variable may not be defined. This leads us to the ability of variable definition, that is extending environments with new bindings at runtime.

The simplest definition is the definition of a variable. The expression (AST node) has the format:

["define", <name>, <value>]

From evaluate procedure, it’s dispatched with this code:

if (isDefinition(expression))
  return evalDefinition(expression, environment);

Its predicate is:

function isDefinition(expression) {
  return isTaggedList("define", expression);
}

And its handler procedure, evalDefinition must recursively call evaluate to compute the value to be associated with the variable. The environment must be modified to change (or create) the binding of the variable:

function evalDefinition(expression, environment) {
  return defineVariable(
    getDefinitionName(expression),
    evaluate(getDefinitionValue(expression), environment),
    environment
  );
}

The main part which creates the needed binding in the environment is the defineVariable function and is quite simple again:

function defineVariable(variable, value, environment) {
  return environment[variable] = value;
}

Since the definition is already a complex expression, we have separator procedures which get specific parts of an expression. Thus, in this particular case we have getDefinitionName and getDefinitionValue getter functions.

The former stratifies the name of the variable in this case:

function getDefinitionName(expression) {
  if (isVariable(expression[1]))
    return expression[1];

  // else name of a function
  return expression[1][0];

}

The functions is a little bit complicated, since we use define for both: definition of simple variables and also for definitions of functions. The signature of a variable definition was already shown above:

["define", <variable-name>, <variable-value>]

And for functions it has the form:

["define", [<funсtion-name>, ... arguments ...], <funсtion-body>]

So getDefinitionName considers both of them. Examples:

// variable definition expression
var varDefExp = ["define", "x", 10];
getDefinitionName(varDefExp); // "x"

// function definition expression
var funcDefExp = ["define", ["square", "x"], ["*", "x", "x"]];
getDefinitionName(funcDefExp); // "square"

Its companion, getDefinitionValue function, separates the “raw” (i.e. without evaluation yet) value of the definition:

function getDefinitionValue(expression) {
  // a simple variable
  if (isVariable(expression[1]))
    return expression[2];

  // else, it's the definition of a function,
  var parameters = expression[1].slice(1);
  var body = expression.slice(2);
  return makeLambda(parameters, body);
}

It again takes into account both: variables and functions, and has separated logic for this. We don’t touch getting values of user-defined functions in this part, so we skip this makeLambda part for now, and test the function with only simple variables.

Notice, in evalDefinition above we wrap the result of getDefinitionValue into the recursive evaluate again — to obtain the real value from the “raw” definition value. It can be seen on the example of assigning to a variable the value of another variable:

var yVarDefExp = ["define", "y", 10];

// defines in the global environment the
// variable "y" with the value 10
evalDefinition(yVarDefExp, TheGlobalEnvironment);

// and now let's show what happens inside "evalDefinition";
// define variable "x" with the value of "y"

var xVarDefExp = ["define", "x", "y"];

var xName = getDefinitionName(xVarDefExp); // "x"
var xRawValue = getDefinitionValue(xVarDefExp); // "y"
var xRealValue = evaluate(xRawValue, TheGlobalEnvironment); // 10

defineVariable(xName, xRealValue, TheGlobalEnvironment);

So the evaluate in the example above receiving the y expression recognizes it as a variable name and returns its value looking up in the passed environment.

Besides the definition, we may also assign a new value to the variable. For this we have assignment expression.

The part of the assignment in the eval looks like:

if (isAssignment(expression))
  return evalAssignment(expression, environment);

Its predicate part is trivial again:

function isAssignment(expression) {
  return isTaggedList("set!", expression);
}

Notice the exclamation mark in the name of the type-tag. In general, we use variable names ending with ! for functions which modifies the original data. Assignment to a variable — set!, does exactly this. Also we use “question mark” names, such as e.g. null? for predicates, that is for functions which return boolean value.

The set! node has similar to define format, but it’s easier, since shouldn’t consider complex case with function definition — it just assign a passed value to the variable:

["set!", <variable-name>, <variable-value>]

The handler procedure of assignment is similar to definition handler:

function evalAssignment(expression, environment) {
  return setVariableValue(
    getAssignmentName(expression),
    evaluate(getAssignmentValue(expression), environment),
    environment
  );
}

Thus its getters are also the same:

// gets the name of variable from
// the assignment expression
function getAssignmentName(expression) {
  return expression[1];
}

// gets the "raw" value (without evaluation yet)
// of assignment expression
function getAssignmentValue(expression) {
  return expression[2];
}

Taking x and y from variable definition example above, we see the following work of the getters:

var assignExp = ["set!", "x", "y"];
var assignExpName = getAssignmentName(assignExp); // "x"
var assignRawValue = getAssignmentValue(assignExp); // "y"

// and evaluate again already gets real value
evaluate(assignRawValue, TheGlobalEnvironment); // 10

And inside the evalAssignment function these stratified parts are already passed to the setVariableValue function which does the main job:

function setVariableValue(variable, value, environment) {

  if (!(variable in environment))
    throw ReferenceError('"' + variable + '" is not defined');

  do {
    // if found the frame, exit
    if (environment.hasOwnProperty(variable)) break;
    // else continue to search in parent environments
  } while (environment = Object.getPrototypeOf(environment));

  return environment[variable] = value;

}

Let’s consider it in some details since it’s a bit complicated.

First, we again check whether the variable is defined in the environment, and if it’s not, then we throw the exception about it.

Secondly, in contrast with definition, assignment may modify some parent environment (whereas the definition always creates a binding in the own environment). To see a simple concrete example, think of a (possibly inner) JavaScript function:

// global "x"
var x = 10;

function foo() {
  var y = 20;
  // update global "x"
  x = 100;
}

To be able to resolve variable x, the function foo should have access to the parent, global environment. And exactly x from the global environment should be updated, but not the local x should be created.

The same is in our interpreter. Later we’ll describe functions in detail, but for now just mention, that every function, when is called creates new fresh environment (a new scope) to bind its arguments to parameter names and also to bind local variables.

Thus, this fresh activation environment extends the parent environment. If it’s a global function, then its activation environment extends the global environment. If it’s an inner function, then it extends the parent frame — that is, the activation environment of the parent function.

Therefore functions form something that we may call a scope chain. These are chained environment frames where every frame corresponds to its scope.

To extend parent environment frame, we use native JavaScript inheritance. Function Object.create helps us with it.

So if a function updates a variable from some parent frame (in this case such variables are called “free variables“), we should first find the parent frame where the variable is actually defined. This is for we use do-while loop above in the setVariableValue function.

Let’s manually describe this case:

// global variable "x"
defineVariable("x", 10, TheGlobalEnvironment);

// create inner environment which inherits
// from the global environment
var newEnvironment = Object.create(TheGlobalEnvironment);

// and define a local variable "y" on it
defineVariable("y", 20, newEnvironment);

// thus, assignment to "y" updates the local frame
setVariableValue("y", 200, newEnvironment);
console.log(newEnvironment); // {y: 200}

// and assignment to "x" evaluated in the same
// "newEnvironment" modifies the global frame
// since "x" is found exactly there
setVariableValue("x", 100, newEnvironment);
console.log(newEnvironment); // still {y: 200}
console.log(TheGlobalEnvironment); // {x: 100, ...}

The manual algorithm shown above is applied behind the scene at every function activation, which we will consider in the next part of this “checkpoint”.

At this step we discussed the first part of the evaluate (or eval) function. We have seen that eval is a simple case-analysis function which dispatches to the needed expression handlers depending on the expression type.

We considered several expression types: self-evaluating (the simples expressions), variables loop up, and also definition of or assignment to variables. The later three are related with the concept of environments. Thus, assignment also works with the concept of nested environments or, another naming, the scope chains.

In the next part we’ll talk about functions and other expressions (such as if expression, etc) in our language.

Lectures:

Essentials of interpretation:

Written by: Dmitry Soshnikov
Published on: 2011-11-21

Tags: , , , , , ,

 
 
 

8 Comments:

  1. Gravatar of Sandy Sandy
    21. November 2011 at 19:08

    Excellent!!


  2. Gravatar of Robert Polovsky Robert Polovsky
    21. November 2011 at 20:30

    Dmitry, thanks for this summary! I follow all your work and this series is very interesting.

    On your format of AST nodes: I’ve seen that usually a separate class for each AST-node is used with own eval method which knows how to evaluate the result. Do you think such approach is easier for representation?


  3. Gravatar of Dmitry Soshnikov Dmitry Soshnikov
    22. November 2011 at 11:28

    @Sandy, thanks!

    @Robert Polovsky, glad to hear, thanks.

    I’ve seen that usually a separate class for each AST-node is used with own eval method which knows how to evaluate the result.

    Yes, such mostly OOP approach is also very convenient in this case.

    You correctly notice that for each AST-node we may have a separate class with its eval method which already is responsible for exactly its peace of evaluation.

    For example:

    // variable expression class
    
    function Variable(name) {
      this.name = name;
    }
    
    Variable.prototype.eval = function (environment) {
      return this.lookupValue(environment);
    };
    
    Variable.prototype.lookupValue = function (environment) {
      if (this.name in environment) {
        return environment[this.name];
      }
      throw ReferenceError('"' + this.name + '" is not defined');
    };
    
    

    Then our AST could be built from objects of such classes. And main evaluate function when traverses the AST just calls eval on each node in the tree.

    In fact, approach shown in the article does the same, though mostly in procedural way — a separate handler function such as lookupVariableValue (which is lookupValue method in the OOP way).

    So both approaches are OK; the later, with OOP and own eval is near to the visitor pattern.


  4. Gravatar of Alexander Alexander
    22. November 2011 at 21:13

    Great job, Dmitry! Thank you.
    I follow your work also since you’ve posted some articles on javascript-ru

    Is it possible to see this article on Russian language? I couldn’t find it.


  5. Gravatar of Dmitry Soshnikov Dmitry Soshnikov
    23. November 2011 at 12:25

    @Alexander, thanks. Yes, perhaps Russian translation of this article (as well as of the articles from ES5 series) will be available soon also.


  6. Gravatar of Joe Soliko Joe Soliko
    27. November 2011 at 18:40

    Hi Dmitry!

    Great series, thanks! I just started to follow the sources on github and have some questions.

    You show how to evaluate a single expression. But how do we evaluate a whole program? Could you show how a simple program with definition of variables and using functions may look like in your language?

    What is the analog of the following JS program?

    var
        x = 10,
        y = 20;
    
    function square(z) {
        return z * z;
    }
    
    if (square(x) == 100) {
        console.log(x);
    } else {
        console.log(y);
    }

    Should we sequentially run expression by expression to get the result of the program (I specially used square function since it appears in this article)?

    Thanks!


  7. Gravatar of Dmitry Soshnikov Dmitry Soshnikov
    28. November 2011 at 19:11

    @Joe Soliko

    Should we sequentially run expression by expression to get the result of the program

    Yes, absolutely correct. We have to sequentially execute every expression (which possibly affects the environment).

    What is the analog of the following JS program?

    I guess you already have checked some later lessons — since you mention functions, etc.

    (By the way, in those later lessons we also introduce begin expression which is exactly this sequence of expressions)

    You may treat the program as a sequence of expressions and to evaluate each expression to get the result of the program. So the analogue of your JS program in our AST view is:

    ["define", "x", 10]
    
    ["define", "y", 20]
    
    ["define", ["square", "z"],
               ["*", "z", "z"]]
    
    ["if", ["=", ["square", "x"], 100]
           ["print", "x"]
           ["print", "y"]]

    To simplify the tests we also use s-expression format instead of direct AST. The program above directly maps to the s-expressions (the parser of which we discuss in note-2)

    [lisp](define x 10)

    (define y 20)

    (define (square z)
    (* z z))

    (if (= (square x) 100)
    (print x)
    (print y))[/lisp]

    Moreover, having lambda we may even directly translate such JS constructions as:

    (function (x) {
      return x * x;
    })(10);

    Which in our language is (s-expression view):

    [lisp]((lambda (x) (* x x)) 10)[/lisp]

    Lambdas are described in the lesson-6.


  8. Gravatar of Joe Soliko Joe Soliko
    29. November 2011 at 01:00

    Perfect! Thanks for clarifications Dmitry! And immediate lambdas are awesome — you have almost the same semantics as in JavaScript, can’t wait for the next explanations!


Leave a Reply

Code: For code you can use tags [js], [text], [ruby] and other.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>