Note 4. Two words about “hoisting”.

My recent playground was the toy Scheme interpreter written in CoffeeScript. In the chapter of the metacircular evaluator of the SICP book, there is a section — 4.1.6 Internal Definitions suggesting to solve one interesting and subtle issue related with the (inner) definitions of functions.

That section of the SICP book points out that the following example:

(define (f x)
 
  ; first inner function
  (define (even? n)
    (if (= n 0)
        true
        (odd? (- n 1))))
 
  ; second inner function     
  (define (odd? n)
    (if (= n 0)
        false
        (even? (- n 1)))))

will work only if we call even? or odd? function after their both definitions, since both functions are dependent on each other. This is the main rule — a function should be defined before its usage. However we can’t call even? right after its definition even if it’s already defined. And the reason is — the mentioned dependency.

At the same time, our perception is that the scope of both functions is the whole body of function f (as if both inner functions are defined simultaneously). From this viewpoint, we should be able to call a function from any place of the f‘s body.

However, if these two definitions are not simultaneous but sequential (that is, the one follows another), then the odd? will be placed into the scope object only at the time it’s defined at runtime evaluation. And in this case we cannot call even? function right after its definition, i.e. before the definition of the odd? function. That is, we get the “issue” that “not from any place a function can be called in its scope”.

One of the techniques to solve this issue is to create all internal definitions before the execution of the code. It of course can be done manually and explicitly by the programmer, but also this case can be handled implicitly by the system. And exactly this model is used in ECMAScript — that known for us the entering the context stage at which variable object (or the variable environment in ES5) is filled with function declarations, variables and function arguments. After the all data of the execution context are ready, the next stage — code execution can be applied.

Let’s rewrite the example in JS to make it easier to understand for those who not familiar with Lisp syntax (actually we may not use the function f, but define isEven and isOdd in the global context):

function isEven(n) {
  if (n == 0) {
    return true;
  }
  return isOdd(n - 1);
}

alert(isEven(2)); // true

function isOdd(n) {
  if (n == 0) {
    return false;
  }
  return isEven(n - 1);
}

We won’t discuss now whether it’s a good solution to test for parity/odd (this example is mostly academical), but we see that it correctly alerts true for testing whether value 2 is even. Thus, repeat again, isEven is dependent on the function isOdd, which in turn is dependent back on the isEven. This sort of dependency is called a mutual recursion. And exactly this is one of the reasons of creating all the data “on entering the context”.

In order not to repeat each time this long “there are two stages of the code handling: (1) entering the context, where all the data are created, and (2) the code execution stage”, often the term “hoisting” is used to describe this behavior (though, it’s not a standard term). The fact that a function is already available at the moment of the code execution brings the picture, where the function looks like it was hoisted to the top of the context’s code.

The example:

foo(); // undefined ("foo" and "bar" exist)

function foo() {
  alert(bar);
}

var bar;

visually if to accept the concept of hoisting transforms into:

var bar;

function foo() {
  alert(bar);
}

foo(); // undefined (now we see that they exist)

Another “hoisting” reason is an optimization.

Consider e.g. the following example:

var n = 100;

while (n-- > 0) {
  var foo = n;
}

Obviously, in the system with “hoisting” all var‘s are again lifted up to the top and therefore there is no variable definition on every iteration of the while loop:

var n = 100;
var foo;

while (n-- > 0) {
  foo = n;
}

However, this case can be optimized even in the system without hoisting. E.g. in the same toy Scheme interpreter mentioned above — on evaluating a variable definition, there can be a check, whether the variable already exists in the environment, and if it is so, we do not create the variable, but just update its value. Though, if it had been the system with “hoisting”, even this check wouldn’t be needed.

So the main rule that a function (or a variable) should be defined before its usage — in the system with hoisting is done implicitly and in the system without hoisting — should be done explicitly by the programmer.

On the other hand, mutual recursion case is not so frequent and probably it’s not enough to “satisfy” the bugs which are related with “hoisting”. Because of these bugs some implementations try to avoid the “hoisting” in the language completely.

For example:

var a = 10;

(function () {
  alert(a); // undefined ?
  var a = 20;
})();

This code results undefined which may confuse some non-experienced programmers. One could thought that the alert‘s result should be 10, which seems quite logical from the “top-bottom” code viewing. But it’s not 10, since we do have “hoisting” here. However, it’s not even 20, since only the definition of the variable is “hoisted”. And the assignment, as we know, is already made at the code execution stage. This code can be transformed into:

var a = 10;

(function () {

  // shadow global "a" and
  // initialize it with (default)
  // "undefined" value

  var a;
 
  alert(a); // undefined, now we see why
  a = 20;

})();

There is another strategy of the complete lexical (static) scope where the variable a from above would be taken from the global scope and no local a is created. This strategy is used e.g. in CoffeeScript:

a = 10

foo = ->
  alert([a, b]); # 10, undefined
  a = 20
  b = 30

foo()

This Coffee’s code compiles to the following JS code (you may check it in Try CoffeeScript console):

var a, foo; // definitions are "hoisted" explicitly

a = 10;

foo = function () {
  var b; // only "b" variable is local
 
  alert([a, b]); // 10, undefined
 
  a = 20; // and a is from global scope
  return b = 30;

};

foo();

And we see that a is always taken (only if it’s not a formal parameter name) from the global scope. It can be arguable whether such a strategy is also very convenient — after all we should remember names of all already declared variables in the parent scopes (although it doesn’t touch the variable from the other scripts, since the whole complied code is wrapped into the IIFE). But such a strategy is chosen — and mainly because there is no special keyword for declaring a variable; it’s done automatically via assignment.

Backing to hoisting — we see that CoffeeScript (CS) avoids hoisting of function declarations. Functions there are compiled into the function expressions (with assigning an anonymous function to a variable) which are created at code execution stage (in other words, functions are not “hoisted”). However, vars are also hoisted and it turns out that functions are nevertheless hoisted in the Coffee, thought, not the functions themselves, but “their variables”:

alert foo # undefined, "foo" is available

foo = -> alert 1 # and here the function is assigned to the variable

foo() # 1

However, CS elegantly avoids the issue with calling of the “declared below” function. We can’t call foo before it becomes a “real” function (which though also means that we cannot call mutually recursive isEven function right after its definition). To see the problem of a hoisted FD in action, we can imagine the case with two separate JS-sources and with using of some concatenation tool to make a single file from these two (it’s quite a normal practice in the web — a single obfuscated file). Thus, in the first file (let’s call it a.js) we have the following function declaration and its usage:

// a.js

function foo() {
  alert(1);
}

foo(); // 2 ?

Why does it alerts 2? We tested the code before and saw that it alerted 1. But it was before the concatenation with the b.js:

// b.js

function foo() {
  alert(2);
}

The issue is that the last found function declaration is hoisted to the top of the whole (currently concatenated) source. Of course here the naming conflicts issue also takes place (and if a programmer combines several files, he should consider it), however if we would used function expressions, we wouldn’t get this error:

// a.js

var foo = function () {
  alert(1);
};

foo(); // 1

// b.js

var foo = function () {
  alert(2);
};

This is the reason why we often see (e.g. in Node.js sources and in CoffeeScript’s compilation) the usage of FEs instead of FDs. Although, ubiquitous such a usage of FEs may also look a bit strange. FEs nevertheless ideologically are for exactly using functions as expressions, e.g. passing as an anonymous “funarg” to another higher-order function, not polluting the outer scope with the named declaration and creating these functions literally at runtime. If you work in your own project in which code you are completely sure, you may easily use FDs to define simple helper functions (because ideologically FDs purpose is exactly this — they are casual helper procedures). However, the described above issue can take place, so it’s completely in yours, as a programmer, responsibility which way to choose and which way is more convenient.

So what do we have at the end? Is “hoisting” a good or bad feature? There is no exact answer. The main thing is that we, as programmers, should be aware about such a technique and understand the reasons of why it can be useful. Two main reasons of why this technique can be needed we have described here. These are: mutual recursion and optimization.

By the way, recently I raised this topic on Twitter and also mentioned as one of the reasons the mutual recursion. Brendan Eich also gave an acknowledgment that FDs hoisting is “for mutual recursion & generally to avoid painful bottom-up ML-like order”.

So you may choose any way of building your application, just know that such an ability (and the feature) currently is in ECMAScript and know how to handle it correctly.

P.S.:

Yeah, about that example with mutual recursion of isEven and isOdd. Don’t use such an approach in current ECMAScript. As I said, the example was only academical. The thing is that currently ES doesn’t support proper tail calls (or tail recursion).

It’s the way, when the recursive process (that is, the process which completely uses the call stack) is transformed into the iterative process (that is, the recursion is transformed into a simple loop without allocating a new call-stack frame, i.e. closer to ES — without entering a new execution context).

The call:

isEven(1000000); // error: "to much recursion"

of the function from above will fail with the error “to much recursion” since, repeat, there is no tail calls in JS yet (though, there are some special code transformations to implement such calls).

However, ES6 (aka Harmony) should have this optimization. But in non-strict mode of ES5 (or ES3) it was hard to implement tail calls.

Currently, the simplest, straightforward and practical way of the isEven and isOdd is:

function isEven(n) {
  return n % 2 == 0;
}

function isOdd(n) {
  return !isEven(n); // or just "n % 2 != 0"
}


Written by: Dmitry Soshnikov
Published on: 2011-02-07

Tags: , , , , ,

 
 
 

9 Comments:

  1. Gravatar of Ilya Ilya
    7. February 2011 at 22:04

    In Issues and Avoiding hoisting paragraphs you’ve wrote that hoisting like

    var a = 10;
     
    (function () {
      alert(a); // undefined ?
      var a = 20;
    })();
    

    can be avoided with code like

    var a, foo; // definitions are "hoisted" explicitly
     
    a = 10;
     
    foo = function () {
      var b; // only "b" variable is local
     
      alert([a, b]); // 10, undefined
     
      a = 20; // and a is from global scope
      return b = 30;
     
    };
     
    foo();
    

    But it seems that these examples have one big distinction – in the second case a variable is not local inside the function. In other words it is the same if I’ve wrote

    var a = 10;
    
    (function(){
        alert(a);
    
        // var a = 20; // no local a-variable
    })();
    

    So, it seems that the main thing to avoid unwanted hoisting is to try to avoid any “hidden”, “buried” variables declarations.
    The best way is to declare all variables in the top the scope.


  2. Gravatar of Ron Marco Ron Marco
    7. February 2011 at 22:57

    Thank you, Dmitry!

    Your articles are just priceless!

    Ron


  3. Gravatar of joseanpg joseanpg
    7. February 2011 at 23:17

    As always … a great analysis!


  4. Gravatar of Dmitry A. Soshnikov Dmitry A. Soshnikov
    8. February 2011 at 12:54

    @Ilya

    But it seems that these examples have one big distinction – in the second case a variable is not local inside the function

    Yes, that’s right. The example explains CoffeeScript’s behavior of the completely static scoping, that is, where the variable appears in the source code, in that scope it will be resolved. The exception of this rule in Coffee is a formal parameter — if it has the same name, it will be resolved in the local scope of the function of course.

    And this approach is shown only to describe an ad-hoc solution of the a is undefined as in the system with hoisting.

    Actually, CoffeeScript (being compilted into JavaScript) also has a hoisting of variables. And this can be seen on the same variable b of the example.

    So, it seems that the main thing to avoid unwanted hoisting is to try to avoid any “hidden”, “buried” variables declarations.

    The best way is to declare all variables in the top the scope.

    I wouldn’t say that all variable should be declared in the global scope — this is exactly the bad practice.

    As also said, hoisting is not bad by itself (since have sensible reasons). Backing to Coffee’s completely static scoping, as I mentioned, it also can be arguable whether it’s a very convenient way — we should remember names of all variables from parent scopes — for to not override them from local scopes.

    So the main goal is to be aware about hoisting feature in the language. And moreover, there is a simple and straightforward way to avoid it explicitly — watch out for your code and declare vars before their usage. That’s it.

    @Ron Marco, @joseanpg

    Yep, thanks guys!

    Dmitry.


  5. Gravatar of Ilya Ilya
    8. February 2011 at 16:07

    @Dmitry

    Sure, I’m completely agree with you – it’s a bad idea to declare all variables in the global scope. When I’ve wrote The best way is to declare all variables in the top the scope., I’ve just ment the top of the local scope.

    Anyway, I just can’t see the practical application of hoisting. May be the reason is that I’m using ECMAscript (javascript, actionscript) mostly for web-animation…

    Nevertheless, thanks a lot for this article. The main thing I’ve learned was the place-dependent (if I can call it so) nature of the ECMA.

    BR,
    Ilya


  6. Gravatar of Dmitry A. Soshnikov Dmitry A. Soshnikov
    8. February 2011 at 16:19

    @Ilya

    I’ve just ment the top of the local scope.

    Ah, I see. “One-var” pattern. Yes, it’s useful in ECMAScript and helpful for parser.

    On the other hand, if the context’s code is too long, for e.g. loops’ counter, I prefer to declare variable “in-place”, i.e.:

    var 
        data = [],
        otherThing = 10,
        third = 3;
    
    /* many code */
    
    for (var k = data.length; k--;) {
      /* handling */
    }

    We of course may define k also at the top and reuse this variable name for all loops, but probably it’s just a habit to define it in-loop.

    Anyway, I just can’t see the practical application of hoisting.

    I’m sure you mean practical application of using vars before their definition and not the “hoisting” concept itself (i.e. defining all data before execution).

    As mentioned, it’s a good for optimization. However, yes, I also think that normal “top-bottom” viewing and declaration is better and avoids many bugs.

    Let’s the system behind the scene uses hoisting (i.e. creates/optimizes data before running) and the programmer explicitly just will put all the data before their usage.

    Dmitry.


  7. Gravatar of gniavaj gniavaj
    17. March 2011 at 18:56

    hi Dmitry~
    your articles are very greate and i have learned a lot from them.
    i recently read the Regular Expression of ECMAScript,but i did’t understand it very well.
    e.g.

    /((a)|(ab))((c)|(bc))/.exec("abc")

    i don’t understabd why the result is ["abc", "a", "a", undefined, "bc", undefined, "bc"].
    could you write an article about the Regular Expression of ECMAScript?
    i will be very greatful if you do that!


  8. Gravatar of michaelficarra michaelficarra
    21. March 2011 at 21:01

    Better isEven and isOdd functions:

    function isEven(n) {
      return !(+n&1);
    }
    
    function isOdd(n) {
      return !!(+n&1);
    }
    

    Parity checking of integers on a computer should be done using the bitwise-and operator instead of the modulo operator. Modulo is not a nearly computation-free operation like bitwise-and. Please replace your example to promote good practices.


  9. Gravatar of orldly orldly
    15. July 2012 at 20:00

    Great article!


Leave a Reply

Code: For code you can use tags [js], [text], [ruby] and other.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>