ECMA-262-3 in detail. Chapter 5. Functions.

Read this article in: Russian, Chinese.

In this article we will more in detail talk about one of general ECMAScript objects — about functions. In particular, we will consider various types of functions, will define how this or that type influences variables object of a context and what contains a scope chain of a certain type of function. We will answer frequently asked questions at forums like: “is there any difference (and if is, in what?) between functions created as follows:

var foo = function () {
  ...
};
 

from functions defined in a “habitual” view?”:

function foo() {
  ...
}
 

Or, “why in the following call function should be surrounded with parentheses?”:

(function () {
  ...
})();
 

Because these articles are dependent on earlier chapters, for full understanding of this part, if there is a necessity, it is desirable to read Chatper 2. Variable object and Chapter 4. Scope chain since we will actively use terminology from these chapters.

But let us give one after another. We begin with consideration of types of functions.

In all ECMAScript has three function types and each of them has its own features.

Function Declaration (abbreviated form is FD) is a function which:

  • has an obligatory name;
  • in the source code position is: either at the Program level or directly in the body of other function (FunctionBody);
  • is created on entering the context stage;
  • influences variable object;
  • and is declared in the following view:
function exampleFunc() {
  ...
}
 

The main feature of this type of functions is that only they influence variable object (i.e. are stored in the VO of the context). This feature defines the second important point (which is a consequence of a variable object nature) — at the code execution stage they are already available (since FD are collected to the VO on entering the context stage).

Example (function is called before its declaration in the source code position):

foo();

function foo() {
  alert('foo');
}
 

Also an important point is the second bullet from the definition — the position of a function declaration in a source code:

// function declaration
// is directly in:
// either the global context
// at Program level
function globalFD() {
  // or directly inside the body
  // of another function
  function innerFD() {}
}
 

In any other position in the code the function declaration cannot appear — i.e. it is impossible to declare it, for example, in an expression position or inside a code block.

Alternative to (and even is possible to say, in contrast with) function declarations are function expressions.

Function Expression (abbreviated form is FE) is a function which:

  • in the source code is necessarily at the expression position;
  • has an optional name;
  • does not influence variable object;
  • and is created at the code execution stage.

The main feature of this type of functions is that they are always in the expression position in the source code. The simplest example of an expression is e.g. an assignment expression:

var foo = function () {
  ...
};
 

In this case an anonymous FE is presented which is assigned to “foo” variable. After that the function is available via “foo” name — foo().

Also as it has been noted in definition points, FE can have an optional name:

var foo = function _foo() {
  ...
};

It is necessary to notice that from the outside FE is accessible via variable “foo” — foo(), while inside the function (for example, in the recursive call), it is also possible to use “_foo” name.

If FE has a name, it can be difficult to distinguish it from a FD. However, if you know the definition, this distinguishing becomes obvious and simple: FE is always in the expression position. In the following example we can see various ECMAScript expressions where related with them functions are FE:

(function foo() {}); // in parentheses (grouping operator) can be only an expression;
[function bar() {}]; // in the array initialiser – also only expressions;
1, function baz() {}; // "comma" also operates with expressions;
 

Also in the definition is said that FE is created at the code execution stage and is not stored in the variable object. Let’s see this behavior on the example:

// FE is not available neither before the definition
// (because it is created at code execution phase),

alert(foo); // "foo" is not defined

(function foo() {});

// nor after, because it is not in the VO

alert(foo);  // "foo" is not defined
 

Fair question arises then, for what they are needed at all? The answer is obvious — to use them in expressions and “not pollute” the variables object. The simplest example is passing of a function as an argument to other function:

function foo(callback) {
  callback();
}

foo(function bar() {
  alert('foo.bar');
});

foo(function baz() {
  alert('foo.baz');
});
 

In that case when some variable stores a reference to the FE, the function remains in memory and is accessible via this variable name (because variables as we know influence VO):

var foo = function () {
  alert('foo');
};

foo();
 

Another example is creation of encapsulated scope to hide auxiliary helper data from external context (in the following example we use FE which is called right after creation):

var foo = {};

(function initialize() {

  var x = 10;

  object.bar = function () {
    alert(x);
  };

})();

foo.bar(); // 10;

alert(x); // "x" is not defined
 

We see that function foo.bar (via its [[Scope]] property) has access to the internal variable “x” of function initialize. And at the same time “x” is not accessible directly from the outside. This strategy is used in many libraries to create “private” data and hide auxiliary entities. Often in this pattern the name of initializing FE is omitted:

(function () {

  // initializing scope

})();
 

Other examples of FE which are created on a condition at code execution phase and do not pollute VO:

var foo = 10;

var bar = (foo % 2 == 0
  ? function () { alert(0); }
  : function () { alert(1); }
);

bar(); // 0
 

Let’s back and answer the question mentioned in the beginning of the article — “what for in a function call right after its creation it is necessary to surround it with parentheses?”. The answer to this question follows from restrictions on the expression statement.

According to the standard, the expression statement (ExpressionStatement) cannot begin with an opening curly brace — { since then it would be indistinguishable from the block, and also the expression statement cannot begin with a function keyword since then it would be indistinguishable from the function declaration. I.e., if we define a function call right after function creation as follows:

function foo() {
  ...
}();
 

the parser will produce a parse error since cannot understand with what it deals — with a function declaration (which should be created on entering the context) or with a function expression which should be created at the code execution phase? Accordingly, the parser fairly “falls” showing an error message.

The simplest way to correct this situation is obviously to transform the function to the FE type, for example, using the grouping operator inside which always there is an expression. Thus, the parser distinguishes a code as function expression (FE) and ambiguity does not arise.

Notice, in the following example of a function call right after its creation parentheses surrounding is not required, since function is already in the expression position and parser knows that it deals with FE which should be created at code execution phase:

var foo = {

  bar: function (x) {
    return x % 2 != 0 ? 'yes' : 'no';
  }(1)

};

alert(foo.bar); // 'yes'
 

As we see foo.bar is a string but not a function as can seem at inattentive viewing of a code. Function here is used only for initialization of this property depending on conditional parameter — it is created and called right after that.

Therefore the complete answer on the question “about parentheses” is the following: parentheses are needed when function is not at the expression position and we want to call it with its creation — in that case we manually transform function to FE. In case when parser knows that it deals with the FE, parentheses are not required.

Besides surrounding parentheses it is possible to use any other way of the function transformation to FE, for example:

1, function () {
  alert('anonymous function is called');
}();
 

but parentheses are in this case the most elegant way.

By the way, the grouping operator can surround the function description as without call parentheses, and also including call parentheses, i.e. both expressions below are correct FE:

(function () {})();
(function () {}());
 

In the following example we see the code which any of implementations does not process (by the current moment) according to the specification:

if (true) {

  function foo() {
    alert(0);
  }

} else {

  function foo() {
    alert(1);
  }

}

foo(); // 1 or 0 ? test in different implementations
 

Here it is necessary to say that according to the standard this syntactic construction in general is incorrect, because as we remember, a function declaration (FD) cannot appear inside a code block (here if and else contain code blocks). As it has been said, FD can appear only in two places: at the Program level or directly inside a body of other function.

It is incorrect because the code block can contain only statements. And the only place in which function can appear in the block is one of such statements — already considered above an expression statement. But it by definition cannot begin with an opening curly brace (since it is indistinguishable from the code block) and a function keyword (since it is indistinguishable from FD).

However in section of errors processing the standard allows for implementations extensions of program syntax. And one of such extensions we see in case of functions which appear in blocks. All existing today implementations do not throw an exception in this case and process it. But every in its own way.

Presence of if-else branches means dynamics, a choice, i.e. by logic should be a function expression (FE) which is dynamically created at the code execution stage. However the majority of implementations simply creates here the function declaration (FD) on entering the context stage and uses the last declared function. I.e. function foo will show 1 even of the fact that the else branch will never be executed.

However, SpiderMonkey (and TraceMonkey) implementation treats this case in two ways: on the one hand it does not consider such functions as declarations (i.e. function is created on the condition at the code execution stage), but on the other hand they are not real function expressions since they cannot be called without surrounding parentheses (again the parse error — “indistinguishably from FD”) and they are stored in the variable object.

My opinion that SpiderMonkey acts in this case correctly, separating the own middle type of function — (FE + FD). Such functions are correctly created due the time and according to conditions, but also unlike FE, but like FD are available to a call from the outside. This syntactic extension SpiderMonkey names as Function Statement (in abbreviated form FS); this terminology is mentioned on MDC. JavaScript inventor Brendan Eich also noticed this type of functions provided by SpiderMonkey implementation.

In that case when FE has a name (named function expression, in abbreviated form NFE) one important feature arises. As we know from definition (and as we saw in the examples above) function expressions do not influence variable object of a context (that means impossibility to call them by name neither before, nor after the definition). However, FE can call itself by name in the recursive call:

(function foo(bar) {

  if (bar) {
    return;
  }

  foo(true); // "foo" name is available

})();

// but from the outside, correctly, is not

foo(); // "foo" is not defined
 

Where the name “foo” is stored? In the activation object of foo? No, since nobody has defined any “foo” name inside foo. In the parent variable object of a context which creates foo? Also not, since it follows from definition — FE does not influence the VO — what exactly we see calling foo from the outside. Where then?

And the point is the following. When the interpreter at the code execution stage meets named FE, it before creation of FE, creates auxiliary special object and adds it in front of the current scope chain. Then it creates FE itself and at this moment (as we know from the Chapter 4. Scope chain) the function gets the [[Scope]] property — the scope chain of the context which creates this function (i.e. in [[Scope]] there is that special object). After that, to the special object the unique property — the name of FE is added; value of this property is the reference to this FE. And the last action is removing that special object from the parent scope chain. Let’s see this algorithm on the pseudo-code:

specialObject = {};

Scope = specialObject + Scope;

foo = FunctionExpression;
foo.[[Scope]] = Scope;
specialObject.foo = foo; // {DontDelete}, {ReadOnly}

delete Scope[0]; // remove specialObject from the front of scope chain
 

Thus, outside the function this name is not available (since it is not present in parent scope), but special object has been saved in [[Scope]] of a function and there this name is available.

It is necessary to note however, that some implementations, for example Rhino, save this optional name not in the special object but in the activation object of the FE. Implementation from Microsoft — JScript, completely breaking FE rules, keeps this name in the parent variables object and the function becomes available outside.

Concerning implementations, some versions of SpiderMonkey have one feature related with this special object which can be treated as a bug (though all implemented according to the standard so it is more editorial defect of the specification). It is related with the mechanism of the identifier resolution: the scope chain analysis is two-dimensional and at resolving of an identifier considers as well a prototype chain of every object in the scope chain.

We can see this mechanism in action if define property in Object.prototype and reference to a “nonexistent” variable. Thus, at “x” name resolution in the following example, we reach global object, but there name “x” is also not found. However in SpiderMonkey the global object inherits from Object.prototype and, accordingly, name “x” is resolved there:

Object.prototype.x = 10;

(function () {
  alert(x); // 10
})();
 

Activation objects do not have prototypes. With the same start conditions, it is possible to see this behavior on the example with inner function. If to define a local variable “x” and to declare inner function (FD or anonymous FE) and then to reference “x” from the inner function, this variable is resolved in the parent function context (i.e. there, where it should be and is), instead of in Object.prototype:

Object.prototype.x = 10;

function foo() {

  var x = 20;

  // function declaration  

  function bar() {
    alert(x);
  }

  bar(); // 20, from AO(foo)

  // the same with anonymous FE

  (function () {
    alert(x); // 20, also from AO(foo)
  })();

}

foo();
 

Some implementations being an exception nevertheless set a prototype for activation objects. So, for example, in the Blackberry implementation value “x” from the above example is resolved to 10. I.e. do not reach activation object of foo since value is found in Object.prototype:

AO(bar FD or anonymous FE) -> no ->
AO(bar FD or anonymous FE).[[Prototype]] -> yes - 10
 

And absolutely the same situation we can see in SpiderMonkey in case of special object of a named FE. This special object (by the standard) is usual object — “as if by expression new Object()”, and accordingly it should be inherited from Object.prototype, what exactly we see in SpiderMonkey implementation (up to version 1.7). Other implementations (including new TraceMonkey) do not set a prototype for that special object:

function foo() {

  var x = 10;

  (function bar() {

    alert(x); // 20, but not 10, as don't reach AO(foo) 

    // "x" is resolved by the chain:
    // AO(bar) - no -> __specialObject(bar) -> no
    // __specialObject(bar).[[Prototype]] - yes: 20

  })();
}

Object.prototype.x = 20;

foo();
 

ECMAScript implementation from Microsoft — JScript which is built into Internet Explorer by the current moment (up to JScript 5.8 — IE8) has a number of bugs related with named function expressions (NFE). Every of these bugs completely contradicts ECMA-262-3 standard; some of them may cause serious errors.

First, JScript in this case breaks the main rule of FE that they should not be stored in the variable object by name of functions. An optional FE name which should be stored in the special object and be accessible only inside the function itself (and nowhere else) here is stored directly in the parent variable object. Moreover, named FE is treated in JScript as the function declaration (FD), i.e. is created on entering the context stage and is available before the definition in the source code:

// FE is available in the variable object
// via optional name before the
// definition like a FD
testNFE();

(function testNFE() {
  alert('testNFE');
});

// and also after the definition
// like FD; optional name is
// in the variable object
testNFE();
 

As we see, complete violation of rules.

Secondly, in case of assigning the named FE to a variable at declaration, JScript creates two different function objects. It is difficult to name such behavior as logical (especially considering that outside of NFE its name should not be accessible at all):

var foo = function bar() {
  alert('foo');
};

alert(typeof bar); // "function", NFE again in the VO – already mistake

// but, further is more interesting
alert(foo === bar); // false!

foo.x = 10;
alert(bar.x); // undefined

// but both function make
// the same action

foo(); // "foo"
bar(); // "foo"
 

Again we see the full disorder.

However it is necessary to notice that if to describe NFE separately from assigning to variable (for example via the grouping operator), and only after that to assign it to a variable, then check on equality returns true just like it would be one object:

(function bar() {});

var foo = bar;

alert(foo === bar); // true

foo.x = 10;
alert(bar.x); // 10
 

This moment can be explained. Actually, again two objects are created but after that remains, really, only one. If again to consider that NFE here is treated as the function declaration (FD) then on entering the context stage FD bar is created. After that, already at code execution stage the second object — function expression (FE) bar is created and is not saved anywhere. Accordingly, as there is no any reference on FE bar it is removed. Thus there is only one object — FD bar, the reference on which is assigned to foo variable.

Thirdly, regarding the indirect reference to a function via arguments.callee, it references that object with which name a function is activated (to be exact — functions since there are two objects):

var foo = function bar() {

  alert([
    arguments.callee === foo,
    arguments.callee === bar
  ]);

};

foo(); // [true, false]
bar(); // [false, true]
 

Fourthly, as JScript treats NFE as usual FD, it is not submitted to conditional operators rules, i.e. just like a FD, NFE is created on entering the context and the last definition in a code is used:

var foo = function bar() {
  alert(1);
};

if (false) {

  foo = function bar() {
    alert(2);
  };

}
bar(); // 2
foo(); // 1
 

This behavior also can be “logically” explained. On entering the context stage the last met FD with name bar is created, i.e. function with alert(2). After that, at code execution stage already new function — FE bar is created, the reference on which is assigned to foo variable. Thus (as further in the code the if-block with a condition false is unreachable), foo activation produces alert(1). The logic is clear, but taking into account IE bugs, I have quoted “logically” word since such implementation is obviously broken and depends on JScript bugs.

And the fifth NFE bug in JScript is related with creation of properties of global object via assigning value to an unqualified identifier (i.e. without var keyword). Since NFE is treated here as FD and, accordingly, stored in the variable object, assignment to unqualified identifier (i.e. not to variable but to usual property of global object) in case when the function name is the same as unqualified identifier, this property does not become global.

(function () {

  // without var not a variable in the local
  // context, but a property of global object

  foo = function foo() {};

})();

// however from the outside of
// anonymous function, name foo
// is not available

alert(typeof foo); // undefined
 

Again, the “logic” is clear: the function declaration foo gets to the activation object of a local context of anonymous function on entering the context stage. And at the moment of code execution stage, the name foo already exists in AO, i.e. is treated as local. Accordingly, at assignment operation there is simply an update of already existing in AO property foo, but not creation of new property of global object as should be according to the logic of ECMA-262-3.

This type of function objects is separated from FD and FE since it also has its own features. The main feature is that the [[Scope]] property of such functions contains only global object:

var x = 10;

function foo() {

  var x = 20;
  var y = 30;

  var bar = new Function('alert(x); alert(y);');

  bar(); // 10, "y" is not defined

}
 

We see that the [[Scope]] of bar function does not contain AO of foo context — the variable “y” is not accessible and the variable “x” is taken from the global context. By the way, pay attention, the Function constructor can be used both with new keyword and without it, in this case these variants are equivalent.

The other feature of such functions is related with Equated Grammar Productions and Joined Objects. This mechanism is provided by the specification as suggestion for the optimization (however, implementations have the right not to use such optimization). For example, if we have an array of 100 elements which is filled in a loop with functions, then implementation can use this mechanism of joined objects. As a result only one function object for all elements of an array can be used:

var a = [];

for (var k = 0; k < 100; k++) {
  a[k] = function () {}; // possibly, joined objects are used
}
 

But functions created via Function constructor are never joined:

var a = [];

for (var k = 0; k < 100; k++) {
  a[k] = Function(''); // always 100 different funcitons
}
 

Another example related with joined objects:

function foo() {

  function bar(z) {
    return z * z;
  }

  return bar;
}

var x = foo();
var y = foo();
 

Here also implementation has the right to join objects x and y (and to use one object) because functions physically (including their internal [[Scope]] property) are not distinguishable. Therefore, the functions created via Function constructor always require more memory resources.

The pseudo-code of function creation algorithm (except steps with joined objects) is described below. This description helps to understand more in detail what function objects are in ECMAScript. The algorithm is fair for all function types.

F = new NativeObject();

// property [[Class]] is "Function"
F.[[Class]] = "Function"

// a prototype of a function object
F.[[Prototype]] = Function.prototype

// reference to function itself
// [[Call]] is activated by call expression F()
// and creates a new execution context
F.[[Call]] = <reference to function>

// built in general constructor of objects
// [[Construct]] is activated via "new" keyword
// and exactly it allocates memory for new
// objects; then it calls F.[[Call]]
// to initialize created objects passing as
// this value newly created object
F.[[Construct]] = internalConstructor

// scope chain of the current context
// i.e. context which creates function F
F.[[Scope]] = activeContext.Scope
// if this functions is created
// via new Function(...), then
F.[[Scope]] = globalContext.Scope

// quantity of formal parameters
F.length = countParameters

// a prototype of created by F objects
__objectPrototype = new Object();
__objectPrototype.constructor = F // {DontEnum}, is not enumerable in loops
F.prototype = __objectPrototype

return F
 

Pay attention, F.[[Prototype]] is a prototype of the function (constructor) and F.prototype is a prototype of objects created by this function (because often there is a mess in terminology, and F.prototype in some articles is named as a “prototype of the constructor” that is incorrect).

This article has turned out rather big; however, we will mention functions again when will discuss their work as constructors in one of following chapters about objects and prototypes. As always, I am glad to answer your questions in comments.

ECMAScript specification:

Other articles:


Translated by: Dmitry A. Soshnikov.
Published on: 2010-04-05

Originally written by: Dmitry A. Soshnikov [ru, read »]
Originally published on: 2009-07-08


Tags: , , , ,

 
 
 

7 Comments:

  1. Gravatar of joseanpg joseanpg
    10. April 2010 at 16:36

    This is a very good article. Congratulations Dmitry!


  2. Gravatar of Dmitry A. Soshnikov Dmitry A. Soshnikov
    11. April 2010 at 00:52

    @joseanpg

    thanks Jose.


  3. Gravatar of Alejandro Moreno Alejandro Moreno
    13. April 2010 at 21:12

    I *love* this series!

    If you allow me to nit-pick, you left one instance of нет and да:

    AO(bar FD or anonymous FE) ->; нет ->;
    AO(bar FD or anonymous FE).[[Prototype]] ->; да ->; 10
    

    And “their” -> “they are” in the following line:
    “(to be exact — functions since their two objects)”

    Thanks again for this fantastic series.


  4. Gravatar of Dmitry A. Soshnikov Dmitry A. Soshnikov
    13. April 2010 at 21:40

    @Alejandro Moreno

    Thanks Alejandro, fixed. You can send me all wording corrections which you find via mail.

    Dmitry.


  5. Gravatar of jomaras jomaras
    29. April 2010 at 18:04

    Dmitry, great series of articles!

    Just a question – do you have any thoughts on code set as HTML node attribute value.

    e.g.

    if the onclick is a property of a HTML node object, do you have any idea how do they handle this case of setting the property of HTMLNode object to, in this case two statements?

    Do they create a function expression as a wrapper??

    Thanks and keep up the good work!

    Josip


  6. Gravatar of Dmitry A. Soshnikov Dmitry A. Soshnikov
    29. April 2010 at 18:53

    @jomaras

    thanks.

    if the onclick is a property of a HTML node object, do you have any idea how do they handle this case of setting the property of HTMLNode object to, in this case two statements?

    Do they create a function expression as a wrapper??

    Actually, this isn’t a part of ECMAScript and relates already to the implementation of the host environment — in this case to the DOM.

    So, to know how exactly some implementation manages this case, we should examines sources of those implementations.

    But in general, yep, there is nothing else as creating of the corresponding method (based on attributes value, i.e. source code) for the node object.

    For example, in Firefox:

    <input type="text" id="el" onclick="alert(event);" />
    var el = document.getElementById('el');
    
    // function object is build and bound
    // to the el node object
    //
    // function onclick(even) {
    //   alert(event);
    // }
    
    alert(el.onclick);

    So you see, that Gecko engine creates function with the same name “onclick” and even hardcoded by default parameter name — “event” (in IE it is global property). And the body of this function — is the source code taken from the “onclick” attribute. Therefore, we can use name “event” in that attribute value in FF.

    And “onclick” attribute is still string:

    var onClickAttr = el.getAttribute("onclick");
    
    // "string", "alert(event);"
    alert([typeof onClickAttr, onClickAttr]);

    But setting corresponding method doesn’t update in FF “onclick” attribute:

    el.onclick = function (e) {
      alert(e);
    };
    
    // source code of the
    // new funciton
    alert(el.onclick);
    
    // get again attribute value after that
    onClickAttr = el.getAttribute("onclick");
    
    // still old "alert(event);"
    alert(onClickAttr);

    And setting the attribute doesn’t update “onclick” method. Meanwhile on real clicking on the element after that new event set via attribute — we see reaction of this new source code evaluated:

    el.setAttribute("onclick", "alert(1);");
    
    onClickAttr = el.getAttribute("onclick");
    
    // "alert(1);"
    alert(onClickAttr);
    
    // stil old with "alert(e);"
    alert(el.onclick);

    But that just about FF. I.e. it depends on implementation and it depends twice because it is even not implementation of JS but of the host environment.

    In Safari (WebKit) and Chrome (V8) the last alert shows updated via the attribute setting state:

    // function onlick(event) {alert(1);}
    alert(el.onclick);

    IE vice-versa — shows correctly all updates of “onclick” attribute/method, but doesn’t set handler at all if you set it via dynamic attribute setting.

    So, the host environment world has own privileges (and can implement such cases in own manner) and also has own bugs.

    Dmitry.


  7. Gravatar of jomaras jomaras
    30. April 2010 at 10:38

    Thanks for the effort and the in-depth answer, it is very helpful.


Leave a Reply

Code: For code you can use tags [js], [text], [ruby] and other.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>