in ECMAScript

ECMA-262-3 in detail. Chapter 5. Functions.

Read this article in: Russian, Chinese (version 1, version 2), French.

In this article we will talk about one of the general ECMAScript objects — about functions. In particular, we will go through various types of functions, will define how each type influences variables object of a context and what is contained in the scope chain of each function. We will answer the frequently asked questions such as: “is there any difference (and if there are, what are they?) between functions created as follows:

var foo = function () {
  ...
};
 

from functions defined in a “habitual” way?”:

function foo() {
  ...
}
 

Or, “why in the next call, the function has to be surrounded with parentheses?”:

(function () {
  ...
})();
 

Since these articles relay on earlier chapters, for full understanding of this part it is desirable to read Chatper 2. Variable object and Chapter 4. Scope chain, since we will actively use terminology from these chapters.

But let us give one after another. We begin with consideration of function types.

In ECMAScript there are three function types and each of them has its own features.

A Function Declaration (abbreviated form is FD) is a function which:

  • has a required name;
  • in the source code position it is positioned: either at the Program level or directly in the body of another function (FunctionBody);
  • is created on entering the context stage;
  • influences variable object;
  • and is declared in the following way:
function exampleFunc() {
  ...
}
 

The main feature of this type of functions is that only they influence variable object (they are stored in the VO of the context). This feature defines the second important point (which is a consequence of a variable object nature) — at the code execution stage they are already available (since FD are stored in the VO on entering the context stage — before the execution begins).

Example (function is called before its declaration in the source code position):

foo();
 
function foo() {
  console.log('foo');
}
 

What’s also important is the position at which the funcion is defined in the source code (see the second bullet in the Function declaration definition above):

// function can be declared:
// 1) directly in the global context
function globalFD() {
  // 2) or inside the body
  // of another function
  function innerFD() {}
}
 

These are the only two positions in code where a function may be declared (i.e. it is impossible to declare it in an expression position or inside a code block).

There’s one alternative to function declarations which is called function expressions, which we are about to cover.

A Function Expression (abbreviated form is FE) is a function which:

  • in the source code can only be defined at the expression position;
  • can have an optional name;
  • it’s definition has no effect on variable object;
  • and is created at the code execution stage.

The main feature of this type of functions is that in the source code they are always in the expression position. Here’s a simple example such assignment expression:

var foo = function () {
  ...
};
 

This example shows how an anonymous FE is assigned to foo variable. After that the function is available via foo name — foo().

The definition states that this type of functions can have an optional name:

var foo = function _foo() {
  ...
};

What’s important here to note is that from the outside FE is accessible via variable foofoo(), while from inside the function (for example, in the recursive call), it is also possible to use _foo name.

When a FE is assigned a name it can be difficult to distinguish it from a FD. However, if you know the definition, it is easy to tell them apart: FE is always in the expression position. In the following example we can see various ECMAScript expressions in which all the functions are FE:

// in parentheses (grouping operator) can be only an expression
(function foo() {});

// in the array initialiser – also only expressions
[function bar() {}];

// comma also operates with expressions 
1, function baz() {}; 
 

The definition also states that FE is created at the code execution stage and is not stored in the variable object. Let’s see an example of this behavior:

// FE is not available neither before the definition
// (because it is created at code execution phase),
 
console.log(foo); // "foo" is not defined
 
(function foo() {});
 
// nor after, because it is not in the VO
 
console.log(foo);  // "foo" is not defined
 

The logical question now is why do we need this type of functions at all? The answer is obvious — to use them in expressions and “not pollute” the variables object. This can be demonstrated in passing a function as an argument to another function:

function foo(callback) {
  callback();
}
 
foo(function bar() {
  console.log('foo.bar');
});
 
foo(function baz() {
  console.log('foo.baz');
});
 

In case a FE is assigned to a variable, the function remains stored in memory and can later be accessed via this variable name (because variables as we know influence VO):

var foo = function () {
  console.log('foo');
};
 
foo();
 

Another example is creation of encapsulated scope to hide auxiliary helper data from external context (in the following example we use FE which is called right after creation):

var foo = {};
 
(function initialize() {
 
  var x = 10;
 
  foo.bar = function () {
    console.log(x);
  };
 
})();
 
foo.bar(); // 10;
 
console.log(x); // "x" is not defined
 

We see that function foo.bar (via its [[Scope]] property) has access to the internal variable x of function initialize. And at the same time x is not accessible directly from the outside. This strategy is used in many libraries to create “private” data and hide auxiliary entities. Often in this pattern the name of initializing FE is omitted:

(function () {
 
  // initializing scope
 
})();
 

Here’s another examples of FE which are created conditionally at runtime and do not pollute VO:

var foo = 10;
 
var bar = (foo % 2 == 0
  ? function () { console.log(0); }
  : function () { console.log(1); }
);
 
bar(); // 0
Notice, ES5 standardized bound functions. This type of functions correctly binds the value of this, making it locked in any function invocation.

var boundFn = function () {
  return this.x;
}.bind({x: 10});

boundFn(); // 10
boundFn.call({x: 20}); // still 10

The most usage bound functions find in attaching them as event listeners, or in postponed (setTimeout) functions, that still should operate on some objects as their this value.

You can get more information on the bound functions in the appropriate chapter from ES5 series.

Let’s go back and answer the question from the beginning of the article — “why is it necessary to surround a function in parentheses if we want to call it right from it’s definition”. Here’s an answer to this question: restrictions of the expression statement.

According to the standard, the expression statement (ExpressionStatement) cannot begin with an opening curly brace — { since it would be indistinguishable from the block, and also the expression statement cannot begin with a function keyword since then it would be indistinguishable from the function declaration. I.e., if we try to define an immediately invoked function the following way (starting with a function keyword):

function () {
  ...
}();

// or even with a name

function foo() {
  ...
}();
 

we deal with function declarations, and in both cases a parser will produce a parse error. However, the reasons of these parse errors vary.

If we put such a definition in the global code (i.e. on the Program level), the parser should treat the function as declaration, since it starts with a function keyword. And in first case we get a SyntaxError because of absence of the function’s name (a function declaration as we said should always have a name).

In the second case we do have a name (foo) and the function declaration should be created normally. But it doesn’t since we have another syntax error there — a grouping operator without an expression inside it. Notice, in this case it’s exactly a grouping operator which follows the function declaration, but not the parentheses of a function call! So if we had the following source:

// "foo" is a function declaration
// and is created on entering the context

console.log(foo); // function

function foo(x) {
  console.log(x);
}(1); // and this is just a grouping operator, not a call!

foo(10); // and this is already a call, 10

everything is fine since here we have two syntactic productions — a function declaration and a grouping operator with an expression (1) inside it. The example above is the same as:

// function declaration
function foo(x) {
  console.log(x);
}

// a grouping operator
// with the expression
(1);

// another grouping operator with
// another (function) expression
(function () {});

// also - the expression inside
("foo");

// etc

In case we had such a definition inside a statement, then as we said, there because of ambiguity we would get a syntax error:

if (true) function foo() {console.log(1)}

The construction above by the specification is syntactically incorrect (an expression statement cannot begin with a function keyword), but as we will see below, none of the implementations provide the syntax error, but handle this case, though, every in it’s own manner.

Having all this, how should we tell the parser that what we really want, is to call a function immediately after its creation? The answer is obvious. It’s should be a function expression, and not a function declaration. And the simplest way to create an expression is to use mentioned above grouping operator. Inside it always there is an expression. Thus, the parser distinguishes a code as a function expression (FE) and there is no ambiguity. Such a function will be created during the execution stage, then executed, and then removed (if there are no references to it).

(function foo(x) {
  console.log(x);
})(1); // OK, it's a call, not a grouping operator, 1

In the example above the parentheses at the end (Arguments production) are already call of the function, and not a grouping operator as it was in case of a FD.

Notice, in the following example of the immediate invocation of a function, the surrounding parentheses are not required, since the function is already in the expression position and the parser knows that it deals with a FE which should be created at code execution stage:

var foo = {
 
  bar: function (x) {
    return x % 2 != 0 ? 'yes' : 'no';
  }(1)
 
};
 
console.log(foo.bar); // 'yes'
 

As we see, foo.bar is a string but not a function as can seem at first inattentive glance. The function here is used only for initialization of the property — depending on the conditional parameter — it is created and called right after that.

Therefore, the complete answer to the question “about parentheses” is the following:

Grouping parentheses are needed when a function is not at the expression position and if we want to call it immediately right after its creation — in this case we just manually transform the function to FE.

In case when a parser knows that it deals with a FE, i.e. the function is already at the expression position — the parentheses are not required.

Apart from surrounding parentheses it is possible to use any other way of transformation of a function to FE type. For example:

1, function () {
  console.log('anonymous function is called');
}();

// or this one
!function () {
  console.log('ECMAScript');
}();

// and any other manual
// transformation

...

 

However, grouping parentheses are just the most widespread and the elegant way to do it.

By the way, the grouping operator can surround the function description as without call parentheses, and also including call parentheses. I.e. both expressions below are correct FE:

(function () {})();
(function () {}());
 

The following example shows a code in which none of implementations processes accordingly to the specification:

if (true) {
 
  function foo() {
    console.log(0);
  }
 
} else {
 
  function foo() {
    console.log(1);
  }
 
}
 
foo(); // 1 or 0 ? test in different implementations
 

Here it is necessary to say that according to the standard this syntactic construction in general is incorrect, because as we remember, a function declaration (FD) cannot appear inside a code block (here if and else contain code blocks). As it has been said, FD can appear only in two places: at the Program level or directly inside a body of another function.

The above example is incorrect because the code block can contain only statements. And the only place in which function can appear within a block is one of such statements — the expression statement. But by definition it cannot begin with an opening curly brace (since it is indistinguishable from the code block) or a function keyword (since it is indistinguishable from FD).

However in section of errors processing the standard allows for implementations extensions of program syntax. And one of such extensions can be seen in case of functions which appear in blocks. All implementations existing today do not throw an exception in this case and process it. But every in its own way.

Presence of ifelse branches assumes a choice is being made which of the two function will be defined. Since this decision is to be made at runtime, that implies that a function expression (FE) should be used. However the majority of implementations will simply create both of the function declarations (FD) on entering the context stage, but since both of the functions use the same name, only the last declared function will get called. In this example the function foo shows 1 although the else branch never executes.

However, SpiderMonkey implementation treats this case in two ways: on the one hand it does not consider such functions as declarations (i.e. the function is created on the condition at the code execution stage), but on the other hand they are not real function expressions since they cannot be called without surrounding parentheses (again the parse error — “indistinguishably from FD”) and they are stored in the variable object.

My opinion is that SpiderMonkey handles this case correctly, separating the own middle type of function — (FE + FD). Such functions are correctly created due the time and according to conditions, but also unlike FE, and more like FD, are available to be called from the outside. This syntactic extension SpiderMonkey names as Function Statement (in abbreviated form FS); this terminology is mentioned in MDC. JavaScript inventor Brendan Eich also noticed this type of functions provided by SpiderMonkey implementation.

In case FE has a name (named function expression, in abbreviated form NFE) one important feature arises. As we know from definition (and as we saw in the examples above) function expressions do not influence variable object of a context (this means that it’s impossible to call them by name before or after their definition). However, FE can call itself by name in the recursive call:

(function foo(bar) {
 
  if (bar) {
    return;
  }
 
  foo(true); // "foo" name is available
 
})();
 
// but from the outside, correctly, is not
 
foo(); // "foo" is not defined
 

Where is the name “foo” stored? In the activation object of foo? No, since nobody has defined any “foo” name inside foo function. In the parent variable object of a context which creates foo? Also not, remember the definition — FE does not influence the VO — what is exactly we see when calling foo from the outside. Where then?

Here’s how it works: when the interpreter at the code execution stage meets named FE, before creating FE, it creates auxiliary special object and adds it in front of the current scope chain. Then it creates FE itself at which stage the function gets the [[Scope]] property (as we know from the Chapter 4. Scope chain) — the scope chain of the context which created the function (i.e. in [[Scope]] there is that special object). After that, the name of FE is added to the special object as unique property; value of this property is the reference to the FE. And the last action is removing that special object from the parent scope chain. Let’s see this algorithm on the pseudo-code:

specialObject = {};
 
Scope = specialObject + Scope;
 
foo = new FunctionExpression;
foo.[[Scope]] = Scope;
specialObject.foo = foo; // {DontDelete}, {ReadOnly}
 
delete Scope[0]; // remove specialObject from the front of scope chain
 

Thus, from the outside this function name is not available (since it is not present in parent scope), but special object which has been saved in [[Scope]] of a function and there this name is available.

It is necessary to note however, that some implementations, for example Rhino, save this optional name not in the special object but in the activation object of the FE. Implementation from Microsoft — JScript, completely breaking FE rules, keeps this name in the parent variables object and the function becomes available outside.

Let’s have a look at how different implementations handle this problem. Some versions of SpiderMonkey have one feature related to special object which can be treated as a bug (although all was implemented according to the standard, so it is more of an editorial defect of the specification). It is related to the mechanism of the identifier resolution: the scope chain analysis is two-dimensional and when resolving an identifier it considers the prototype chain of every object in the scope chain as well.

We can see this mechanism in action if we define a property in Object.prototype and use a “nonexistent” variable from the code. In the following example when resolving the name x the global object is reached without finding x. However since in SpiderMonkey the global object inherits from Object.prototype the name x is resolved there:

Object.prototype.x = 10;
 
(function () {
  console.log(x); // 10
})();
 

Activation objects do not have prototypes. With the same start conditions, it is possible to see the same behavior in the example with inner function. If we were to define a local variable x and declare inner function (FD or anonymous FE) and then to reference x from the inner function, this variable would be resolved normally in the parent function context (i.e. there, where it should be and is), instead of in Object.prototype:

Object.prototype.x = 10;
 
function foo() {
 
  var x = 20;
 
  // function declaration  
 
  function bar() {
    console.log(x);
  }
 
  bar(); // 20, from AO(foo)
 
  // the same with anonymous FE
 
  (function () {
    console.log(x); // 20, also from AO(foo)
  })();
 
}
 
foo();
 

Some implementations set a prototype for activation objects, which is an exception compared to most of other implementations. So, in the Blackberry implementation value x from the above example is resolved to 10. I.e. do not reach activation object of foo since value is found in Object.prototype:

AO(bar FD or anonymous FE) -> no ->
AO(bar FD or anonymous FE).[[Prototype]] -> yes - 10
 

And we can see absolutely the same situation in older SpiderMonkey versions (before ES5) in case of special object of a named FE. This special object (by the standard) is a normal object“as if by expression new Object(), and accordingly it should be inherited from Object.prototype, what is exactly what can be seen in SpiderMonkey implementation (but only up to version 1.7). Other implementations (including newer versions of SpiderMonkey) do not set a prototype for that special object:

function foo() {
 
  var x = 10;
 
  (function bar() {
 
    console.log(x); // 20, but not 10, as don't reach AO(foo) 
 
    // "x" is resolved by the chain:
    // AO(bar) - no -> __specialObject(bar) -> no
    // __specialObject(bar).[[Prototype]] - yes: 20
 
  })();
}
 
Object.prototype.x = 20;
 
foo();
 

Notice, in ES5 this behavior was changed and in current Firefox versions the environment which stores the name of the FE does not inherit from Object.prototype.

ECMAScript implementation from Microsoft — JScript which is currently built into Internet Explorer (up to JScript 5.8 — IE8) has a number of bugs related with named function expressions (NFE). Every of these bugs completely contradicts ECMA-262-3 standard; some of them may cause serious errors.

First, JScript in this case breaks the main rule of FE that they should not be stored in the variable object by name of functions. An optional FE name which should be stored in the special object and be accessible only inside the function itself (and nowhere else) here is stored directly in the parent variable object. Moreover, named FE is treated in JScript as the function declaration (FD), i.e. is created on entering the context stage and is available before the definition in the source code:

// FE is available in the variable object
// via optional name before the
// definition like a FD
testNFE();
 
(function testNFE() {
  console.log('testNFE');
});
 
// and also after the definition
// like FD; optional name is
// in the variable object
testNFE();
 

As we see, complete violation of rules.

Secondly, in case of assigning the named FE to a variable at declaration, JScript creates two different function objects. It is difficult to name such behavior as logical (especially considering that outside of NFE its name should not be accessible at all):

var foo = function bar() {
  console.log('foo');
};
 
console.log(typeof bar); // "function", NFE again in the VO – already mistake
 
// but, further is more interesting
console.log(foo === bar); // false!
 
foo.x = 10;
console.log(bar.x); // undefined
 
// but both function make 
// the same action
 
foo(); // "foo"
bar(); // "foo"
 

Again we see the full disorder.

However it is necessary to notice that if to describe NFE separately from assigning to variable (for example via the grouping operator), and only after that to assign it to a variable, then check on equality returns true just like it would be one object:

(function bar() {});
 
var foo = bar;
 
console.log(foo === bar); // true
 
foo.x = 10;
console.log(bar.x); // 10
 

This moment can be explained. Actually, again two objects are created but after that remains, really, only one. If again to consider that NFE here is treated as the function declaration (FD) then on entering the context stage FD bar is created. After that, already at code execution stage the second object — function expression (FE) bar is created and is not saved anywhere. Accordingly, as there is no any reference on FE bar it is removed. Thus there is only one object — FD bar, the reference on which is assigned to foo variable.

Thirdly, regarding the indirect reference to a function via arguments.callee, it references that object with which name a function is activated (to be exact — functions since there are two objects):

var foo = function bar() {
 
  console.log([
    arguments.callee === foo,
    arguments.callee === bar
  ]);
 
};
 
foo(); // [true, false]
bar(); // [false, true]
 

Fourthly, as JScript treats NFE as usual FD, it is not submitted to conditional operators rules, i.e. just like a FD, NFE is created on entering the context and the last definition in a code is used:

var foo = function bar() {
  console.log(1);
};
 
if (false) {
 
  foo = function bar() {
    console.log(2);
  };
 
}
bar(); // 2
foo(); // 1
 

This behavior can also be “logically” explained. On entering the context stage the last met FD with name bar is created, i.e. function with console.log(2). After that, at code execution stage already new function — FE bar is created, the reference on which is assigned to foo variable. Thus (as further in the code the if-block with a condition false is unreachable), foo activation produces console.log(1). The logic is clear, but taking into account IE bugs, I have quoted “logically” word since such implementation is obviously broken and depends on JScript bugs.

And the fifth NFE bug in JScript is related with creation of properties of global object via assigning value to an unqualified identifier (i.e. without var keyword). Since NFE is treated here as FD and, accordingly, stored in the variable object, assignment to unqualified identifier (i.e. not to variable but to usual property of global object) in case when the function name is the same as unqualified identifier, this property does not become global.

(function () {
 
  // without var not a variable in the local
  // context, but a property of global object
 
  foo = function foo() {};
 
})();
 
// however from the outside of
// anonymous function, name foo
// is not available
 
console.log(typeof foo); // undefined
 

Again, the “logic” is clear: the function declaration foo gets to the activation object of a local context of anonymous function on entering the context stage. And at the moment of code execution stage, the name foo already exists in AO, i.e. is treated as local. Accordingly, at assignment operation there is simply an update of already existing in AO property foo, but not creation of new property of global object as should be according to the logic of ECMA-262-3.

This type of function objects is discussed separately from FD and FE since it also has its own features. The main feature is that the [[Scope]] property of such functions contains only global object:

var x = 10;
 
function foo() {
 
  var x = 20;
  var y = 30;
 
  var bar = new Function('console.log(x); console.log(y);');
 
  bar(); // 10, "y" is not defined
 
}
 

We see that the [[Scope]] of bar function does not contain AO of foo context — the variable “y” is not accessible and the variable “x” is taken from the global context. By the way, pay attention, the Function constructor can be used both with new keyword and without it, in this case these variants are equivalent.

The other feature of such functions is related with Equated Grammar Productions and Joined Objects. This mechanism is provided by the specification as suggestion for the optimization (however, implementations have the right not to use such optimization). For example, if we have an array of 100 elements which is filled in a loop with functions, then implementation can use this mechanism of joined objects. As a result only one function object for all elements of an array can be used:

var a = [];
 
for (var k = 0; k < 100; k++) {
  a[k] = function () {}; // possibly, joined objects are used
}
 

But functions created via Function constructor are never joined:

var a = [];
 
for (var k = 0; k < 100; k++) {
  a[k] = Function(''); // always 100 different funcitons
}
 

Another example related with joined objects:

function foo() {
 
  function bar(z) {
    return z * z;
  }
 
  return bar;
}
 
var x = foo();
var y = foo();
 

Here also implementation has the right to join objects x and y (and to use one object) because functions physically (including their internal [[Scope]] property) are not distinguishable. Therefore, the functions created via Function constructor always require more memory resources.

The pseudo-code of function creation algorithm (except steps with joined objects) is described below. This description helps to understand in more detail which function objects exist in ECMAScript. The algorithm is identical for all function types.

F = new NativeObject();
 
// property [[Class]] is "Function"
F.[[Class]] = "Function"
 
// a prototype of a function object
F.[[Prototype]] = Function.prototype
 
// reference to function itself
// [[Call]] is activated by call expression F()
// and creates a new execution context
F.[[Call]] = <reference to function>
 
// built in general constructor of objects
// [[Construct]] is activated via "new" keyword
// and it is the one who allocates memory for new
// objects; then it calls F.[[Call]]
// to initialize created objects passing as
// "this" value newly created object 
F.[[Construct]] = internalConstructor
 
// scope chain of the current context
// i.e. context which creates function F
F.[[Scope]] = activeContext.Scope
// if this functions is created 
// via new Function(...), then
F.[[Scope]] = globalContext.Scope
 
// number of formal parameters
F.length = countParameters
 
// a prototype of created by F objects
__objectPrototype = new Object();
__objectPrototype.constructor = F // {DontEnum}, is not enumerable in loops
F.prototype = __objectPrototype
 
return F
 

Pay attention, F.[[Prototype]] is a prototype of the function (constructor) and F.prototype is a prototype of objects created by this function (because often there is a mess in terminology, and F.prototype in some articles is named as a “prototype of the constructor” that is incorrect).

This article has turned out rather big; however, we will mention functions again when will discuss their work as constructors in one of chapters about objects and prototypes which follow. As always, I am glad to answer your questions in comments.


Translated by: Dmitry A. Soshnikov.
Published on: 2010-04-05

Originally written by: Dmitry A. Soshnikov [ru, read »]
Originally published on: 2009-07-08