Demystify Functions

Part of "Demystify Programming Languages" series

In one of my previous posts I talked about the theoretical point of view on functions. In this post, we will talk about how to implement function from scratch in a programming language.

This post is part of the series: in previous posts, we constructed small languages, which for now can do +, -, define (global scope variables). In this function, we will add function operation which will create a new function. We will add a new type (function) to the list of existing types (symbol, number).

What we will cover?

This is learning exercise, which means that we will implement only limited functionality, for example, we will use dynamic variable resolution instead of lexical scope, we will not talk about recursion or stack overflow error or tail call optimization, we will not support closures yet (this is for the next post), evaluation strategy (we will use call by value for most of the time).

We will implement a function which will work like this:

> (define minus
(function (x y)
(- x y)))
> (minus 2 1)
= 1

e.g. (function ...) returns a function which we assign to a variable (minus) and later we can call it the same way as we can call built-in functions.

Implementation

What does it take to create a function? We need 3 things

• keyword function which serves as a signal that this is expression is function declaration. Other Lisp flavors may use lambda, λ or \ instead.
• list of function arguments
• body of the function

For example:

1;                 function body⤵
2(define minus (function (x y) (- x y)))
3;              arguments⤴

Function invocation will evaluate the body with an environment which will have variables named the same way as arguments e.g.

1(minus 2 1)

is the same as

1evaluate(parse(`(- x y)`), { x: 2, y: 1 });

A function is sub-program (or routine) with some local variables.

Function as value

Function is a value, so we can asign it to variable:

1(define minus (function (x y) (- x y)))

If we can assign it to a variable, it means that we need to represent a function in some way storable in memory. How we will do it?

We can store is as list:

• first will be keyword “function” (tag)
• the second is the list of arguments
• the third is the body of the function

Hm… seems familiar 🤔. We can reuse AST of function as function representation

1const evaluate = (ast, environment = {}) => {
2  // ...
3  // function call handling
4  let [name, first, second] = ast;
5  const numberOfArguments = ast.length - 1;
6  if (name === "+") {
7    // ...
8  } else if (name === "function") {
9    return ast;
10  } else {
11    // ...
12  }
13};

We can detect function like this:

1const isFunction = ast => isList(ast) && ast === "function";

Function call

Let’s add support for function calls. As we discussed earlier function call is just evaluation with aditional local variables:

1const evaluate = (ast, environment = {}) => {
2  // ...
3  if (name === "+") {
4    return evaluate(first, environment) + evaluate(second, environment);
5    //...
6  } else {
7    if (!isFunction(environment[name])) {
8      throw new RuntimeError(`"\${name}" is not a function`);
9    }
10    // take function and destructure it to arguments and body
11    const [_, argumentNames, body] = environment[name];
12    // assume all functions expect 2 arguments
13    const functionEnvironment = {
14      // take current environment
15      ...environment,
16      // add arguments to environment
17      [argumentNames]: evaluate(first, environment),
18      [argumentNames]: evaluate(second, environment)
19    };
20    // pass body and new environment to evaluate
21    return evaluate(body, functionEnvironment);
22  }
23};

That is it. We implemented functions. Now let’s talk about details.

Local variables

Why do they call it local variables? The difference between local and global variables is that global variables are accessible everywhere (once defined), but local variables are only available inside the function.

For example:

> (define z 1)
= 1
> (+ z z)
= 2

1(define minus (function (x y) (- x y)))

As you can see we use x and y variables, that means they are defined (at least inside the function). Now if we try

> (minus 2 1)
= 1
> (+ x y)

it will throw an exception about undefined variables x and y because they don’t exist globally.

Each function has its scope, but it contains all variable from the global scope.

Let’s see on more example:

> (define z 1)
= 1
> (define minuzzz (function (x y) (- (- x y) z)))
> (minuzzz 2 1)
= 0

> (define x 1)
= 1
> (define minus (function (x y) (- x y)))
> (minus 2 1)
= 1

x exists globally and locally. In this case, local version “wins”, this is called variable shadowing (local variable shadows global one).

Dynamic resolution

What would happen if we will do:

> (define getFun
(function (x y)
(function (i j)
(- (+ x y) (+ i j))
)
)
)
> (define fun (getFun 5 4))
> (fun 3 2)

getFun is a function which returns a function. We assign to fun a function returned by getFun (with x and y substituted as 5 and 4 respectively).

I would expect (fun 3 2) to extend to the following expression (- (+ 5 4) (+ 3 2)) or in arithmetic notation ((5 + 4) - (3 + 2)) and evaluate to 4. But instead, it will result in error Can't find "y" variable.... This is because we use “dynamic” resolution, we don’t preserve environments, there is one global environment and one function environment, but to support this case we need to save environment of each function when it was created and pass it around together with the function. The function passed together with an environment called closure, we will implement closures in the next post.

Native functions

Now we can define functions in our language, we see that there is some difference between + and -, for example, and user-defined functions.

+ and - use “native” functions e.g. capability of the underlying platform to perform the actual operation. If we would use assembly language instead of JS it could be some processor-specific instructions, for example:

Three-operand architecture (RISC - PowerPC)

;A:= B+C
lwz r2, [num1]
lwz r3, [num2]

Two-operand architecture (CISC - x86)

;A:=B
mov eax, [num1]
mov ebx, [num2]
;A:=A+B

Functions in environment

Now, when we can store user-created functions in the environment, we can think of storing some of the built-in functions in the environment as well, this way we can simplify code a bit.

We can move +, - functions to the environment, but not define and function. (Think why we can’t.)

By doing so we would be able to remove some code:

1const evaluate = (ast, environment = {}) => {
2  // ...
3  // function call handling
4  let [name, first, second] = ast;
5  const numberOfArguments = ast.length - 1;
6- if (name === "+") {
7-   return evaluate(first, environment) + evaluate(second, environment);
8- } else if (name === "-") {
9-   return evaluate(first, environment) - evaluate(second, environment);
10- } else if (name === "define") {
11+ if (name === "define") {
12    // ...
13    if (
14      environment[first] !== undefined ||
15-     first === "+" ||
16-     first === "-" ||
17      first === "define" ||
18      first === "function"
19    ) {
20      throw new RuntimeError(`Can't redefine "\${first}" variable`);
21    }
22    // ...
23  }
24};

Move functions to environment:

1const defaultEnvironment = {
2  "+": (a, b) => a + b,
3  "-": (a, b) => a - b
4};
5
6const evaluate = (ast, environment = { ...defaultEnvironment }) => {

Add logic to handle function call:

1const evaluate = (ast, environment = { ...defaultEnvironment }) => {
2  // ...
3  if (name === "define") {
4    // ...
5  } else {
6    if (isNativeFunction(environment[name])) {
7      return environment[name](
8        evaluate(first, environment),
9        evaluate(second, environment)
10      );
11    }
12    if (isFunction(environment[name])) {
13      // ...
14    }
15  }
16};

PS

This is just a start for functions. We still need to cover a lot of subjects, but the basic idea is in place.

Source code for this post is here and here.