More About Scheme

apply and eval

The usual way of evaluating expressions is like this:

=> (* 1 2 3)
6
=> (+ 1 2 3)
6
=> (- 1 2 3)
-4

An equivalent way is to use apply:

=> (apply * '(1 2 3))
6
=> (apply + '(1 2 3))
6
=> (apply - '(1 2 3))
-4

(apply f seq) takes a function f and evaluates it using the items in seq as its input. It’s as if f is “cons-ed” onto seq, and then that expression is evaluated.

The eval function takes an entire list an evaluates it:

=> (eval '(* 1 2 3) user-initial-environment)
6
=> (eval '(+ 1 2 3) user-initial-environment)
6
=> (eval '(- 1 2 3) user-initial-environment)
-4

The second parameter to eval is the environment in which the expression will be evaluated, i.e. all the bindings that the expression has access to. The built-in environment variable user-initial-environment is the environment used by the interpreter.

eval is a very powerful function: it is essentially Scheme implemented in Scheme!

The Scope of Names: Static Scoping vs Dynamic Scoping

A variable’s scope is where it is visible. A local variable is a variable whose scope is restricted to a block of code. A nonlocal variable is visible outside of the block in which it was declared. Global variables are a kind of nonlocal variable that can be used anywhere in a program.

Many modern languages are statically scoped (lexically scoped). This means that the scope of a variable can be determined at compile-time before the program runs just by examining the source code. Static scoping helps programmers to read source code and determine what values names are bound to without need to run the code.

Consider this Scheme code:

(define x 1)
(define f (lambda (x) (g 2)))
(define g (lambda (y) (+ x y)))  ;; Which x does this refer to?

Scheme is statically scoped, and so if you evaluate (f 5) the answer is 3 (because the x in g refers to the x whose value is 1). If Scheme were instead dynamically scoped, i.e. if the most recently encountered variable named x was used in g, then the answer would be 7.

It is useful to trace this in some detail. Before (f 5) is called, x was bound to 1 by the first define. When (f 5) is called, the 5 is bound to x in the lambda expression for f. Then (g 2) is called, and the 2 is bound y in the lambda expression for g. So at this point, there are three bound variables: x bound to 1, x bound to 5, and y bound to 2. In gs body expression (+ x y), what value should be used for x? Should it be 1, or should it be 2? Scheme is statically scoped, and so it decides on bindings before the code runs, which means it must use the x bound to 1. However, in a dynamically scoped language (notably many versions of Lisp before Scheme), the most recently bound value of x is used. If Scheme were dynamically scoped, then (f 5) would print 7.

Here’s another example of static scoping, this time from JavaScript:

function big() {
    function sub1() {
        var x = 7;   // hides the x defined in big
        sub2();
    }

    function sub2() {
        var y = x;      // which x does this refer to?
    }
    var x = 3;
    sub1();
}

Javascript is statically scoped, and so the x in sub2 is the x with the value 3 that is defined in big. If Javascript were dynamically scoped, then then x would refer to the most recently bound x at runtime, i.e. the x bound to 7.

Dynamic scoping is an alternative to static scoping that has largely fallen out of favor. Most examples of dynamic scoping occur in older languages, such as APL, SNOBOL, and Lisp (at least early versions). Perl lets you optionally declare variables that follow dynamic scoping rules.

The idea of dynamic scoping is that the meaning of a variable depends upon the value of the most recent variable with the same name in the current function call stack (as opposed to the enclosing block of source code).

Here is one more example that shows the difference between static and dynamic scoping:

const int b = 5;

int foo()
{
   int a = b + 5;  // What is b?
   return a;
}

int bar()
{
   int b = 2;
   return foo();
}

int main()
{
   foo(); // returns 10 for static scoping; 10 for dynamic scoping
   bar(); // returns 10 for static scoping; 7 for dynamic scoping
}

In general, dynamic scoping is unpopular because it makes it harder to reason about the meaning of programs from their source code alone. Under dynamic scoping, you can’t always tell for sure what a variable refers to until the code runs, because the order in which functions are called matters.

Another problem with dynamic scoping is that it exposes the local variables of a function to other functions, thus allowing the possibility that they could be modified.

One good thing about dynamic scoping is that it is easier to implement than static scoping.

Closures

A closure combines two things: a function, and an environment of (variable, value) pairs. The function is allowed to use variables from this environment.

For example, consider this code:

(define f
    (lambda (n)
        (+ n 1)
    )
)

=> (f 5)
1

f is a function, but it’s not a closure.

Now consider this:

(define make-adder
    (lambda (n)
        (lambda (x) (+ n x))  ;; n is outside of the lambda function
    )
)

(define g (make-adder 1))

=> (g 5)
6

g is a closure that includes both a function and a binding for the variable n. By itself, the function (lambda (x) (+ n x)) can’t be evaluated because n is free. But, in g, n is bound to the value 1. The function plus this binding of n is a closure.

More concretely, you can think of a closure as being similar to this:

(define g1
    (let ((n 1))
        (lambda (x)
            (+ n x)
        )
    )
)

=> (g1 5)
6

You can see that g1 not just a function, but a function along with some variable bindings that it needs.

Any programming language that lets you define functions inside functions must deal with the issue of variables the inner function uses that are not defined inside it. Closures are a common solution to this problem.

In Go, for example, you can do this:

package main

import "fmt"

type intFunc func(int) int

func makeAdder(n int) intFunc {
    result := func(x int) int { return x + n }
    return result
}

func main() {
    add2 := makeAdder(2)   // add2 is a closure

    fmt.Println(add2(5))
}

add2 is a closure. The n inside the result function is bound to the value of n passed to makeAdder, and it stays bound for the life of the result function. So even after makeAdder finishes executing, the n in the result is still there and can be used as long as add2 exists.

As an aside, notice how the typing information that Go requires makes the code longer, and more cluttered, than the same code in Scheme. Scheme is dynamically typed, and so doesn’t check for type information at compile time. Instead, it just runs the code and assumes the types are correct. If they aren’t, then there will be a run-time error.

Composing Functions

Another interesting feature of Scheme is the ability to easily compose functions. Recall how function composition works in mathematics. Given, for example, \(f(x) = x^2\) and \(g(x) = 2x + 1\), the composition of \(f\) and \(g\), denoted \(f \circ g\), is \(f(g(x)) = g(x)^2 = (2x + 1)^2 = 4x^2 + 4x + 1\).

In Scheme, we can write these functions like this:

(define f
    (lambda (x)
        (* x x)
    )
)

(define g
    (lambda (x)
        (+ (* 2 x) 1)
    )
)

=> (f 2)
4
=> (g 2)
5
=> (f (g 2))
25

One way to compose functions is to write a function called compose like this:

(define compose
    (lambda (f g)
        (lambda (x)
            (f (g x))
        )
    )
)

compose takes two functions, f and g as input, and returns the function (lambda (x) (f (g x))) as output.

Now we can write the composition of f and g without having to supply a particular x value:

(define fg (compose f g))

=> (fg 2)
25

Here’s a way to define a function that returns the second element of a list:

(define my-second (compose car cdr))

Notice that we don’t mention the list parameter anywhere — it’s implicit. We can think of this as saying that my-second is a function that first applies cdr to its input, and then applies car to the result of that.

The twice function takes a function, f, as input and returns f composed with itself, i.e. \(f \circ f\):

(define twice
    (lambda (f)
        (compose f f)
    )
)

For instance:

(define garnish
    (twice (lambda (x) (cons 'cheese x)))
)

=> (garnish '(1 2 3))
(cheese cheese 1 2 3)

We can generalize this as follows:

(define compose-n
    (lambda (f n)
        (if (= n 1)
            f
            (compose f (compose-n f (- n 1)))
        )
    )
)

=> ((compose-n 1+ 5) 1)

;Value: 6

1 ]=> (define triple-cherry (compose-n (lambda (lst) (cons 'cherry lst)) 3))

;Value: triple-cherry

1 ]=> (triple-cherry '(vanilla))

;Value 16: (cherry cherry cherry vanilla)

compose-n returns a new function that we could refer to as f to the power of n, where function composition is being used instead of multiplication.

Curried Functions

Here are two different ways to write the addition function:

(define add_a          ;; uncurried
    (lambda (x y)
        (+ x y)
    )
)

=> (add_a 3 4)
7

(define add_b          ;; curried
    (lambda (x)
        (lambda (y)
            (+ x y)
        )
    )
)

=> ((add_b 3) 4)
7

add_a takes 2 inputs, while add_b takes only one input and returns a function that takes the second input (thus the calling syntax is different).

The nice thing about add_b is that if we give it only a single input n, the we get a function that adds n to a number:

(define add5 (add_b 5))

add_b is an example of a curried function, i.e. it is a function that takes multiple inputs and processes them one at time. After it takes one input, it returns a function that processes the rest of the inputs. In practice, it can be a useful trick for creating new functions out of old ones.

In Scheme, most functions are not written in a curried style because of the awkward calling syntax, i.e. ((add_b 3) 4) is not as nice as (add_a 3 4). But you can write a function that converts a non-curried function into a curried one:

;; given a 2-parameter uncurried function, curry2 returns a curried version
(define curry2
    (lambda (f)
      (lambda (x)
        (lambda (y)
          (f x y)))))

curry2 assumes f is a function that takes exactly 2 inputs. We could use it to create a new function like this:

(define add5 ((curry2 +) 5))

=> (add5 3)
8

+ is a pre-defined 2-argument function, and (curry2 +) is this:

(lambda (x)
  (lambda (y)
    (+ x y)))

This is a curried version of +. Thus ((curry2 +) 5) is this:

(lambda (y)
  (+ 5 y))

Here’s a another example. Recall that (filter pred lst) returns a new list containing just the elements of lst for which the predicate function pred evaluates as true:

(define keep-odds ((curry2 filter) odd?))

=> (keep-odds '(1 2 3 4 5))
(1 3 5)

Notice that the definition of keep-odds does not have lambda explicitly in it anywhere. That’s because ((curry2 filter) odd?)) returns a function which is ready to accept the arguments it needs; there is no need to add any extra parameters to it.

A convenient way to use currying is to pre-define curried versions of functions for use in later definitions. For example:

;; curried versions of some standard functions
(define c_+ (curry2 +))
(define c_cons (curry2 cons))
(define c_filter (curry2 filter))

;; some definitions that use curried functions
(define inc (c_+ 1))
(define f (c_filter odd?))
(define add-cherry (c_cons 'cherry))

You can also write an uncurry2 function that takes a 2-argument curried function as input and returns a non-curried version of it:

;; converts a 2-argument curried function to a non-curried function
(define uncurry2
    (lambda (f)
      (lambda (x y)
        ((f x) y))))

How Scheme Implements Lists

Lists in Scheme are implemented as singly-linked lists. Each item in a list is represented by a cons cell, i.e. an object that contains two pointers, one to the first element of the list, and one to the rest of the list. The first pointer is called the car, and the second pointer is called the cdr.

A new cons cells is created using cons, e.g.:

=> (cons 2 3)
(2 . 3)

In Scheme, (2 . 3) is a pair, where 2 is the car of the pair, and 3 is the cdr of the pair. In a proper list, the cdr of every cons cell must point to a list, and the last cdr must be the empty list. For example, the list (1 2 3) is short-hand for the pair (1 . (2 . (3 . ()))). The car of this pair points to 1, and the cdr points to the pair (2 . (3 . ())).

The pair? function tests if an object is a pair, e.g.:

=> (pair? (cons 2 3))
#t

=> (pair? '(1 2))
#t

=> (pair? '())
#f

=> (pair? 'pear)
#f

=> (pair? '(1 . (2 . (3 . ()))))
#t

See cons cell box diagrams for examples of how to draw diagrams of pairs and lists.