I actually wanted to write something about function parameters, how these are passed as so-called “promises”, how these have associated scopes (and the consequences of that), and how they give us lazy evaluation.

It looks simple when you call a function, but there is actually a lot going on.

While I was thinking about lazy evaluation, though, I tried to come up with an example where we could do something interesting with it, which made me think of lazy lists and lazy queues.

I wrote about those in Functional Data-Structures in R, and I am not going to repeat all of it here, but I cannot resist using them as examples.

But I got sidetracked. When I wrote Functional Data-Structures, I hadn’t thought about Haskell-like pattern matching in R. That kind of pattern matching makes it a lot easier to manipulate data structures. Without them, the code I wrote for Functional Data-Structures got a little hard to read.

Well, I implemented a pattern matching DSL in Domain-Specific Languages in R. You can get the code here. With this DSL, you can define data structures and then pattern match against them when you define functions.

I will include that library for this post:


I will use patterns to define linked lists. Linked lists are more common in functional programming languages than the type of lists we have in R, because most functional programming languages do not allow data to be modified. It is easy to efficiently build new linked lists from other lists, but there is really not a lot you can do with vector-like lists if you cannot modify them.

R is a strange case when it comes to modifying data. All data is immutable, but you can modify environment, and sometimes that makes it look like you can modify data. That is an illusion. If you think this function will modify its input, you will be sadly disappointed.

set_one <- function(x) x[1] <- 1

Inside the function body we make a copy of x and assign to it. We never modify the original data.

x <- rnorm(5)
## [1] -1.4330804 -0.6122987 -0.8524831  0.1019865  2.2688074
## [1] -1.4330804 -0.6122987 -0.8524831  0.1019865  2.2688074

Because data is immutable in R, you need data structures that you can update without modifying them. Those are called functional data structures.1

Linked lists

A linked list is a recursive data structure. A list is either empty or consists of an element followed by a list. By tradition—something we inherit from the Lisp language—we call the empty list nil and for a non-empty list we call the head car and the tail cdr.

Using my pattern DSL, we can define linked lists this way:

linked_list := NIL | CONS(car, cdr)

We define the data type linked_list as the result of the constructors NIL and CONS. The constructor for the empty list, NIL doesn’t take any arguments but the constructor CONS takes two.

After we have defined the data structure, we can create lists using the constructors:

llist <- CONS(1, CONS(2, CONS(3, NIL)))
## CONS(car = 1, cdr = CONS(car = 2, cdr = CONS(car = 3, cdr = NIL)))

You can think of the list data structure as a head element, car, and then a pointer to a list, cdr.

That is how it is usually implemented, but since R doesn’t have pointers, the structure that pmatch makes is (an R) list with elements car and cdr. It behaves the same way, although less efficiently, of course.

There is a default print method defined for all types you construct using pmatch, but I want a specialised one for lists to make them easier to read:

toString.linked_list <- function(llist)
  cases(llist, NIL -> "[]",
        CONS(car, cdr) -> paste(car, "::", toString(cdr)))
print.linked_list <- function(llist)
  cat(toString(llist), "\n")

## 1 :: 2 :: 3 :: []

This makes it a lot easier to read especially long lists.

You can use purrr to construct longer lists if you want to. The constructors work just as functions. The CONS constructor takes its arguments in the wrong order for reduce, though, but a lambda expression fixes that:

llist <- purrr::reduce_right(1:6, ~ CONS(.y,.x), .init = NIL)
## 1 :: 2 :: 3 :: 4 :: 5 :: 6 :: []

A nice property of linked lists is that we have constant time prepending. Adding a new head to a list doesn’t modify the old list and in effect just puts a pointer from a new head to an existing list. Unlike R lists, where changing a list often means making a complete copy—a linear time operation—prepending is always a constant time operation. The drawback is that most other operations take linear time. We don’t have random access to a list’s elements.

Creating constructors for data structures is not the only thing that pmatch lets us do. Much more importantly, it lets us match values against structures in a switch-like way.

You do this using the cases function. The syntax for testing different patterns against an object, x, is this:

cases(x, patter1 -> result1, pattern2 -> result2, …)

The function takes a value made from data structure constructors as its first argument and then a sequence of pattern-result pairs. The patterns look like calls to constructors but actually tests if x matches the structure ein the pattern. If the pattern matches the structure, then the result of cases is what we get from result-expression. We can bind variables in the matching pattern and use that in the result expression.

If you suspect that there is a lot of non-standard evaluation going on here, you are not wrong.

A simple example, reversing a list:

reverse <- function(llist, acc = NIL)
        NIL -> acc,
        CONS(car, cdr) -> reverse(cdr, CONS(car, acc)))

This function recursively reverses a list. We use an accumulator, acc to do this. We go through each element in the input, prepending it to acc. When we reach the end of the list, acc will contain the input in reversed order.

In the reverse function, we match the input, llist, against two patterns. The pattern NIL will match empty strings, and the pattern CONS(car, cdr) will match non-empty lists (and assign the head and tail of those to the variables car and cdr).

When we see the empty list, we return acc—it should now contain the reversed list—and otherwise we update acc and continue recursively.

## 6 :: 5 :: 4 :: 3 :: 2 :: 1 :: []

We can illustrate the function this way:

Notice that the input list is not modified. We cannot do that. Instead, the parameter, llist, moves along it while we construct acc. If there is another reference to the original list, then the list it refers to isn’t changed; if there is not, the elements we scan past will be garbage-collected.

I won’t show the parts of structures that are not pointed to by variables in the figures below. I will use notation similar to this: