# Scoping Rules and NSE

Earlier this week, I wrote some tweets about how you have to be careful about scopes when you do “non-standard evaluation”. I cover this in both *Metaprogramming in R* and *Domain-Specific Languages in R*, but this tweet

Comprehensive overview of NSE. Should be of interest to @ThomasMailund @MilesMcBain @_ColinFay and @edwin_thoen https://t.co/2uI1HSnfEN

— Deemah ðŸ‡ºðŸ‡¦ ðŸ‡³ðŸ‡´ (@dmi3k) September 17, 2018

made me write about it againâ€”only this time in a twitter thread.

Well, I say threadâ€”I don’t actually know how to do that, it seems; I just replied to the tweet a bunch of times, and apparently that doesn’t make a thread, it, therefore, my reply was hard to read. Apparently, this is where I went wrong

Sorry there is only one Tweet, remember you have to reply to one of the main tweet from the thread not a comment.

— Thread Reader App (@threadreaderapp) September 17, 2018

If you think I'm buggy please let me know by DM or email ðŸ˜Š

Need help? check this link: https://t.co/dz8sHGkl3T ðŸ¤–

Anyway, the first tweet links to this page that contains a list of tools for implementing “macros” in R. This just means “tools for substituting expressions into other expressions” (preferably with some control over where they are put).^{1} If you evaluate those expressions after the substitution, either right away or at some later time, it is also called *non-standard evaluation*. Not surprisingly, it is called that because it differs (at least it can differ) from standard evaluation where you simply have an expression and evaluate it.

The motivation for the overview on macro methods was apparently that a paper on the `wrapr`

package was rejected because it didn’t compare the package with quasi-quotation from `rlang`

. I might be partially responsible for this since I reviewed the paper, but I didn’t complain about the lack of comparisonâ€”I complained that `wrapr`

doesn’t deal with scopes (which `rlang`

does) and that this makes `wrapr`

’s `let`

function very risky to use. You can *very* easily write a function that seems to work but contains subtle errors.

Erh.. but the paper they had rejected â€” mentioned at the beginning â€” was reviewed by me :-/

— Thomas Mailund (@ThomasMailund) September 17, 2018

There were some pretty serious issues with overscoping with their wrapr package. It doesnâ€™t handle it at all, so you very easily get some very hard to notice errorsâ€¦

It occurs to me that my previous tweet makes it sound like there is something especially bad with wrapr. There isnâ€™t. But NSE is hard, and wrapr doesnâ€™t make it easier â€” just easier to get scoping wrong. I find that rlangâ€™s quosures does a good job (your mileage may vary). 1/

— Thomas Mailund (@ThomasMailund) September 17, 2018

Since I didn’t manage to make my tweets into a proper thread, so they would be easy to read, I will try to repeat it here. I can add a few things here, now that I have a whole post to work with, but I cannot get around all the exciting scope and NSE topics I would like to. For that, I will send you to my books or return to those topics at a later time.

Before I get started, though, I once more want to stress that *this is not an attack on wrapr::let!* The issues with the scope exist for pretty much all tools that manipulate expressions. The `rlang`

package has “quosures”â€”expressions plus scopesâ€”that alleviates many of the issues. This is the reason I prefer it over the others. If you do not explicitly handle scopes, you are likely to get into trouble with non-standard evaluation. Regardless of how you implement it.

I am simplifying a few things below to make the explanation easier; I am not describing what R *actually* does, but how it *conceptually* works with expressions and scopes. There are more details in it, but those are not relevant for this topic.

## Quoting and evaluating expressions

Whenever you write an expression in R, you create an expression object, an abstract syntax tree if you will, and then evaluates it.

```
x <- 2 ; y <- 1 ; z <- 3
x * (y + z)
```

```
## [1] 8
```

You do not *have* to evaluate it, though. You can avoid this by *quoting* the expression

`quote(x * (y + z))`

```
## x * (y + z)
```

This gives you the abstract syntax tree instead of the result of evaluating it.

```
expr <- quote(x * (y + z))
class(expr)
```

```
## [1] "call"
```

In this particular case, it says that the syntax tree is a *call*. It is because `*`

is a function and that is what we are calling at the outermost level. There are other kinds an expression can be, e.g. constants and variables, but for this post that doesn’t matter. What matters is that you can create non-evaluated (quoted) expressions.

That, alone, isn’t that interesting. The exciting part is that you can:

- Evaluate expressions later, in alternative scopes,
- You can manipulate expressions and modify them before you evaluate them.

You evaluate an expression using the `eval`

function

`eval(expr)`

```
## [1] 8
```

and you can modify it in several ways; one way is using the built-in `substitute`

function

```
expr2 <- substitute(x * (y + z), list(x = 42))
expr2
```

```
## 42 * (y + z)
```

`eval(expr2)`

```
## [1] 168
```

The substitute function quotes its input itself, and it returns a quoted expression. This can make the function a little hard to work with. If we want to substitute `x`

with `42`

in `expr`

from beforeâ€”for example, if the expression was a function argument, we couldn’t just use

`substitute(expr, list(x = 42))`

```
## expr
```

This is because `substitute`

quotes its input, which is `expr`

. It doesn’t care that it is a variable we have assigned a value to; it only sees it as an expression. To get the value of `expr`

into the `substitute`

call, you need to substitute it:

`substitute(substitute(expr, list(x = 42)), list(expr = expr))`

```
## substitute(x * (y + z), list(x = 42))
```

Then, to do the actual substitution, you need actually to evaluate it. If you actually want to get the value of the modified expression, you need to evaluate *that*. So the whole thing can be done like this:

```
expr2 <- substitute(substitute(expr, list(x = 42)), list(expr = expr))
expr3 <- eval(expr2)
expr3
```

```
## 42 * (y + z)
```

`eval(expr3)`

```
## [1] 168
```

To make `substitute`

just a little bit more confusingâ€”because why notâ€”it works slightly different inside functions. In the global scope, we do not substitute variables with their values.

```
x <- quote(x + x)
substitute(x)
```

```
## x
```

Inside functions, we do

```
f <- function(x) {
substitute(x)
}
f(x + x)
```

```
## x + x
```

We only substitute known variables and only inside functions.

`substitute(x * y)`

```
## x * y
```

```
g <- function(x) {
substitute(x * y)
}
g(x + x)
```

```
## (x + x) * y
```

```
h <- function(x) {
y <- 42
substitute(x * y)
}
h(x + x)
```

```
## (x + x) * 42
```

You can use `substitute`

this way to translate input into an expression. Function arguments are passed as something called *promises*â€”among other things they are expressions that are not yet evaluatedâ€”so you can capture them and get the quoted expressions.

Capturing parameters and treating them as quoted expressions is at the heart of non-standard evaluation. If you do not capture parameters, they are evaluated where they are used, and you get the standard evaluation. If you capture themâ€”quote themâ€”and do something with them that doesn’t involve immediately evaluating them, then you are doing something non-standard. Well, it is called non-standard, but it is used all over R, so it is a little bit standardâ€¦

Be careful when you call non-standard evaluation functions from other functions. When you capture an input variable, you get the exact expression the function was called with. You don’t get the expression it potentially refers to.

`f(2 + 2)`

```
## 2 + 2
```

```
g <- function(x) f(x)
g(2 + 2)
```

```
## x
```

There are ways to fix thisâ€”R is very flexible, and you can always come up with different ways to solve a problemâ€”but the easiest fix is just not to do this. You can write functions that take quoted expressions as input and manipulate or evaluate them, and keep the non-standard evaluation to functions you expect not to be called by other functions. Non-standard evaluation is excellent for writing domain-specific languages, but you don’t want to combine functions that use it in other ways.

```
f_ <- function(exp) {
substitute(x * x, list(x = exp))
}
f <- function(x) f_(substitute(x))
f(2 + 2)
```

```
## (2 + 2) * (2 + 2)
```

```
g <- function(x) eval(f_(substitute(x)))
g(2 + 2)
```

```
## [1] 16
```

Substitute is not the only way to manipulate expressions;
many ways to do this that all do it in slightly different ways. (My foolbox does it for rewriting functions). But that must be a topic for another day. Now I want to get to the *evaluation* part of non-standard evaluation.

When you have a quoted expression, you can evaluate it. Writing

`x + y`

is essentially the same as writing

`eval(quote(x + y))`

For example

```
x <- 2; y <- 6
x + y
```

```
## [1] 8
```

`eval(quote(x + y))`

```
## [1] 8
```

This is also the same as

`eval(x + y)`

```
## [1] 8
```

but don’t confuse the two. When you write `eval(x + y)`

you give `eval`

a value. Well, strictly speaking, you give it a promise, but it amounts to a value because `expr`

doesn’t capture it and make it into a quoted expression.

When you use `eval(x + y)`

the expression `x + y`

is, conceptually, evaluated before it is passed to `eval`

. If you change `x`

or `y`

, it doesn’t change that value. If you create the quoted expression `quote(x + y)`

it *isn’t* evaluated before you explicitly do so, and if you change `x`

or `y`

before you evaluate the expression, that affects what the expression evaluates to.

```
val <- x + y
ex <- quote(x + y)
x <- 2 * x
eval(val)
```

```
## [1] 8
```

`eval(ex)`

```
## [1] 10
```

If you just remember that values are expressions too, this shouldn’t cause you any problems ever. If you do not quote an expression, you get its value, if you do, you get an expression. If you give `eval`

a value, you get it back, if you give it a quoted expression, that expression is evaluated.

When we run the code

`val <- x + y`

we evaluate `x + y`

before we assign it to `val`

, and when we run `eval(val)`

we get it back. With

`ex <- quote(x + y)`

we create an expression that we haven’t evaluated yet.

With functions, it gets a little more complicated. Arguments to functions are not evaluated right away so you can get the corresponding expressions using `substitute`

. It is because function arguments are not “passed by value” that this is possible, but if you evaluate an argument it doesn’t matter much. Using `substitute`

is just a way of getting a quoted expression without requiring that the caller gives you one explicitly.

Ok, so why is quoting and evaluating that interesting? Two reasons, one is that you can manipulate an expression before you evaluate itâ€”if you are into that kind of thingsâ€”and second, you can change the context an expression is evaluated in.

Consider this example:

```
x <- rnorm(5) ; y <- rnorm(5)
d <- data.frame(x = rnorm(5), y = rnorm(5))
lm(y ~ x)
```

```
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 0.06635 -0.42819
```

`lm(y ~ x, data = d)`

```
##
## Call:
## lm(formula = y ~ x, data = d)
##
## Coefficients:
## (Intercept) x
## 0.12088 -0.09828
```

We give the `lm`

function the same formula to work with, but when we give it a data frame, then `x`

and `y`

refer to values there, while otherwise, they refer to the global variables. The values that `x`

and `y`

refers to change when we give `lm`

a data frame; it seems it is evaluated in a different context when we give it the data frame.

We can do the same thing. We can make an expression and evaluate it in two different contexts:

```
ex <- quote(x + y)
eval(ex)
```

```
## [1] -0.02098503 1.15938729 0.70591802 0.85950330 -0.28848018
```

`eval(ex, d)`

```
## [1] -0.6408889 1.9256321 1.5547519 0.7177998 -0.1451555
```

When we give `eval`

the data frame `d`

, it will find `x`

and `y`

in there; when we do not give it `d`

it finds them in the global scope.

Do not forget to use a quoted expression if you want this to work. If you just give `eval`

an expressionâ€”without quoting itâ€”you get the value of that expression. Changing the context in which it is evaluated does nothing.

`eval(x + y)`

```
## [1] -0.02098503 1.15938729 0.70591802 0.85950330 -0.28848018
```

`eval(x + y, d)`

```
## [1] -0.02098503 1.15938729 0.70591802 0.85950330 -0.28848018
```

This, again, has to do with how parameters are passed as promises. Promises are not *just* expressions. For the rest of this post, just think of function arguments as passed by value *unless* we get the expressions using `substitute`

. It isn’t exactly correct, but I have to leave promises for another post. I promise.

## Scopes and how to modify them

If you do not do anything special, then `eval`

evaluates its input in the scope it is in. If you give it a value, it doesn’t matter what scope it is inâ€”a value evaluates to a valueâ€”but for quoted expressions it does.

Consider this:

```
f <- function(ex) {
x <- 2
eval(ex)
}
x <- 4 ; y <- 2 ; z <- 1
f(quote(x))
```

```
## [1] 2
```

`f(quote(y))`

```
## [1] 2
```

`f(quote(z))`

```
## [1] 1
```

Inside `f`

we set the variable `x`

and evaluate the input. The input is evaluated in the scope of the function. This means that `quote(x)`

is evaluated in a context where `x`

refers to 2, not 4. The other expressions are also evaluated inside the scope of the function, but the variables are the global variables. The scope of `f`

is enclosed by the global scope.

If you don’t quote the input, you get the value in the global scope.

`f(x)`

```
## [1] 4
```

If you use substitute inside `f`

, you get a quoted expression, and then it works as before, except that you do not need to quote the argument to f explicitly:

```
f <- function(x) {
ex <- substitute(x)
x <- 2
eval(ex)
}
f(x)
```

```
## [1] 2
```

`f(y)`

```
## [1] 2
```

`f(z)`

```
## [1] 1
```

The scoping rules in R are surprisingly simple, but you can manipulate scopes, and that can make it difficult to see how they interact with quoted expressions. The rules are simple, but so are the rules of chess. The combinations can be complex.

### Environments

Scopes, in R, are implemented through so-called *environments*. These are essentially tables that map from variables to values. Whenever you evaluate an expression, you do so in an environment. R uses this environment to get the values that correspond to the variables in the expression.

Now, an environment doesn’t necessarily know what all the variables in the expression refer to. If it doesn’t, R has to look elsewhere for it. The way this works is that all environments have a *parent*; R looks in the parent of an environment if it doesn’t find the variable it is looking for. It keeps searching up the chain of parents until it either finds the variable or runs out of parents.

In the following, I am going to assume that the global environment doesn’t have a parent. This isn’t true, but what actually happens above the global variable is a bit complicated and has to do with package namespaces. It has nothing to do with the examples I am going to show you. In all the examples, all environment chains end in the global environment. They do not have to, chains can end in the *empty environment*, but that isn’t important for the examples either.

Ok, assume that we have a global environment with nothing in it.

```
rm(list=ls())
ls()
```

```
## character(0)
```

If we assign to variables `x`

and `y`

`x <- 2; y <- 3`

it will contain those two variables:

`ls()`

```
## [1] "x" "y"
```

When you call `ls()`

without arguments, you get the list of variables in the environment you are currently in. In this case, the global environment. You can also explicitly give `ls()`

an environment. You get the global environment by calling the function `globalenv()`

:

`ls(globalenv())`

```
## [1] "x" "y"
```

Now, let’s define a function:

`f <- function(x) x + y`

We are really doing two things in this statement. We create a function

`function(x) x + y`

and we assign it to the variable `f`

. The effect of the assignment is simple; we add `f`

to the (global) environment and make it point to the function. The function definition is a bit more involved.

We don’t need to know all the gory details of how functions work, but we need to know that functions have associated environments. You can get these by calling the function `environment()`

on them.

`environment(f)`

```
## <environment: R_GlobalEnv>
```

You can change this environment if you want to, but by default, it is the environment in which you defined the function. In this case, the global environment.

The function environment is *not* a new environment, however. You do not create an environment when you define a function. You just associate an environment with it. This environment is used as the parent of environments you create when you *call* a function.

If you call a function, e.g. call

`f(5)`

you *will* create a new environment. You put function arguments and local variables into this environment, and the parent of the environment is the environment associated with the function.

When you evaluate the `x + y`

in the body of `f`

, R will first check the function instance environment, where it will find `x`

mapped to five, and then search up the parent chain for `y`

, which it will find in the global environment mapped to three.

The combination of associating environments to functions and making these the parents of function-call environment enables you to get closures.

Get rid of `x`

and `y`

and define a function that nests another function

```
rm("x", "y")
f <- function(x) {
g <- function(y) x + y
g
}
```

After we have defined the function and assigned it to `f`

, the global environment maps `f`

to the function and the function’s environment is the global variable.

It doesn’t matter what happens inside the function body, because we haven’t evaluated it yet. There is no inner function yet; it doesn’t exist before we evaluate `f`

.

Now, let us call `f`

with argument 2. This creates an instance-environment and set its parent to the global environment (the environment of function f). Inside this environment, we store the argument `x`

and the local function `g`

. The environment of `g`

is the instance environment of the `f(2)`

call.

When you return from the call to `f`

and assign the result to `h`

, you have this setup:

You do not have a direct handle on the instance-environment from the call to `f`

, but it exists as the parent of the function environment of `h`

.

If you then call `h(3)`

you create another instance environment in which you map `y`

to three. The parent environment is the function `h`

’s environmentâ€”we are applying the same rule as before about instance environments and functionsâ€”so the state of the system in the call is this:

To evaluate `x + y`

inside the `h(3)`

call, we have this chain of environments:

When we look for `y`

, we can find it directly in the current environment. To find `x`

we need to look in its parent. If we wanted `f`

, `g`

, and `h`

we would find them in the chain as well.

If you have done any functional programming in R, this behaviour shouldn’t surprise you. It is lexical scope, and pretty much all modern languages have similar rules.

When we have quoted expressions added to the mix, we do not have to put the expression inside the nested function’s body. We can make it take an expression and evaluate it in its body. This separates where we define the (quoted) expression from where we evaluate it:

```
f <- function(x) function(y) function(ex) eval(substitute(ex))
g <- f(2)
h <- g(3)
h(x + y)
```

```
## [1] 5
```

`h(x * y)`

```
## [1] 6
```

Here, I use `substitute`

to translate an argument into an expression. If you want to quote the arguments explicitly that is even simpler; I leave that as an exercise.

The expression we give the final nested function is evaluated in the scope of the innermost function. But how does `eval`

know about that environment?

The parent environment of the `eval`

call is the environment where we defined the `eval`

function, which is the global environment.^{2} That environment doesn’t know anything about the variables we captured in the different closures when we constructed the function `h`

.

When we call `eval`

, we do not want it to evaluate its input in its own environment; rather, we want it to evaluate it in the caller’s environment.

Traditionally, we call function instances on the call stack frames. Since instance environment can survive longer than function calls, we are not really working with proper frames, but the terminology persists. We want to evaluate the expression we give `eval`

in the caller’s frame, not in `eval`

’s environment.

Well, `eval`

is implemented using magic, and this is exactly what it does; it evaluates its input in the caller’s frame.^{3}

We cannot write magical functions, but we can use `eval`

to get the same effect. We can give `eval`

a second argument, and it will use that as the environment to evaluate the expression in. We can get the caller’s environment, rather than our own, through the function `parent.frame`

. This is an unusually poor choice of function name because the caller’s frame has nothing to do with parent environments, but that is what it is called. The function `rlang::caller_env`

does the same, and it has a more sensible name.

Consider this setup:

```
lexical <- function(x) function(ex) eval(ex)
dynamic <- function(x) function(ex) eval(ex, parent.frame())
lex_closure <- lexical(1)
dyn_closure <- dynamic(1)
```

Here, we have two functions that create closures, and we create one from each. I have called them `lexical`

and `dynamic`

because they implement lexical and dynamic scoping, respectively. Don’t worry about the terminology if it is unfamiliar. All sensible languages use lexical scopingâ€”it is much easier to reason aboutâ€”but dynamic scoping is something we can exploit in a non-standard evaluation. You will see the difference between the two in just a second.

After creation `lex_closure`

and `dyn_closures`

, the setup looks very similar.

Both closures have an environment that contains a value for `x`

â€”in both cases that value is 1â€”and both environments have the global environment as the parent. The only difference in the two closures is the function body we execute when we call them.

Try this:

```
caller <- function(x, f) f(quote(x))
caller(2, lex_closure)
```

```
## [1] 1
```

`caller(2, dyn_closure)`

```
## [1] 2
```

The two calls to `caller`

look the same. You get this graph of function instances with parent environments (in brown) and caller frames (in blue) regardless of whether you use the lex closure or the dyn closure.

The difference between the two calls is the path that R takes to search for `x`

. In the lexical scoping closures, `eval`

will look in its caller frame, which is the `lex_closure`

instance, and then search up the parent chain there.

With `dyn_closure`

we connect `eval`

with `dyn_closure`

’s calling frame, so we skip `dyn_closure`

’s instance environment entirely and search in `caller`

instead, where we find `x`

.

The figures are getting a bit crowded by now, but I am sure you can work your way through them, with little effort. They capture most of what you need to know about scopes and evaluation: how environments are chained together by the lexical scope and how function call-stacks are chained together through caller frames.

When you evaluate an expression, you always search for variable-value bindings in the chain of environment parents. Non-standard evaluation is all about starting the search somewhere you do not usually start it or constructing alternative parent-chains than those you would typically use.

There’s a bit more to the story. Functions are not the only objects that have environments attached to them. Promises and formulas and quosures (which are really formulas) have as well. There are also environment rules for packages, but to understand the actual language that is not so essential. There is also more to how `eval`

finds valuesâ€”it doesn’t just let you choose an alternative environment path, it also lets you find values in lists and data frames. All that must wait for another post; this one is getting long enough as it is.

In any case, to understand the tweets that I wrote a few days ago, the explanation above will suffice. I will repeat the examples from my tweets below. They use `wrapr::let`

to substitute variables in expressions with values. This function works slightly different from `substitute`

, so there are some differences, but you should easily be able to understand the examples. Maybe I will try to rewrite them using `substitute`

in a later post.

## Examples using wrapr::let

Ok, for the examples I imagined that I had some data frames, that I want to fit a linear model on the data, and that I then want to extract some property from the fitted model. The variables in the model depend on the data frame, though, and I might want to extract different properties.

So, I have this setup:

```
# Data:
d1 <- data.frame(x = rnorm(5), y = rnorm(5))
d2 <- data.frame(a = rnorm(5), b = rnorm(5))
# Not unreasonable usage:
summary(lm(y ~ x, data = d1))[["residuals"]]
```

```
## 1 2 3 4 5
## 0.04477841 -0.54955468 -0.68726607 -0.72583989 1.91788224
```

`summary(lm(b ~ a, data = d2))[["residuals"]]`

```
## 1 2 3 4 5
## 0.2206216 -0.9706430 -0.1337413 1.3052772 -0.4215145
```

The way formulas work in R, you can combine variables from data frames with variables known where you created the formula. In this case the global environment. So we can mix global variables with variables from the data frames like this:

```
xx <- rnorm(5)
summary(lm(y ~ xx, data = d1))[["residuals"]]
```

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

`summary(lm(b ~ xx, data = d2))[["residuals"]]`

```
## 1 2 3 4 5
## -0.01405177 -0.73169414 -0.49131556 1.47527203 -0.23821056
```

Now, I might want to parameterise these expressions in a function that looks like this:

```
lm_prop <- function(x, y, d, prop) {
m <- lm(y ~ x, data = d)
summary(m)[[prop]]
}
lm_prop(x, y, d1, "residuals")
```

```
## 1 2 3 4 5
## 0.04477841 -0.54955468 -0.68726607 -0.72583989 1.91788224
```

Great, this looks like it is working, but that is an unfortunate coincidence. It works because the data frame `d1`

contains variables `x`

and `y`

, and those happen to be the ones we want to fit. In the `lm(y ~ x, data = d)`

call, however, the variables are quoted, so to speak. The formula is `y ~ x`

, and we do not substitute `x`

and `y`

with the arguments to `lm_prop`

.

Because of this, we get an error if we try to fit a model from `d2`

:

`lm_prop(a, b, d2, "residuals")`

```
## Error in eval(predvars, data, env): objekt 'b' blev ikke fundet
```

To get this to work, we need to substitute arguments into the model-fitting.

The `wrapr::let`

function works that way. It lets us substitute variables, and after the substitution, we can evaluate the expression. If the parameter `eval`

is `TRUE`

the expression will be evaluated; if it is `FALSE`

, it will not.

We can define our `lm_prop`

function like this:

```
library(wrapr)
lm_prop2 <- function(x, y, d, prop, eval) {
let(c(x = x, y = y, prop = prop),
summary(lm(y ~ x, data = d))$prop,
eval = eval)
}
```

We can check that it works:

```
# Data frame variables
lm_prop2("x", "y", d1, "residuals", eval = FALSE)
```

```
## summary(lm(y ~ x, data = d))$residuals
```

`summary(lm(y ~ x, data = d1))$residuals`

```
## 1 2 3 4 5
## 0.04477841 -0.54955468 -0.68726607 -0.72583989 1.91788224
```

`lm_prop2("x", "y", d1, "residuals", eval = TRUE)`

```
## 1 2 3 4 5
## 0.04477841 -0.54955468 -0.68726607 -0.72583989 1.91788224
```

```
# Mixed variables
lm_prop2("xx", "y", d1, "residuals", eval = FALSE)
```

```
## summary(lm(y ~ xx, data = d))$residuals
```

`lm_prop2("xx", "y", d1, "residuals", eval = TRUE)`

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

`summary(lm(y ~ xx, data = d1))$residuals`

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

```
# Data frame variables
lm_prop2("a", "b", d2, "residuals", eval = FALSE)
```

```
## summary(lm(b ~ a, data = d))$residuals
```

`summary(lm(b ~ a, data = d2))$residuals`

```
## 1 2 3 4 5
## 0.2206216 -0.9706430 -0.1337413 1.3052772 -0.4215145
```

`lm_prop2("a", "b", d2, "residuals", eval = TRUE)`

```
## 1 2 3 4 5
## 0.2206216 -0.9706430 -0.1337413 1.3052772 -0.4215145
```

```
# Mixed variables
lm_prop2("xx", "b", d2, "residuals", eval = FALSE)
```

```
## summary(lm(b ~ xx, data = d))$residuals
```

`summary(lm(b ~ xx, data = d2))$residuals`

```
## 1 2 3 4 5
## -0.01405177 -0.73169414 -0.49131556 1.47527203 -0.23821056
```

`lm_prop2("xx", "b", d2, "residuals", eval = TRUE)`

```
## 1 2 3 4 5
## -0.01405177 -0.73169414 -0.49131556 1.47527203 -0.23821056
```

Great, it looks like it is working.

Whenever your code looks like it is working, if you are using non-standard evaluation, it is time to figure out why it only *looks* that way.

Surprisingly often, we get fooled by scopesâ€”which is why I spent so much time explaining them above. That is also happening here.

Consider what we are doing when we mix global variables with data frame variables. If a variable can be found in the data frame then we use it; otherwise, we find a global variable. Since `a`

isn’t found in `d1`

, and `x`

isn’t found in `d2`

we should be able to rename `xx.`

`x <- a <- xx`

and still be able to mix a data frame with global variables.

`lm_prop2("a", "y", d1, "residuals", eval = FALSE)`

```
## summary(lm(y ~ a, data = d))$residuals
```

`summary(lm(y ~ a, data = d1))$residuals`

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

`lm_prop2("a", "y", d1, "residuals", eval = TRUE)`

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

Ok, so far it looks fine.

Now try this:

`lm_prop2("x", "b", d2, "residuals", eval = FALSE)`

```
## summary(lm(b ~ x, data = d))$residuals
```

`summary(lm(b ~ x, data = d2))$residuals`

```
## 1 2 3 4 5
## -0.01405177 -0.73169414 -0.49131556 1.47527203 -0.23821056
```

`lm_prop2("x", "b", d2, "residuals", eval = TRUE)`

```
## Error in model.frame.default(formula = b ~ x, data = d, drop.unused.levels = TRUE): variable lengths differ (found for 'x')
```

Wait, what? Why doesn’t it work with `x`

for `d2`

?

The `let`

function does precisely what it is supposed to do with the substitution. The problem is when we evaluate the result.

The setup that works looks like this:

The one that fails looks like this:

They look very similar, almost identically, so why does one work while the other does not?

The function `lm`

also uses a non-standard evaluation. If we give it a data frame, it will look for variables there before it looks in the enclosing scope, i.e. the function that calls `lm`

.

When we evaluate `lm(y ~ a, data = d)`

, we search for `a`

and `y`

like this:

When we evaluate `lm(b ~ x, data = d)`

, we search for `x`

and `b`

like this:

The `x`

we find is the local variable, not the global one. That is why the function call fails.

For this function to work, you cannot have local variables that over-shadows global ones. You can fix the function like this:

```
lm_prop3 <- function(.a, .b, d, .prop, eval) {
let(c(x = .a, y = .b, prop = .prop),
summary(lm(y ~ x, data = d))$.prop,
eval = eval)
}
lm_prop3("a", "y", d1, "residuals", eval = FALSE)
```

```
## summary(lm(y ~ a, data = d))$.prop
```

`summary(lm(y ~ a, data = d1))$residuals`

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

`lm_prop3("a", "y", d1, "residuals", eval = TRUE)`

```
## NULL
```

`lm_prop3("x", "b", d2, "residuals", eval = FALSE)`

```
## summary(lm(b ~ x, data = d))$.prop
```

`summary(lm(b ~ x, data = d2))$residuals`

```
## 1 2 3 4 5
## -0.01405177 -0.73169414 -0.49131556 1.47527203 -0.23821056
```

`lm_prop3("x", "b", d2, "residuals", eval = TRUE)`

```
## NULL
```

This will work, I think.

But what if we had written the function in an almost identical way like this:

```
lm_prop4 <- function(.a, .b, .d, prop, eval) {
let(c(x = .a, y = .b, d = .d, prop = prop),
summary(lm(y ~ x, data = d))$prop,
eval = eval)
}
lm_prop4("a", "y", "d1", "residuals", eval = FALSE)
```

```
## summary(lm(y ~ a, data = d1))$residuals
```

`summary(lm(y ~ a, data = d1))$residuals`

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

`lm_prop4("a", "y", "d1", "residuals", eval = TRUE)`

```
## 1 2 3 4 5
## 0.2662110 -1.4082401 0.1057206 -0.4657714 1.5020799
```

`lm_prop4("x", "b", "d2", "residuals", eval = FALSE)`

```
## summary(lm(b ~ x, data = d2))$residuals
```

`summary(lm(b ~ x, data = d2))$residuals`

```
## 1 2 3 4 5
## -0.01405177 -0.73169414 -0.49131556 1.47527203 -0.23821056
```

`lm_prop4("x", "b", "d2", "residuals", eval = TRUE)`

```
## 1 2 3 4 5
## -0.01405177 -0.73169414 -0.49131556 1.47527203 -0.23821056
```

This looks like it is working. Can you spot where the problem is?

What happens if we call `lm_prop4`

from another function?

```
indirect <- function(xx, yy, eval) {
d <- data.frame(x = xx, y = yy)
lm_prop4("x", "y", "d", "residuals", eval = eval)
}
aa <- rnorm(5) ; bb <- rnorm(5)
summary(lm(bb ~ aa))$residuals
```

```
## 1 2 3 4 5
## -0.3157049 -0.1431962 0.7170060 0.2178049 -0.4759099
```

`indirect(aa, bb, eval = FALSE)`

```
## summary(lm(y ~ x, data = d))$residuals
```

`indirect(aa, bb, eval = TRUE)`

```
## Error in is.data.frame(data): objekt 'd' blev ikke fundet
```

It cannot find `d`

because that name only exists in the call-environment of `indirect(aa, bb, eval = TRUE)`

.

The search for `d`

goes as follows:

We do not follow the “caller frame” reference from `lm_prob4`

to `indirect`

. We do get an error; we would be worse off if the global scope had a data frame called `d`

. Then we might not get an error but instead just get the wrong result. That can be hard to debug.

If you want to fix the example, you can easily do so. You can give the `let`

function an environment in which to evaluate the expression. If you do that, however, you cannot see the local variables in the function that calls `let`

; it will look in the environment you give it instead of the caller frame.

It isn’t hard to change the current environment’s parent, and you can use that to chain the environment inside `lm_prop4`

and its caller, but then you lose the enclosing scope of `lm_prop4`

â€”that scope is found through the parent chain, and you have just changed that.

It is not entirely impossible to get values both from the enclosing scopeâ€”through the parent pointerâ€”as well as the calling scope. There are more environments around than those I have described here, and you can exploit that. I might explain that in another post. I explain it, in some detail, in my domain-specific language book, so you can always check it out there.

The take-home message of this post is not to pick on `wrapr::let`

. It does what it says it does. I just think it is problematic with quick fixes for non-standard evaluation. If it looks easy, you might think it is. Worse, you might attempt to use it.

If you do not explicitly worry about scopes, non-standard evaluation can get very complicated very quickly. If you are lucky, you will get crash-errors, but you can also very easily just get the wrong result. If you do not spot that, you can be in a lot of trouble.

Non-standard evaluation is a powerful tool, but powerful tools should be handled with care.

If you liked what you read, and want more like it, consider supporting me at Patreon.

- I don’t like the term macro here, because to me it sounds like something that would happen at compile time (which in R would be when you translate code into bytecode). This doesn’t happen. There is no way (that I am aware of) to implement your own syntactic sugar that doesn’t involve evaluating code at runtime.
^{[return]} - Actually,
`eval`

doesn’t live in the global environment, but as I said earlier, we simplify the discussion by pretending that the global environment is the root of all the environments. It doesn’t matter that it isn’t for the points I want to make in this post.^{[return]} - When I say magic, I am not far from the truth. The
`eval`

function uses a function that is built into the R runtime system, and that means that it can do more than we can do with normal R functions. We cannot implement`eval`

as a normal R function, because we need the functionality that`eval`

provides to do that. Some magic is needed. Once we have it, though, we can exploit the hell out of it.^{[return]}