Monday, June 20, 2011

Today's Assignment: Assignment

A new R user quickly discovers that there are multiple ways to store information into an object -- the technical term for this is assignment. There's = as in:

x = c(1,2,3)

and there's <- as in:

x <- c(1,2,3)

R help on assignOps offers this explanation of the difference:
The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.
For catching subtle coding errors with =, this is a smart feature of R. When I am tired, I sometimes find myself writing a line of code that looks like this:

if(codename = "thedoctor") watch = "yes"

In this line of code, the problem is that codename = "thedoctor" isn't a logical statement. It is a statement that would assign the string "thedoctor" to an object called codename.... if the statement were allowed by R's language. Fortunately, R prevents this.

This feature of the R programming language helps me because I -- by default -- use = for assignment rather than <-. My style is not conventional, however. Popular R coding style guides (Google, 4D pie charts) suggest that users use <- option for assignment, rather than =.

Deep down, I think I use this unconventional default because I had the = versus == distinction (more generally, assignment versus logical statement) drilled into me in my first computer science class (not in R; Java). On reflection, I have three reasons to use = rather than <-. First, = is easier to type, using one keystroke and no shift keys. Second, the equals style looks cleaner to me (completely a personal preference). Third, I think assignment should take place out in the open (i.e., exactly where = is allowed). If the line of code is out in the open, both <- and = work the same.

On a final note, the <- is not without its problems for logical statements. For example, the statement

if(agediff<-2) status = "I'm older"


will cause problems and it won't let you know about these problems until you get some really strange results later on. Here's the code from an interactive R session where I show what can go wrong.

> agediff = -3
> if(agediff<-2) status = "I'm older"
> status
[1] "I'm older"
> agediff
[1] 2
> agediff = 4
> if(agediff<-2) status = "I'm older"
> status
[1] "I'm older"
> agediff
[1] 2
> status=NULL
> if(agediff<-2) status = "I'm older"
> status
[1] "I'm older"
> agediff
[1] 2


The problem is two-fold in this code. First, the if statement isn't acting as a logical statement splitting by whether "agediff< -2" Second, the value of agediff is changed because "<-" means assignment. The result is that both status and agediff have (potentially) the wrong values at the end of running the code.

As this error is a problem whether you use = or <- for assignment, it is worth knowing regardless of your style preference. Adhering to good spacing guidelines is a good way to avoid this error.

> agediff = 4
> if(agediff < -2) status = "I'm older"
> status
NULL
> agediff
[1] 4


There are some other examples in the Stack Overflow forum discussion on this topic (actually multiple threads on this), all of which are useful. It is interesting reading to be sure.

1 comment:

  1. Nice post. I suspect that the vast majority of R users don't know the difference between "=" and "<-", and I suppose they don't really need to worry about it. Just to do a bit of self-publicity, I wrote a similar post:

    http://csgillespie.wordpress.com/2010/11/16/assignment-operators-in-r-vs/

    that nicely complements your post.

    ReplyDelete