Friday, May 20, 2011

Defaults, Lists and Classes: A Functional Post

In this post, I demonstrate a couple of useful tricks to writing functions in R. The context is a function I wrote called called samp() that allows for an easy demonstration of sampling distribution properties.



Defaults

By default, this function draws K = 500 samples of size N from a normal distribution with a mean of mu = 0 and a standard deviation of std = 1. As you can see, the syntax for specifying a default for an argument to the function is to set it equal to its default value inside the function statement. Arguments can be given no default by naming them as an argument to the function without specifying their default value.

Sensibly chosen defaults can save a lot of time for people who use your function. For example, although this samp() function has four arguments, I don't need to type samp(100, 0, 1, 500) every time I wish to take 500 samples of size N=100 from a Normal(0,1) random variable. If sampling from a Normal(0,1) is especially common, setting the defaults this way can save a lot of keystrokes.

On the other hand, using defaults instead of hard coding the Normal(0,1) choice has the distinct advantage that users get flexibility from your function giving them the option to specify arguments other than the default. The right use of defaults strikes the perfect balance between ease of use and flexibility.

Returning a List... with Names

If your function warrants the special treatment, you have probably performed several useful calculations that you would like to return. Maybe you have conducted several related hypothesis tests that you would like to reference later (report and/or use in another calculation). A great way to do this is to store the results of your function into a list.

For example, in the samp() function, I wanted to store vectors of the sample means, sample standard deviations and standard errors (500, one for each replication). For good measure, I wrote the function to store the means and standard deviations of these vectors as well as the true parameter values that went into the function.

Although it is often a good idea to store the results in a list, it is a better idea to name the elements of the list for easy extraction. If I ran the command

mysamp = samp(100)

The object mysamp would contain a list of the results from the function samp(). Suppose I want to extract the vector of means (for example, to compute a histogram). Without running the names command, I would have to remember that the vector of means was in the first position in the list and use the command

hist(mysamp[1])

to plot a histogram of the means vector from my sampling object. As it is easy to forget the precise order in which the elements of a function are stored, this syntax can lead to too much thinking. It is better to name the first list element something like means. Naming the list elements allows the user to use the $ extractor on objects created with the function. That is, the syntax becomes

hist(mysamp$means)

which is much easier to remember and easier to read. This latter fact is often unappreciated, but if you get in the habit of naming elements in a list, you will be a better collaborative coder. Even if you're not into collaboration, when you return to that project after two weeks of doing something else, it is much easier to remember where you left off.

A Benefit of Keeping It Classy

Another coding practice that often goes unappreciated is the use of classes to make your life easier (and your console cleaner). At this point, if you type mysamp into your console window, R will bombard you with more than a screen's worth of output. Most of this is output that I don't want to see printed out. I might want to save it for later, but I don't want to save it.

This is where the class(result) = "samp" command comes in handy. By "classing" my object this way and writing a short a print method for this class (named print.samp()), I can cut down on the amount of output I have to see when I inspect my object. Here's code to define a print method.



Now, try typing mysamp into R. You will see much less output (only the output from print.samp), but the object mysamp still has all of the information you want to keep around. If you don't believe me, just type mysamp$means to be sure.

Just as we could "class and print," we can also "class and plot." This can be handy if you want there to be a standard meaning for the command plot(mysamp). In our working example, let's make this command mean plot three histograms, one for means, one for standard deviations and one for standard errors. Here's some code that imposes this meaning on objects with class "samp"



In case you were wondering what this plotting out looks like, here is the picture that is produced from the command plot(mysamp).

No comments:

Post a Comment