Just read the two data frames into R
mydata1 = read.csv(path1, header=T)
mydata2 = read.csv(path2, header=T)
Then, merge
myfulldata = merge(mydata1, mydata2)
As long as mydata1 and mydata2 have at least one common column with an identical name (that allows matching observations in mydata1 to observations in mydata2), this will work like a charm. It also takes three lines.
What if I have 20 files with data that I want to match observation-to-observation? Assuming they all have a common column that allows merging, I would still have to read 20 files in (20 lines of code) and merge() works two-by-two... so I could merge the 20 data frames together with 19 merge statements like this:
mytempdata = merge(mydata1, mydata2)
mytempdata = merge(mytempdata, mydata3)
.
.
.
mytempdata = merge(mytempdata, mydata20)
That's tedious. You may be looking for a simpler way. If you are, I wrote a function to solve your woes called multmerge().* Here's the code to define the function:
multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y)}, datalist)
After running the code to define the function, you are all set to use it. The function takes a path. This path should be the name of a folder that contains all of the files you would like to read and merge together and only those files you would like to merge. With this in mind, I have two tips:
- Before you use this function, my suggestion is to create a new folder in a short directory (for example, the path for this folder could be "C://R//mergeme") and save all of the files you would like to merge in that folder.
- In addition, make sure that the column that will do the matching is formatted the same way (and has the same name) in each of the files.
mymergeddata = multmerge("C://R//mergeme")
After running this command, you have a fully merged data frame with all of your variables matched to each other. Of course, most of the details in matching and merging data come down to making sure that the common column is specified correctly, but given that, this function can save you a lot of typing.
*Maybe a function like this exists out there already, but I think it is entertaining to write helpful functions. I also trust the functions more if I write them myself and test them on my own problems.
Great tip! Thanks Tony!
ReplyDeleteThis was so helpful! Self-service to the max! MegaHelpful I might say.
ReplyDeleteI'm a newbie to R and at first blush this function had me so excited because it is exactly what I needed. The problem was I kept getting all manner of error messages when I would try to load it. I finally did a count of all the paren and brackets and found the code lacked a final closing bracket. Now it works like a charm. Thanks Tony!
ReplyDeleteIt doesn't work correctly, though, if there are multiple variables to match in the merge process if some of the values aren't present in all data sets.
ReplyDeletePossibly change last line as follows can solve problem.
Delete# //ori Reduce(function(x,y) {merge(x,y)}, datalist)
datalist = do.call("rbind", datalist)
# // ZHUANSHI HE, 20121119
> R.version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 9.2
year 2009
month 08
day 24
svn rev 49384
language R
version.string R version 2.9.2 (2009-08-24)
This function is a wonderful expansion of the merge function! However, often times I'd like to merge multiple dataframes that I've already created in R (rather than import them all from csv files). I've played around a bit with this, but can't get it to work (I'm pretty new to R). Could you help?
ReplyDeletethanks a lot.
ReplyDeleteThanks.
ReplyDelete