tag:blogger.com,1999:blog-8269737761286917895.post6567049723377168910..comments2024-02-19T23:07:54.532-08:00Comments on Coffee and Econometrics in the Morning: Avoiding Loops in R: An Example with Principal MinorsTony Cooksonhttp://www.blogger.com/profile/12565713889808330198noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-8269737761286917895.post-24703852385372145632011-07-19T16:12:20.793-07:002011-07-19T16:12:20.793-07:00On my system, this one-liner is a bit quicker:
&g...On my system, this one-liner is a bit quicker:<br /><br />> system.time(replicate(N, {<br />+ minors <- combn(nrow(mat), 2, function(x) {det(mat[x, x])})<br />+ }))<br /> user system elapsed <br /> 35.758 0.000 35.763 <br /><br />And here are the results of Berend's Method 2:<br /><br />> system.time(replicate(N, {<br />+ index_vec <- t(combn(nrow(mat),2))<br />+ minors.2 <- apply(index_vec, MARGIN=1, FUN=function(ix){minor2(mat,ix)})<br />+ }))<br /> user system elapsed <br /> 41.755 0.024 41.810 <br />>Jasonhttp://leftcensored.skepsi.net/noreply@blogger.comtag:blogger.com,1999:blog-8269737761286917895.post-24358344664266123132011-07-18T22:08:32.039-07:002011-07-18T22:08:32.039-07:00Some more tests.
I have tested with the following ...Some more tests.<br />I have tested with the following 4 methods<br /><br />Method 1.<br />----------------------------<br /><br />minors.1 <- NULL<br />system.time(replicate(N, {<br />for(i in 1:(nrow(mat)-1)){<br /> start <- i + 1<br /> for(j in start:nrow(mat)){<br /> thisminor <- det(mat[c(i,j),c(i,j)])<br /> minors.1 <- c(minors.1, thisminor)<br /> }<br />}<br />}))<br /><br />Method 2.<br />----------------------------<br /><br />minors.1 <- numeric(nrow(mat)*nrow(mat)/2+nrow(mat)) #max<br />system.time(replicate(N, { <br />k <- 0 <br />for(i in 1:(nrow(mat)-1)){<br /> start <- i + 1<br /> for(j in start:nrow(mat)){<br /> thisminor <- det(mat[c(i,j),c(i,j)]) <br /> k <- k+1<br /> minors.1[k] <- thisminor<br /> }<br />}<br />}))<br /><br />Method 3.<br />----------------------------<br /><br />system.time(replicate(N, {<br />index_vec <- t(combn(nrow(mat),2))<br />minors.2 <- apply(index_vec, MARGIN=1, FUN=function(ix){ minor2(mat,ix)})<br />}))<br /><br />Method 4.<br />----------------------------<br /><br />index_vec <- t(combn(nrow(mat),2))<br />system.time(replicate(N, {<br />minors.2 <- apply(index_vec, MARGIN=1, FUN=function(ix){ minor2(mat,ix)})<br />}))<br /><br />Timing results<br />------------------<br /><br />N=25 nrow(mat)=200<br /><br /> Elapsed time<br />1. 55.862<br />2. 21.816<br />3. 33.192<br />4. 25.049<br /><br />N=100 nrow(mat)=100<br /><br /> Elapsed time<br />1. 26.007<br />2. 21.968<br />3. 33.569<br />4. 24.758<br /><br />In both these cases the method 2 appears to be the quickest.<br />So I would conclude that the for loop with preallocation of result vector<br />is likely to be the quickest. <br /><br />The R provided function combn is also a bit of a bottleneck as can seen from the timings for method 3 and 4.<br />Executing combn once helps but the for loop with preallocation remains the quickest in the cases presented here.<br /><br />BerendAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-8269737761286917895.post-6811226655283156742011-07-18T14:51:14.815-07:002011-07-18T14:51:14.815-07:00Good point on preallocating the vector (copying sl...Good point on preallocating the vector (copying slows R down quite a bit). <br /><br />One caveat to your suggestion: The size you propose isn't right. There are usually many more minors of order two than rows in the matrix.Tony Cooksonhttps://www.blogger.com/profile/12565713889808330198noreply@blogger.comtag:blogger.com,1999:blog-8269737761286917895.post-5971437195117098212011-07-18T14:33:38.865-07:002011-07-18T14:33:38.865-07:00I think some speed would be gained by preallocatin...I think some speed would be gained by preallocating a vector of size nrow, so that R does not have to copy data by c(...). <br /><br />minors[i] = thisminor<br /><br />By the way, apply has an internal for loop...spicemannoreply@blogger.comtag:blogger.com,1999:blog-8269737761286917895.post-52199991555254679732011-07-18T12:44:48.943-07:002011-07-18T12:44:48.943-07:00Thanks again for another challenging comment. Mayb...Thanks again for another challenging comment. Maybe I should not have been so unequivocal in my language about avoiding for loops. For small problems, they're fine.<br /><br />I have been working with bigger matrices (on the order of 100 or 200 rows. In this setting, apply() saves time. I tested the code you provided (with combinations, rather than combn) as follows:<br /><br />> mat = matrix(rnorm(200*200), nrow = 200)<br />> N = 25<br />> <br />> system.time(replicate(N, <br />+ {<br />+ minors.1 = NULL<br />+ for(i in 1:(nrow(mat)-1)){<br />+ start <- i + 1<br />+ for(j in start:nrow(mat)){<br />+ thisminor <- det(mat[c(i,j),c(i,j)])<br />+ minors.1 <- c(minors.1, thisminor)<br />+ }<br />+ }<br />+ }))<br /> user system elapsed <br /> 30.75 0.03 30.79 <br />> <br />> <br />> minor2 = function(mat, idx){ <br />+ prin = mat[idx,idx]<br />+ return(det(prin))<br />+ }<br />> <br />> system.time(replicate(N, {<br />+ index_vec <- combinations(nrow(mat), 2)<br />+ minors.2 <- apply(index_vec, MARGIN=1, FUN=function(ix){ minor2(mat,ix)})<br />+ }))<br /> user system elapsed <br /> 21.64 0.00 21.67<br /><br />Conclusion: apply() scales better than the nested for loop. But, you're right. For smaller matrices, the for loop will be the faster option.Tony Cooksonhttps://www.blogger.com/profile/12565713889808330198noreply@blogger.comtag:blogger.com,1999:blog-8269737761286917895.post-59366668310963750072011-07-18T12:20:03.218-07:002011-07-18T12:20:03.218-07:00About the speed.
I tested as follows: number of r...About the speed.<br /><br />I tested as follows: number of replications N <- 1000<br /><br />> system.time(replicate(N, {<br />+ for(i in 1:(nrow(mat)-1)){<br />+ start <- i + 1<br />+ for(j in start:nrow(mat)){<br />+ thisminor <- det(mat[c(i,j),c(i,j)])<br />+ minors.1 <- c(minors.1, thisminor)<br />+ }<br />+ }<br />+ }))<br /> user system elapsed <br /> 0.417 0.001 0.421 <br /><br />> system.time(replicate(N, {<br />+ index_vec <- t(combn(nrow(mat),2))<br />+ minors.2 <- apply(index_vec, MARGIN=1, FUN=function(ix){ minor2(mat,ix)})<br />+ }))<br /> user system elapsed <br /> 0.824 0.002 0.829 <br /><br />Even if you move the combn to before the second system.time the for loop is still faster albeit with a much smaller margin.<br /><br />Conclusion: For loop is faster!<br /><br />BerendAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-8269737761286917895.post-7510668827973147222011-07-18T12:19:09.323-07:002011-07-18T12:19:09.323-07:00Thanks for the correction and the tip on combn. W...Thanks for the correction and the tip on combn. With regard to why to use gregmisc, I suppose that it depends on your application. It turns out that gregmisc is faster even without transposing (I tried that too).<br /><br />> system.time(combinations(400,2))<br /> user system elapsed <br /> 0.29 0.00 0.29 <br />> system.time(t(combn(400,2)))<br /> user system elapsed <br /> 1.04 0.00 1.05 <br /><br />But, gregmisc's combinations() function breaks on combinations much larger than this (I picked 1000 choose 2... which broke gregmisc's combinations, but not combn).Tony Cooksonhttps://www.blogger.com/profile/12565713889808330198noreply@blogger.comtag:blogger.com,1999:blog-8269737761286917895.post-16161407110423916482011-07-18T12:10:42.271-07:002011-07-18T12:10:42.271-07:001. first for loop has an error and should read
fo...1. first for loop has an error and should read<br /><br />for(i in 1:(nrow(mat)-1)){<br /><br />2. why use gregmisc? You can use the R provided function combn and then use t(combn(nrow(mat),2)) to get the same result.<br /><br />BerendAnonymousnoreply@blogger.com