In the example that motivated this post, I only cared that A was linked with B in my data, and if B is linked with A, that's great, but it does not make A and B any more related. In other words, the order of the link didn't matter.
In this case, you'll see that our stylized example has duplicates -- id1 = "A" and id2 = "B" is the same as id1="B" and id2 = "A" for this purpose. What's a simple way to get a unique identifier? There's an apply command for that!
Thinking of each row of the identifier data as a vector, we could alphabetize (using sort(), so c("B", "A") becomes c("A", "B")), and then paste the the resulting vector together into one identifier (paste, using collapse). I call our worker function idmaker():
idmaker = function(vec){return(paste(sort(vec), collapse=""))}
Then, all we need to do is use the apply command to apply this function to the rows of the data, returning a vector of results. Here's how my output looks.
To get a data frame of unique links, all we need to do is cbind() the resulting vector of indices to the original data frame (and strip the duplicates). Here's some code:
co_id = apply(as.matrix(df[, c("id1", "id2")]), 1, idmaker)
df = cbind(df, co_id)
df = df[!duplicated(df[,"co_id"]),]
Here is the resulting data frame with only unique pairs.