Replacing values in one DF with values from another DF by matching #37

mattnuttall00 · 2020-01-27T15:27:14Z

Hi all,

What I am trying to do is quite simple, and I thought I knew how to do it, but it's not working...!

I have two dataframes. Below are reproducible dataframes which demonstrate the example:

df1 <- data.frame(year = rep(c(2007,2008,2009,2010,2011), each=2), comm_name = c("a","b",NA,"d","e",NA,"g","h",NA,"j"), code_com = c(1:10))

df2 <- data.frame(comCode = c(1:10), comm_name = letters[1:10])

All I am trying to do is to replace the NAs in df1$comm_name with the value from df2$comm_name, based on the com_code. i.e. code_com in df1 and comCode in df2 are the same variable, so I want to use those values to match the the correct comm_name from df2 to df1.

I have tried:
df1$comm_name[is.na(df1$comm_name)] <df2$comm_name[match(df1$code_com,df2$comCode)]

and

df1$comm_name <- ifelse(is.na(df1$comm_name), df2$comm_name[match(df2$comCode,df1$code_com)], df1$comm_name)

and

df1$comm_name <- replace(df1$comm_name, is.na(df1$comm_name), df2$comm_name[match(df1$code_com,df2$comCode)])

and

df1 <- df1 %>% mutate(comm_name = ifelse(is.na(df1$comm_name), df2$comm_name[match(df1$code_com,df2$comCode)], comm_name))

and

df1 <- df1 %>% mutate(comm_name = ifelse(is.na(df1$comm_name), replace(comm_name, is.na(comm_name), df2$comm_name[match(df1$code_com,df2$comCode)]), comm_name))

And none of them are doing what I want. Not sure if I am over-complicating it, or if I am just making some silly mistake in my code....but all suggestions welcome!

Thanks

The text was updated successfully, but these errors were encountered:

jejoenje · 2020-01-27T15:52:50Z

Hi Matt,

I think this does want you want it to do?

df1 <- data.frame(year = rep(c(2007,2008,2009,2010,2011), each=2), 
                  comm_name = c("a","b",NA,"d","e",NA,"g","h",NA,"j"), 
                  code_com = c(1:10))

df2 <- data.frame(comCode = c(1:10), comm_name = letters[1:10])

df1$comm_name = df2$comm_name[match(df1$code_com, df2$comCode)]
df1

Basically, match() returns indexes where it found a match between vector A and B, so you need to use the output of that function as an "indexer" for where you want to place the values.
It's a bit odd to get your head around, but have a look at what the output of match(df1$code_com, df2$comCode) is.

Hope that helps...

J

mattnuttall00 · 2020-01-27T16:09:51Z

Thanks @jejoenje. Am I correct in thinking though that
df1$comm_name = df2$comm_name[match(df1$code_com, df2$comCode)]
replaces all of the df1$comm_name values with the values from df2$comm_name? How would I then specify that I only wanted it do do that when df1$comm_name values are NA?

jejoenje · 2020-01-27T16:20:56Z

Ah, yes, it does indeed replace all of them. In the example case, though, that doesn't matter, as your "index" (i.e. df2) has all the necessary values. I assume that for your "real" case, that isn't the case (perhaps only the missing cases)?

The only way I could think of doing this in that case, would be to effectively subset df1 into two, one where df1$comm_Name == NA, and one where df1$comm_Name != NA.
Then do the above match using the former subset only, and re-combine the two subsets. Watch your factor levels (if applicable) in that case, though.

The result, though, for the given example would be identical... it would be far easier to just ensure that your "index" (df2) just has all the cases you're trying to match?
... or is my cold-addled brain missing something?

All else fails: for loop and some ifelse statements. But that would, of course, be cheating. :)

anna-deasey · 2020-01-27T17:16:40Z

I think using a combination of left_join() and coalesce() would do what you want - you get two versions of the col 'comm_name' with the join, then coalesce (merge them together) to get a new col with NA's replaced, and then delete the extra cols you don't need. It didn't work with factors though. Does that help?

df1$comm_name <- as.character(df1$comm_name)
df2$comm_name <- as.character(df2$comm_name)

df3 <- left_join(df1, df2, by = c('code_com' = 'comCode')) %>%
mutate(comm_name = coalesce(comm_name.x, comm_name.y)) %>%
select(-comm_name.x, -comm_name.y)

mattnuttall00 · 2020-01-28T08:39:22Z

Hey @jejoenje and @anna-deasey - both of those methods work. Thanks very much!

mattnuttall00 added question Further information is requested data wrangling labels Jan 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing values in one DF with values from another DF by matching #37

Replacing values in one DF with values from another DF by matching #37

mattnuttall00 commented Jan 27, 2020

jejoenje commented Jan 27, 2020

mattnuttall00 commented Jan 27, 2020

jejoenje commented Jan 27, 2020 •

edited

Loading

anna-deasey commented Jan 27, 2020

mattnuttall00 commented Jan 28, 2020

Replacing values in one DF with values from another DF by matching #37

Replacing values in one DF with values from another DF by matching #37

Comments

mattnuttall00 commented Jan 27, 2020

jejoenje commented Jan 27, 2020

mattnuttall00 commented Jan 27, 2020

jejoenje commented Jan 27, 2020 • edited Loading

anna-deasey commented Jan 27, 2020

mattnuttall00 commented Jan 28, 2020

jejoenje commented Jan 27, 2020 •

edited

Loading