Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

Best way to merge two spreadsheets with similar information? #10

Open
ghost opened this issue Nov 30, 2015 · 1 comment
Open

Best way to merge two spreadsheets with similar information? #10

ghost opened this issue Nov 30, 2015 · 1 comment

Comments

@ghost
Copy link

ghost commented Nov 30, 2015

For example, I have two spreadsheets: one lists all the OR items in the IR. The other lists all the journal articles entered into the last REF.

Assuming there's a unique idenitifer of some kind, e.g. DOI or title, that are the same in each spreadsheet, what's the best way to find out how many REF journal articles have been made OA on the IR?

@ostephens
Copy link
Contributor

There are definitely multiple ways of approaching this problem. I'm going to give an OpenRefine answer here but of course there are definitely other approaches you could take :)

So - a suggested approach:

  • Create two OpenRefine projects - one from each file. Lets say they are called 'Repository list' and 'Submitted to REF'. Lest assume both have a DOI column.
  • In the 'Submitted to REF' project do the following:
    • From the DOI column do 'Edit column->Add column based on this column'
    • In the expression box delete 'value' and type cell.cross("Repository list","DOI").length()
      • This is using the 'cross' function which allows you to look up data across two OpenRefine projects
      • The cell.cross("Repository list","DOI") bit tells it to take the content of the current cell (i.e. the DOI in your "Submitted to REF" project, and see if it can find one or more matches in the DOI column of the "Repository list" project
      • The result of a 'cross' function is an array of rows from the other project - using length() just tells us how many matching rows it found. You'd usually expect 1 or 0 in this case, but don't underestimate the messiness of data!
    • Now create a text facet on this new column - all the values >0 mean a match in the repository

For more on the 'cross' function see this recent email thread on the OpenRefine Google Group https://groups.google.com/forum/#!searchin/openrefine/cross/openrefine/unN6dQsyYUs/-p_7Eb74EAAJ

Feel free to ask for clarifications here!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant