You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.
For example, I have two spreadsheets: one lists all the OR items in the IR. The other lists all the journal articles entered into the last REF.
Assuming there's a unique idenitifer of some kind, e.g. DOI or title, that are the same in each spreadsheet, what's the best way to find out how many REF journal articles have been made OA on the IR?
The text was updated successfully, but these errors were encountered:
There are definitely multiple ways of approaching this problem. I'm going to give an OpenRefine answer here but of course there are definitely other approaches you could take :)
So - a suggested approach:
Create two OpenRefine projects - one from each file. Lets say they are called 'Repository list' and 'Submitted to REF'. Lest assume both have a DOI column.
In the 'Submitted to REF' project do the following:
From the DOI column do 'Edit column->Add column based on this column'
In the expression box delete 'value' and type cell.cross("Repository list","DOI").length()
This is using the 'cross' function which allows you to look up data across two OpenRefine projects
The cell.cross("Repository list","DOI") bit tells it to take the content of the current cell (i.e. the DOI in your "Submitted to REF" project, and see if it can find one or more matches in the DOI column of the "Repository list" project
The result of a 'cross' function is an array of rows from the other project - using length() just tells us how many matching rows it found. You'd usually expect 1 or 0 in this case, but don't underestimate the messiness of data!
Now create a text facet on this new column - all the values >0 mean a match in the repository
For example, I have two spreadsheets: one lists all the OR items in the IR. The other lists all the journal articles entered into the last REF.
Assuming there's a unique idenitifer of some kind, e.g. DOI or title, that are the same in each spreadsheet, what's the best way to find out how many REF journal articles have been made OA on the IR?
The text was updated successfully, but these errors were encountered: