-
Notifications
You must be signed in to change notification settings - Fork 3
/
06-abstracts.Rmd
84 lines (56 loc) · 2.69 KB
/
06-abstracts.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
```{r echo=FALSE}
knitr::opts_chunk$set(
comment = "#>",
warning = FALSE,
message = FALSE
)
```
# Abstracts {#abstracts}
Fetching abstracts likely will come after searching for articles with `ft_search()`. There are a few scenarios in which simply getting abstracts in lieu of full text may be enough.
For example, if you know that a large portion of the articles you want to mine text from are closed access and you don't have access to them, you may have access to the abstracts depending on the publisher.
In addition, there are cases in which you really only need abstracts regardless of whether full text is available or not.
`ft_abstract()` gives you access to the following data sources:
```{r echo=FALSE, results='asis'}
cat(paste(" -", paste(ft_abstract_ls(), collapse = "\n - ")))
```
## Usage {#abstracts-usage}
```{r}
library(fulltext)
```
List data sources available
```{r}
ft_abstract_ls()
```
Search - by default searches against PLOS (Public Library of Science)
```{r}
res <- ft_search(query = "ecology")
(dois <- res$plos$data$id)
```
Take the output of `ft_search()` and pass directly to `ft_abstract()`:
```{r}
out <- ft_abstract(dois)
out
```
The output has slots for each data source:
```{r}
names(out)
```
Index to the data source you want to get data from, here selecting the first item:
```{r}
out$plos[[1]]
```
Which gives a named list, with the DOI as the first element, then the abstract as a single
character string.
You can then take these abstracts and use any number of R packages for text mining.
## By Ids {#abstracts-by-ids}
Instead of using `ft_search()` first, and passing those results to `ft_abstract()`, you can pass article ids (character/numeric) to `ft_abstract()`.
Here, we'll fetch abstracts for three articles from arXiv. With Semantic Scholar we need to prefix the string `arXiv` to the ids (if you use DOIs you don't need to prefix any string).
```{r eval=FALSE}
arxiv_ids <- c("0710.3491", "0804.0713", "0810.4821", "1003.0315")
out <- ft_abstract(x = paste0("arXiv:", arxiv_ids), from = "semanticscholar")
unname(vapply(out$semanticscholar, "[[", "", "abstract"))
```
## Abstracts options {#abstracts-options}
All data sources for `ft_abstract()` accept configuration options as a named list. For example, if you set `from="plos"` you can set additional PLOS specfiic options by passing a named list to `plosopts`. You can find out what PLOS options are available by looking at the documention for `?rplos::searchplos`.
The only data source that doesn't allow configuration options is Semantic Scholar.
As all functions in `fulltext`, you can pass on curl options to each function call or set them globally for the session, see the [curl options chapter](#curl-options).