forked from leesulab/arcMS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
153 lines (105 loc) · 5.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# 🏹 arcMS
<!-- badges: start -->
<!-- badges: end -->
`arcMS` can convert HDMS<sup>E</sup> data acquired with Waters UNIFI to tabular format for use in R or Python, with a small filesize when saved on disk. test
Two output data file formats can be obtained:
- the [Apache Parquet](https://parquet.apache.org/) format for minimal filesize and fast access. Two files are produced: one for MS data, one for metadata.
- the [HDF5](https://www.hdfgroup.org/solutions/hdf5/) format with all data and metadata in one file, fast access but larger filesize.
`arcMS` stands for *accessible*, *rapid* and *compact*, and is also based on the french word *arc*, which means *bow,* to emphasize that it is compatible with the [Apache Arrow library](https://arrow.apache.org/).
## :arrow_down: Installation
You can install `arcMS` in R with the following command:
```{r eval=FALSE}
install.packages("pak")
pak::pkg_install("leesulab/arcMS")
```
To use the HDF5 format, the `rhdf5` package needs to be installed:
```{r eval=FALSE}
pak::pkg_install("rhdf5")
```
## 🚀 Usage
First load the package:
```{r eval=FALSE}
library("arcMS")
```
```{r include=FALSE}
library("arcMS")
```
Then create connection parameters to the UNIFI API (retrieve token). See `vignette("api-configuration")` to know how to configure the API and register a client app.
```{r eval=FALSE}
con = create_connection_params(apihosturl = "http://localhost:50034/unifi/v1", identityurl = "http://localhost:50333/identity/connect/token")
```
If `arcMS` and the `R` session are run from another computer than where the UNIFI API is installed, replace `localhost` by the IP address of the UNIFI API.
```{r eval=FALSE}
con = create_connection_params(apihosturl = "http://192.0.2.0:50034/unifi/v1", identityurl = "http://192.0.2.0:50333/identity/connect/token")
```
```{r include=FALSE}
#con = create_connection_params(apihosturl = "http://10.12.3.154:50034/unifi/v1", identityurl = "http://10.12.3.154:50333/identity/connect/token" )
con = create_connection_params(apihosturl = "http://localhost:50034/unifi/v1", identityurl = "http://localhost:50333/identity/connect/token")
```
Now these connection parameters will be used to access the UNIFI folders. The following function will show the list of folders and their IDs (e.g. `abe9c297-821e-4152-854a-17c73c9ff68c` in the example below).
```{r eval=FALSE}
folders = folders_search()
folders
```
```{r echo=FALSE}
folders = folders_search(con)
folders[3:4,]
```
With a folder ID, we can access the list of Analysis items in the folder:
```{r eval=FALSE}
ana = analysis_search("abe9c297-821e-4152-854a-17c73c9ff68c")
ana
```
```{r eval=FALSE, include=FALSE}
ana = analysis_search("abe9c297-821e-4152-854a-17c73c9ff68c")
ana[4:6,]
```
Finally, with an Analysis ID, we can get the list of samples (injections) acquired in this Analysis:
```{r eval=FALSE}
samples = get_samples_list("e236bf99-31cd-44ae-a4e7-74915697df65")
samples
```
```{r eval=FALSE, include=FALSE}
samples = get_samples_list("e236bf99-31cd-44ae-a4e7-74915697df65")
samples[2:5,]
```
Once we get a sample ID, we can use it to download the sample data:
```{r eval=FALSE}
convert_one_sample_data(sample_id = "0134efbf-c75a-411b-842a-4f35e2b76347")
```
This command will get the sample name (`sample_name`) and its parent analysis (`analysis_name`), create a folder named `analysis_name` in the working directory and save the sample data with the name `sample_name.parquet` and its metadata with the name `sample_name-metadata.parquet`.
With an Analysis ID, we can convert and save all samples from the chosen Analysis:
```{r eval=FALSE}
convert_all_samples_data(analysis_id = "e236bf99-31cd-44ae-a4e7-74915697df65")
```
To use the HDF5 format instead of Parquet, the format argument can be used as below:
```{r eval=FALSE}
convert_one_sample_data(sample_id = "0134efbf-c75a-411b-842a-4f35e2b76347", format = "hdf5")
convert_all_samples_data(analysis_id = "e236bf99-31cd-44ae-a4e7-74915697df65", format = "hdf5")
```
This will save the samples data and metadata in the same `file.h5` file.
Parquet or HDF5 files can be opened easily in `R` with the `arrow` or `rhdf5` packages. Parquet files contain both low and high energy spectra (HDMSe), and HDF5 files contain low energy in the "ms1" dataset, high energy in the "ms2" dataset, and metadata in the "metadata" dataset. The `fromJSON` function from `jsonlite` package will import the metadata json file (associated with the Parquet file) as a list of dataframes.
```{r eval=FALSE}
sampleparquet = arrow::read_parquet("sample.parquet")
metadataparquet = jsonlite::fromJSON("sample-metadata.json")
samplems1hdf5 = rhdf5::h5read("sample.h5", name = "ms1")
samplems2hdf5 = rhdf5::h5read("sample.h5", name = "ms2")
samplemetadatahdf5 = rhdf5::h5read("sample.h5", name = "samplemetadata")
spectrummetadatahdf5 = rhdf5::h5read("sample.h5", name = "spectrummetadata")
```
## ✨ Shiny App
A Shiny application is available to use the package easily. To run the app, just use the following command (it might need to install a few additional packages):
```{r eval=FALSE}
run_app()
```