swisspalmR is a small package with one purpose: retrieval of
S-palmitoylation data from the SwissPalm database using httr2
, rvest
and curl
.
You can install the development version of swisspalmR from GitHub with:
# install.packages("devtools")
devtools::install_github("simpar1471/swisspalmR")
To query the SwissPalm database, get some protein accessions into a vector. The proteins must be supported by SwissPalm, i.e. a UniProt AC, UniProt secondary AC, UniProt ID, UniProt gene name, Ensembl protein, Ensembl gene, Refseq protein ID, IPI ID, UniGene ID, PomBase ID, MGI ID, RGD ID, TAIR protein ID, or EuPathDb ID.
Once in a vector you can query the SwissPalm database using the
swissPalm()
function. You’ll receive a 25-column dataframe with rows
for each query ID supplied to the function, detailing various aspects of
S-palmitoylation for each protein found in SwissPalm. For example:
protein_ids <- c("P05067", "O00161", "P04899", "P98019")
# Only using 5 cols to restrict printed output
swisspalmR::swissPalm(protein_ids)[, c(1, 3, 4, 23, 24)]
#> Query_identifier UniProt_ID UniProt_status Protein_has_hits_in_SwissPalm
#> 1 P05067 A4_HUMAN Reviewed TRUE
#> 2 P04899 GNAI2_HUMAN Reviewed TRUE
#> 3 O00161 SNP23_HUMAN Reviewed TRUE
#> 4 P98019 COX2_ANAPL Reviewed FALSE
#> Orthologs_of_this_protein_have_hits_in_SwissPalm
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 FALSE
You can test your protein accessions against specific datasets or
species in SwissPalm using the dataset
and species
parameters. Valid
values for dataset
and species
can be found in the package objects
swisspalmR::datasets
and swisspalmR::species
.
# Checking against only mallard ducks
mallard <- swisspalmR::species["Mallard duck"]
swisspalmR::swissPalm(protein_ids, species = mallard)[, c(1, 3, 4, 23, 24)]
#> Query_identifier UniProt_ID UniProt_status Protein_has_hits_in_SwissPalm
#> 1 P98019 COX2_ANAPL Reviewed FALSE
#> 2 O00161 <NA> <NA> NA
#> 3 P04899 <NA> <NA> NA
#> 4 P05067 <NA> <NA> NA
#> Orthologs_of_this_protein_have_hits_in_SwissPalm
#> 1 FALSE
#> 2 NA
#> 3 NA
#> 4 NA
More information on using swissPalm()
can be found in the
introductory
vignette.
Note that swissPalm()
is
memoised - results are cached
and returned if the same inputs are provided to swissPalm()
in one
session. This way, SwissPalm can return results to users faster. If you
want the swissPalm()
function to ‘forget’ previous results, use
memoise::forget(swissPalm)
.
Though swissPalm()
is memoised, the function will request data it has
already received from SwissPalm if provided in a different vector, or if
different species
/dataset
parameters are used.
swissPalm(query_id = "P05067")
swissPalm(query_id = "P05067", species = "7")
swissPalm(query_id = c("P05067", "P04899"))
In the above calls, data for “P05067”
is requested from SwissPalm three
times even though SwissPalm is memoised. I plan to implement a caching
system separate from memoise
which cache swissPalm()
outputs in
memory. These could be retrieved when necessary to further reduce the
load on the SwissPalm database.
Additionally, the SwissPalm database has more than just the
protein-level data accessed by swissPalm()
.
This includes data on
hits/sites and
experiments. I plan to extend swisspalmR
for accessing this data.
The SwissPalm database is available under a Creative Commons BY-NC-ND license. SwissPalm reference: SwissPalm: Protein Palmitoylation database. Mathieu Blanc*, Fabrice P.A. David*, Laurence Abrami, Daniel Migliozzi, Florence Armand, Jérôme Burgi and F. Gisou van der Goot. F1000Research.