Skip to content

simpar1471/swisspalmR

Repository files navigation

swisspalmR: Access SwissPalm data through R

R-CMD-check Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

swisspalmR is a small package with one purpose: retrieval of S-palmitoylation data from the SwissPalm database using httr2, rvest and curl.

Installation

You can install the development version of swisspalmR from GitHub with:

# install.packages("devtools")
devtools::install_github("simpar1471/swisspalmR")

Examples

To query the SwissPalm database, get some protein accessions into a vector. The proteins must be supported by SwissPalm, i.e. a UniProt AC, UniProt secondary AC, UniProt ID, UniProt gene name, Ensembl protein, Ensembl gene, Refseq protein ID, IPI ID, UniGene ID, PomBase ID, MGI ID, RGD ID, TAIR protein ID, or EuPathDb ID.

Once in a vector you can query the SwissPalm database using the swissPalm() function. You’ll receive a 25-column dataframe with rows for each query ID supplied to the function, detailing various aspects of S-palmitoylation for each protein found in SwissPalm. For example:

protein_ids <- c("P05067", "O00161", "P04899", "P98019")
# Only using 5 cols to restrict printed output
swisspalmR::swissPalm(protein_ids)[, c(1, 3, 4, 23, 24)]
#>   Query_identifier  UniProt_ID UniProt_status Protein_has_hits_in_SwissPalm
#> 1           P05067    A4_HUMAN       Reviewed                          TRUE
#> 2           P04899 GNAI2_HUMAN       Reviewed                          TRUE
#> 3           O00161 SNP23_HUMAN       Reviewed                          TRUE
#> 4           P98019  COX2_ANAPL       Reviewed                         FALSE
#>   Orthologs_of_this_protein_have_hits_in_SwissPalm
#> 1                                             TRUE
#> 2                                             TRUE
#> 3                                             TRUE
#> 4                                            FALSE

You can test your protein accessions against specific datasets or species in SwissPalm using the dataset and species parameters. Valid values for dataset and species can be found in the package objects swisspalmR::datasets and swisspalmR::species.

# Checking against only mallard ducks
mallard <- swisspalmR::species["Mallard duck"]
swisspalmR::swissPalm(protein_ids, species = mallard)[, c(1, 3, 4, 23, 24)]
#>   Query_identifier UniProt_ID UniProt_status Protein_has_hits_in_SwissPalm
#> 1           P98019 COX2_ANAPL       Reviewed                         FALSE
#> 2           O00161       <NA>           <NA>                            NA
#> 3           P04899       <NA>           <NA>                            NA
#> 4           P05067       <NA>           <NA>                            NA
#>   Orthologs_of_this_protein_have_hits_in_SwissPalm
#> 1                                            FALSE
#> 2                                               NA
#> 3                                               NA
#> 4                                               NA

More information on using swissPalm() can be found in the introductory vignette.

Note that swissPalm() is memoised - results are cached and returned if the same inputs are provided to swissPalm() in one session. This way, SwissPalm can return results to users faster. If you want the swissPalm() function to ‘forget’ previous results, use memoise::forget(swissPalm).

Planned features

Though swissPalm() is memoised, the function will request data it has already received from SwissPalm if provided in a different vector, or if different species/dataset parameters are used.

swissPalm(query_id = "P05067")
swissPalm(query_id = "P05067", species = "7")
swissPalm(query_id = c("P05067", "P04899"))

In the above calls, data for “P05067” is requested from SwissPalm three times even though SwissPalm is memoised. I plan to implement a caching system separate from memoise which cache swissPalm() outputs in memory. These could be retrieved when necessary to further reduce the load on the SwissPalm database.

Additionally, the SwissPalm database has more than just the protein-level data accessed by swissPalm(). This includes data on hits/sites and experiments. I plan to extend swisspalmR for accessing this data.

Credit and copyright

The SwissPalm database is available under a Creative Commons BY-NC-ND license. SwissPalm reference: SwissPalm: Protein Palmitoylation database. Mathieu Blanc*, Fabrice P.A. David*, Laurence Abrami, Daniel Migliozzi, Florence Armand, Jérôme Burgi and F. Gisou van der Goot. F1000Research.