-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcover_letter.txt
23 lines (12 loc) · 3.51 KB
/
cover_letter.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Dear Editor,
Biological studies often rely on complicated data analysis pipelines. Generally, these consist of several steps, each taking a larger amount of input data, which they condense and summarise into a smaller amount of output data -- until the final results can be visualised in overview plots that are easy to interpret.
To ensure that the intermediate steps are correct, most studies merely rely on ad hoc quality filters that have been chosen a priori, without knowledge of the actual data. How can we make sure that all the steps have worked as intended on the actual data at hand? We believe that manual inspection using suitable data visualisation is of utmost importance. In our manuscript, we present a very general tool that can be used for such tasks.
Checking an analysis pipeline is best done by "looking backwards": After a preliminary analysis has been completed, one should create suitable plots to double-check especially those parts of the intermediate data that gave rise to the "hits" in the outcome, i.e., to the results that seem significant or unusual, and compare how the raw data leading to them differs from "average-looking" result data points. In this way, one can ensure that one is able to reliably distinguish interesting biological effects from technical abnormalities.
Here, we present a novel tool to facilitate such spot checks of summarised biological data: "Linked Charts" is an R library for creating interactive apps based on the concept of "linking" two or more plots. Typically, one of the plots will display data summarising the analysis result, while another plot can show details, such as intermediate results or underlying raw data, for one of the items shown in the overview plot. The user can choose, e.g. by mouse click in the first plot, what details to see in the second plot. By chaining more than two plots, one can even walk an analysis backwards all the way from the final result to initial raw data.
The following web page illustrates this with a few examples (which are all discussed in the manuscript): https://anders-biostat.github.io/lc-paper/
It may seem that there already are quite a few tools for interactive data visualisation. Some are very easy to use but restricted to only one very specific kind of data. Others are extremely general but require substantial effort and considerable skill to use. The great gap in between is filled by LinkedCharts. Even though it is very general and can be fitted to any specific task, it allows for extremely rapid development of draft-like interactive visualisations: it is possible to set up an interactive visualisation with no more effort as is needed for conventional data exploration with static plots.
Therefore, LinkedCharts is especially apt for the early exploratory stages of a study, for bioinformaticians creating a new analysis pipeline and needing to get a "feel" for their data -- and even more so for non-standard, innovative analyses that usually happens at the edge of scientific progress.
With a bit more effort, LinkedCharts is, however, also suitable to produce "publication-quality" web apps. Initial draft apps can thus later be converted into prettified and well-organised versions, which may then serve as an interactive supplement to a publication. Making a paper's data and analysis thus more open and accessible, LinkedCharts thus is also a tool towards more open science.
For all these reasons, we are convinced that our work might be of great interest to the readers of Nature Methods.
With best regards
Simon Anders and Svetlana Ovchinnikova