Review #1 #1

distillpub-reviewers · 2018-10-19T22:50:38Z

The following peer review was solicited as part of the Distill review process.

The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer for taking the time to review this article.

This article was a delight to read, and hits many of the right notes in what I think makes a good Distill article:

The writing is crisp. I enjoyed the brevity of notation and mathematical minimalism --- the essential points are communicated without burdening the reader with unnecessary formalism.
The figures are of an exceptional quality and include many design innovations. e.g., the use of toggle-able “pre-set” data-points is very elegant and trumps conventional widgets that allow the user to put in data-points wherever they wish. The diagrams are also technically very executed; the animations and interactions are fluid and work with minimal stutter. The presentation is overall outstanding.

What is unclear to me, however, is if this article breaks new ground in exposition. A lot of space e.g. is devoted to exposition on basic properties of the multivariate Gaussian distributions. Many of the figures correspond to visualizations that I already have in my head (though this is probably a good thing!), and the depth of the material on GPs themselves are not beyond what is covered in introductory courses. What I’m saying is that a reader already familiar with GPs is unlikely to take away anything from this. A new reader, however, who has not developed this visual “muscle”, would likely take away a lot. I will thus leave it to the editors to decide if Distill is an appropriate venue for such an article.

Personally, however, I think it is a solid piece of writing and thus I recommend it for publication.

Specific comments

[minor] based on prior knowledge -> incorporating prior knowledge

[minor] The statement [Gaussian processes assigns a probability to each of those functions] needs a citation.

[typo] symmetric -> symmetric

[nit] I am not a fan of the notation P(X,Y) used to denote the joint distribution of X and Y. This appears to be interchangeable with p_{X,Y}, used elsewhere in the article to refer to the joint density. These two notations refer to essentially the same thing.

[nit] Though it is clear from context what the article means, the article needs to be careful about distinguishing between probabilities and probability densities.
e.g. in statements like “we are interested in the probability of X=x”

[minor] In the section “kernels”, L2-norm should be L2-distance

[minor] Periodic kernels are not invariant to translations but invariant to translations equal to the period of the function.

[major] Kernels can be combined by addition, multiplication and concatenation (see Chapter 5.4 of Mackay, Information Theory, Inference, and Learning Algorithms). The omission of addition as a way of combining kernels is surprising.

[minor] Furthermore, it would be nice to see an explanation of what it means when two kernels are combined. The resulting kernels inherit the qualitative properties of its constituents, but how, and why?

[major] There seems to be a problem in the figure right above “Conclusion”, when the linear kernel is used alone. The linear kernel has rank 2, and thus the distribution after conditioning is not well defined if there are more than 2 points observed. The widget, however, does not seem to complain when more than two points are added. What is going on? I suspect that the regression has been done with a noise model, i.e. with a small multiple of the identity added to fix the singularity. This is completely valid, but it should be discussed explicitly.

[major] To speak more broadly on this point, it is worth mentioning that GP regression is typically done in conjunction with a noise model. This allows the posterior some flexibility such that it does not need to pass through every observation (indeed, this constraint makes GPs nearly unusable). This addition requires very little extension to the GP framework and should be discussed explicitly.

[minor] I do think this is what the authors seem to be getting at in the conclusion, at the line ""If we need to deal with real-world data, we will often find measurements that are afflicted with uncertainty and errors."" The cited article, however, seems to be more sophisticated than what is needed to address the basic problem.

A final thought

I think the article could benefit from a hero figure that is a true “GP sandbox”. It would be nice to be able to combine kernels, change the parameters of the kernels, and add/remove data points in real time, all in one place.

Distill employs a reviewer worksheet as a help for reviewers.

The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.

Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: Explanation of existing results

Advancing the Dialogue	Score
How significant are these contributions?	3/5

Outstanding Communication	Score
Article Structure	4/5
Writing Style	5/5
Diagram & Interface Style	5/5
Impact of diagrams / interfaces / tools for thought?	3/5
Readability	4/5

Scientific Correctness & Integrity	Score
Are claims in the article well supported?	4/5
Does the article critically evaluate its limitations? How easily would a lay person understand them?	3/5
How easy would it be to replicate (or falsify) the results?	3/5
Does the article cite relevant work?	3/5
Does the article exhibit strong intellectual honesty and scientific hygiene?	3/5

grtlr · 2018-11-06T09:04:06Z

Thank you very much for your detailed review! We are sorry for our late response - we agree with your feedback and will make changes to the article to reflect this. Currently, we are waiting for the other reviews before making major changes. Many of your specific comments should be straightforward to fix.

[major] To speak more broadly on this point, it is worth mentioning that GP regression is typically done in conjunction with a noise model. This allows the posterior some flexibility such that it does not need to pass through every observation (indeed, this constraint makes GPs nearly unusable). This addition requires very little extension to the GP framework and should be discussed explicitly.

[minor] I do think this is what the authors seem to be getting at in the conclusion, at the line ""If we need to deal with real-world data, we will often find measurements that are afflicted with uncertainty and errors."" The cited article, however, seems to be more sophisticated than what is needed to address the basic problem.

As you have mentioned, the noise model for the input is a very important aspect of Gaussian processes because this is what makes them practical. We agree that in the current version this is not covered adequately and we plan on adding an additional paragraph, together with a corresponding figure.

I think the article could benefit from a hero figure that is a true “GP sandbox”. It would be nice to be able to combine kernels, change the parameters of the kernels, and add/remove data points in real time, all in one place.

This sounds like a really interesting idea! Ideally, we would want to add an additional example that is more tangible (e.g. temperature data) and also has a larger number of training points. In that regard, we need to see what is possible with our current JavaScript implementation.

grtlr · 2018-12-03T15:18:04Z

PR #2 fixes some of the comments of R1:

[minor] based on prior knowledge -> incorporating prior knowledge

[minor] The statement [Gaussian processes assigns a probability to each of those functions] needs a citation.

[typo] symmetric -> symmetric

[minor] In the section “kernels”, L2-norm should be L2-distance

I will make additional changes in the next couple of days.

grtlr · 2019-01-28T07:59:55Z

Commit 48aa644 fixes the following:

[nit] I am not a fan of the notation P(X,Y) used to denote the joint distribution of X and Y. This appears to be interchangeable with p_{X,Y}, used elsewhere in the article to refer to the joint density. These two notations refer to essentially the same thing.

Commit 5752d96 fixes the following:

[nit] Though it is clear from context what the article means, the article needs to be careful about distinguishing between probabilities and probability densities.
e.g. in statements like “we are interested in the probability of X=x”

grtlr · 2019-01-30T11:58:42Z

[major] To speak more broadly on this point, it is worth mentioning that GP regression is typically done in conjunction with a noise model. This allows the posterior some flexibility such that it does not need to pass through every observation (indeed, this constraint makes GPs nearly unusable). This addition requires very little extension to the GP framework and should be discussed explicitly.

[minor] I do think this is what the authors seem to be getting at in the conclusion, at the line ""If we need to deal with real-world data, we will often find measurements that are afflicted with uncertainty and errors."" The cited article, however, seems to be more sophisticated than what is needed to address the basic problem.

PR #18 adds more information about extending Gaussian processes to input noise.

We provide more details on the error model for noisy inputs, as a valuable extension to Gaussian processes. (#1, #4)

RKehlbeck · 2019-02-07T14:28:41Z

[minor] Periodic kernels are not invariant to translations but invariant to translations equal to the period >of the function.

[major] Kernels can be combined by addition, multiplication and concatenation (see Chapter 5.4 of Mackay, Information Theory, Inference, and Learning Algorithms). The omission of addition as a way of combining kernels is surprising.

[minor] Furthermore, it would be nice to see an explanation of what it means when two kernels are combined. The resulting kernels inherit the qualitative properties of its constituents, but how, and why?

PR #11 adresses the above issues, adds additional information on kernels, and includes a new static figure.

distillpub-reviewers added the peer-review This issue contains a peer review from the Distill review process label Oct 19, 2018

grtlr added this to the revisions milestone Jan 3, 2019

grtlr added a commit that referenced this issue Jan 31, 2019

Expand paragraph on noisy inputs (#7)

c02f66d

We provide more details on the error model for noisy inputs, as a valuable extension to Gaussian processes. (#1, #4)

agitter mentioned this issue Apr 11, 2019

Himmelstein's proofread greenelab/meta-review#211

Merged

2 tasks

farhanhubble mentioned this issue May 23, 2021

Explain Marginalization More Intuitively #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review #1 #1

Review #1 #1

distillpub-reviewers commented Oct 19, 2018

grtlr commented Nov 6, 2018

grtlr commented Dec 3, 2018

grtlr commented Jan 28, 2019

grtlr commented Jan 30, 2019 •

edited

Loading

RKehlbeck commented Feb 7, 2019

Review #1 #1

Review #1 #1

Comments

distillpub-reviewers commented Oct 19, 2018

Specific comments

A final thought

grtlr commented Nov 6, 2018

grtlr commented Dec 3, 2018

grtlr commented Jan 28, 2019

grtlr commented Jan 30, 2019 • edited Loading

RKehlbeck commented Feb 7, 2019

grtlr commented Jan 30, 2019 •

edited

Loading