In the following is a list of papers that I love. It’s a bit of a living document and I hope to eventually write a short motivation/comment for all of the papers. They deserve it.
The papers are not selected for their importance or impact, but for how interesting they are to me. I exclude paper that I’ve been involved or where I have personal attachments to the authors. The order has no particular meaning.
- Klemeš (1986): Dilettantism in hydrology: Transition or destiny?. More an essay than a scientific treatment. It essence, Klemeš lines out what kind of research hydrologist should do to avoid scientism and becoming more of a real science: Avoid sophistication without grounding, ask epistemic questions, focus on processes. The piece is certainly a product of its time (e.g., stochastic hydrology does basically not exist anymore), but it is still a joy to read because Klemeš is super opinionated and writes very well. I look at it regularly and over time I came to disagree with quite some parts.
- Zadeh (2006): Generalized theory of uncertainty (GTU)—principal concepts and ideas. TBD
- Robert Abrahart (2005): Neurohydrology: implementation options and a research agenda. The paper that schmidhubered our little initiative. From todays point of view it is very interesting to see how many things Abrahart was able to conceptualize.
- Horvath and Solenthaler (2013): Mass Preserving Multi-Scale SPH. TBD
- Judd (2016): Fifty years of forecasting chaos and the shadow of imperfect models. TBD
- Kalra and Paddock (2016): Driving to Safety. TBD
- Zach Lypton (2016): The Mythos of Model Interpretability. TBD
- Ha and Schidhuber (2018): World models. TBD
- Forth, Hu, and Lakshminarayanan (2020): Deep Ensembles: A Loss Landscape Perspective. TBD
- Surís, Vondrick (2022): Representing Spatial Trajectories as Distributions. TBD
- ⭐️ Surís, Liu, Vondrick (2021): Learning the Predictability of the Future. Exploiting the properties of hyperbolic embeddings (or hyperbolic space) for processes over time.
- ⭐️ Teney et al. (2022): Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization. Develops a gradient based regularization techniques to encourage diversity in ensembles.
- ⭐️ Teney et al. (2023): ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets. Study how different relationships between ID and OOD performances can arrise (a special emphasis is laid on inverse correlations since they do not seem to be reported often).
- Jeffares et al. (2023): Joint Training of Deep Ensembles Fails Due to Learner Collusion. Presents a failure mode for the joing trianing of deep ensembles.
- Althoff et al. (2017): Large-scale physical activity data reveal worldwide activity inequality. This is an extremely well written paper about a super interesting topic. Namely: They use smartphone data from over 700k people to get a proxy on activity. Interestingly, their analytics suggest that the inequality in activity is the best (found) predictor for obesity; and reduced activity in females contributes to a large portion of the observed inequality. Maybe less surprising is their corollary finding that aspects of the built environment, such as the walkability of a city, are associated with a smaller inequality.
- Bachman and Nagarajan (2024; arxiv): The pitfalls of next-token prediction. This is a very well written papers that provides some empirical that the current generation of language models are not only limited because of the autoregressive rollout during inference, but also because of the teacher forcing based training. Very interesting.
- Bassi et. al (2024; HESS preprint): Learning Landscape Features from Streamflow with Autoencoders. This has shaped into a very cool ppaer. My review of the preprint can be found here.
- Kreibich et al. (2022): The challenge of unprecedented floods and droughts in risk management. The authors study how society adapts given a large flood/drought events. They do so by compiling a “huge” data-set of event-pairs that happened at given places (afaik it is the largest of its kind, but in general terms it is still very small). The process that they use to rate how small/large events are is quite interesting to me, since it is based on a consensus approach. As a nitpick, I’d mention the the overall ductus is a bit too negative for what the data suggest. For example, I would are that even if an event of the same magnitude happens a second time and the loss is reduced then that should be interpreted as a win for society --- there are even scenarios where two events of the same with the same impact needs to be counted as a win, e.g., when the demographic change would induce higher vulnerability (but the risk prevention measure still lead to the same loss).
- Devitt et al. (2023): Flood hazard potential reveals global floodplain settlement patterns. This is an interesting paper that tries to map inundation risk from fluvial events globally. The authors suggest that the eveness of the distribution in risk suggests that society is (currently) able to adapt to flood risks. I, however, am a bit worried about the large noise in their global map, since it suggests low locality of risk.
- Feichtenhofer et al. (2022): Masked Autoencoders As Spatiotemporal Learners. Super cool (empirical) work, following the scalable vision learners paper. I really, really like the masked autoencoder concept and all the thoughts the authors poured into the paper to get them running for video-like data.
- Formetta & Feyen (2019): Empirical evidence of declining global vulnerability to climate-related hazards. Very cool study! Basically, they provide empirical evidence that currently humanity as a whole becomes less vulnerable to enviornmental hazards. And, the mechanism behind this progress is attributed towards the global increase of wealth, since richer societies have better systems to cope with hazards and prevent disasters (interengtly enough, in their data heatwaves seem the exception to this rule (my thought: maybe because richer societies tend to have older populations?), even if not much data on them is there). It is, of course, unclear how long this relationship holds and the study is associated with a lot of uncertainties (and the authors do a realy good job at mentioning them throughout their exposition) --- still, the result is cool!
- Schaefli & Gupta (2007): Do Nash values have value?. Interesting discussion on the NSE, which takes the perspective that the mean of the divisor is a reference model. They also introduce and discuss some other reference models. Its all fascinating, but, as far as I know, nothing from the paper is used by anyone (not even by the authors) --- it mainly seems to get citations when people need a reference that explains the limited nature of the NSE. That is also fine, I guess.
- O and Orth (2021): Global soil moisture data derived through machine learning trained with in-situ measurements. I do not have much to say about this one. The paper is interesting for me because it uses our regional LSTM approach to derive a soil moisture product. I do not know how interesting it is for others.
- ⭐️ Singha (2023): Giving Shape to a Meaningful and Fulfilling Career in Science: Some No‐Nonsense Advice. This perspective (?) paper is great. It is written with a very light hearted and funny style, but also gives very tangible advice for building a scientific career in geoscience. I wish there would be more papers like this out there, and I wish I would have read them in an earlier stage of my life.
- Chapelle et al. (2000): Vicinal Risk Minimization. Roughly speaking, the idea of vicinal risk minimization is to not only minimize the empirical risk, but also draw in information from near data points. I also adored this idea, but so far have not found a problem where I can use this framing. So far!
- Mumford (1999): The Dawning of the Age of Stochasticity. It is very interesting to read this in retrospective. I guess one should just read the paper for that alone. That is, to see what Mumford conceptualized (e.g., in terms of Machine Learning) and how things progressed. Apart from that, I also really like I also like how Mumford conceptualizes mathematics as such and how he contrasts logic with probability. Very stimulating.
- Moscovich & Rosset (2020): On the cross-validation bias due to unsupervised preprocessing. Technical discussion on how pre-processing steps, such as clustering, can induce biases for cross-validation. Very interesting.
- Duc & Sawada (2023): A signal-processing-based interpretation of the Nash–Sutcliffe efficiency. A very interesting technical discussion of the NSE. I am not sure how surprising their results are and I think the Gaussian assumptions are not necessarily useful for hydrology. Still, a good perspective.
- Grillakis et al. (2022): Climate drivers of global wildfire burned area. The topic is super cool, but I do not like (and understand) their binning approach. Do they not induce (pseudo) correlations by their partitioning?
- Good & Mittal (1987): The amalgamation and geometry of two-by-two contingency tables. One of the original amalgamation paradox papers. Mostly of historical interest to me.
- ⭐️ Breiman (2001): Statistical Modeling: Two Cultures. This is a legendary paper. I would recommend it to everyone who works in machine learning and data science. It does not only describe on a high level the difference between classical statics and machine learning, but it also goes deep on a lot of topics. I particularly like the conceptualization of the Rashomon set --- for that alone the paper is a must read. Ghamizi et al. (2022): Adversarial Robustness in Multi-Task Learning: Promises and Illusions. This is an extremely interesting contribution on how to use multi-task learning for adversarial robustness. I am not sure how good it is in terms of the adversarial attacks topic, since I am not particularly interested and knowledgable about that aspect of the paper.
- Balestriero, Pesenti & LeCun (2021): Learning in High Dimension Always Amounts to Extrapolation. This is a very well written paper, which demonstrates that it is not so easy to define extrapolation for higher dimensions.
- Duede (2023): Deep Learning Opacity in Scientific Discovery. This is an extremely interesting discussion about what science can do with black box model. Specifically, Duede makes a nuanced argument that the opacity induced by black box models does not necessarily imply less scientific breakthroughs.
- Shwartz-Ziv & LeCun (2023): To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review. This contribution provides a very nice framework to think about deep learning in general, and self-supervised learning in specific. I think it nudges you towards a certain way of thinking, but that bias is often useful.
- ⭐️ Puchert et al. (2021): Data-driven deep density estimation. I really, really, like the technique they developed --- and actually find it surprising that not more people have picked it up. I think nowadays an architecture with attention mechanisms could do the density estimation in a more elegant way, but still. Still, the idea they had was nice.
- Klemeš (1986): Operational testing of hydrological simulation models. The paper that brought split-sample testing to hydrology. It is actually more nuanced than current practice would suggest. In terms of the split-sample test it is interesting to not that Klemeš actually proposed a two-fold cross validation. The differential split-sample test (where you, e.g., train a model on low-flow only and then evaluate it on high-flow and vice versa) is also interesting, since in coincides with hydrological intuition, but is not adequate for testing ML models or obtaining the bad predictions. I think there is still work to do here to properly understand its implications.
- Nash and Sutcliffe (1970): River flow forecasting through conceptual models part I—A discussion of principles. The OG NSE paper. It is actually much more thoughtful than people give it credit for.
- Gupta & Govindaraju (2023). An okaish overview of uncertainty quantification for rainfall-runoff modelling.
- Arlot & Celisse (2010): A survey of cross-validation procedures for model selection. Technical, but very good survey on cross-validation based model selection. It provides a good overview of the different techniques, and I found the discussion on correlated data very nice.
- Chen Cheung, & Yiu (2020): Metamorphic Testing: A New Approach for Generating Next Test Cases. Given that geo-sciences seem to start framing a particular kind of counterfactual model evaluation as “metamorphic testing”, it interesting to learn what metamorphic testing actually is. I like the idea and also how some people adapt it to other fields. I am, however, not sure how good it is to test data--driven models with it. In general it does not make much sense to me to require a data-driven model to perform well outside the data manifold/envelope. Maybe, it could make sense if said models have some invariance or are trained with the test mind.
- Mao et al. (2020): Multitask Learning Strengthens Adversarial Robustness. This is a very cool paper. It shows that multitask learning helps against adversarial attacks (blood pressure preserving note: Follow-up work has shown that one can not just add random task for this).
- Montero-Manso & Hyndman (2021): Principles and algorithms for forecasting groups of time series: Locality and globality. This paper provides arguments why the recent trend to go towards global models in time series forecasting is the real deal. In this context, global models refers to models that are constructed so they are able to predict several time series congruently (as opposed to building an individual model per time series, which they call local). They also provide theoretical and empirical arguments that more often than not one should prefer global models --- in short, except for some theoretical worst cases global models will tend to learn more than local models.
- Clark et al. (2021): The Abuse of Popular Performance Metrics in Hydrologic Modeling. This paper tries to show the inherent uncertainties of performance metrics. The idea is good, but I disagree with some crucial choices in the empirical work (for example, they choose to evaluate over both the training and the test set). I guess this means that the results should be taken with a grain of salt, even if the argument seems solid.
- Hewitt & Liang (2019); Designing and Interpreting Probes with Control Tasks. Good paper to get a grip on probing.
- Rabanser, Günnemann, & Lipton (2019): Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. This is a really good study on the use of dimensionality reduction for distribution shift detection. This paper is good both in terms of content and style. Regarding the latter: the paper is not only very clearly written, it also presents everything so nicely. Just look at the colors and symbols.
- Teegavarapu & Elshorbagy (2005): Fuzzy set based error measure for hydrologic model evaluation. Conceptually, I do like the idea of a fuzzy performance criterion. In practice, I do believe that it introduces to many degrees of freedom.
- Maier & Dandy (2000): Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. This paper is mainly interesting from a historical perspective. Specifically, it is the first hydrological/environmental modeling paper I know that mentions the LSTM.
- Regsgaard & Henriksen (2004): Modelling guidelines––-terminology and guiding principles. I feel like the hydrological community underapreciate Refgaard. I guess this specific paper is a bit out-dated by now, and I am not sure whether I agree with his notion of model falsification.
- Bergmeir, Hyndman, & Koo (2016): A note on the validity of cross-validation for evaluating autoregressive time series prediction. This paper argues that cross validation is preferable over out-of-sample test even if the time series shows high autocorrelations, because it is the autocorrelation of the errors that counts. For me this paper was not as enlightening as I thought it would be, because (a) for some reason I already presumed that the problem are the errors not the time series as such, and (b) so far I have never dealt with problems of this kind.
- Shen, Tolson, & Mai (2022): Time to Update the Split-Sample Approach in Hydrological Model Calibration. This paper shows that (conceptual) hydrological models always generalize best, when they are trained on all the data (rather than, say, choosing a split-sample approach). Chapeau for the (computational) effort that was put into this paper. Now, I think their results indicate that conceptual models tend to underfit the data. I also suspect that this is the case because the model structure does simply not allow to fit certain patterns. The models are, however, parametrized richly enough to fit fit certain other (unwarranted) patterns that are spurious. These then only get averaged out as more data is available. Thus, calibration seems to be more data inefficient than many modelers tend to believe.
- Watson (2022): Machine learning applications for weather and climate need greater focus on extremes. A short opinion paper that succinctly argues that ML research for weather/climate should focus more on extremes.
- ⭐️ Klemeš (1986): Dilettantism in Hydrology' Transition or Destiny?. This paper is hilarious to read. Klemeš really was one of the best writers in hydrology. I still reread from time to time. Over the years I got a bit weary about some of the arguments, e.g.: Are models that work well really the greatest danger to progress in hydrology? Is it not more the case that a lot of (to be fair, recent) progress was made by constructing models that work very well? Shouldn't we see fidelity and understanding on a continuum. Etc. Etc. Any ways, in general, I still like the paper a lot.
- Kratzert et al. (2019): Towards Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning. A paper where I was had the honor to contribute. This paper is important because we show how good LSTMs are in an ungauged basin setting.
- Rudin et al. (2022): Interpretable machine learning: Fundamental principles and 10 grand challenges. Very good, very opinionated position paper on interpretable machine learning. The paper provides a strong case why building models that are structurally interpretable is favorable (over first training a network and then post-hoc interpreting what it does). Albeit, I agree with the general argument I am not sure if I fully buy it: As of now, for many problems it seems to be the case that it is much easier to build black boxes than grey/white boxes (Purely empirically speaking). And, as long as this is the case, I think that post-hoc methods can provide a lot of value,
- Chauhan et al. (2023): A Brief Review of Hypernetworks in Deep Learning. Good overview of the state and use of hypernetworks.
- Roberts et al. (2017): Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. One of the few review papers on cross-validation that I know. Even better: Its a good and informative read. I would say it has a lot of breath, but not much depth.
- Melis, Kočiský, & Blunsom: Mogrifier LSTM. Interesting, but a bit too late for its time.
- Lee, Yao and Finn (2023): Diversify and Disambiguate: Out-of-Distribution Robustness via Disagreement. I like the idea --- but am probably a bit biased since I had a similar idea once. I am not sure about the actual implementation yet.
- Vahidi et al. (2023): Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning. I am not so much interested in the self-supervised part of this. However, I really like (diversified) ensemble. Tangent: It is interesting that some fields use ensembles so often, but others almost never. I hope I can contribute at some point to the research on or with ensembles.
- Freiesleben & Grote (2023): Beyond generalization: a theory of robustness in machine learning. An interesting "let us think about a concept out loud paper". For me this is the best discussion about the meaning of robustness in machine learning that I read so far.
- Schölkopf et al. (2021): Toward Causal Representation Learning. This is an extremely good overview of what casual representation learning could be or become. I especially like how the authors contrast statistical and causal models.
- Jain et al. (2023): A Data-Based Perspective on Transfer Learning. Very interesting paper that provides a more nuanced view on transfer learning by looking at the data. They provide a strong argument for data curation. For example: among other things they find that more data is not in itself good, and one can also improve downstream performance by removing data from pretraining.
- Ting et al. (2023): Model Calibration and Validation From A Statistical Inference Perspective. This is an deep, thorough, but also opinionated overview about the principles of model calibration and validation in transport research (read: cars/traffic). Because of its theoretical nature and clear writing it is also a good read for people form different domains (like me). That said, in my opinion, some claims are formulated too strongly.
- Burnell et al. (2023): Rethink reporting of evaluation results in AI. Opinion paper that argues for a more thorough evaluation of machine learning models. Specifically, the authors propose to not (exclusively) focus on aggregate metrics. Strong agree.
- Frank, Fiedler, & Crevel (2021): Balancing potential of natural variability and extremes in photovoltaic and wind energy production for European countries. Thinks about how one can stabilize European renewables by cross country compensation schemes (in a purely renewable setting). Quite technical --- and at least for mem, as a non-expert, the paper was difficult to read. Still idea, topic, and approach are truly interesting. Looking forward to read follow ups to this one.
- Caruana (1997): Transfer Learning. The OG multitask learning paper. Its actually suprisingly deep. Much deeper than one would think from reading current multitask papers.
- Agned et al., (2021): Deep learning hybrid model with Boruta-Random forest optimiser algorithm for streamflow forecasting with climate mode indices, rainfall, and periodicity. Crazy model that combines a special random forest with an LSTM to do streamflow forecasting. Only very few basins are used. Not sure if I buy the results.
- Bhasme, Vagadiya & Bhatia (2021): Enhancing predictive skills in physically-consistent way: Physics Informed Machine Learning for Hydrological Processes. A PINN approach for streamflow prediction of single catchments. NOt very concincing.
- Liu et al. (2022): Landscape Learning for Neural Network Inversion. I am very interested in the kind of ivnerstion they are describing and, at first sight, I like their proposed solution. I have to think more about it and learn more about this style of approaches.
- Foret et al. (2021): Sharpness-Aware Minimization for Efficiently Improving Generalization. Maybe the paper is a bit too much focused on theory to appeal to someone like me. Still, I find the idea(s) behind SAM quite cool.
- Manchingal and Cuzzolin (2022): Epistemic Deep Learning. Very broad overview of many different things. I like the intention of the proposed Epistemic Deep Learning program, but got lost in the paper. Whats worse: I am a bit sceptical about the experimental section since they do not compare themsevles to sota and have no image net. For many approaches that is fine, but here it seems off.
- Chouraqui et al. (2022): A Geometric Method for Improved Uncertainty Estimation in Real-time. A geometrical approach for uncertainty prediction based on the input-label relationship of classifier models. The exposition is perhaps overly technical, but I still like the approach. Also, I am a bit criticall whether the distance based notions / hyperspheres works well on very high dimensional data. Maybe I am worng.
- Guo et al. (2017): On Calibration of Modern Neural Networks. Interesting analysis of the uncertainty prediction capacities of many networks.
- Zhe Liu et al. (2022): A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness. I like the initial discussion and the motivation of the paper. I think all their intuitions are the right ones. However, I don't think that it is a simple approach – and, I am not super convinced by the use of GPs and the enforcement of distance awareness into the last layer.
- Fannjiang et al. (2022): Conformal Prediction for the Design Problem. Very torough paper discussing an extremely interesting setting: The design problem - setting where the performance on a training set influences the shift on the test-set.
- ⭐️ He et al. (2022): NeMF: Neural Motion Fields for Kinematic Animation. Super clever use of implicit neural representations. I wish I had an use for something like that.
- Wolleb et al. (2022): Diffusion Models for Medical Anomaly Detection. Very clever use of diffusion models for anomaly detection. I am not sure about the evaluation though.
- Douven (2021): Scoring, Truthlikeness, and Value. A very interesting article about the interellation of model performance in terms of scoring rules and the value the models produce. Douven argue, quite convincingly, that there exist no single scoring rule (proper or not) that perfectly reflects the value of the prediction.
- Sousa (2022): Inductive Conformal Prediction: A Straightforward Introduction with Examples in Python. A very brief but very well written introduction to conformal prediction (a type of coverage based uncertaitny prediction for arbitrary models).
- ⭐️ Hüllermeier and Wageman (2020): Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Perhaps the best introductory paper about uncertainty estimation for ML. The text has both breath and depth. It is well written, very clear explanations and examples. Even touches on conformal predictions and the like.
- Huber (2002): Approximate models. An extremely well written essays about model fitting, robustness and simulation.
- Zhang et al. (2021): Understanding plastic degradation and microplastic formation in the environment: A review. Interesting overview for someone like me who has very little insight to topic.
- Wortsman et al. (2022): Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. Very interesting approach to obtain flat minima. I think i still prefer ensembles, but with the very large model paradigm we are in these might be the next-best thing.
- Simon (1995): Artificial Intelligence: an empirical science. An AI landmark paper before the advent of Deep Learning. A long read and at times a bit outdates. However, on a larger level it provides some very sharp arguments that are still relevant for todays discussion.
- Post et al. (2021): Application of Laser-Induced, Deep UV Raman Spectroscopy and Artificial Intelligence in Real-Time Environmental Monitoring—Solutions and First Results. A paper that lines out pathways for ubiquitous microplastic monitoring. Intersting topic, but boring read.
- Petropolous et al. (2022): Forecasting: theory and practice. Because of its size and scope it seems pretty useless to me. Maybe a good reference.
- Tatsunami and Taki (2022): Sequencer: Deep LSTM for Image Classification. One of these papers that corroborate that with large amounts of data the learning algorithms matter less and less. In this case they show that LSTMs obtain competitive performance for large-scale image data. Very impressive engineering.
- Cha, Chun, Lee, Cho, Park, Lee, and Park (2021): SWAD: Domain Generalization by Seeking Flat Minima. A flat-minima paper. They show a quite interesting technique to find flat-minima and empirical evidence that flat minima tend to generalize better than sharp ones.
- Shi, Daunhawer, Vogt, Torr, and Sanyal (2022): How robust are pre-trained models to distribution shift?: Empirical evidence that unsupervised models outperform supervised models regarding OOD generalization.
- Yiou, Jézéquel, Naveau, Otto, Vautard, and Vrac (2017): A statistical framework for conditional extreme event attribution. An overview of a method to use counterfactual-analysis for assessing whether kinds of extreme events have become more or less likely due to Climate Change. Some things in there are hard to swallow, but great concept.
- Lloyd, Oreskes (2019): Climate Change Attribution: When Does it Make Sense to Add Methods?. This contribuztion is easy to read and gives a great overview of different appraoches to assess the impact of climate change on extreme events. For my taste it is a bit to opinionated and almost prosaic from time to time. Still worth the read.
- Katz, Parlange, and Naveau (2002): Statistics of extremes in hydrology. A good overview of the uses of extreme value theory in hydrology. Perhaps needs an udpate.
- Foorgione, Muni, Piga, and Gallieri (2022): On the adaptation of recurrent neural networks for system identification. Interstingly enough system identificaiton is an interesting, but for ML unconventional, setting. The idea of the authors is to use that setting to study how RNNs that have been used for system identification purposes can be adjusted to eventual changes in the underlyign system as fast as possible. They propose to use an approximation to an updated model using a sort-of taylor expansion technique. This approximation yields a linearization of the loss-landscape around the current model parameters and lets them directly estimate a potential updade in a Kalman-like forward step. The exposition is good and the results are convincing.
- Baartman, Melsen, Moore, and van der Ploeg (2020): On the complexity of model complexity: Viewpoints across the geosciences. This contribution gives an interesting take on model complexity. Instead of going trough formal or classical definitions of complexity the authors decided to design a questionaire and actually ask geo-scientist. Instead of deriving yet another definition of model complexity (which would probably be limited, like all current one), the authors emphasize that their results reflect that complexity is a context dependent property. I am not sure if I buy that.
- ⭐️ Gontijo-Lopes, Dauphin, and Cubuk (2021): No One Representation to Rule Them All: Overlapping Features of Training Methods. For me this was one of the most interseeting paper at neurips2021. The authors show that one can build super diverse ensembles with neural networks, if the same network is trained with different enough training methods (new self-supervised, state-of-the-art approaches lend themselves more to that than plain supervised ones).
- Khintchine (1934): Korrelationstheorie der stationären stochastischen Prozesse. One of the earliest paper that pins down the concept of stationarity and instationarity. Requires knowledge of german and some patience. Quite precise.
- Koutsoyiannis and Sargentis (2021): Entropy and Wealth Demetris. A mathematical essay on the connection between entropy and wealth. True out of the box thinking.
- Beven (2020): The era of infiltration. Historical inquiry about infiltration theory by one of the most important hydrologist alive. Worth reading for everyone who is itnerested in the history of quantitative hydrology.
- Merz et al. (2021): Causes, impacts and patterns of disastrous river floods. QUite superficial review about the flood risks. Maybe good as a reference.
- Saltelli (2020): Ethics of quantification or quantification of ethics?. I guess this is what "philosophy of data-science" looks like if it is done by a well read data-scientist (as opposed to a data-sciency philosopher)
- Voita and Titov (2020): Information-Theoretic Probing with Minimum Description Length. Probing is cool and this is an excellent paper about how to make probes.
- ⭐️ Jain et al. (2021): DEUP: Direct Epistemic Uncertainty Prediction. A very cool paper about a quite involved method for uncertainty-estimation based active learning.
- Karandikar et al. (2021): Soft Calibration Objectives for Neural Networks. I do have a heart for soft-appraoches and this seems to be a good idea. If I would actually work on calibration task I would try it for sure.
- Biloš et al. (2021): Neural Flows: Efficient Alternative to Neural ODEs. The paper presents an intersting alternative to neuralODE. I am (still) not sure if neuralODE are a good idea in the first place, but here is already an approach that might supersedes them.
- ⭐️ Judd and Stemler (2010): Forecasting: it is not about statistics, it is about dynamics. When reading one can see that authors had very specific model classes in mind. Still, I quite like their framing and the (historical) view they present regarding the forecasting problem.
- ⭐️ Judd and Nakamura (2006): Degeneracy of time series models: The best model is not always the correct model. A very interesting paper that demonstrates on basis of a simple example that when we model a non-linear system and noisy measruements the best model is likely not the correct (i.e. true) model. Given the few citation it has, I think that the insight of the paper and the clearness of the example is underapreciated.