- About The Project
- Getting Started
- Usage
- Interactions and Contributing
- Exhibition and Examples
- License
- Contact
- Acknowledgments
To help researchers, readers, reviewers, and editors better communicate envrionmental ML research work, we developed the EMBRACE Checklist that aims to enhance understanding of the feasibility of proposed projects, the completeness of ongoing research, and the robustness of finished work. Before adopting the checklist, it is encouraged to consult previously published review work and viewpoint for specific terminologies and concepts.
Note
- A comprehensive review study was conducted to investigate common pitfalls and best ML practices in Environmental Science and Engineering (ESE) areas, please check: Zhu, J.-J., Yang, M., & Ren, Z. J. (2023). Machine learning in environmental research: common pitfalls and best practices. Environmental Science & Technology, 57(46), 17671-17689. https://doi.org/10.1021/acs.est.3c00026
- A viewpoint summarizes the general usage of this checklist and advocates to better faciliate an engaged learning community in ESE, please check: Zhu, J.-J., Boehm, A. B., & Ren, Z. J. (2024). Environmental Machine-learning, Baseline Reporting, And Comprehensive Evaluation: The EMBRACE Checklist. Environmental Science & Technology, 58(45), 19909–19912. https://doi.org/10.1021/acs.est.4c09611.
The objective of introducing this Checklist is to develop a ease-to-use tool to provide essential and important data reporting along your environmental ML work. It only costs you 5-10 minutes to complete the 3-page checklist. The ultimate goal is to foster an active learning community and promote self-beneficial practices in environmental ML research. We encourage interested researchers to join and contribute this effort, potentially establishing standards that benefit the broader community.
- For Researchers: The Checklist helps guidance on essential steps and requirements to develop ML models for environmental research. Clear data reporting streamlines the publication process and enhances the impact of the research work.
- For Readers: The Checklist (shared by researchers) contains key information of the research work, making it easier to follow, understand, and utilize the data, methodology, and findings of the research.
- For Reviewers: The Checklist (along with manuscript) provides clear information which minimizes potential confusion during the review process, allowing reviewers to better understand the study’s novelty and contributions.
- For Editors: The Checklist (submitted by researchers along with their manuscripts) helps editors screen the work based on the scope and quality.
The checklist includes "project overview" and eight sections that follow a typical flowchart of ML model development:
- Project overview helps to record the general information of your ongoing or finished study. You can also use it to track potential problems during your research.
- Section 1 includes reporting study objectives and feasibility assessment, aiding researchers in evaluating their available resources more effectively.
- Section 2 covers reporting on data sources, including types, ethics, as well as details regarding the number and quality of data points.
- Section 3 outlines reporting on data cleaning, enrichment, feature engineering, data splitting, and final data descriptions.
- Section 4 focuses on reporting the supervised learning methods chosen and the overall modeling framework.
- Section 5 highlights key aspects of model evaluation and hyperparameter optimization (HPO).
- Section 6 emphasizes the need for a deeper understanding and reporting of model interpretability, explainability, and causality.
- Section 7 focuses verifying data leakage management.
- Section 8 encourages the sharing of data and code when possible.
Tip
We strongly recommend to read the review work and viewpoint before using the checklist. When using the checklist, you can refer to the Instructions for additional information.
The checklist can be used at any stage of life cycle of your research, including project initialization, ongoing work, finished study, and educational learning.
Important
The checklist is designed to easy check and fill. You may consult Instructions for detailed explanations.
We encourage you to share your checklist directly, so lengthy sentences or contents in a short cell can be read via reader interaction. However, you can also follow the instructions below if you prefer to save the checklist as a read-only document.
-
When using Microsoft Windows, please follow these step: “File” >> “Print” >> Select “Microsoft print to PDF” in “Printer” >> Print >> Save it as a new PDF document.
-
When using macOS, please follow these step: Or check this webpage for the same direction.
-
For examples of common problems and less robust applications, please refer to the Instructions.
-
For shared checklists from fellow researchers, please check the following Exhibition table.
- If you find this checklist useful, please help to spread it to build an engaging community for environmental ML research. If the checklist helps your ML research, we appreciate your credit to our work. Please cite the viewpoint.
- The best approach to share your checklist is to use it as supporting information when submitting your manuscript, so researchers (yourself), editors, reviewers, and readers can all benefit from the transparent and complete data reporting.
- If you'd like to share your checklist data alone without used as supporting information or because of other situations (e.g., old studies/papers), we are also happy to share them. Please send your checklist to Dr. Junjie Zhu at Princeton University ([email protected] or [email protected]) via email. Please name your checklist document "Finished_date EMBRACE Checklist version_number for Your_research" (e.g., "20240901 EMBRACE Checklist 1.0 for Zhu et al. (2023)") and send an email entitled "Sharing EMBRACE Checklist from Your_name" (e.g., "Sharing EMBRACE Checklist from Junjie Zhu"). To better classify the shared checklists, please also provide your research application domain (e.g., Water Quality and Treatment) in your email and your contact email (if other than your sending email address). We will do a quick data completeness inspection and share it ASAP. It is worth noting that sharing your checklist is completely volunteer per your agreement on releasing your research information once you send your document to us. Therefore, we and other fellow researchers appreciate your sharing. If you shared your checklist and would like to update the existing document, please identify the document and explain the reason. We will revise it accordingly.
- Sharing your checklist may enhance the impact of your research by increasing visibility among colleagues. Therefore, it is important to ensure the accuracy of reported data, which is a self-responsible QA/QC. From the prespective of other fellow researchers, the accuracy of the checklist information relies solely on the reporting researcher. While we encourage researchers to share their checklists, it is crucial to ensure responsible data reporting.
- If you find other important items need to be added or there are any places need to be corrected, particularly if they are common and representative in ESE areas, please feel free to let us know. One strightforward way is to send an email to Dr. Junjie Zhu with your thoughts and supporting materials. Alternatively, you can post issues with clear descriptions. It is likely that your suggestion will be accepted for future development of the checklist.
Sharing Date | Application Domain | Checklist Sharing | Paper Link | Publication Date | Author Name | Author Contact | Additional Info. |
---|---|---|---|---|---|---|---|
09/05/2024 | Resource Recovery | PURL | DOI | 05/13/2024 | Meiqi Yang | Material discovery with DLM | |
09/05/2024 | Resource Recovery | PURL | DOI | 03/27/2023 | Meiqi Yang; Junjie Zhu | Separation predictions with rigorous data leakage management | |
09/04/2024 | Hydrology and Water Quantity | PURL | DOI | 07/15/2022 | Junjie Zhu | Probabilistic predictions with 95% PI | |
09/04/2024 | Water Quality and Treatment | PURL | DOI | 06/15/2022 | Junjie Zhu | Metaheuristic-optimized deep learning | |
09/04/2024 | Water Quality and Treatment | PURL | DOI | 01/01/2018 | Junjie Zhu | Multi-objective optimized data-driven |
This work is licensed under a Creative Commons Attribution 4.0 International License.
Junjie Zhu - @Jun_Jie_Zhu - [email protected] or [email protected]
Project Link: https://github.com/starfriend10/EMBRACE
Zhiyong Jason Ren. Professor, CEE department, Princeton University. Project initialization main contributor
Alexandria B. Boehm. Professor, CEE department, Stanford University. Project initialization main contributor
Meiqi Yang. Ph.D. Candidate, CEE department, Princeton University. Checklist in-house testing and verification
Zhonghua Zheng. Assistant Professor, EES department, The University of Manchester. Checklist in-house testing
Sina Borzooei. IVL Swedish Environmental Research Institute. Checklist in-house testing