Skip to content

A Stepwise Checklist for Environmental Researchers to Conduct a Supervised Machine Learning Study

License

Notifications You must be signed in to change notification settings

starfriend10/EMBRACE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EMBRACE Checklist: Environmental Machine-learning, Baseline Reporting, And Comprehensive Evaluation

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Interactions and Contributing
  5. Exhibition and Examples
  6. License
  7. Contact
  8. Acknowledgments

About The Project

To help researchers, readers, reviewers, and editors better communicate envrionmental ML research work, we developed the EMBRACE Checklist that aims to enhance understanding of the feasibility of proposed projects, the completeness of ongoing research, and the robustness of finished work. Before adopting the checklist, it is encouraged to consult previously published review work and viewpoint for specific terminologies and concepts.

Note

  • A comprehensive review study was conducted to investigate common pitfalls and best ML practices in Environmental Science and Engineering (ESE) areas, please check: Zhu, J.-J., Yang, M., & Ren, Z. J. (2023). Machine learning in environmental research: common pitfalls and best practices. Environmental Science & Technology, 57(46), 17671-17689. https://doi.org/10.1021/acs.est.3c00026
  • A viewpoint summarizes the general usage of this checklist and advocates to better faciliate an engaged learning community in ESE, please check: Zhu, J.-J., Boehm, A. B., & Ren, Z. J. (2024). Environmental Machine-learning, Baseline Reporting, And Comprehensive Evaluation: The EMBRACE Checklist. Environmental Science & Technology, 58(45), 19909–19912. https://doi.org/10.1021/acs.est.4c09611.

Community-Owned Tool

The objective of introducing this Checklist is to develop a ease-to-use tool to provide essential and important data reporting along your environmental ML work. It only costs you 5-10 minutes to complete the 3-page checklist. The ultimate goal is to foster an active learning community and promote self-beneficial practices in environmental ML research. We encourage interested researchers to join and contribute this effort, potentially establishing standards that benefit the broader community.

  • For Researchers: The Checklist helps guidance on essential steps and requirements to develop ML models for environmental research. Clear data reporting streamlines the publication process and enhances the impact of the research work.
  • For Readers: The Checklist (shared by researchers) contains key information of the research work, making it easier to follow, understand, and utilize the data, methodology, and findings of the research.
  • For Reviewers: The Checklist (along with manuscript) provides clear information which minimizes potential confusion during the review process, allowing reviewers to better understand the study’s novelty and contributions.
  • For Editors: The Checklist (submitted by researchers along with their manuscripts) helps editors screen the work based on the scope and quality.

CL_SS

Getting Started

The checklist includes "project overview" and eight sections that follow a typical flowchart of ML model development:

  • Project overview helps to record the general information of your ongoing or finished study. You can also use it to track potential problems during your research.
  • Section 1 includes reporting study objectives and feasibility assessment, aiding researchers in evaluating their available resources more effectively.
  • Section 2 covers reporting on data sources, including types, ethics, as well as details regarding the number and quality of data points.
  • Section 3 outlines reporting on data cleaning, enrichment, feature engineering, data splitting, and final data descriptions.
  • Section 4 focuses on reporting the supervised learning methods chosen and the overall modeling framework.
  • Section 5 highlights key aspects of model evaluation and hyperparameter optimization (HPO).
  • Section 6 emphasizes the need for a deeper understanding and reporting of model interpretability, explainability, and causality.
  • Section 7 focuses verifying data leakage management.
  • Section 8 encourages the sharing of data and code when possible.

Prerequisites

Tip

We strongly recommend to read the review work and viewpoint before using the checklist. When using the checklist, you can refer to the Instructions for additional information.

Workflow

The checklist can be used at any stage of life cycle of your research, including project initialization, ongoing work, finished study, and educational learning.

Usage

Important

The checklist is designed to easy check and fill. You may consult Instructions for detailed explanations.

Format and Types

Document Save

We encourage you to share your checklist directly, so lengthy sentences or contents in a short cell can be read via reader interaction. However, you can also follow the instructions below if you prefer to save the checklist as a read-only document.

  • When using Microsoft Windows, please follow these step: “File” >> “Print” >> Select “Microsoft print to PDF” in “Printer” >> Print >> Save it as a new PDF document. CL_SS

  • When using macOS, please follow these step: CL_SS Or check this webpage for the same direction.

  • For examples of common problems and less robust applications, please refer to the Instructions.

  • For shared checklists from fellow researchers, please check the following Exhibition table.

Interactions and Contributing

  • If you find this checklist useful, please help to spread it to build an engaging community for environmental ML research. If the checklist helps your ML research, we appreciate your credit to our work. Please cite the viewpoint.
  • The best approach to share your checklist is to use it as supporting information when submitting your manuscript, so researchers (yourself), editors, reviewers, and readers can all benefit from the transparent and complete data reporting.
  • If you'd like to share your checklist data alone without used as supporting information or because of other situations (e.g., old studies/papers), we are also happy to share them. Please send your checklist to Dr. Junjie Zhu at Princeton University ([email protected] or [email protected]) via email. Please name your checklist document "Finished_date EMBRACE Checklist version_number for Your_research" (e.g., "20240901 EMBRACE Checklist 1.0 for Zhu et al. (2023)") and send an email entitled "Sharing EMBRACE Checklist from Your_name" (e.g., "Sharing EMBRACE Checklist from Junjie Zhu"). To better classify the shared checklists, please also provide your research application domain (e.g., Water Quality and Treatment) in your email and your contact email (if other than your sending email address). We will do a quick data completeness inspection and share it ASAP. It is worth noting that sharing your checklist is completely volunteer per your agreement on releasing your research information once you send your document to us. Therefore, we and other fellow researchers appreciate your sharing. If you shared your checklist and would like to update the existing document, please identify the document and explain the reason. We will revise it accordingly.
  • Sharing your checklist may enhance the impact of your research by increasing visibility among colleagues. Therefore, it is important to ensure the accuracy of reported data, which is a self-responsible QA/QC. From the prespective of other fellow researchers, the accuracy of the checklist information relies solely on the reporting researcher. While we encourage researchers to share their checklists, it is crucial to ensure responsible data reporting.
  • If you find other important items need to be added or there are any places need to be corrected, particularly if they are common and representative in ESE areas, please feel free to let us know. One strightforward way is to send an email to Dr. Junjie Zhu with your thoughts and supporting materials. Alternatively, you can post issues with clear descriptions. It is likely that your suggestion will be accepted for future development of the checklist.

Exhibition and Examples

Sharing Date Application Domain Checklist Sharing Paper Link Publication Date Author Name Author Contact Additional Info.
09/05/2024 Resource Recovery PURL DOI 05/13/2024 Meiqi Yang Email Material discovery with DLM
09/05/2024 Resource Recovery PURL DOI 03/27/2023 Meiqi Yang; Junjie Zhu Email Separation predictions with rigorous data leakage management
09/04/2024 Hydrology and Water Quantity PURL DOI 07/15/2022 Junjie Zhu Email Probabilistic predictions with 95% PI
09/04/2024 Water Quality and Treatment PURL DOI 06/15/2022 Junjie Zhu Email Metaheuristic-optimized deep learning
09/04/2024 Water Quality and Treatment PURL DOI 01/01/2018 Junjie Zhu Email Multi-objective optimized data-driven

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Contact

Junjie Zhu - @Jun_Jie_Zhu - [email protected] or [email protected]

Project Link: https://github.com/starfriend10/EMBRACE

(back to top)

Acknowledgments

Zhiyong Jason Ren. Professor, CEE department, Princeton University. Project initialization main contributor

Alexandria B. Boehm. Professor, CEE department, Stanford University. Project initialization main contributor

Meiqi Yang. Ph.D. Candidate, CEE department, Princeton University. Checklist in-house testing and verification

Zhonghua Zheng. Assistant Professor, EES department, The University of Manchester. Checklist in-house testing

Sina Borzooei. IVL Swedish Environmental Research Institute. Checklist in-house testing

About

A Stepwise Checklist for Environmental Researchers to Conduct a Supervised Machine Learning Study

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published