Skip to content

Commit

Permalink
Update Docs
Browse files Browse the repository at this point in the history
  • Loading branch information
AbdelrahmanAmr3 committed Mar 27, 2024
1 parent 9ec21f0 commit c81f769
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 19 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,10 @@ EarthStat's Library Workflows's Notebooks:

Inspired by my engagement with the AgML community's "Regional Crop Yield Forecasting" challenge, I created a Python library designed to set benchmarks for Machine Learning (ML) models. The library presents an efficient workflow for extracting statistical information from big remote sensing and climate datasets. Currently, The library presents two workflows. First, for dealing with GeoTIFF files, as the main workflow of EarthStat. In addition, it presents a unique workflow for AgERA5 datasets, which gives the user the power to download a huge amount of different variables using CDS API, extended to extract and aggregate all downloaded data. EarthStat's workflows provide multiprocessing and GPU for parallel computation as an option. This library is particularly suited for creating statistical information datasets for ML models or for environmental analyses and monitoring.

## EarthStat Workflow
## EarthStat Main Workflow
This diagram illustrates the workflow of the geospatial data processing implemented in EarthStat from the initialized dataset to the created CSV file.

![Geospatial Data Processing Workflow](docs/assests/workflow.png)

## xEarthStat Workflow
This diagram illustrates the workflow of xEearthStat for AgERA5 data processing.

![Geospatial Data Processing Workflow](docs/assests/xES_workflow.png)

## EarthStat Main Workflow Features

EarthStat revolutionizes the extraction of statistical information from geographic data, offering a seamless workflow for effective data management:
Expand All @@ -56,13 +50,19 @@ EarthStat revolutionizes the extraction of statistical information from geograph

- **Efficient Parallel Processing:** Leverages the power of multiprocessing, significantly accelerating data processing across extensive datasets for quicker, more efficient computation.

## xEarthStat Workflow For AgERA5
This diagram illustrates the workflow of xEearthStat for AgERA5 data processing.

![xEarthStat Workflow](docs/assests/xES_workflow.png)


## EarthStat Main Workflow Features
- **Unlimited AgERA5 Data Downloads**: The EarthStat workflow enables users to bypass the limitations of the CDS server, allowing for the download of any quantity of data for the required variables.
- **Fully Automated**: This library is entirely automated and does not require any prior Python knowledge. Users simply need to select the variables for download and aggregation, specify the start and end years to determine the data volume, and define the shapefile containing the geometry objects.
- **Parallel Computation**: EarthStat workflow intelligently detects GPU availability to shift aggregation processes for parallel computation on the GPU. It also offers users the option to leverage available CPU cores for multiprocessing (Parallel Execution), enhancing I/O-bound tasks.
- **Aggregated Data as CSV**: Ultimately, the workflow provides users with a neatly organized CSV file, compiling all downloaded and aggregated variables.

### EarthStat Google Colab Performance
### xEarthStat Workflow Performance on Google Colab
This table demonstrates the workflow's performance across various configurations, ranging from multiprocessing to GPU usage for parallel computation by using Google Colab.

| Data | Variables | Number of Geo-Objects | Dataset | Processing Unit | Time (Run: One Time) min |
Expand Down
18 changes: 9 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,10 @@ EarthStat's Library Workflows's Notebooks:

Inspired by my engagement with the AgML community's "Regional Crop Yield Forecasting" challenge, I created a Python library designed to set benchmarks for Machine Learning (ML) models. The library presents an efficient workflow for extracting statistical information from big remote sensing and climate datasets. Currently, The library presents two workflows. First, for dealing with GeoTIFF files, as the main workflow of EarthStat. In addition, it presents a unique workflow for AgERA5 datasets, which gives the user the power to download a huge amount of different variables using CDS API, extended to extract and aggregate all downloaded data. EarthStat's workflows provide multiprocessing and GPU for parallel computation as an option. This library is particularly suited for creating statistical information datasets for ML models or for environmental analyses and monitoring.

## EarthStat Workflow
## EarthStat Main Workflow
This diagram illustrates the workflow of the geospatial data processing implemented in EarthStat from the initialized dataset to the created CSV file.

![Geospatial Data Processing Workflow](docs/assests/workflow.png)

## xEarthStat Workflow
This diagram illustrates the workflow of xEearthStat for AgERA5 data processing.

![Geospatial Data Processing Workflow](docs/assests/xES_workflow.png)

![Geospatial Data Processing Workflow](assests/workflow.png)
## EarthStat Main Workflow Features

EarthStat revolutionizes the extraction of statistical information from geographic data, offering a seamless workflow for effective data management:
Expand All @@ -58,13 +52,19 @@ EarthStat revolutionizes the extraction of statistical information from geograph

- **Efficient Parallel Processing:** Leverages the power of multiprocessing, significantly accelerating data processing across extensive datasets for quicker, more efficient computation.

## xEarthStat Workflow For AgERA5
This diagram illustrates the workflow of xEearthStat for AgERA5 data processing.

![xEarthStat Workflow](assests/xES_workflow.png)


## EarthStat Main Workflow Features
- **Unlimited AgERA5 Data Downloads**: The EarthStat workflow enables users to bypass the limitations of the CDS server, allowing for the download of any quantity of data for the required variables.
- **Fully Automated**: This library is entirely automated and does not require any prior Python knowledge. Users simply need to select the variables for download and aggregation, specify the start and end years to determine the data volume, and define the shapefile containing the geometry objects.
- **Parallel Computation**: EarthStat workflow intelligently detects GPU availability to shift aggregation processes for parallel computation on the GPU. It also offers users the option to leverage available CPU cores for multiprocessing (Parallel Execution), enhancing I/O-bound tasks.
- **Aggregated Data as CSV**: Ultimately, the workflow provides users with a neatly organized CSV file, compiling all downloaded and aggregated variables.

### EarthStat Google Colab Performance
### xEarthStat Workflow Performance on Google Colab
This table demonstrates the workflow's performance across various configurations, ranging from multiprocessing to GPU usage for parallel computation by using Google Colab.

| Data | Variables | Number of Geo-Objects | Dataset | Processing Unit | Time (Run: One Time) min |
Expand Down
2 changes: 1 addition & 1 deletion earthstat/xES/DailyDatasetBuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,6 @@ def _daily_datasets(self, folder):
if aggregated_data:
df = pd.DataFrame(aggregated_data)
df.to_csv(
f'{self.area_name}_Aggregated_Daily/AgERA5_{self.area_name}_{ds_variable}_dekadal.csv', index=False)
f'{self.area_name}_aggregated_daily_csv/AgERA5_{self.area_name}_{ds_variable}_dekadal.csv', index=False)
else:
print(f"No data found for {ds_variable}")
2 changes: 1 addition & 1 deletion earthstat/xES/DekadalDatasetBuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ def _dekadal_datasets(self, folder):
if aggregated_data:
df = pd.DataFrame(aggregated_data)
df.to_csv(
f'{self.area_name}_Aggregated_Dekadal/AgERA5_{self.area_name}_{ds_variable}_dekadal.csv', index=False)
f'{self.area_name}_Aggregated_dekadal_csv/AgERA5_{self.area_name}_{ds_variable}_dekadal.csv', index=False)
else:
print(f"No data found for {ds_variable}")

Expand Down

0 comments on commit c81f769

Please sign in to comment.