-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update climate_data_processing_with_dask_extremly_slow.md
- Loading branch information
1 parent
63c265a
commit a5a9530
Showing
1 changed file
with
19 additions
and
10 deletions.
There are no files selected for viewing
29 changes: 19 additions & 10 deletions
29
UCs-lessons-learnt/climate_data_processing_with_dask_extremly_slow.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,19 @@ | ||
## Climate data processing with dask extremly slow | ||
|
||
| Section | Value | | ||
| ------- | ---- | | ||
| Use case | UC1 | | ||
| Description | The scope is to process 1 year of hourly climate data (around 14GB in NetCDF) to produce daily statistics for selected EU cities. Using dask to (hopefully) speedup data loading and processing. <br> However, the script runtime is considerably slower when executed on FAIRiCube Hub than when executed locally. | ||
| Impact on the project | This problem causes execution time and resource consumption to increase exponentially (up to 10x in some cases). | | ||
| Component | Storage, CPU, RAM and Network | | ||
| Potential solution | Traditional file formats (e.g. tiff, netCDF) cause a lot of network traffic and slow down the computation when the file resides on the cloud. <br> Cloud-optimized format like COG, zarr are designed to overcome this problem. | | ||
| Solution benefits | The use of cloud-optimised formats results in exponentially better performance (in terms of execution time and resources consumed) than traditional formats such as NetCDF. | | ||
# Climate data processing with dask extremly slow | ||
|
||
## Use Case | ||
UC1 | ||
|
||
## Description | ||
The scope is to process 1 year of hourly climate data (around 14GB in NetCDF) to produce daily statistics for selected EU cities. Using dask to (hopefully) speedup data loading and processing. However, the script runtime is considerably slower when executed on FAIRiCube Hub than when executed locally. | ||
|
||
## Impact on the project | ||
This problem causes execution time and resource consumption to increase exponentially (up to 10x in some cases). | ||
|
||
## Component | ||
Storage, CPU, RAM and Network | ||
|
||
## Potential solution | ||
Traditional file formats (e.g. tiff, netCDF) cause a lot of network traffic and slow down the computation when the file resides on the cloud. <br> Cloud-optimized format like COG, zarr are designed to overcome this problem. | ||
|
||
## Solution benefits | ||
The use of cloud-optimised formats results in exponentially better performance (in terms of execution time and resources consumed) than traditional formats such as NetCDF. |