Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] t.rast.aggregate fails when GUI Settings → Number of threads for parallel computing set to more than 1 with active processing mask #4708

Open
Falconus opened this issue Nov 16, 2024 · 5 comments
Labels
bug Something isn't working
Milestone

Comments

@Falconus
Copy link

Falconus commented Nov 16, 2024

Description

When attempting to run the t.rast.aggregate tool with a mask, it fails when the number of threads for parallel computing is set to >1 in the GUI settings (Settings → Preferences). When it is reset to 1 and saved, the tool works as expected with no errors. When the mask is removed, it also works with no errors.

t.rast.aggregate --overwrite input=uas_dsm@assignment5b output=uas_dsm_aggr basename=uas_dsm_aggr suffix=time granularity=1 months nprocs=1
WARNING: Parallel processing disabled due to active MASK.
Traceback (most recent call last):
  File "/usr/local/grass84/scripts/t.rast.aggregate", line
245, in <module>
    main()
  File "/usr/local/grass84/scripts/t.rast.aggregate", line
195, in main
    output_list = tgis.aggregate_by_topology(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/grass84/etc/python/grass/temporal/aggrega
tion.py", line 383, in aggregate_by_topology
    process_queue.put(mod)
  File "/usr/local/grass84/etc/python/grass/pygrass/modules/
interface/module.py", line 253, in put
    self.wait()
  File "/usr/local/grass84/etc/python/grass/pygrass/modules/
interface/module.py", line 311, in wait
    proc.wait(),
    ^^^^^^^^^^^
  File "/usr/local/grass84/etc/python/grass/pygrass/modules/
interface/module.py", line 859, in wait
    raise CalledModuleError(
grass.exceptions.CalledModuleError: Module run `r.series fil
e=/media/christopher/Data/GIS_584/GRASS/Lake_Wheeler_NCspm/a
ssignment5b/.tmp/christopher-desktop/771233.0 method=average
nprocs=2 memory=122880
output=uas_dsm_aggr_2015_09_01T00_00_00 --o --q` ended with
an error.
The subprocess ended with a non-zero return code: -11. See
errors above the traceback or in the error output.
(Fri Nov 15 23:55:38 2024) Command ended with non-zero return code 1 (2 sec)    

To reproduce

  1. Set a processing mask
  2. From the Settings drop-down menu, select the "Preferences" menu item.
  3. In the tools tab in the "GUI Settings" window, set the number of threads for parallel computing to a number greater than 1 (I tried 2 and 16, with the same results)
  4. Run the t.rast.aggregate tool. I used the following parameters: t.rast.aggregate --overwrite input=uas_dsm@assignment5b output=uas_dsm_aggr basename=uas_dsm_aggr suffix=time granularity=1 months nprocs=1. The nprocs flag had no effect, regardless of whether it was set at 1 or 16.
  5. Observe tool failure

Expected behavior

Tool should not fail due to mask if default nprocs are set to >1.

Screenshots

System description

                            
System Info                                                                     
GRASS version: 8.4.1dev                                                         
Code revision: cd76f8b7d1                                                       
Build date: 2024-11-15                                                          
Build platform: x86_64-pc-linux-gnu                                             
GDAL: 3.9.3                                                                     
PROJ: 9.6.0                                                                     
GEOS: 3.12.2                                                                    
SQLite: 3.45.1                                                                  
Python: 3.12.3                                                                  
wxPython: 4.2.2                                                                 
Platform: Linux-6.8.0-48-generic-x86_64-with-glibc2.39                          
                                                                                
python3 -c import sys, wx; print(sys.version); print(wx.version())
3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]
4.2.2 gtk3 (phoenix) wxWidgets 3.2.6

Workaround

Set the default nprocs to 1 or remove the mask for the t.rast.aggregate step.

@Falconus Falconus added the bug Something isn't working label Nov 16, 2024
@Falconus Falconus changed the title [Bug] t.rast.aggregate fails when GUI Settings → Number of threads for parallel computing set to more than 1 [Bug] t.rast.aggregate fails when GUI Settings → Number of threads for parallel computing set to more than 1 with active processing mask Nov 16, 2024
@veroandreo
Copy link
Contributor

Thanks for your report @Falconus. If you only use commands, i.e., set the nprocs parameter in t.rast.aggregate instead of via GUI, does it also fail? Would you mind creating a command line reproducible example with North Carolina dataset?

@Falconus
Copy link
Author

Using the time series dataset from the Data page:

#Note: set default processors to >1 to test
t.info input=LST_Day_monthly@modis_lst

#Set region to maximum extent of LST_Day_monthly
g.region n=760180.124115 s=-415819.875885  e=1550934.464115 w=-448265.535885 -pa

#Probably not relevant, but I did this anyway
t.rast.series -n input=LST_Day_monthly@modis_lst method=count output=intersection

#Set region to subset
g.region n=550997 s=156914 e=626823 w=-56871

#Export region to polygon
v.in.region output=mask_area

#Reset region back to max extent
g.region n=760180.124115 s=-415819.875885  e=1550934.464115 w=-448265.535885 -pa

#Set mask from polygon
r.mask vector=mask_area@modis_lst

#The following two tasks fail. Note the granularity is set to 2 months, since 1 month doesn't do anything (nothing to aggregate)
t.rast.aggregate input=LST_Day_monthly output=LST_aggr basename=LST_ suffix=time granularity="2 months" method=average
t.rast.aggregate input=LST_Day_monthly output=LST_aggr basename=LST_ suffix=time granularity="2 months" method=average nprocs=1 --overwrite

#Remove mask
r.mask -r

#The following task succeeds without mask
t.rast.aggregate input=LST_Day_monthly output=LST_aggr basename=LST_ suffix=time granularity="2 months" method=average --overwrite 

@petrasovaa
Copy link
Contributor

I think this is the same as is #4297. There may be other tools impacted.

@petrasovaa petrasovaa added this to the 8.4.1 milestone Nov 20, 2024
@ninsbl
Copy link
Member

ninsbl commented Nov 20, 2024

Yes, I guess e.g. t.rast.series is affected the same way. But virtually every Python module that uses OpenMP parallelized modules under the hood would be affected.

Ideally, this is fixed in the Python modules, I guess. We could probably create a library function that:

  1. checks if a mask is present and deactivates OpenMP parallelism (if module is parallelized that way) or if no mask is present
  2. passes an "nprocs" parameter down to the parallelized modules or
  3. for temporal modules checks how many parallel module calls will be executed and then
    a. runs N number of modules in parallel if N module calls >= nprocs or
    b. distributes nprocs across inner processes (each single module call with nprocs > 1) and outer processes (N parallel module calls > 1) if N parallel module calls is < nprocs

Not sure if case b could be written to find an optimal balance between inner and outer processes.

We use a very simplistic approach for something like this here:
https://github.com/NVE/actinia_modules_nve/blob/762f55bac991c1b5424e87d04340d435800c0b0c/src/temporal/t.pytorch.predict/t.pytorch.predict.py#L695

@petrasovaa
Copy link
Contributor

petrasovaa commented Nov 22, 2024

I need to think this through more, but it seems to me there are 2 separate issues, one needs to be fixed in the C tools (#4297) and the other one is how to deal with the nprocs parameter in the python temporal tools (and there is also the environment variable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants