Idea for improving speed and memory usage #94

hdoupe · 2019-10-31T21:41:01Z

Tax-Brain has been somewhat limited on Compute Studio because it has hit memory problems when running the calculations for each year in parallel. Now that C/S supports dask clusters, we should see how much of a speed up we can get for Tax-Brain. In OG-USA, @jdebacker found that passing a Calculator object from one process to another using the distributed client causes memory problems, but things work fine if you create the calculator object in the process where the calculations will be run and just advance it to the correct year there (https://github.com/PSLmodels/OG-USA/pull/496#issuecomment-542953090). So, my question is: Can this approach work for Tax-Brain, too?

The text was updated successfully, but these errors were encountered:

andersonfrailey · 2019-11-01T21:02:09Z

@hdoupe, I'm definitely down to try this approach. If I'm understanding the process you're describing correctly, what we'd need to do is create a new function in calculator that we will create each calculator object, advance/run that calculator, then pass all the results back for aggregation/presentation. Does that sound about right?

hdoupe · 2019-11-01T21:34:32Z

Yep, you got it.

andersonfrailey · 2019-11-01T21:41:05Z

Sweet. Definitely down to give it a shot. Do you think this would cause any issues for users running Tax-Brain locally? That would be a lot of calculator creation for a personal computer to handle. Maybe we could add an argument to the run method of TaxBrain that would either run Tax-Brain as it currently runs (only two calculators created) or in this new method, depending on its argument. This might make maintenance a tad bit tougher, but I don't think it'd be a significant challenge.

jdebacker · 2021-02-19T03:23:42Z

@andersonfrailey You should be able to have this work well locally and on C/S. You can have an argument for the Dask client and have it default to None. Tax-Brain users running on their own machines may never touch it, but you can set it to what you want for Compute-Studio runs.

e.g. in OG-USA's execute.runner() function:

def runner(output_base, baseline_dir, test=False, time_path=True,
           baseline=True, iit_reform={}, og_spec={}, guid='',
           run_micro=True, tax_func_path=None, data=None, client=None,
           num_workers=1):

We calling functions like this for Compute Studio, we create a client in functions.py.

hdoupe changed the title ~~Idea for fixing memory problems~~ Idea for improving speed and memory usage Oct 31, 2019

andersonfrailey mentioned this issue Nov 2, 2019

WIP: Create individual calculators in Compute Studio runs #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea for improving speed and memory usage #94

Idea for improving speed and memory usage #94

hdoupe commented Oct 31, 2019

andersonfrailey commented Nov 1, 2019

hdoupe commented Nov 1, 2019

andersonfrailey commented Nov 1, 2019

jdebacker commented Feb 19, 2021

Idea for improving speed and memory usage #94

Idea for improving speed and memory usage #94

Comments

hdoupe commented Oct 31, 2019

andersonfrailey commented Nov 1, 2019

hdoupe commented Nov 1, 2019

andersonfrailey commented Nov 1, 2019

jdebacker commented Feb 19, 2021