Long non-linear run times on large models #174

jrawbits · 2022-02-28T18:03:17Z

Based on timing reported in log files from running multiple scenarios in the H-GAC models, a few of the modules appear to have undesirable algorithmic properties (in particular, with runtimes suggesting order N squared or worse). It is desirable to delve into how these modules do their work and see if there is some way to reduce the complexity - all of the "victims" are doing some kind of sampling and balancing toward target proportions, and we'll easily hit order N-squared if the process requires doing N samples N times. That may be inevitable, but perhaps there's a craftier sampling algorithm out there... I'll edit this issue with the names of the affected modules once I've downloaded the enormous set of results...

The modules I'm starting with are PredictWorkers, and CalculateVehicleOwnCost - the latter runtime collapses whenever pay-as-you-drive is set up in the scenario.

jrawbits · 2022-03-03T17:03:14Z

The problem appears to be using the sample function in many places in R to generate a set of values that are randomly distributed according to a probability / proportion. The sample function without replacement (essentially classifying the stuff we're sampling from) is excruciatingly slow and scales very badly to big samples from big populations.

There is a much faster algorithm available and even though it appears to scale to cases with probabilities, it is really only valid as a "non-probability" sampler. To sample with probabilities and get something like the same answer requires iterating once over (on average) half the population for each required worker (removing any selected worker, then rescaling the remaining probabilities). I'll keep researching for a while, but for now I'm just going to pursue them module fixups mentioned in the next comment.

jrawbits · 2022-03-08T13:48:50Z

Since I have the modules open, I'm making some other cleanups to the estimation and documentation builds (and getting rid of some R CMD check warnings along the way), and I'll put that up as a pull request soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long non-linear run times on large models #174

Long non-linear run times on large models #174

jrawbits commented Feb 28, 2022 •

edited

Loading

jrawbits commented Mar 3, 2022 •

edited

Loading

jrawbits commented Mar 8, 2022 •

edited

Loading

Long non-linear run times on large models #174

Long non-linear run times on large models #174

Comments

jrawbits commented Feb 28, 2022 • edited Loading

jrawbits commented Mar 3, 2022 • edited Loading

jrawbits commented Mar 8, 2022 • edited Loading

jrawbits commented Feb 28, 2022 •

edited

Loading

jrawbits commented Mar 3, 2022 •

edited

Loading

jrawbits commented Mar 8, 2022 •

edited

Loading