Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efficiency of network editing functions #762

Open
jamesuber opened this issue Nov 22, 2023 · 6 comments
Open

efficiency of network editing functions #762

jamesuber opened this issue Nov 22, 2023 · 6 comments

Comments

@jamesuber
Copy link
Member

I've had the chance the put the network editing functions through their paces using a network skeletonization use-case for large network models. This uses epanet as the backend store of network knowledge, and uses a boost graph representation of the network to support the skeletonization process. As changes are made in the network links and nodes, those changes are sent to epanet through the model editing api functions.

So this is an application where you are doing a lot of model editing. Very unlike support for model editing features in the UI. I was surprised to find that over 95% of the cpu time was spent in epanet, cause I figured all the heavy lifts would be in the graph skeletonization algorithms.

Talking to @LRossman about this, he thinks that the time is probably being spent on memory re-allocation. So this ticket is asking whether those memory requests can be streamlined somehow.

@jamesuber
Copy link
Member Author

by the way, I don't think that network skeletonization is the only use case that would be hit by this. Another good example is using the epanet API to build models from scratch GIS data sources.

@LRossman
Copy link
Collaborator

The potential bottlenecks might be the EN_addnode, EN_deletenode, EN_addlink, and EN_deletelink functions. The EN_add... functions require a total reallocation of the Node or Link struct arrays every time just a single node or link is added to a project. One way to avoid this is to add a Capacity property to these structs. When a new node or link is added and the total number is below the Capacity then a reallocation is not needed. Once the number of nodes/links reaches Capacity then the latter is increased by some amount and a new reallocation is made for this new Capacity. This same strategy is currently used with Curve elements (see the declaration of Scurve in types.h and the resizecurve function in project.c.

EN_addnode also reallocates the NodeDemand, NodeQual, NodeHead, DemandFlow, and EmitterFlow arrays whenever a new node is added. These arrays are only used during an analysis run and can be reallocated just once when an analysis is begun. Likewise, EN_addlink reallocates LinkFlow, LinkSetting, and LinkStatus when a new link is added. These re-allocations can also be postponed until a new analysis is begun.

Regarding the EN_deletenode function, it requires shifting the indices of the Node array and the indices stored in the NodeHashtable down by one whenever a single node at position index is deleted. The code for doing this looks as follows:

    for (i = index; i <= net->Nnodes - 1; i++)
    {
        net->Node[i] = net->Node[i + 1];
        // ... update node's entry in the hash table
        hashtable_update(net->NodeHashTable, net->Node[i].ID, i);
    }

The hashtable_update function has to search the table for each node who's position is above index. It might be more efficient to move this function out of the loop and replace it with the following:

    void hashtable_shiftdown(HashTable *ht, int index)
    {
        DataEntry *entry, *nextentry;
        int i;
        for (i = 0; i < HASHTABLEMAXSIZE; i++)
        {
            entry = ht[i];
            while (entry != NULL)
            {
                if (entry->data > index) (entry->data)--;
                entry = entry->next;
            }
        }
    }

One potential downside of this new function is that HASHTABLEMAXSIZE is currently set to 12800 (with most of these entries set to NULL for small to moderate sized networks), so it remains to be seen if this approach is more efficient. If so then a similar replacement for the hashtable_update function called in EN_deletelink should be made.

@jamesuber
Copy link
Member Author

jamesuber commented Nov 22, 2023 via email

@LRossman
Copy link
Collaborator

@jamesuber I pushed a new branch to dev named dev-reduced_reallocs that only re-allocates space once every 50 times when EN_addnode or EN_addlink are called. (You can change the '50' to something else at the top of epanet.c.) If you want to try it out on your project let us know if it reduces your run time by any substantial amount.

@LRossman
Copy link
Collaborator

I ran some performance tests on the dev-reduced_reallocs branch. For creating 100,000 junction nodes there was a 27% reduction in run time for a re-allocation capacity of 50, a 63% reduction for a capacity of 100, and a 91% reduction for a capacity of 500 as compared to the original EPANET (which has a re-allocation capacity of 1). This suggests that for optimal performance the re-allocation capacity factor should be made adjustable based on some estimate of the number of nodes to be added using the EN_addnode function (with the same holding true for EN_addlink).

@jamesuber
Copy link
Member Author

Thanks @LRossman -- I will plan to run some network skeletonization use-case tests in order to get some results from a practical application to add to yours. Will report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants