Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize node_keys; add way_keys #629

Closed
wants to merge 9 commits into from

Commits on Dec 28, 2023

  1. lua: use non-member functions for interop

    Currently, Tilemaker uses member functions for interop:
    
    ```lua
    function node_function(node)
        node:Layer(...)
    ```
    
    This PR changes Tilemaker to use global functions:
    
    ```lua
    function node_function()
        Layer(...)
    ```
    
    The chief rationale is performance. Every member function call needs to
    push an extra pointer onto the stack when crossing the Lua/C++ boundary.
    
    Kaguya serializes this pointer as a Lua userdata. That means every
    call into Lua has to malloc some memory, and every call back from Lua
    has to dereference through this pointer.
    
    And there are a lot of calls! For OMT on the GB extract, I counted
    ~1.4B calls from Lua into C++.
    
    A secondary rationale is that a global function is a bit more honest.
    A user might believe that this is currently permissible:
    
    ```lua
    last_node = nil
    function node_function(node)
        if last_node ~= nil
            -- do something with last_node
        end
    
        -- save the current node for later, for some reason
        last_node = node
    ```
    
    But in reality, the OSM objects we pass into Lua don't behave quite
    like Lua objects. They're backed by OsmLuaProcessing, who will move
    on, invalidating whatever the user thinks they've got a reference to.
    
    This PR has a noticeable decrease in reading time for me, measured
    on the OMT profile for GB, on a 16-core computer:
    
    Before:
    ```
    real	1m28.230s
    user	19m30.281s
    sys	0m29.610s
    ```
    
    After:
    ```
    real	1m21.728s
    user	17m27.150s
    sys	0m32.668s
    ```
    
    The tradeoffs:
    
    - anyone with a custom Lua profile will need to update it, although the
      changes are fairly mechanical
    
    - Tilemaker now reserves several functions in the global namespace,
      causing the potential for conflicts
    cldellow committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    53b48e2 View commit details
    Browse the repository at this point in the history
  2. faster tag map, faster Find()/Holds(), avoid mallocs

    Cherry-picked from
    systemed@b322166,
    systemed@5c807a9,
    systemed@13b3465
    and fixed up to work with protozero's data_view structure.
    
    Original commit messages below, the timings will vary but the idea is
    the same:
    
    Faster tagmap
    =====
    
    Building a std::map for tags is somewhat expensive, especially when
    we know that the number of tags is usually quite small.
    
    Instead, use a custom structure that does a crappy-but-fast hash
    to put the keys/values in one of 16 buckets, then linear search
    the bucket.
    
    For GB, before:
    ```
    real 1m11.507s
    user 16m49.604s
    sys 0m17.381s
    ```
    
    After:
    ```
    real	1m9.557s
    user	16m28.826s
    sys	0m17.937s
    ```
    
    Saving 2 seconds of wall clock and 20 seconds of user time doesn't
    seem like much, but (a) it's not nothing and (b) having the tags
    in this format will enable us to thwart some of Lua's defensive
    copies in a subsequent commit.
    
    A note about the hash function: hashing each letter of the string
    using boost::hash_combine eliminated the time savings.
    
    Faster Find()/Holds()
    =====
    
    We (ab?)use kaguya's parameter serialization machinery. Rather than
    take a `std::string`, we take a `KnownTagKey` and teach Lua how to
    convert a Lua string into a `KnownTagKey`.
    
    This avoids the need to do a defensive copy of the string when coming
    from Lua.
    
    It provides a modest boost:
    
    ```
    real 1m8.859s
    user 16m13.292s
    sys 0m18.104s
    ```
    
    Most keys are short enough to fit in the small-string optimization, so
    this doesn't help us avoid mallocs. An exception is `addr:housenumber`,
    which, at 16 bytes, exceeds g++'s limit of 15 bytes.
    
    It should be possible to also apply a similar trick to the `Attribute(...)`
    functions, to avoid defensive copies of strings that we've seen as keys
    or values.
    
    avoid malloc for Attribute with long strings
    =====
    
    After:
    
    ```
    real	1m8.124s
    user	16m6.620s
    sys	0m16.808s
    ```
    
    Looks like we're solidly into diminishing returns at this point.
    cldellow committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    c4518f3 View commit details
    Browse the repository at this point in the history
  3. try to avoid lock contention on AttributeStore

    On a 48-core machine, I still see lots of lock contention.
    AttributeStore:add is one place.
    
    Add a thread-local cache that can be consulted without taking the shared
    lock. The intuition here is that there are 1.3B objects, and 40M
    attribute sets. Thus, on average, an attribute set is reused 32 times.
    
    However, average is probably misleading -- the distribution is likely not
    uniform, e.g. the median attribute set is probably reused 1-2 times, and
    some exceptional attribute sets (e.g. `natural=tree` are reused thousands of times).
    
    For GB on a 16-core machine, this avoids 27M of 36M locks.
    cldellow committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    89f43ea View commit details
    Browse the repository at this point in the history
  4. RelationScanStore: more granular locks

    On a 48-core machine, this phase currently achieves only 400% CPU usage,
    I think due to these locks
    cldellow committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    c87497d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f6807c4 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    515a021 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    6ba38b0 View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2023

  1. generalize node_keys; add way_keys

    This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes systemed#402.
    
    I'm not too sure if this is generally useful - it's useful for one of my
    use cases, and I see someone asking about it in systemed#190
    and, elsewhere, in onthegomap/planetiler#99
    
    If you feel it complicates the maintainer story too much, please reject.
    
    The goal is to reduce memory usage for users doing thematic extracts by
    not indexing nodes that are only used by uninteresting ways.
    
    For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node
    store. By contrast, if your interest is only to build a railway map, you
    require only ~8M nodes, needing 70MB of RAM. Or, to build a map of
    national/provincial parks, 12M nodes and ~120MB of RAM.
    
    Currently, a user can achieve this by pre-filtering their PBF using
    osmium-tool. If you know exactly what you want, this is a good
    long-term solution. But if you're me, flailing about in the OSM data
    model, it's convenient to be able to tweak something in the Lua script
    and observe the results without having to re-filter the PBF and update
    your tilemaker command to use the new PBF.
    
    Sample use cases:
    
    ```lua
    -- Building a map without building polygons, ~ excludes ways whose
    -- only tags are matched by the filter.
    way_keys = {"~building"}
    ```
    
    ```lua
    -- Building a railway map
    way_keys = {"railway"}
    ```
    
    ```lua
    -- Building a map of major roads
    way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}`
    ```
    
    Nodes used in ways which are used in relations (as identified by
    `relation_scan_function`) will always be indexed, regardless of
    `node_keys` and `way_keys` settings that might exclude them.
    
    A concrete example, given a Lua script like:
    
    ```lua
    function way_function()
      if Find("railway") ~= "" then
        Layer("lines", false)
      end
    end
    ```
    
    it takes 13GB of RAM and 100 seconds to process North America.
    
    If you add:
    
    ```lua
    way_keys = {"railway"}
    ```
    
    It takes 2GB of RAM and 47 seconds.
    
    Notes:
    
    1. This is based on `lua-interop-3`, as it interacts with files that are
       changed by that. I can rebase against master after lua-interop-3 is
       merged.
    
    2. The names `node_keys` and `way_keys` are perhaps out of date, as they
       can now express conditions on the values of tags in addition to their
       keys. Leaving them as-is is nice, as it's not a breaking change.
       But if breaking changes are OK, maybe these should be
       `node_filters` and `way_filters` ?
    
    3. Maybe the value for `node_keys` in the OMT profile should be
       expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`?
       This would avoid issues like systemed#337
    
    4. This also adds a SIGUSR1 handler during OSM processing, which prints
       the ID of the object currently being processed. This is helpful for
       tracking down slow geometries.
    cldellow committed Dec 29, 2023
    Configuration menu
    Copy the full SHA
    3c1740a View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2023

  1. fix multipolygon handling

    D'oh, I thought this could be handled with fewer cases, but I don't
    think that's the case.
    
    If it's a multipolygon: whether to accept or not is a function of
    wayKeys filtering.
    
    If it's not a multipolygon: whether to accept or not is the result of
    relation_scan_function.
    cldellow committed Dec 30, 2023
    Configuration menu
    Copy the full SHA
    2a0abf0 View commit details
    Browse the repository at this point in the history