π Welcome!
Read more about the 2022 planet import challenge:
- Blog post: https://www.openstreetmap.org/user/mmd/diary/400113
- Technical details: https://wiki.openstreetmap.org/wiki/User:Mmd/Planet_import_challenge_22
- Performance optimized experimental Overpass API fork 0.7.59_mmd
- Based on 0.7.57.0, plus selected functional additions and bugfixes included in 0.7.58.* releases
-
Supporting CMake based build, in addition to autotools.
- Unity builds (USE_UNITY_BUILD = ON) for faster compilation
cpack
: creates debian packages- Various
HAVE_*
flags to control libraries and features in use
-
Libosmium-based output modes:
- PBF/OPL format output: setting
[out:pbf]
and[out:opl]
- LZ4 compression for PBF output:
[out:pbf(lz4)]
- Libosmium based XML output:
[out:osmxml]
- faster output; Overpass API-specific OSM XML format extensions are not supported
- PBF/OPL format output: setting
-
Regular expressions:
- ICU regular expressions: setting
[regexp:ICU]
- PCRE2 (+JIT) regular expression engine: setting
[regexp:PCRE]
or[regexp:PCREJIT]
- ICU regular expressions: setting
-
New/improved statements:
-
XAPI-like union operation for tag values (syntactic sugar):
node[place=city|town|village];
query syntax can be used instead ofnode['place'~'^(city|town|village)$'];
-
New query statement filter 'has no key like ...' -
node[amenity=recycling][!~"^recycling:"];
(Related upstream issue: drolbr#589) -
New function
all_vertex()
which evaluates to true, if all vertices of a way fulfill a given expression.Query
way({{bbox}})[building] (if:lrs_in(1,per_vertex(abs(angle()) > 170)));
can now be rewritten as:
way({{bbox}})[building] (if:!all_vertex(abs(angle()) <= 170));
-
Ad-hoc area creation on any closed way/relation
Example query for ad-hoc area creation (click to open)
``` way[landuse=residential]({{bbox}}); foreach ->.pivot { ( .pivot; node(w.pivot); ); ( make_area [.pivot]; .result;)->.result; } rel[type=multipolygon][landuse=residential]({{bbox}}); foreach ->.pivot { ( way(r.pivot); node(w); ); ( make_area [.pivot]; .result;)->.result; } foreach .result -> .area { way[building](area.area); if(count(ways) == 0) { way(pivot.area); out geom meta; rel(pivot.area); out geom meta; } } ```
-
-
Dockerfile to facilitate building 0.7.59_mmd and 0.7.56 binaries (see
docker/
directory) -
PBF planet and diff files can be imported without any external PBF->XML conversion tools. By avoiding expensive XML parsing, and leveraging libosmium parallel file processing, imports see a significant speedup. To enable this new file importer, add command line parameter
--use-osmium
when callingupdate_from_dir
orupdate_database
. Command line parameter-f
can be used to override the default input file format (PBF). See https://osmcode.org/file-formats-manual/ for permitted values. -
PBF planet initial load supports LocationsOnWays extension. When using osmium tool to create the PBF file, it's essential to use the,
--keep-untagged-nodes option
to keep the untagged nodes in the output file.osmium add-locations-to-ways -n extract.osm.pbf -o extract_low.osm.pbf
-
map_demo: alternative implementation for API 0.6 /map call for very large areas with > 100 million nodes. Returns PBF file format.
-
Improved dispatcher process security for use in multi-user environments (upstream issue 247)
-
Improved dispatcher signal handling for SIGTERM and SIGINT, same behavior as
--terminate
command line parameter -
Environment variables:
OVERPASS_MAX_TIMEOUT
: global override for maximum permitted[timeout:...]
value.OVERPASS_MAX_ELEMENT_LIMIT
: global override for maximum permitted[maxsize:...]
value.OVERPASS_FCGI_MAX_REQUESTS
: number of FastCGI requests beforeinterpreter
process is being terminated (when idle)OVERPASS_FCGI_MAX_ELAPSED_TIME
: maximum time after which FastCGI process is being terminated (when idle)OVERPASS_REGEXP_ENGINE
: default regexp engine to use if none is specified in Overpass QL settings. Possible values include:POSIX
,ICU
,PCRE
andPCREJIT
. PCREJIT is recommended for best performance.OVERPASS_LOG_LEVEL
: define transactions.log log level. Available levels: 0 (error), 1 (warn), 2 (info), 3 (debug, default value), 4 (trace)OVERPASS_SHARED_NAME_SUFFIX
: define /dev/shm/osm3s*... shared memory file suffix, allowing multiple parallel Overpass instances on one systemOVERPASS_MAX_SPACE_LIMIT
: maximum size of a FastCGI process's virtual memory (address space), in bytes. Default value: 2^33 (=8 GiB); memory will be unlimited when the parameter value is set to 0.
- Reject unsupported area filter for areas (
area(area)
) - Reject empty poly statement (
(poly)
) - Print error messages in CSV output mode. Errors can be detected as empty line followed by the error message.
- Fix replication scripts apply_osc_to_db.sh and fetch_osc.sh which now handle the global state.txt file correctly.
- Fix Timestamp constructor may trigger segmentation fault (upstream issue 625)
- Fix Change_Entry comparison operator bug (upstream issue 623)
- Avoid accumulating area_blocks in foreach loop (upstream issue 568)
- Fixed some memory leaks in test classes
- Validate object ids during import (when using osmium based importer), rejecting object ids which are too large to be stored in 40 bits / 32 bits respectively.
- Use of 40 bit node ids.
- Due to upstream issue 465, node ids were already limited to 42 bits before.
- LZ4 can no longer compress 40 bit node ids, resulting in a slightly worse compression ratio (average 69% before --> 75% fixed now).
- Way nodes stored as varint using protozero (original idea: drolbr#250)
- Node changelog is replaced by node change packages. They form a timestamp based delta encoded list of node ids, similar to way nodes. Also, old_idx and new_idx fields in Change_Entry have been removed (upstream issue 654). Both changes result in an overall size reduction from >60GB down to 6.5GB for node changelog details.
- Timestamp data type has been changed from 40 bit to 32 bit, with year 2000 as baseline. Supporting 16384 years for OSM object metadata seemed a bit excessive. The new data type still covers years 2000..2063.
Three conversion tools are available for easy conversion from the official database file format to the custom one.
- Conversion tool to convert 0.7.56 clone database into 0.7.59_mmd format (runtime ~3h). See upstream/README_0_7_57_1_patched.md for details.
- Conversion tool to created tagged nodes table absent in 0.7.56 database (
create_tagged_nodes [db_dir]
) - Conversion tool to create node_changepack.bin based on an existing node_changelog.bin (
create_node_changepack [db_dir]
)
All conversion tools need to be executed without an active dispatcher instance.
A full attic database using lz4 compression for bin+map needs about 400G on 0.7.59 (mmd), based on 01/2023 data.
(List does not include some rather technical changes)
-
FastCGI support: avoid starting a new
interpreter
binary for each request, thereby enabling further caching options (upstream pull request 383) -
Index and username caching: further taking advantage of FastCGI, database indices and usernames are only updated once per minute, and can be reused for many queries.
-
epoll based dispatcher processing for better scalability. This replaces round robin based unix domain socket polling with 10ms..100ms time intervals.
-
Hybrid array/bitmap container for better memory efficiency of large query statements (subset of features found in roaring bitmaps, https://arxiv.org/pdf/1603.06549.pdf)
-
Avoid object instantiation to access node/way/relation properties wherever possible in osm3s template db backend by leveraging CRTP.
-
Attic: speed up object reconstruction, avoid expensive copying of objects
-
Parallel processing support for update_from_dir/update_database. New parameter
--parallel=n
, where n denotes the max. number of parallel processes -
around
statement:- bbox based pruning. Calculate bounding box for each way and avoid O(nΒ²) great circle distance calculations, if bounding boxes don't intersect. (upstream issue 167, > 120x speedup).
- Avoid excessive object allocations by replacing vector by tuple.
- condense_ranges as standalone function: improve range expansion for global queries (example: https://overpass-turbo.eu/s/1azD -> 180x speedup).
- Reduce memory consumption by releasing temporary structure immediately after statement execution.
-
Area caching, use of memoization to avoid expensive calculation
-
Lazy way geometry store loading for "out qt" to reduce memory consumption, also in attic mode. In addition avoid recalculation of loop invariants in Way_Geometry_Store.
-
Ignore bounding box if it covers the whole globe.
-
Lazy
(if: )
filter evaluation to short circuit boolean expression evaluation. -
(if: )
expression evaluation now uses std::variant for intermediate results to avoid back and forth conversions from number to string types, resulting in significant speedups forangle()
et al. -
Changeset based
(changed: )
filtering to process huge changesets in Achavi. Additional caching inchanged
to avoid recalculation of already available results.
Example syntax for changeset 46503970: (syntax is subject to change)
[adiff:"2017-03-01T20:28:34Z","2017-03-01T20:28:42Z"];
(node(changed!46503970);way(bn);way(changed!46503970);relation(changed!46503970););out meta geom qt;
- Separate nodes table for fast lookup of tagged nodes. This way only ~3% of all nodes need to be processed for tag node queries.
- union statement optimization to avoid unnecessary copying of so called "stack frames". Speeds up typically used
(._; .result;)->.result;
statements inside foreach loops. - Use lz4 compression by default for bin and map files
- LZ4: fallback to uncompressed input, if result size increases during compression
- Using fmt library to print fixed decimal values
- Reduce memory consumption for attic tag queries (collect_attic_kregv and al.)
- Significant reduction in memory consumption for
out qt;
based output mode using index based prefetching. - Expensive health check: use linear complexity approach in collect_items_range and similar functions.
- Use of SharedDataPointer to reduce cost of copying large objects (ways, relations, areas, etc.)
- Parallel clone generation. Number of parallel threads can be set using the
--clone-parallel=n
command line parameter. - Use of C++17 string_view to execute key value checks on raw data, avoiding copying altogether. This includes ICU and PCRE(JIT) regular expressions. Note: POSIX regular expressions still require additional copying due to API limitations.
- Key value queries to validate first element of a new index only. Immediately skip all elements, in case key value pairs are not matching.
- Replace std::map based user data cache by std::vector, also avoid expensive sorting operations for millions of user id/display_name pairs.
- Use monobound_binary_search instead of std::binary_search for some hot code paths.
- Make updater and backend use std::vector instead of std::set, and std::unordered_map instead of std::map
- Added multi-threaded processing in updater
- Many more micro-optimizations, see git changelog
[out:custom]
and[out:popup]
output modes- Public transport diagrams (sketch route tools)
- local (localize)
- XAPI compatibility layer (obsolete, no real world usage anymore)
- Compile time option
--disable-overpassxml
to remove support for Overpass XML dialect - Support for jsonp trick. Obsoleted nowadays by CORS.
- clang tidy: modernize/performance (see commit log for details)
- Removed some old content in doc/
- Full attic database creation 4-5 days (currently: > 24 days)
- Initial area creation (using 0.7.57 area creation rules): 28.5 min
- Dispatcher process can handle 5000-7000 requests/s.
Test setup:
- Reprocess lz4 transactions.log, dated 2021/12/17
- 1'837'079 queries in total
- 1 CPU used to execute queries, no parallel processing
- Areas created beforehand
- No database updates
Results:
- Roughly 7x speedup on average vs. 0.7.57
- Total processing time: 100'054 s (about 27 hours, 45.5 minutes), based on runtime measurement in transactions.log
- Average: 1017 queries/minute, based on start/end times in Apache log
- Distribution
- 495 queries (=0.027%) account for 20% of the total processing time (queries w/ > 20s runtime)
- 8916 queries (=0.485%) account for 50% of the total processing time (queries w/ > 1.1s runtime)
- Errors during execution
- Oversized: 199 queries (vs. 110 in original transactions.log file)
- Timeout: 274 queries (vs. 1402 in original transactions.log file)
- Frequently triggered by expensive Achavi queries still lacking filtering based on changeset id, as well as a single umap map without proper limit on zoom level
Query runtime:
mean 0.054 s
min 0.000 s
max 205.513 s
quantile runtime (in s)
---------------------------
10% 0.002
50% 0.007
90% 0.042
95% 0.111
99% 0.601
99.5% 1.065
99.9% 5.829
99.95% 12.425
99.99% 30.380
99.995% 56.768
99.999% 120.544
99.9999% 148.805
(-> 99.5% of all queries take less than 1 second)
Used supervisord config settings
environment=
OVERPASS_FCGI_MAX_REQUESTS=10000,
OVERPASS_FCGI_MAX_ELAPSED_TIME=900,
OVERPASS_REGEXP_ENGINE="PCREJIT",
OVERPASS_DEFAULT_TIMEOUT=60,
OVERPASS_MAX_TIMEOUT=120,
OVERPASS_MAX_SPACE_LIMIT=8589934592,
OSMIUM_POOL_THREADS=1,
OVERPASS_LOG_LEVEL=2
Base image: Ubuntu 20.04
git clone https://github.com/mmd-osm/Overpass-API.git
cd Overpass-API
git checkout test7591
git submodule update --init
sudo apt-get update -qq || true
sudo apt-get install -y g++ git make autoconf automake ca-certificates libtool \
libfcgi-dev libxml2-dev zlib1g-dev \
expat libexpat1-dev liblz4-dev libbz2-dev libicu-dev \
libfmt-dev libpcre2-dev libcereal-dev libgoogle-perftools-dev \
--no-install-recommends
C++17 compiler support is mandatory to build binaries.
pushd src/
chmod u+x test-bin/*.sh
autoscan
aclocal
autoheader
libtoolize
automake --add-missing
autoconf
popd
mkdir -p build
cd build
../src/configure CXXFLAGS="-Werror=implicit-function-declaration -D_FORTIFY_SOURCE=2 -fexceptions -fpie -Wl,-pie -fpic -shared -fstack-protector-strong -Wl,--no-as-needed -pipe -Wl,-z,defs -Wl,-z,now -Wl,-z,relro -fno-omit-frame-pointer -flto -fwhole-program -march=native -O2 -ftree-vectorize -g3 -ggdb" LDFLAGS="-ltcmalloc -flto -fwhole-program -lpcre2-8 -lfmt" --prefix=$EXEC_DIR --enable-lz4 --enable-fastcgi --enable-tests
make V=0 -j3
make install
(also see docker/
directory for examples)
/etc/supervisor/conf.d/overpass.conf
[fcgi-program:interpreter]
socket=unix:///var/run/interpreter.socket
socket_owner=www-data
socket_mode=0660
environment=
OVERPASS_FCGI_MAX_REQUESTS=10000,
OVERPASS_FCGI_MAX_ELAPSED_TIME=900,
OVERPASS_REGEXP_ENGINE="PCREJIT"
command=/home/user/osm3s/fcgi-bin/interpreter
numprocs=6
priority=999
process_name=%(program_name)s_%(process_num)02d
user=www-data
autorestart=true
autostart=true
startsecs=1
startretries=3
stopsignal=QUIT
stopwaitsecs=10
redirect_stderr=true
stdout_logfile=/var/log/interpreter.log
stdout_logfile_maxbytes=10MB
Forwarding calls to /api/interpreter
to local socket managed by supervisord. Requires mod_proxy_fcgi.
ProxyPass /api/interpreter unix:///var/run/interpreter.socket|fcgi://localhost/api/interpreter
Replacing /api/map
shell script: use Apache rewrite engine to use /api/interpreter
endpoint instead:
<LocationMatch "^/api/map$">
RewriteEngine On
RewriteCond %{QUERY_STRING} ^bbox=([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+)$
RewriteRule ".*" "/api/interpreter?data=[timeout:300][maxsize:2000000000][bbox:%2,%1,%4,%3];(node(%2,%1,%4,%3);way(bn);node(w););(._;(rel(bn)->.a;rel(bw)->.a;);rel(br););out meta;" [PT]
</LocationMatch>
<LocationMatch "^/api/map\.pbf$">
RewriteEngine On
RewriteCond %{QUERY_STRING} ^bbox=([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+),([\-0-9\.]+)$
RewriteRule ".*" "/api/interpreter?data=[out:pbf][timeout:300][maxsize:2000000000][bbox:%2,%1,%4,%3];(node(%2,%1,%4,%3);way(bn);node(w););(._;(rel(bn)->.a;rel(bw)->.a;);rel(br););out meta;" [PT]
</LocationMatch>