diff --git a/_quarto.yml b/_quarto.yml index 9cf4101..bfbbdf4 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -61,6 +61,7 @@ website: - flatgeobuf/intro.qmd - flatgeobuf/hilbert-r-tree.qmd - flatgeobuf/flatgeobuf.ipynb + - flatgeobuf/flatgeobuf-in-js.qmd - section: PMTiles contents: - pmtiles/intro.qmd diff --git a/flatgeobuf/flatgeobuf-in-js.qmd b/flatgeobuf/flatgeobuf-in-js.qmd new file mode 100644 index 0000000..baa0c3d --- /dev/null +++ b/flatgeobuf/flatgeobuf-in-js.qmd @@ -0,0 +1,189 @@ +--- +title: FlatGeobuf in JavaScript +subtitle: Example of using FlatGeobuf with Leaflet +--- + +FlatGeobuf is a cloud-native vector data format because it contains a built-in spatial index that allows reading a specific spatial region from within the file without downloading the entire file's content. + +This is very useful for browser-based applications, because it allows them to make use of large files hosted on commodity cloud object storage without maintaining a server. + +This notebook provides an example of using FlatGeobuf with spatial filtering from JavaScript. + +## Downloading vs Streaming vs Range reads + +FlatGeobuf supports a few different ways of loading data into the browser. + +_Downloading_ refers to fetching the entire FlatGeobuf file and parsing it after the full file has finished downloading. This has the downside that the user must wait for the entire download to finish before they see any interaction on the web page. This may lead to a user wondering if the web page is broken if it takes a while to download their data. + +_Streaming_ refers to making use of a file's contents incrementally as it downloads. This approach still downloads the entire file from beginning to end, but enables e.g. rendering part of the data on a map quickly, before waiting for the full file to finish downloading. This has the benefit of increased responsiveness, but the downside that a large file will be loaded in full. FlatGeobuf supports streaming because the file's metadata is located at the beginning. A good example of this in action is ["Streaming FlatGeobuf"](https://observablehq.com/@bjornharrtell/streaming-flatgeobuf) by Björn Harrtell. + +_Range reads_ refers to fetching only specific parts of the file that are required by the user. In the context of FlatGeobuf, this usually means a spatial query. FlatGeobuf enables this through its spatial index at the beginning of the file. Web clients can read the header, and then make requests only for data in a specific location. This has the benefit that very large files can be used in situations where downloading them in full would be impractical. A downside is that it takes more individual HTTP requests to understand which byte range in the file contains the desired data, leading to a longer latency before data starts to display. + +## Example + +This example uses slightly-modified JavaScript syntax used in [Observable notebooks](https://observablehq.com/). + +Load the FlatGeobuf JavaScript library: + +```{ojs} +flatgeobuf = require("flatgeobuf@3.26.2/dist/flatgeobuf-geojson.min.js") +``` + +This library has two functions: `deserialize` to fetch a remote file and parse it to GeoJSON, and `serialize`, which converts GeoJSON to FlatGeobuf. + +```{ojs} +flatgeobuf +``` + +For this demo, we'll use the same data source as in the [FlatGeobuf leaflet example](https://flatgeobuf.org/examples/leaflet/large.html). This data file represents _every census block in the USA_. + +```{ojs} +url = 'https://flatgeobuf.septima.dk/population_areas.fgb' +``` + +The above is a really big file at almost 12GB total size, so we don't want to fetch the entire file. In this demo, we'll choose a small bounding box representing an area over Manhattan in New York City. + +Beware: if you make this bounding box too big, FlatGeobuf will try to download a large amount of data into your browser and maybe crash the tab! + +```{ojs} +bbox = { + return { + minX: -74.003802, + minY: 40.725756, + maxX: -73.981481, + maxY: 40.744008, + } +} +``` + +The above `bbox` object represents a bounding box in the format required by the FlatGeobuf API, but Leaflet's API requires an array-formatted bounding box, so we'll define a function to convert between the two: + +```{ojs} +// leaflet uses lat-lon ordering +bboxObjectToArray = (obj) => [ + [obj.minY, obj.minX], + [obj.maxY, obj.maxX], +]; +``` + +Next we'll fetch all the data from the FlatGeobuf file within this bounding box. Notice how we pass the `bbox` argument into `deserialize`. + +```{ojs} +features = { + const iter = flatgeobuf.deserialize(url, bbox); + const features = []; + for await (const feature of iter) { + features.push(feature); + } + return features; +} +``` + +There are 354 individual features that match this query: + +```{ojs} +features +``` + +As in the FlatGeobuf example, we'll define a color scale based on how many people live in the census block. + +```{ojs} +colorScale = (d) => { + return d > 750 + ? "#800026" + : d > 500 + ? "#BD0026" + : d > 250 + ? "#E31A1C" + : d > 100 + ? "#FC4E2A" + : d > 50 + ? "#FD8D3C" + : d > 25 + ? "#FEB24C" + : d > 10 + ? "#FED976" + : "#FFEDA0"; +}; +``` + +Next we load the Leaflet JavaScript library and fetch its CSS styling defintions if needed. + +```{ojs} +L = { + const L = await require("leaflet@1/dist/leaflet.js"); + if (!L._style) { + const href = await require.resolve("leaflet@1/dist/leaflet.css"); + document.head.appendChild(L._style = html``); + } + return L; +} +``` + +Next we instantiate the Leaflet map and include multiple layers: + +- An `L.tileLayer` to show basemap tiles on the map for context. +- An `L.rectangle` to show the bounding box of our FlatGeobuf query. +- An `L.layerGroup` to group all the FlatGeobuf features into a single layer. +- An `L.geoJSON` item for each feature in the FlatGeobuf response. + +```{ojs} +map = { + const container = html`
`; + yield container; + const map = L.map(container).setView([40.7299, -73.9923], 13); + L.tileLayer("https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png", { + attribution: + "© OpenStreetMap contributors", + }).addTo(map); + + // Render the bounding box rectangle + L.rectangle(bboxObjectToArray(bbox), { + interactive: false, + color: "blue", + fillOpacity: 0.0, + opacity: 1.0, + }).addTo(map); + + const results = L.layerGroup().addTo(map); + for (const feature of features) { + // Leaflet styling + const defaultStyle = { + color: colorScale(feature.properties["population"]), + weight: 1, + fillOpacity: 0.4, + }; + L.geoJSON(feature, { + style: defaultStyle, + }) + .on({ + mouseover: function (e) { + const layer = e.target; + layer.setStyle({ + weight: 4, + fillOpacity: 0.8, + }); + }, + mouseout: function (e) { + const layer = e.target; + layer.setStyle(defaultStyle); + }, + }) + .bindPopup( + `${feature.properties["population"]} people live in this census block.` + ) + .addTo(results); + } +} +``` + +Voilà! We just fetched data directly from a massive FlatGeobuf file, directly from the client, without a server in between. + +### References + +This notebook was created with help from several resources + +- [FlatGeobuf Leaflet example](https://flatgeobuf.org/examples/leaflet/large.html) ([and its source code](https://github.com/flatgeobuf/flatgeobuf/blob/master/examples/leaflet/large.html)) +- [`@bjornharrtell/streaming-flatgeobuf`](https://observablehq.com/@bjornharrtell/streaming-flatgeobuf) is a useful related resource for an example of a streaming load of FlatGeobuf. +- [`@observablehq/hello-leaflet`](https://observablehq.com/@observablehq/hello-leaflet) for an example of loading and rendering a Leaflet map using Observable. + diff --git a/flatgeobuf/intro.qmd b/flatgeobuf/intro.qmd index eb180bf..7b9ac23 100644 --- a/flatgeobuf/intro.qmd +++ b/flatgeobuf/intro.qmd @@ -46,6 +46,12 @@ FlatGeobuf is a write-only format, and doesn't support appending, as that would FlatGeobuf optionally supports a spatial index at the beginning of the file, which can speed up reading portions of a file based on a spatial query. For more information on how this spatial index works, refer to the [Hilbert R Tree](./hilbert-r-tree.qmd) page. +::: {.callout-note} + +Note that because FlatGeobuf has no internal chunking, the spatial index references _every single object_ in the file. This means that for datasets with many small geometries, like points, the spatial index will be very large as a proportion of the file size. + +::: +