Technical details

Homepage
Technical details

The extract process

Once a day, around 21:00 CET, we update a locally held version of the planet file from the latest OpenStreetMap data and split it into a number of pre-defined regions. This is done using the Osmosis and osmium programs (Osmosis to download updates, Osmium to apply them and split). Every couple of months we re-initialise our update process with a new planet file just to make sure we're not carrying over potential replication errors forever. The splitting is done in a cascading fashion – first we split the world in two halves, then we cut out the continents from each half, then countries, and so on.

Follow us
and spread the word

We use polygonal boundaries for the splitting – boundaries that are sometiems derived and simplified from OSM data, sometimes just hand-drawn. The boundaries usually follow country borders, but occasionally we take liberties and include a litte more of a neighbouring country if this greatly simplifies the polygon. The Osmium extract function that we use keeps ways and multipolygon relations that cross an extract border complete, i.e. when a very large mutlipolygon crosses the border, an extract can occasionally contain a lot more that expected.

Polygon files

The .poly files that you can download reflect the exact clipping boundary that we use in generating the extract, and can be used with programs like Osmosis, Osmium, or Osmconvert to generate the extract from a larger file. The KML files are the same data, just in different format. Please note that these files are not country boundaries but a buffer around countries - go to naturalearthdata.com if you want a simple set of country boundaries.

pbf files

The .osm.pbf data format is the common format for the exchange of raw OpenStreetMap data. It is fast to read and write and can be directly processed by most programs dealing with OSM data. Our .osm.pbf files are 100% pure, un-filtered OSM and contain all data and metadata available in OSM for the region; the only thing they don't contain is history, i.e. information about past edits.
We do, however, keep a couple of older files around. They are not usually shown but you can access them through the directory index; they are timestamped in the file name. We delete these older files after a while. If you are on a very slow and/or flaky internet connection, do not download the file named "...-latest", download the timestamped file instead, then you can resume the download even if the connection fails.
The .osh.pbf format is for history files. We keep one history file for each region that is on offer, and that file is only updated weekly, but it contains the full history of an area and can be used to synthesize a data file for the region for any timestamp in the past...

Get Updates

Subscribe and we will keep you updated on Data refreshes, new modes of transport, and how the service will evolve.
Thank you for subscribing to our newsletter

Related posts

Autonomous AI Ushers in a New Era of Analytics with ThoughtSpot

April 14, 2025
Read More

How to Connect Your Cloud Data Warehouse to ThoughtSpot

March 26, 2025
Read More

How to Train Spotter in ThoughtSpot: Step-by-Step Guide to Improve Accuracy

March 13, 2025
Read More