logo
  • What we do
  • How we work
        • Resources hub

          Accelerate your business with our tailored solutions that prioritize outcomes over technology.

          learn more

        • discover our blog
          We share our deep data expertise and industry insights through our blog. Our articles and thought leadership pieces empower businesses to navigate the complex data landscape and discover innovative solutions for their data challenges
          Read Now
        • Case Studies

          We empower global businesses across industries through enhanced data capabilities. Our advanced data analytics and visualisation features empower our clients to extract valuable insights from their data effortlessly

          Read Now

          7Dxperts team helped Raven put data at the heart of every decision

Speak to us
Discover our  blog

How Eqolines Used H3 and Databricks to Analyse Ride-sharing Geospatial Data

Share
Learn how Frontify improves speed to insight by 99.9% with ThoughtSpot

Thank you for your 
interest!

The download will start shortly.

Related posts

5 min

Autonomous AI Ushers in a New Era of Analytics with ThoughtSpot

by Shariq Wagener
1 2 3 8
+1
Authors
11 min to read
Learn how Frontify improves speed to insight by 99.9% with ThoughtSpot

Thank you for your 
interest!

The download will start shortly.

Why Geospatial Intelligence Matters in Ride-Sharing

Organisations worldwide focused on moving people and goods are looking for ways to be more efficient. Travelers/riders want more availability and shorter travel times, while delivery services want more predictability. Using geospatial and routing-based insights, location intelligence is imperative for any delivery, transportation business, or service. Databricks recently announced support for built-in H3 expressions and how H3 provides a great approach for processing mobility data.

In this blog, our analytics team shares how they specialise in helping customers explore complex geospatial mobility challenges. Their work focuses on analysing how people and services move through the world using location-based data.

 

Use Case: Ride-Sharing Analysis in London

In this blog, we examine ride-share movement data for one of our clients in London. London’s ride-share and taxi markets are highly competitive. To gain an edge, this customer aims to grow market share by using insights from location data. A ride-sharing business succeeds by achieving several key goals. These include reducing rider wait times, increasing driver engagement, and expanding into high-priority regions. Many factors influence these outcomes, including driver placement and behaviour, retention versus attrition, and real-time supply and demand dynamics.

 

Understanding the Business Challenge

To help our client, we started looking for opportunities to use a driver’s time more effectively. What are driver’s doing when they do not have an active fare? Why do drivers wait and return to airport queues for so long and so frequently? Making drivers more productive would help with customer perception of the service, and help retain drivers. 

 

Analysing Driver Behaviour and Idle Time

For this project, we focused on approximately 2,000 drivers around London and surrounding areas. We conducted two types of analysis: behavioural and geospatial. The first focused on understanding driver habits—such as idle time, breaks, and preferences like accepting jobs on the way home. The second looked at unfulfilled demand and how closely idle drivers were positioned to those unmet requests. These analyses helped us assess missed opportunities. We were able to categorise how poor geographic positioning during idle time impacted overall efficiency.

Data Modeling and Geospatial Insights

A few of the challenges we faced included aggregating and classifying billions of geolocations from pick-up requests and driver pings. We collected data from the driver app and used it to build a time series model. This model scored each driver's performance in 15-minute intervals, across 24 hours, for every day of the week. Using a set of rules based on GPS ping data and location tracking, we identified whether a driver was working, on break, or carrying a passenger.

Over time—across hours, days, and months—we built a comprehensive foundation to evaluate each driver’s relative efficiency. As expected with log data at this scale, we encountered data quality challenges. These included inaccurate ping locations and bots using the app to scrape pricing data, both of which required filtering and cleanup.

 

Data Volume and Ingestion with Databricks Delta Lake

We worked with the customer to breakdown the data requirements into the blocks shown below. Around 1.4 billion rows, covering a year’s worth of activity, were rapidly ingested into Delta Lake tables with the help of Databricks. We had previously built a data-profiling solution on Databricks that creates 30+ data points to assess structural and data-related information. This allows engineering teams to quickly make decisions about data ingestion, target data models, and quality issues. 

Data Building Blocks

 

Three main data quality issues needed to be addressed during data ingestion and transformation: 

  1. When geolocation measurements are taken at a fixed frequency over a long period of time, there are always issues. We addressed this by filtering out ping-level anomalies using administrative boundary data. 
  2. Bots impersonating customers made numerous requests for pick-up in order to scrape pricing. These records were removed by analyzing the bots' behavior, frequency cycle, and route pattern requests. 
  3. Many customers may explore the trip over a 5 to 15-minute period, check for alternative car types available until satisfied, or lose the booking to other providers because data needs more detail to identify true demand. Every browse may be incorrectly counted twice in terms of demand. 

 

Our Self-Service Analytics Approach

We wanted to provide end-users and key decision-makers a self-service solution. The one that resulted included the ability to analyse unfulfilled demand versus idle drivers and the scale of missed opportunity, at any point in the day to see peaks and at any time of the year to see seasonality. Additionally, the solution allowed further means to examine individual driver performance geographically and behaviour patterns for a year at a time.

 

Driver Behaviour Modeling and Classification

We used Databricks to explore the subsets of the original data and build pipelines used in rearranging them into more usable structures. In order to filter through the data at the user end, we needed to densify the time-series data for each driver. Densifying eliminates the impact of periods of inactivity when no data is provided and enables us to account for the complete 24 hours of a driver’s day.

We then created a classification of the drivers' times based on the GPS ping frequency and app working statuses.

Classification enables our client to filter the times of the drivers based on their working status and productivity during this time. We created 15 different classification codes. To be classified, the driver must be in that category for 95% of a 15-minute period. This threshold reduced data noise and skew caused by rapidly changing statuses.

Classifying driver status and location enables teams to visualise a driver’s journey over the course of a day.

We calculate the number of jobs each driver completes in a day, making it easier to identify trends. The graph below shows the number of jobs completed over a 24-hour period, with the orange line representing the median.

 

Data Quality Checks and Profiling on Delta Lake

While processing data on the Delta Lake, data profiling activities were performed. These included data structure identification, column orders, data type identifications, number of null records for each column, identifying special characters in the column, outliers, and automated partitioning of the Delta Lake tables for faster processing and querying of the data. More than thirty data quality checks were run on the data.

 

Geospatial Data Engineering and Analysis with H3

The data was subjected to a number of analyses and transformations. In order to quickly detect and display unmet demand, idle driver supply, and proximity to an opportunity, the most important component of the solution was to spatially aggregate all drivers' ping data and customer pick-up orders.

Databricks’ built-in H3 expressions makes it easy for geospatial data engineers and data scientists to aggregate and visualize geolocation data at scale. We indexed all driver and customer data to a H3 Resolution 8, representing an area size of approx. 0.7Km. This means, all driver pings or pick-up requests within a spatial area of 0.7km will fall under a unique H3 index which can be used for exploration, analysis, engineering and visualization.

For more information on what H3 is and how to use it for geospatial analysis, please refer to the documentation [AWS | ADB | GCP]. 

H3 for geospatial analysis, London map

H3 is effective when used for discrete binning. However, many organisations operate within geographical boundaries and typically need the ability to filter by geography (like a neighbourhood or city).

To support this, we ingested OSM administrative boundary data into the Databricks Delta Lake. The boundaries were joined using the center-point (lat/long) of each H3 cell at resolution 8. To achieve this, we used the h3_centeraswkt and h3_longlatash3string functions available on Databricks Photon Runtimes. In addition, we applied spatial functions such as t_geomfromwkt, st_astext, and st_intersects from the Databricks Labs project Mosaic. After preparing the spatial data, we merged the taxi dataset with the boundary data. This allowed us to perform analysis at both county and city levels, depending on the requirements of the project.

We used kepler.gl to visualise the H3-based analysis. The purpose of these maps was to see the demand at any given point in time. The darker the hexagon, the greater the demand.

 

Powering Tableau Self-Service with Databricks SQL Warehouse  

For our client, we used Tableau to build self-service data products powered by Databricks SQL Warehouse. The ability to query large-scale data live from Databricks and Tableau’s ability to visually engineer insight to convey the most complex questions has benefits beyond the scope of this blog.

We were able to query live Delta Tables with hundreds of thousands of rows live from Tableau in seconds.

Demand v Empty drivers, geospatial data

The visualisation below uses the H3 cells displayed in Tableau using the live SQL Endpoint. The map highlights the key insight of unfulfilled demand vs idle supply of drivers. The greater the density shown by the heatmap, the more available the idle supply in that area. This allows business users to easily identify areas of excess supply. The complementing charts are shown with a granularity of 15 minutes, making it easy to see trends in demand vs actual bookings, total resources, and idle resources. The heatmap also shows hotspots around airports, and major railway stations, which mirrored our intuition.

Driver Idle time

 

In the next dashboard, we allow for analysis of a single driver for a month showing working behaviour and geographically where the driver spends time vs demand. The Gantt chart at the bottom allows you to track how a driver spent working hours at specific times of interest. This insight allows our client to work at a micro level and educate new drivers on geographical demand areas during different times of the day.

 

Beyond Ride-Sharing: Location Data for other sectors

This blog has focused on ride-sharing use cases, but the solution also applies across other sectors.

For instance, in emergency response situations, when someone reports an incident and requests immediate help, it's crucial that the nearest deployable resource reaches the scene quickly.

With this in mind, using our Databricks-powered solution, teams can visualise both the incident location and available response units on the same map. Just as ride-hailing companies aim to position drivers for maximum efficiency, emergency responders also need their resources strategically located. This approach allows teams to improve response times and deliver timely assistance exactly where it's needed.

Moreover, although incidents can occur anytime and anywhere, predictive analytics helps track patterns in accidents and crimes. When certain areas show consistently high call volumes during specific time periods, teams can capture this data and proactively share it with response control rooms, enabling faster and more efficient resource deployment.

In addition, given the rise in fast-moving packaged goods (FMPG) and delivery services from all retailers (particularly in the food sector), there is a clear need for location intelligence. We analyse order patterns by location and time to improve delivery efficiency. Teams must determine how many drivers they need at specific times, especially during peak demand. By doing so, by collecting large volumes of data using Databricks and visualising it in Tableau, teams can achieve real-time situational awareness.


There are many more use cases to consider.

 

In Summary: Unlocking the Power of Geospatial Analytics

Our customer was able to unlock the power of their  geospatial data through an intuitive, effective, and scalable solution because to the ability to geographically aggregate and query data at scale using Databricks.

Using this solution, decision-makers can be proactive rather than reactive. They can quickly identify geographic areas with unfulfilled demand and idle supply. By joining the two datasets, teams can minimise missed opportunities. This also improves customer experience by reducing ETAs and boosting driver revenue. Exploratory data science and question-driven visualisations help teams pinpoint where demand exceeds supply at any given time.

Teams can also analyse the behaviour of top-performing drivers and share those insights with new drivers to improve overall performance. Future drivers will no longer have to second-guess demand by using a real-time approach. We can nurture and guide drivers to travel to specific locations at optimal times, based on predictive demand surfaced by the driver app. In the next phase of the project, we plan to explore streaming data capabilities using Databricks.

In addition, analytics insights don’t have to stay with resource managers alone. By sharing this information with drivers, we enable them to make smarter choices about when to be available for work and where to position themselves to secure more jobs.

To learn more from the code used to classify, explore, and visualize geospatial data in Databricks, refer to these notebooks: 1-Classification, 2-Data-Exploration, and 3-Kepler-H3-Visualization.

Learn how Frontify improves speed to insight by 99.9% with ThoughtSpot

Thank you for your 
interest!

The download will start shortly.

Related posts

5 min

Autonomous AI Ushers in a New Era of Analytics with ThoughtSpot

by Shariq Wagener
1 2 3 8
About the authors
How Eqolines Used <span>H3 and Databricks</span> to Analyse Ride-sharing Geospatial Data
Kent Marten
GeoSpatial Staff Product Manager at Databricks
Kent is a Staff Product Manager at Databricks. In this role, Kent is responsible for crafting the vision and roadmap for all things geospatial for Databricks.
How Eqolines Used <span>H3 and Databricks</span> to Analyse Ride-sharing Geospatial Data
Lenka Hasova
PhD, Geopatial Data Science and Research Consultant
How Eqolines Used <span>H3 and Databricks</span> to Analyse Ride-sharing Geospatial Data
Mark Balcer
Lead Consultant
Sign up to our newsletter
By signing up you agree to our privacy policy

    [custom-related-posts]

    Latest posts

    Blog

    Autonomous AI Ushers in a New Era of Analytics with ThoughtSpot

    Apr 14, 2025
    Read More
    Blog
    #ThoughtSpot

    Exploring Data with ThoughtSpot : Advanced Search Capabilities  

    Mar 27, 2025
    Read More
    Blog

    How to Connect Your Cloud Data Warehouse to ThoughtSpot

    Mar 26, 2025
    Read More
    1 2 3

    About the Authors

    Kent Marten
    GeoSpatial Staff Product Manager at Databricks
    Kent is a Staff Product Manager at Databricks. In this role, Kent is responsible for crafting the vision and roadmap for all things geospatial for Databricks.
    Lenka Hasova
    PhD, Geopatial Data Science and Research Consultant
    Mark Balcer
    Lead Consultant

    Ready to transform your data?




      Thank you
      for your interest to learn more

      We will contact you shortly.
      Please check your spam and junk folder if you don't hear from us within 24 hours.

      Related posts

      5 min

      Autonomous AI Ushers in a New Era of Analytics with ThoughtSpot

      by Shariq Wagener
      1 2 3 8
      chevron-down