TechTalk: Reach Your Destination, Part I
You can’t just draw a circle to determine how close amenities are to a property
In 1944, Harold Samuel laid down the ultimate maxim about property when founding the U.K.’s Land Securities. Only three things matter: Location, location, location.
At GeoPhy we try to quantify what qualifies as a good location as accurately as possible — especially on our commercial real estate Evra platform. We build neighborhood scores by determining how many amenities are nearby, how long is the drive to the nearest grocery store, or kindergarten, or anything.
Real estate is not alone. Spatial analytics has become a major part of many tech companies. A rapidly growing suite of open source tools offer spatial capabilities to almost any database or analytics platform. Simply displaying locations on a map has become a minimum requirement for almost any product.
But doing it for hundreds of thousands of properties and cross-referencing against millions of amenities like restaurants, drug stores, and transportation options is enormously difficult. It’s too many to store one time for recall. It’s also so computationally intense that doing so on the fly almost threatens to grind a web interface to a halt.
At GeoPhy, we are not interested in what is easy. We seek to provide meaningful insights to our users. In this two-part TechTalk series we will give you a brief overview of one of our essential services: Reach, originally developed for our Evra commercial real estate data and analytics platform and now used across GeoPhy’s offerings.
Why not a buffer?
One of the most core concepts that data engineers master is the join operation. It combines different data layers and sources based on a common attribute. For spatial data engineers, this often translates into a “spatial join” – joining attributes that resemble a location. The most basic spatial join is to join a location with any number of target objects using a radius around an object (a buffer). This simple procedure conceptually links together the two data layers based on proximity in the real world.
Using a buffer is fast and easy to perform. However, it does not always make sense in the real world.
When we drive, bike, or walk, infrastructure constrains us. Distances measured as the crow flies are not always very intuitive, especially in dense urban areas or locations with topographic obstacles such as a river or a cliff.
Using a buffer, a spatial data engineer constructs a polygon of a certain radius around an object. This polygon merely acts as a means to an end (intersecting with any number of target objects). It does not represent anything that exists in the real world (except for our crow).
To better represent the catchment area’s reality, GeoPhy developed Reach – a tool to construct polygons by traversing an actual road network. (We also make Reach available as a service via an API.) The resulting polygon more accurately joins spatial data based on real-world proximity.
Components of a catchment area
In order to get a polygon that represents reality – the catchment or service area of a location for a particular mode of transport – we need three steps:
- Build a road network topology
- Traverse the road network
- Compile a concave hull
Before diving into each step, please note that there are many different ways to approach these steps. You can often find existing tools for each one. We have tried many different ways of computing these catchment areas as fast as possible. For our purposes, our current workflow provides the best speed/quality trade-off.
Each step has its own complications, but the overall idea is pretty simple. Let’s dive in.
Building a road (network)
First, we need a road network. In this blog, we focus on driving times by car, but our method supports any mode of transport.
When using linestrings that resemble a road network, we first transform them into a network topology. We connect each edge (road) to another edge, often at the beginning and end of an edge. We split any long edges so that they only have a start and end connecting to any other edge – a move that simplifies everything down the line.
After we have a working network topology, we can theoretically move on to routing. However, we find that additional preprocessing steps make routing a lot faster and more robust. First, we simplify various edges. We do not need full precision on rounded corners or, sometimes, even roundabouts.
Another important preprocessing step removes dead ends and unreachable networks. We find the strongly connected components of the network by traversing everything once. A strongly connected component is a section of the network where every node is reachable from every other node. After finding all the connected components, we discard various edges that are dead ends, and more importantly, all components that are not connected to our main graph.
No data source is perfect. Often various elements of the road network appear to be well connected but are not. Starting a traversal from within such a disconnected component can harm results, limiting their reach to just the disconnected component. Common examples are parking lots, airport service roads, and sometimes even (small) islands.
For the attentive reader: in order to not disconnect whole areas we also include ferry routes as edges so that these islands are connected to the main graph.
So far we have just been preprocessing and building a road network. But to create a catchment for a location we cannot just grab this information from a precomputed cache. There are simply too many combinations. The unfortunate consequence is that we need to traverse the road network in real time whenever a request comes in.
Most of our effort has gone into making this process a scalable and reliant service within our infrastructure. We save the road network as efficiently as possible and heavily optimize the traversal algorithm as well.
To traverse a road network, we implement one big loop. We aptly named our internal service running this computation roadrunner. In order to squeeze out as much performance as possible while having a stable service, we built roadrunner in Rust.
The conceptual idea of roadrunner is quite simple. We start off at any origin of our choosing with a travel time budget (referred to as cost) – for instance, 10 minutes (tracked in seconds). We snap our origin to the nearest road (see image on left above), and from this moment on we start iterating until our budget is spent (center image and image on left).
In the first iteration we simply subtract the cost (in time) based on the speed limit of that road from our budget and push the neighbouring edges onto a stack. In the next iteration we start on each of those neighbouring edges with a now reduced starting cost, and we repeat the process. After our stack is empty (because no new edges were pushed when the travel budget is depleted), we have completed the road traversal.
While iterating, we have been saving each vertex of the encountered edges which will now be used to construct a concave hull. Here’s an example using Douglas on the Isle of Man in the U.K:
Triangles are everywhere
While it seems we are in the home stretch, we encounter a surprisingly tricky part of computer science. We want to draw a polygon around our set of vertices, but it should roughly follow our idea of a border.
For instance, with our example of Douglas, there is a bay next to the town. Our polygon should not include the bay. Ending up with a simple convex hull including the bay after all our work would be underwhelming to say the least. Let’s build a concave hull instead!
There are various alternative ways to build a concave hull from a set of points. We will build ours by first generating a delaunay triangulation. This approach allows us to remove triangles one at a time until we are satisfied with the resulting shape.
Each of our saved points are used to construct a delaunay triangulation. Our road network essentially lives on the edges of the triangulation, which means that we can start removing faces to more closely resemble our desired shape. We compute the circumcircle of each triangle and simply start removing the biggest ones up until an internal heuristic. In practice, the largest polygons live on the edges of our shape where our road traversal has not yet reached.
After we have removed the largest triangles, all that’s left is to union the remainder and call it a catchment area! Our caller is still waiting for our request, so we quickly build our polygon, add it to a cache, and return the shape so that we can pick up any new requests.
In the next part we will walk through some more of the details on building and deploying the roadrunner application.
The series has two parts.
Part 1: We define the concept of catchment areas.
Part 2: We explain how we deploy our application to ensure quick delivery of catchments to end users.