Monday, May 30, 2016

Gurafu - ETA PHONE HOME: HOW UBER ENGINEERS AN EFFICIENT ROUTE



https://eng.uber.com/engineering-an-efficient-route/
https://segmentfault.com/a/1190000005162383
In Uber’s early days, we used a combination of routing engines (including OSRM) to produce an ETA. (We didn’t have in-app navigation at this point, so we only used it for the ETA and map matching to display vehicle locations.)
We called this service “Goldeta”, which was essentially a model that sat on top of the routing engines and made an adjustment to those original estimates using our own historical Uber Data of similar routes in time and space. This solution, which ultimately took into account hundreds of thousands of Uber trips, compared them to the initial routing engine ETA. Goldeta worked better than using any single ETA alone. However, one issue with this approach was the cold start problem: when we launch in new cities we didn’t have enough data to inform an ETA offset (for new cities, our ETA used to be less accurate than older cities for precisely this reason). Also, as we grew we periodically needed to add new features whose development was slowed by having to implement it to an open source solution (OSRM) which wasn’t built from the ground up with a dispatching system in mind.
The whole road network is modeled as a graph. Nodes represent intersections, and edges represent road segments. The edge weights represent a metric of interest: often either the road segment distance or the time take it takes to travel through it. Concepts such as one-way streets, turn restrictions, turn costs, and speed limits are modeled in the graph as well.
Of course that isn’t the only way to model the real world. Some people want to model road segments as nodes, and edges as the transition between one road segment to another. This is called edge-based representation, in contrast to the node-based representation mentioned before. Each representation has its own tradeoffs, so it’s important to know what you need before committing to one or the other.
Once you have decided on the data structure, you can use different routing algorithms to find a route. One simple example you can try at home is the Dijkstra’s search algorithm, which has become the foundation for most modern routing algorithms today. However, in a production environment, Dijkstra, or any other algorithm that works on top of an unprocessed graph, is usually too slow.
OSRM is based on contraction hierarchies. Systems based on contraction hierarchies achieve fast performance — taking just a few milliseconds to compute a route — by preprocessing the routing graph. (Below 100 milliseconds in the 99th percentile response time. We need this because this calculation is done every time before a vehicle is dispatched to a ride request.) But because the preprocessing step is very slow, it’s very difficult to make real-time traffic work. With our data, it takes roughly 12 hours to build the contracted graph using all the roads of the world, meaning we can never take into account up-to-date traffic information.
This is the reason some preprocessing and tweaking are often needed to speed up querying. (Recent examples of this class of algorithms in the literature include highway hierarchiesthe ALT-algorithm, and customizable route planning.)

1. Making Contraction Hierarchies Dynamic

In our routing graph, as new traffic information comes in, we need to be able to dynamically update those graph edge weights. As we mentioned, we used OSRM which uses contraction hierarchies as its routing algorithm. One drawback of contraction hierarchies is when edge weights are updated, the pre-processing step needs to be re-run for the whole graph, which could take a few hours for a big graph, for example one that covers the whole planet. This makes contraction hierarchies unsuitable for real-time traffic updates.
Contraction hierarchies’ preprocessing step can be significantly faster by doing dynamic updates, where the ordering of the nodes remains the same and only the edges that change due to traffic are updated. This decreases the precomputation significantly. With this approach, it takes just 10 minutes to do a dynamic update of the world graph even with 10% of the road segments changing traffic speed. However, 10 minutes is still too much of a delay for the traffic update to finally be considered for ETAs. So, this path was ultimately a dead end.

2. Sharding

We can also break up the graphs into geographic regions, aka sharding. Sharding speeds up the time it takes to build the contracted graph. However, this requires quite a bit of engineering on our infrastructure for it to work, and would introduce bottlenecks in the cluster size of servers for each region. If one region received too many requests during peak hours, the other servers couldn’t share the load. We wanted to make the most out of the limited numbers of servers we had, so we did not implement this solution either.

3. A* Algorithm

For real-time updates and in a small scale, we tried the A* search algorithm. At a high level, A* is Dijkstra’s search algorithm with heuristics, so A* prioritizes whichever nodes are most likely to find a route from A to B. This means we can update the edge weights of the graph in real-time to account for traffic conditions without needing to do any precomputation. And since most routes we need to calculate are for short trips (the en route time from drivers to riders), A* works well in those situations.
But we knew A* was a temporary solution because it’s really slow for long routes: the A* response grows geometrically in relation to the depth of nodes traversed. (For example, the response time for a route from the Presidio to the Mission District in San Francisco is ~120 milliseconds, several times longer than contraction hierarchies.)
Even A* with landmarks, which makes use of the triangle inequality and several pre-computation tricks, does not increase the time of A* traversal enough to make it a viable solution.
For long distance trips, A* simply doesn’t have a quick enough response time, so we end up falling back to using a contracted graph that doesn’t have the dynamic edge weights to begin with.
We needed the best of both worlds: we needed precomputation to make it fast, and the ability to update edge weights quickly enough to support real-time traffic. Our solution effectively handles real-time traffic updates by re-running the preprocessing step only for a small part of the whole graph. Since our solution divides the graph into layers of small cells which are relatively independent of each other, the preprocessing step is run in parallel when needed to make it even faster. (To find out more, click here!)

 The new system is based on Gurafu, our new routing engine, and Flux, Uber’s first historical traffic system based on GPS data we collect from partner phones. We used our in-house ETA system primarily for pickups, but we’ve also been tracking ETA accuracy for full trip length ETAs.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts