Massive Technical Interviews Tips: Google Map Architecture

http://all-things-spatial.blogspot.com/2009/06/ingenuity-of-google-map-architecture.html
The traditional way to publish maps over the Internet was to use specialised mapping servers to render GIS data into a simple image and then to serve that single image to a browser for display. A new map was generated every time users panned to a different location or changed zoom level (since there is an infinite number of combinations of map extents and zoom levels it was impossible to generate those maps in advance). And those maps were always less than perfect because every display rule had to be predefined for the entire map and for every possible zoom level. In case of very detailed maps the complexity of defining those rules was very often beyond the capacity of map creators.

Google opted for a solution where maps could be produced in advance and served as small tiles for assembling into one big image at user end. The advantage of this approach is consistency of appearance and graphical quality of the map and, probably more important, enormous scalability that could be achieved. There is no need for server side processing to generate maps and individual map tiles are much smaller than the whole map presented at the user end, so they are able to be delivered and displayed much faster. The trade off was a big effort up front to generate nice looking maps and the need to fix zoom levels rather than allowing a continuous zoom, as is the case with the traditional approach.
Limits on Dynamic Content

https://www.quora.com/Design-a-server-architecture-for-serving-Google-maps-images
In this case, you can talk about the need for scalability and elasticity in the serving platform. You could compare the benefits and drawbacks of preprocessing tiles, or rendering them on-the-fly.

It would also be prudent to discuss how you would design a system that scales the tile size based on the type of device making requests (i.e. 256px x 256px tiles may load well on a laptop/desktop, but might make for a choppy mobile consumption experience).

How Traffic Sensors Work
three above-ground types have become more common in recent years: radar, active infrared and laser radar. radar traffic sensors deploy a measureable area of microwave energy that is reflected back to the device when a vehicle passes through it. Active infrared and laser radar sensors operate in a similar manner, using low power infrared energy and infrared beams to form detection areas. In all three types of devices, the time it takes for the energy to bounce back to the sensor is compared to data collected in an unobstructed field to determine the size and speed of the vehicle passing through it.
Sweating the Small Stuff
on smaller rural and neighborhood streets:
To accomplish this, Google turned toward the very people it was gathering the information for: its customers. GPS-enabled cell phones running the Google Maps application continually pass along each user’s location and speed to Google in real time. Using a technique known as “crowdsourcing,” Google combines the information provided by thousands of active cell phones to determine how swiftly traffic is moving through a given location.

https://www.ncta.com/platform/broadband-internet/how-google-tracks-traffic/

http://prismoskills.appspot.com/lessons/System_Design_and_Big_Data/Chapter_07_-_Designing_Google_Maps.jsp

Rendering
http://stackoverflow.com/questions/204644/how-does-google-maps-work

The technology could generically be described as a map server. The map server generates a map for the requested location from a large set of pre-generated map tile images covering the entire planet. The map server may overlay data from other databases on top of this. The combination of a map viewer client and geographical database is traditionally called a Geographical Information System (GIS).

As stated, Google generated all of these 256x256 tiles and is just serving the relevant tiles.

The javascript code on the page and the server code use the numbers in the link to determine the location of the map you are viewing, the zoom level, and the size of your viewing window to determine the tiles to send to your browser.

Google Maps and Google Earth use something known as KML, or "Keyhole Markup Language", which is a special variant of XML.

http://stackoverflow.com/questions/1204258/how-does-google-maps-render-the-map-etc-is-it-flash-a-java-applet

A bit more detail, google maps uses a big div element to contain several img elements. each of those img elements is 256 pixels square, and is positioned on a regular grid. from there, the google maps javascript program calculates which grid images should be loaded into each img tag and uses regular dom manipulation to position each img in the right place. Only the tiles of the map that would be visible inside the div are loaded. when you scroll off the side, the javascript library unloads the image, and loads new ones as needed. Other elements, like the zoom controls, markers, and lines, are stacked or drawn on top of that as needed.

http://gis.stackexchange.com/questions/64248/google-maps-android-vector-rendering-how-do-they-make-it-that-fast

https://plus.google.com/+EvanParker/posts/cUYRzsn5nyN
http://alistapart.com/article/takecontrolofyourmaps

Video:
Google I/O 2013 - Behind the scenes of Google Maps

公路路段速度系统
现在某号公路某段某方向的堵不堵，是每个上路司机都想知道的问题。如果我来设计一个这样的系统

信号的来源应该使用手机程序。给予手机机主一些好处作为奖励，一定比硬件装到车上面成本更低。
手机程序应该每10秒中采一次样。采GEO的信息（我只知道纬度和经度，懂行的人可以说说）。
这样手机程序里就有了10秒前和10秒后的信息。很快可以算出这个机主的速度。
静止速度为0的情况一般有两种，最主要的一种是机主在办公室。另一种是park车休息例如的士司机。
如果调用server的api，可以知道server是否感兴趣我当前的位置。这样我们的server方必须提供一个api，input是longitude和latitude的时候，返回true/false。
如果sever方返回的是false，则不要再往server送地点。除非速度超过10mph后就应该不停地报告地点。速度变成0mp就要调用server的api，（高速上和普通路红绿灯有完全停车的情况）

另一种思路是不管怎样，submit就是。server方会filter off不感兴趣的submission。（说实在这点需要domain knowledge，判断一个geolocation是否在我们这个系统所监控的高速和普通公路上，如果很便宜就应该采用这个思路，省却了一个server的api）

给server送达的信息应该包括如下：

用户的id （这样和奖励挂钩）
10秒前的geo location，和当前的geo location（一对）
如果手机信号不行，最多保留最后5分钟（50个location对）
手机的绝对时间不重要，关键是相对时间。

到server方后，应该根据geo location（这点不知道是否可行）和地图来判断所在的公路和方向（向东还是向西），这样就知道了一个关于某号公路某段某方向的速度的一个sample。
按照路段来sharding的话，通过软件load balancing把这个sample送到相应路段的kafka里
处理kafka里的samples的service可以刨除极端的数据，然后计算5分钟window的平均值，送给显示方。
每个用户的有效sample的submission记录，并且最后写入关系型数据库。

这样这个系统就可以显示5分钟某段某方向的道路的平均速度。如果没有数据则可以不显示或者采用同天同时间段的平均平均值来提出估计数据。

1：为什么需要手机记录历史位置？既然手机已经发过之前位置了，服务器端应该已经有了，而且应该是序列。近期的可以放在缓存里。
2：假设使用app的人足够多，应该可以根据geo和周围其他人的位置信息综合出来是不是停车还是堵车。如果位置是商城里，肯定不能用。或者其他人在这条线上都是快速飞奔。
3：可以根据数据量采用适当geohash level来分片计算以及缓存。
4：还可以同时使用其他数据源来做比对，综合

手机会有掉线的情况，data用量高被暂时关掉。所以手机无法发送的时间段是存在的。
是可以综合同一路段的情况，我在楼顶提到了“刨除极端的数据”，和“平均值”。
如何判断用户在某公路上，是一个我没想好的点。用GEOHASH我不知道是不是更好仍然需要判断一个点所在的路段，而用GEOHASH能省多少我不确定。如有细节，还请详述

Thursday, October 22, 2015

Google Map Architecture

Labels

Popular Posts