Google Architecture - High Scalability -
Information Sources
- Video: Building Large Systems at Google
- Google Lab: The Google File System
- Google Lab: MapReduce: Simplified Data Processing on Large Clusters
- Google Lab: BigTable.
- Video: BigTable: A Distributed Structured Storage System.
- Google Lab: The Chubby Lock Service for Loosely-Coupled Distributed Systems.
- How Google Works by David Carr in Baseline Magazine.
- Google Lab: Interpreting the Data: Parallel Analysis with Sawzall.
- Dare Obasonjo's Notes on the scalability conference.
- BigTable has three different types of servers:
- The Master servers assign tablets to tablet servers. They track where tablets are located and redistributes tasks as needed.
- The Tablet servers process read/write requests for tablets. They split tablets when they exceed size limits (usually 100MB - 200MB). When a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
- The Lock servers form a distributed lock service. Operations like opening a tablet for writing, Master aribtration, and access control checking require mutual exclusion.
- A 1,000-fold computer power increase can be had for a 33 times lower cost if you you use a failure-prone infrastructure rather than an infrastructure built on highly reliable components. You must build reliability on top of unreliability for this strategy to work.
Lessons Learned
- Infrastructure can be a competitive advantage. Many companies treat infrastructure as an expense. Each group will use completely different technologies and their will be little planning and commonality of how to build systems. Google thinks of themselves as a systems engineering company, which is a very refreshing way to look at building software.
- Spanning multiple data centers is still an unsolved problem. Most websites are in one and at most two data centers. How to fully distribute a website across a set of data centers is, shall we say, tricky.
- Take a look at Hadoop
- An under appreciated advantage of a platform approach is junior developers can quickly and confidently create robust applications on top of the platform. If every project needs to create the same distributed infrastructure wheel you'll run into difficulty because the people who know how to do this are relatively rare.
- Synergy isn't always crap. By making all parts of a system work together an improvement in one helps them all. Improve the file system and everyone benefits immediately and transparently. If every project uses a different file system then there's no continual incremental improvement across the entire stack.
- Build self-managing systems that work without having to take the system down. This allows you to more easily rebalance resources across servers, add more capacity dynamically, bring machines off line, and gracefully handle upgrades.
- Create a Darwinian infrastructure. Perform time consuming operation in parallel and take the winner.
- Don't ignore the Academy.
- Consider compression. Compression is a good option when you have a lot of CPU to throw around and limited IO.