How Twitter Stores 250 Million Tweets a Day Using MySQL - High Scalability -
Cassandra is used for high velocity writes, and lower velocity reads.
The advantage is Cassandra can run on cheaper hardware than MySQL, it can expand easier, and they like schemaless design.
Hadoop is used to process unstructured and large datasets, hundreds of billions of rows.
Vertica is being used for analytics and large aggregations and joins so they don't have to write MapReduce jobs.
Some Other Ideas:
Loose coupling. Whenever something breaks they realize it wasn't loosely coupled enough and something pushed back way to hard on something else.
Soft launches. Decouple pushing a release out and driving new traffic to something. Features can be turned on and off.
Open source.
His idea is as a developer, open sourcing code, means you can think about your software knowing it won't go away. It's not just another throw away piece of software inside a corporate blackhole. That gives you permission to think of corporate software development in a different, more lasting way, more meaningful way.
Database sharding
http://highscalability.com/unorthodox-approach-database-design-coming-shard
Rebalancing data.
http://highscalability.com/blog/category/shard
Read full article from How Twitter Stores 250 Million Tweets a Day Using MySQL - High Scalability -
Twitter's original tweet store:
- Temporally sharded tweets was a good-idea-at-the-time architecture. Temporal sharding simply means tweets from the same date range are stored together on the same shard.
- The problem is tweets filled up one machine, then a second, and then a third. You end up filling up one machine after another.
- This is a pretty common approach and one that has some real flaws:
- Load balancing. Most of the old machines didn't get any traffic because people are interested in what is happening now, especially with Twitter.
- Expensive. They filled up one machine, with all its replication slaves, every three weeks, which is an expensive setup.
- Logistically complicated. Building a whole new cluster every three weeks is a pain for the DBA team.
Cassandra is used for high velocity writes, and lower velocity reads.
The advantage is Cassandra can run on cheaper hardware than MySQL, it can expand easier, and they like schemaless design.
Hadoop is used to process unstructured and large datasets, hundreds of billions of rows.
Vertica is being used for analytics and large aggregations and joins so they don't have to write MapReduce jobs.
Some Other Ideas:
Loose coupling. Whenever something breaks they realize it wasn't loosely coupled enough and something pushed back way to hard on something else.
Soft launches. Decouple pushing a release out and driving new traffic to something. Features can be turned on and off.
Open source.
His idea is as a developer, open sourcing code, means you can think about your software knowing it won't go away. It's not just another throw away piece of software inside a corporate blackhole. That gives you permission to think of corporate software development in a different, more lasting way, more meaningful way.
Database sharding
http://highscalability.com/unorthodox-approach-database-design-coming-shard
Rebalancing data.
http://highscalability.com/blog/category/shard
Read full article from How Twitter Stores 250 Million Tweets a Day Using MySQL - High Scalability -