Friday, February 5, 2016

Scalable Web Architectures: Common Patterns and Approaches



Scalable Web Architectures: Common Patterns and Approaches
So what is scalability?
Traffic growth
Dataset growth
Maintainability

Three goals of application architecture:
Scale
HA
Performance

Two kinds:
Vertical (get bigger)
Horizontal (get more)

Sometimes vertical scaling is right

Buying a bigger box is quick (ish)
Redesigning software is not

Running out of MySQL performance?
Spend months on data federation
Or, Just buy a ton more RAM

App Servers
Sessions!
(State)
Local sessions == bad
When they move == quite bad

Centralized sessions == good
No sessions at all == awesome!

Local Sessions
Stored on disk
PHP sessions

Stored in memory
Shared memory block (APC)

Bad!
Can’t move users
Can’t avoid hotspots
Not fault tolerant
If your load balancer has sticky sessions, you can still get hotspots
Depends on volume – fewer heavier users hurt more

Remote centralized sessions
Store in a central database
Or an in-memory cache

No porting around of session data
No need for sticky sessions
No hot spots

Need to be able to scale the data store
But we’ve pushed the issue down the stack

No sessions
Stash it all in a cookie!

Sign it for safety
$data = $user_id . ‘-’ . $user_name;
$time = time();
$sig = sha1($secret . $time . $data);
$cookie = base64(“$sig-$time-$data”);

Timestamp means it’s simple to expire it

Super slim sessions
If you need more than the cookie (login status, user id, username), then pull their account row from the DB
Or from the account cache

None of the drawbacks of sessions
Avoids the overhead of a query per page
Great for high-volume pages which need little personalization
Turns out you can stick quite a lot in a cookie too
Pack with base64 and it’s easy to delimit fields

App servers
The Rasmus way
App server has ‘shared nothing’
Responsibility pushed down the stack
Ooh, the stack

Other services scale similarly to web apps
That is, horizontally

The canonical examples:
Image conversion
Audio transcoding
Video transcoding
Web crawling
Compute!

Amazon
Let’s talk about Amazon
S3 - Storage
EC2 – Compute! (XEN based)
SQS – Queueing

All horizontal

Cheap when small
Not cheap at scale

Load Balancer
Queuing
Parallelizable == easy!

If we can transcode/crawl in parallel, it’s easy
But think about queuing
And asynchronous systems
The web ain’t built for slow things
But still, a simple problem

Asynchronous system
Helps with peak periods

Database
More read power
Web apps typically have a read/write ratio of somewhere between 80/20 and 90/10
If we can scale read capacity, we can solve a lot of situations
MySQL replication!

Master-Slave Replication

Caching
Caching avoids needing to scale!
Or makes it cheaper

Getting more complicated…
Write-through cache
Write-back cache
Sideline cache
Easy to implement
Just add app logic

Need to manually invalidate cache
Well designed code makes it easy

Memcached
From Danga (LiveJournal)
http://www.danga.com/memcached/

HA Data
But what about HA?
SPOF

The key to HA is avoiding SPOFs
Identify
Eliminate

Some stuff is hard to solve
Fix it further up the tree
Dual DCs solves Router/Switch SPOF

Master-Master
Either hot/warm or hot/hot

Writes can go to either
But avoid collisions
No auto-inc columns for hot/hot
Bad for hot/warm too
Unless you have MySQL 5
But you can’t rely on the ordering!
Design schema/access to avoid collisions
Hashing users to servers

Rings
Master-master is just a small ring
With 2 nodes

Bigger rings are possible
But not a mesh!
Each slave may only have a single master
Unless you build some kind of manual replication

Dual trees
Master-master is good for HA
But we can’t scale out the reads (or writes!)

We often need to combine the read scaling with HA
We can simply combine the two models

There’s a problem here
We need to always have 200% capacity to avoid a SPOF
400% for dual sites!
This costs too much

Solution is straight forward
Make sure clusters are bigger than 2

N+M
N+M
N = nodes needed to run the system
M = nodes we can afford to lose

Having M as big as N starts to suck
If we could make each node smaller, we can increase N while M stays constant
(We assume smaller nodes are cheaper)

Data Federation

At some point, you need more writes
This is tough
Each cluster of servers has limited write capacity

Just add more clusters!

Vertical partitioning
Divide tables into sets that never get joined
Split these sets onto different server clusters
Voila!

Simple things first
Logical limits
When you run out of non-joining groups
When a single table grows too large

Split up large tables, organized by some primary object
Usually users

Put all of a user’s data on one ‘cluster’
Or shard, or cell

Have one central cluster for lookups

Need more capacity?
Just add shards!
Don’t assign to shards based on user_id!

For resource leveling as time goes on, we want to be able to move objects between shards
Maybe – not everyone does this
‘Lockable’ objects

The wordpress.com approach
Hash users into one of n buckets
Where n is a power of 2

Put all the buckets on one server

When you run out of capacity, split the buckets across two servers
Then you run out of capacity, split the buckets across four servers
Etc

Data federation
Heterogeneous hardware is fine
Just give a larger/smaller proportion of objects depending on hardware

Bigger/faster hardware for paying users
A common approach
Can also allocate faster app servers via magic cookies at the LB

Downsides
Need to keep stuff in the right place
App logic gets more complicated
More clusters to manage
Backups, etc
More database connections needed per page
Proxy can solve this, but complicated
The dual table issue
Avoid walking the shards!

Data federation is how large applications are scaled
It’s hard, but not impossible

Good software design makes it easier
Abstraction!

Master-master pairs for shards give us HA

Master-master trees work for central cluster (many reads, few writes)

Multi-site HA
Having multiple datacenters is hard
Not just with MySQL

Hot/warm with MySQL slaved setup
But manual (reconfig on failure)

Hot/hot with master-master
But dangerous (each site has a SPOF)

Hot/hot with sync/async manual replication
But tough (big engineering task)

Serving Files
Serving lots of files is not too tough
Just buy lots of machines and load balance!

We’re IO bound – need more spindles!
But keeping many copies of data in sync is hard
And sometimes we have other per-request overhead (like auth)

Serving out of memory is fast!
And our caching proxies can have disks too
Fast or otherwise
More spindles is better
We stay in sync automatically

We can parallelize it!
50 cache servers gives us 50 times the serving rate of the origin server
Assuming the working set is small enough to fit in memory in the cache cluster

Dealing with invalidation is tricky

We can prod the cache servers directly to clear stuff out
Scales badly – need to clear asset from every server – doesn’t work well for 100 caches

We can change the URLs of modified resources
And let the old ones drop out cache naturally
Or prod them out, for sensitive data

Good approach!
Avoids browser cache staleness
Hello Akamai (and other CDNs)
Read more:
http://www.thinkvitamin.com/features/webapps/serving-javascript-fast

High overhead serving
What if you need to authenticate your asset serving?
Private photos
Private data
Subscriber-only files

Two main approaches
Proxies w/ tokens
Path translation

Perlbal backhanding
Perlbal can do redirection magic
Client sends request to Perbal
Perlbl plugin verifies user credentials
token, cookies, whatever
tokens avoid data-store access
Perlbal goes to pick up the file from elsewhere
Transparent to user

Doesn’t keep database around while serving
Doesn’t keep app server around while serving
User doesn’t find out how to access asset directly

Permission URLs
But why bother!?
If we bake the auth into the URL then it saves the auth step
We can do the auth on the web app servers when creating HTML
Just need some magic to translate to paths
We don’t want paths to be guessable

Downsides
URL gives permission for life
Unless you bake in tokens
Tokens tend to be non-expirable
We don’t want to track every token
Too much overhead
But can still expire

Upsides
It works
Scales nicely

Storing Files

Storing files is easy!
Get a big disk
Get a bigger disk
Uh oh!

Horizontal scaling is the key
Again
NFS
Stateful == Sucks
Hard mounts vs Soft mounts, INTR

SMB / CIFS / Samba
Turn off MSRPC & WINS (NetBOIS NS)
Stateful but degrades gracefully

HTTP
Stateless == Yay!
Just use Apache
Volumes are limited in total size
Except (in theory) under ZFS & others

Sometimes we need multiple volumes for performance reasons
When using RAID with single/dual parity

At some point, we need multiple volumes
Volumes are limited in total size
Except (in theory) under ZFS & others

Sometimes we need multiple volumes for performance reasons
When using RAID with single/dual parity

At some point, we need multiple volumes

Further down the road, a single host will be too small
Total throughput of machine becomes an issue
Even physical space can start to matter
So we need to be able to use multiple hosts

HA is important for assets too
We can back stuff up
But we tend to want hot redundancy

RAID is good
RAID 5 is cheap, RAID 10 is fast
But whole machines can fail
So we stick assets on multiple machines

In this case, we can ignore RAID
In failure case, we serve from alternative source
But need to weigh up the rebuild time and effort against the risk
Store more than 2 copies?

Self repairing systems
When something fails, repairing can be a pain
RAID rebuilds by itself, but machine replication doesn’t

The big appliances self heal
NetApp, StorEdge, etc

So does MogileFS (reaper)

Field Work
Flickr
Because I know it

LiveJournal
Because everyone copies it

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts