Massive Technical Interviews Tips: Web Scalability for Startup Engineers

Web Scalability for Startup Engineers
Chapter 2 Principles of Good Software Design
Simplicity
Hide Complexity and Build Abstractions
Avoid Overengineering
Try Test-Driven Development
Learn from Models of Simplicity in Software Design
Loose Coupling
Promoting Loose Coupling
Avoiding Unnecessary Coupling
Models of Loose Coupling
Don’t Repeat Yourself (DRY)
Copy and Paste Programming
Coding to Contract
Draw Diagrams
Use Case Diagrams
Class Diagrams
Module Diagrams
Single Responsibility
Promoting Single Responsibility
Examples of Single Responsibility
Open-Closed Principle
Dependency Injection
Inversion of Control (IOC)
Designing for Scale
Adding More Clones
Functional Partitioning
Data Partitioning
Design for Self-Healing

Chapter 3 Building the Front-End Layer
Managing State
Managing HTTP Sessions
Store session state in cookies
Delegate the session storage to an external data store
Use a load balancer that supports sticky sessions

Cookies are sent by the browser with every single request, regardless of the type of resource being requested.

your web application would take the session identifier from the web request and then load session data from an external data store. At the end of the web request life cycle, just before a response is sent back to the user, the application would serialize the session data and save it back in the data store.

any time a new client sends a request, the load balancer assigns the client to a particular web server and injects a new load balancer cookie into the response, allowing the load balancer to keep track of which user is assigned to which server.

sticky sessions break the fundamental principle of statelessness. Once you allow your web servers to be unique, by storing any local state, you lose flexibility. You will not be able to restart, decommission, or safely auto-scale web servers without braking users’ sessions because their session data will be bound to a single physical machine.

Managing Files
If you want to serve a private file, you will need to download it to your front-end web application server

GridFS is an extension built into MongoDB that splits files into smaller chunks and stores them inside MongoDB collections as if they were regular documents.

Astyanax Chunked Object Store released as open source by Netflix. It uses Cassandra as the underlying data store, which allows you to leverage Cassandra’s core features like transparent partitioning, redundancy, and failover.
It then adds file storage–specific features on top of Cassandra’s data model. For example, it optimizes access by randomizing the download order of chunks to avoid hotspots within your cluster.

ClusterFS
Managing Other Types of State
Components of the Scalable Front End
DNS
If you were hosting your servers in multiple Amazon regions (multiple data centers), your clients would actually benefit from establishing a connection to a region that is closer to their location.

Route 53 allows you to do that easily using latency-based routing.It works similar to geoDNS, but the data center is selected based on the latency measurement rather than location of the client.

Load Balancers
ELB can perform SSL termination, so connections coming from ELB to your web servers are HTTP, not HTTPS (Hypertext Transfer Protocol over SSL).
Nginx is that it is also a reverse HTTP proxy, so it can cache HTTP responses from your servers.

HAProxy can be configured as either a layer 4 or layer 7 load balancer.
When HAProxy is set up to be a layer 4 proxy, it does not inspect higher-level protocols and it depends solely on TCP/IP headers to distribute the traffic. This, in turn, allows HAProxy to be a load balancer for any protocol, not just HTTP/HTTPS

HAProxy can also be configured as a layer 7 proxy, in which case it supports sticky sessions and SSL termination, but needs more resources to be able to inspect and track HTTP-specific information.

Web Servers
Caching
Auto-Scaling
Scalability is not just about scaling out; it is also about the ability to scale down, mainly to save cost.
Auto-scaling can take out any instance at any point in time, so you cannot store any data on your web servers
you can create an auto-scaling group to define scaling policies. An auto-scaling group is the logical representation of your web server cluster and it can have policies like “add 2 servers when CPU utilization is over 80 percent” or “set minimum server count to 4 every day at 9 a.m.” Amazon has a powerful policy framework, allowing you to schedule scaling events and set multiple thresholds for different system metrics collected by Cloud Watch

you can also decide to use Amazon ELB. Amazon auto-scaling can launch new instances, add them to the load balancer, monitor cluster metrics coming from Cloud Watch, and based on the policies, add or remove further server instances.
Auto-scaling controls all of the instances within the auto-scaling group and updates ELB any time servers are added or removed from the cluster.

Auto-scaling is in some ways similar to self-healing as you make your system handle difficulties without human interaction.

the application uses Route 53 as the DNS. Since Route 53 provides high availability and scalability out of the box, you will not need to worry about managing or scaling the DNS. Further down the stack, web requests hit the ELB, where you can implement SSL offloading and round-robin traffic distribution to your auto-scaling group.

When requests finally hit your web servers (EC2 instances), web servers use the web services layer, caches, queues, and shared data stores to render the response.

To avoid storing any local state, all files (public and private) are stored in S3. Public files are served directly from S3, and private files are returned by your web servers, but they are still stored on S3 for scalability and high availability.
- Use valve key pattern to directly get from s3 and avoid load on web server

In case of a private data center, you will need to put a layer of web servers in front of your file storage to allow public access to your files via the CDN.

By having all of the code in a single application, you now have to develop and host it all together.

API-First Approach
API-first implies designing and building your API contract first and then building clients consuming that API and the actual implementation of the web service. I would say it does not matter whether you develop clients first or the API implementation first as long as you have the API contract defined beforehand.

An alternative approach to that problem is to create a layer of web services that encapsulates most of the business logic and hides complexity behind a single API contract.

you can use functional partitioning and divide your web services layer into a set of smaller independent web services.
From a scalability point of view, having a separation of concerns helps in scaling clients and services independently.

thinking of the web services layer and service-oriented architecture from day one, but implementing it only when you see that it is truly necessary.

TYPES OF WEB SERVICES
Function-Centric Services
Resource-Centric Services

Keeping Service Machines Stateless
Caching Service Responses

you can break the semantics of GET requests is by using local object caches on your web service machines.
REST services usually pass authentication details in request headers. These headers can then be used by the web service to verify permissions and restrict access. The problem with authenticated REST endpoints is that each user might see different data based on their permissions. That means the URL is not enough to produce the response for the particular user. Instead, the HTTP cache would need to include the authentication headers when building the caching key. This cache separation is good if your users should see different data, but it is wasteful if they should actually see the same thing.

You can implement caching of authenticated REST resources by using HTTP headers like Vary: Authorization in your web service responses. Responses with such headers instruct HTTP caches to store a separate response for each value of the Authorization header (a separate cache for each user).

To truly leverage HTTP caching, you want to make as many of your resources public as possible. Making resources public allows you to have a single cached object for each URL, significantly increasing your cache efficiency and reducing the web service load.

functional partitioning can be thought of as a way to split a large system into a set of smaller, loosely coupled parts so that they can run across more machines

Data Layer
SCALING WITH MYSQL
When using MySQL replication, your application can connect to a slave to read data from it, but it can modify data only through the master server. All of the data-modifying commands like updates, inserts, deletes, or create table statements must be sent to the master. The master server records all of these statements in a log file called a binlog, together with a timestamp, and it also assigns a sequence number to each statement. Once a statement is written to a binlog, it can then be sent to slave servers.

The master server writes commands to its own binlog, regardless if any slave servers are connected or not. The slave server knows where it left off and makes sure to get the right updates, but the master server does not have to worry about its slaves at all.

You can use different slaves for different types of queries.
You can use the asynchronous nature of MySQL replication to perform zero-downtime backups.

Master-Master
All writes sent to Master A are recorded in its binlog. Master B replicates these writes to its relay log and executes them on its own copy of the data. Master B writes these statements to its own binlog as well in case other slaves want to replicate them. In a similar way, Master A replicates statements from Master B’s binlog by appending them to its own relay log, executing all new statements, and then logging them to its own binlog.

In case of Master A failure, or any time you need to perform long-lasting maintenance, your application can be quickly reconfigured to direct all writes to Master B.

Both masters have to perform all the writes. The fact that you distribute writes to both master servers from your application layer does not mean that each of them has less to do.
In fact, each of the masters will have to execute every single write statement either coming from your application or coming via the replication. To make it even worse, each master will need to perform additional I/O to write replicated statements into the relay log. Since each master is also a slave, it writes replicated statements to a separate relay log first and then executes the statement, causing additional disk I/O.
Both masters have the same data set size.

Although master-master replication can be useful in increasing the availability of your system, it is not a scalability tool.
you can use MySQL ring replication, where instead of two master servers, you chain three or more masters together to create a ring. Although that might seem like a great idea, in practice, it is the worst of the replication variants discussed so far

When hosting your system on a decent network (or cloud), your replication lag should be less than a second. That means that any time you write to the master, you should expect your read replicas to have the same change less than a second later.

ring replication significantly increases your replication lag, as each write needs to jump from master to master until it makes a full circle

It is worth pointing out that any master-master or ring topology makes your system much more difficult to reason about, as you lose a single source of truth semantics. In regular master-slave replication, you can always query the master to get the most recent data.

using replication is only applicable to scaling reads not to writes.
if you ever hit the limit of how many slaves your master can handle, you can use multilevel replication to further distribute the load and keep adding even more slaves. By adding multiple levels of replication, your replication lag increases, as changes need to propagate through more servers, but you can increase read capacity, which may be a reasonable tradeoff.

it is not a way to scale the overall data set size.
Active data set size.
a hosted MySQL solution like Amazon RDS (Amazon Relational Database Service)

Data Partitioning (Sharding)
Choosing the Sharding Key
if you wanted to have globally unique IDs, you could use auto_increment_increment and auto_increment_offset to make sure that each shard generates different primary keys.

Sharding can be implemented in your application layer on top of any data store. All you need to do is find a way to split the data so it could live in separate databases and then find a way to route all of your queries to the right database server.

you should try to split your data set into buckets of similar(and small) size.
Since sharding splits data into disjointed subsets, you end up with a share-nothing architecture.
Another advantage of sharding is that you can implement it in the application layer and then apply it to any data store.

you cannot execute queries spanning multiple shards. Any time you want to run such a query, you need to execute parts of it on each shard and then somehow merge the results in the application layer.

Another challenge with sharding in your application layer is that as your data grows, you may need to add more servers (shards).

One way to avoid the need to migrate user data and reshard every time you add a server is to keep all of the mappings in a separate database. Rather than computing server number based on an algorithm, we could look up the server number based on the sharding key value.

consistent hashing
Another simple alternative to generating globally unique IDs is to use atomic counters provided by some data stores.
For example, if you already use Redis, you could create a counter for each unique identifier. You would then use Redis’ INCR command to increase the value of a selected counter and return it in an atomic fashion.

This way, you could have multiple clients requesting a new identifier in parallel and each of them would end up with a different value, guaranteeing global uniqueness. You would also ensure that there are no gaps and that each consecutive identifier is bigger than the previous ones.

scaling by adding copies of the same thing, functional partitioning, and data partitioning.

SCALING WITH NOSQL
Consistency ensures that all of the nodes see the same data at the same time. Availability guarantees that any available node can serve client requests even when other nodes fail. Finally, partition tolerance ensures that the system can operate even in the face of network failures where communication between nodes is impossible.

In CAP, consistency ensures that the same data becomes visible to all of the nodes at the same time, which means that all of the state changes need to be serializable, as if they happened one after another rather than in parallel. That, in turn, requires ways of coordinating across CPUs and servers to make sure that the latest data is returned.

In ACID, on the other hand, consistency is more focused on relationships within the data, like foreign keys and uniqueness.

Eventual consistency is a property of a system where different nodes may have different versions of the data, but where state changes eventually propagate to all of the servers.
If you asked a single server for data, you would not be able to tell whether you got the latest data or some older version of it because the server you choose might be lagging behind.

the most recent write wins
some data stores like Dynamo push the responsibility for conflict resolution onto its clients. They detect conflicts and keep all of the conflicting values

with the Amazon shopping cart, even if some servers were down, people would be able to keep adding items to their shopping carts. These writes would then be sent to different servers, potentially resulting in multiple versions of each shopping cart. Whenever multiple versions of a shopping cart are discovered by the client code, they are merged by adding all the items from all of the shopping carts rather than having to choose one winning version of the cart. This way, users will never lose an item that was added to a cart, making it easier to buy.

eventually consistent data stores often support ongoing data synchronization to ensure data convergence
10 percent of reads sent to Cassandra nodes trigger a background read repair mechanism.
As part of this process, after a response is sent to the client, the Cassandra node fetches the requested data from all of the replicas, compares their values, and sends updates back to any node with inconsistent or stale data. Although it might seem like overkill to keep comparing all of the data 10 percent of the time, since each of the replicas can accept writes, it is very easy for data to diverge during any maintenance or network issues. Having a fast way of repairing data adds overhead, but it makes the overall system much more resilient to failures, as clients can read and write data using any of the servers rather than having to wait for a single server to become available.

Cassandra, allow clients to fine-tune the guarantees and tradeoffs made by specifying the consistency level of each query independently. Rather than having a global tradeoff affecting all of your queries, you can choose which queries require more consistency and which ones can deal with stale data, gaining more availability and reducing latency of your responses.

A quorum is a good way to trade latency for consistency in eventually consistent stores. You need to wait longer for the majority of the servers to respond, but you get the freshest data. If you write certain data using quorum consistency and then you always read it using quorum consistency, you are guaranteed to always get the most up-to-date data and thus regain the read-after-write semantics.

Faster Recovery to Increase Availability
In MongoDB, data is automatically sharded and distributed among multiple servers. Each piece of data belongs to a single server, and anyone who wants to update data needs to talk to the server responsible for that data. That means any time a server becomes unavailable, MongoDB rejects all writes to the data that the failed server was responsible for.

The obvious downside of having a single server responsible for each piece of data is that any time a server fails, some of your client operations begin to fail.

To add data redundancy and increase high availability, MongoDB supports replica sets, and it is recommended to set up each of the shards as a replica set. In replica sets, multiple servers share the same data, with a single server being elected as a primary. Whenever the primary node fails, an election process is initiated to decide which of the remaining nodes should take over the primary role. Once the new primary node is elected, replication within the replica set resumes and the new primary node’s data is replicated to the remaining nodes. This way, the window of unavailability can be minimized by automatic and prompt failover.

MongoDB is a CP data store (favoring consistency and partition tolerance over availability)
Since MongoDB replica sets use asynchronous replication, your writes reach primary nodes and then they replicate asynchronously to secondary nodes. This means that if the primary node failed before your changes got replicated to secondary nodes, your changes would be permanently lost.

Cassandra
all of its nodes are functionally equal
Clients can connect to any of Cassandra’s nodes and when they connect to one, that node becomes the client’s session coordinator. Clients do not need to know which nodes have what data, nor do they have to be aware of outages, repairing data, or replication. Clients send all of their requests to the session coordinator and the coordinator takes responsibility for all of the internal cluster activities like replication or sharding.

Clients then issue their queries to the coordinator node they chose without any knowledge about the topology or state of the cluster. Since each of the Cassandra nodes knows the status of all of the other nodes and what data they are responsible for, they can delegate queries to the correct servers.
The fact that clients know very little about the topology of the cluster is a great example of decoupling and significantly reduces complexity on the application side.

Functional partitioning of the web services layer and using different data stores based on the business needs is often referred to as polyglot persistence, and it is a growing trend among web applications.

Cassandra performs data partitioning automatically so that each of the nodes gets a subset of the overall data set. None of the servers needs to have all of the data, and Cassandra nodes communicate among one another to make sure they all know where parts of the data live.

The Cassandra data model is based on a wide column, similar to Google’s BigTable. In a wide column model, you create tables and then each table can have an unlimited number of rows. Unlike the relational model, tables are not connected, each table lives independently, and Cassandra does not enforce any relationships between tables or rows.

Cassandra tables are also defined in a different way than in relational databases.
Different rows may have different columns (fields), and they may live on different servers in the cluster. Rather than defining the schema up front, you dynamically create fields as they are required. This lack of upfront schema design can be a significant advantage, as you can make application changes more rapidly without the need to execute expensive alter table commands any time you want to persist a new type of information.

The flip side of Cassandra’s data model simplicity is that you have fewer tools at your disposal when it comes to searching for data.
To access data in any of the columns, you need to know which row are you looking for, and to locate the row, you need to know its row key (something like a primary key in a relational database).

When you send your query to your session coordinator, it hashes the row key (which you provided) to a number. Then, based on the number, it can find the partition range that your row key belongs to (the correct shard). Finally, the coordinator looks up which Cassandra server is responsible for that particular partition range and delegates the query to the correct server.

you can specify how many copies of each piece of data you want to keep across the cluster, and session coordinators are responsible for ensuring the correct number of replicas.

how well automated it is and how little administration it requires. For example, replacing a failed node does not require complex backup recovery and replication offset tweaking.

All you need to do to replace a broken server is add a new (blank) one and tell Cassandra which IP address this new node is replacing. All of the data transferring and consistency checking happens automatically in the background. Since each piece of data is stored on multiple servers, the cluster is fully operational throughout the server replacement procedure. Clients can read and write any data they wish even when one server is broken or being replaced. As soon as node recovery is finished, the new node begins processing requests and the cluster goes back to its original capacity.

From a scalability point of view, Cassandra is a truly horizontally scalable data store. The more servers you add, the more read and write capacity you get, and you can easily scale in and out depending on your needs.
Since data is sliced into a high number of small partition ranges, Cassandra can distribute data more evenly across the cluster. In addition, since all of the topology is hidden from the clients, Cassandra is free to move data around. As a result, adding new servers is as easy as starting up a new node and telling it to join the cluster. Again, Cassandra takes care of rebalancing the cluster and making sure that the new server gets a fair share of the data.

Cassandra loves writes
deletes are the most expensive type of operation you can perform in Cassandra.
Cassandra uses append-only data structures, which allows it to write inserts with astonishing efficiency. Data is never overwritten in place and hard disks never have to perform random write operations, greatly increasing write throughput.

But that feature, together with the fact that Cassandra is an eventually consistent data store, forces deletes and updates to be internally persisted as inserts as well. As a result, some use cases that add and delete a lot of data can become inefficient because deletes increase the data size rather than reducing it (until the compaction process cleans them up).

Saturday, February 6, 2016

Web Scalability for Startup Engineers - Book Note

Labels

Popular Posts