Massive Technical Interviews Tips: A Beginner's Guide To Scaling To 11 Million+ Users On Amazon's AWS

Friday, April 12, 2019

A Beginner's Guide To Scaling To 11 Million+ Users On Amazon's AWS

http://highscalability.com/blog/2016/1/11/a-beginners-guide-to-scaling-to-11-million-users-on-amazons.html

How do you scale a system from one user to more than 11 million users? Joel Williams, Amazon Web Services Solutions Architect, gives an excellent talk on just that subject: AWS re:Invent 2015 Scaling Up to Your First 10 Million Users.

Some of the interesting takeaways:

Start with SQL and only move to NoSQL when necessary.
A consistent theme is take components and separate them out. This allows those components to scale and fail independently. It applies to breaking up tiers and creating microservices.
Only invest in tasks that differentiate you as a business, don't reinvent the wheel.
Scalability and redundancy are not two separate concepts, you can often do both at the same time.
There's no mention of costs. That would be a good addition to the talk as that is one of the major criticisms of AWS solutions.

AWS is in 12 regions around the world.
- A Region is a physical location in the world where Amazon has multiple Availability Zones. There are regions in: North America; South America; Europe; Middle East; Africa; Asia Pacific.
- An Availability Zone (AZ) is generally a single datacenter, though they can be constructed out of multiple datacenters.
- Each AZ is separate enough that they have separate power and Internet connectivity.
- The only connection between AZs is a low latency network. AZs can be 5 or 15 miles apart, for example. The network is fast enough that your application can act like all AZs are in the same datacenter.
- Each Region has at least two Availability Zones. There are 32 AZs total.
- Using AZs it’s possible to create a high availability architecture for your application.
- At least 9 more Availability Zones and 4 more Regions are coming in 2016.
AWS has 53 edge locations around the world.
- Edge locations are used by CloudFront, Amazon’s Content Distribution Network (CDN) and Route53, Amazon’s managed DNS server.
- Edge locations enable users to access content with a very low latency no matter where they are in the world.

Building Block Services
- AWS has created a number of services that use multiple AZs internally to be highly available and fault tolerant. Here is a list of what services are available where.
- You can use these services in your application, for a fee, without having to worry about making them highly available yourself.
- Some services that exist within an AZ: CloudFront, Route 53, S3, DynamoDB, Elastic Load Balancing, EFS, Lambda, SQS, SNS, SES, SWF.
- A highly available architecture can be created using services even though they exist within a single AZ.

1 User

In this scenario you are the only user and you want to get a website running.
Your architecture will look something like:
- Run on a single instance, maybe a type t2.micro. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications.
- The one instance would run the entire web stack, for example: web app, database, management, etc.
- Use Amazon Route 53 for the DNS.
- Attach a single Elastic IP address to the instance.
- Works great, for a while.

Vertical Scaling

You need a bigger box. Simplest approach to scaling is choose a larger instance type. Maybe a c4.8xlarge or m3.2xlarge, for example.
This approach is called vertical scaling.
Just stop your instance and choose a new instance type and you’re running with more power.
There is a wide mix of different hardware configurations to choose from. You can have a system with 244 gigs of RAM (2TB of RAM types are coming soon). Or one with 40 cores. There are High I/O instances, High CPU Instances, High storage instances.
Some Amazon services come with a Provisioned IOPS option to guarantee performance. The idea is you can perhaps use a smaller instance type for your service and make use of Amazon services like DynamoDB that can deliver scalable services so you don’t have to.
Vertical scaling has a big problem: there’s no failover, no redundancy. If the instance has a problem your website will die. All your eggs are in one basket.
Eventually a single instances can only get so big. You need to do something else.

Users > 10

Separate out a single host into multiple hosts
- One host for the web site.
- One host for the database. Run any database you want, but you are on the hook for the database administration.
- Using separate hosts allows the web site and the database to be scaled independently of each other. Perhaps your database will need a bigger machine than your web site, for example.
Or instead of running your own database you could use a database service.
- Are you a database admin? Do your really want to worry about backups? High availability? Patches? Operating systems?
- A big advantage of using a service is you can have a multi Availability Zone database setup with a single click. You won’t have to worry about replication or any of that sort of thing. Your database will be highly available and reliable.
As you might imagine Amazon has several fully managed database services to sell you:
- Amazon RDS (Relational Database Service). There are many options: Microsoft SQL Server, Oracle, MySQL, PostgreSQL, MariaDB, Amazon Aurora.
- Amazon DynamoDB. A NoSQL managed database.
- Amazon Redshift. A petabyte scale data warehouse system.
More Amazon Aurora:
- Automatic storage scaling up to 64TB. You no longer have to provision the storage for your data.
- Up to 15 read read-replicas
- Continuous (incremental) backups to S3.
- 6-way replication across 3 AZs. This helps you handle failure.
- MySQL compatible.
Start with a SQL database instead of a NoSQL database.
- The suggestion is to start with a SQL database.
- The technology is established.
- There’s lots of existing code, communities, support groups, books, and tools.
- You aren’t going to break a SQL database with your first 10 million users. Not even close. (unless your data is huge).
- Clear patterns to scalability.
When might you need start with a NoSQL database?
- If you need to store > 5 TB of data in year one or you have an incredibly data intensive workload.
- Your application has super low-latency requirements.
- You need really high throughput. You need to really tweak the IOs you are getting both on the reads and the writes.
- You don’t have any relational data.

https://aws.amazon.com/about-aws/global-infrastructure/

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html

An Elastic IP address is a static IPv4 address designed for dynamic cloud computing. An Elastic IP address is associated with your AWS account. With an Elastic IP address, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account.

An Elastic IP address is a public IPv4 address, which is reachable from the internet. If your instance does not have a public IPv4 address, you can associate an Elastic IP address with your instance to enable communication with the internet; for example, to connect to your instance from your local computer.

Friday, April 12, 2019

A Beginner's Guide To Scaling To 11 Million+ Users On Amazon's AWS

1 User

Vertical Scaling

Users > 10

Labels

Popular Posts