Question

Q: How many URLs will we need to handle in the next 5 years?

Answer 1

A: Written : 40 * (500 + 6) bytes, Read : 360 * (500 + 6) bytes

Answer 2

A: This is a tricky one. Both are extremenly important. However, CAP theorem dictates that we choose one. Do we want a system that always answers correctly but is not available sometimes? Or else, do we want a system which is always available but can sometime say that a URL does not exists even if it does?
This tradeoff is a product decision around what we are trying to optimize. Let's say, we go with consistency here.

Answer 3

A: URL shortener by definition needs to be as short as possible. Shorter the shortened URL, better it compares to competition.

Answer 4

A: We can use integer encoding of the shortened URL to distribute data among our DB shards.
Assuming we assign values from 0 to 61 to characters a to z, A to Z, and 0 to 9, we can compute the integer encoding of the shortened URL.
We can see that the maximum value of integer encoding will be less than 10^13, which we can divide among our DB shards.
We will use consistent hashing to ensure we don't have to rehash all the data again if we add new DB shards later.

Answer 5

A: This is a bit tricky. Obviously, for every shard, we need to have more than one machine
We can have a scheme better known as master slave scheme, wherein there is one machine(master) which processes all writes and there is a slave machine which just subscribes to all of the writes and keeps updating itself. In an event that the master goes down, the slave can take over and start responding to the read queries

字段	含义
id	link_id
url	长连接
keyword	短链接码
type	系统: “system” 自定义: “custom”
insert_at	插入时间
updated_at	更新时间

_id	long_url	created_at
....	....	....
10002	http://stackoverflow.com/questions/tagged/node.js	2015-12-26 11:27
10003	https://docs.mongodb.org/getting-started/node/	2015-12-27 12:14
10004	http://expressjs.com/en/starter/basic-routing.html	2015-12-28 16:12

Sunday, June 28, 2015

How to design a tiny URL or URL shortener?

Assumptions

Encode Url

KV store

Followup

二

两种算法对比

后期功能扩展

一对一还是一对多映射？

如何计算短网址

如何存储

301还是302重定向

预防攻击

Cost

Multiple machines

Providing a customized short URL

Filtering undesirable words out

Previewing the long URL

Providing statistics

Labels

Popular Posts