Tuesday, October 6, 2015

Develop an API Rate-limit Throttling Client



Related: http://massivetechinterview.blogspot.com/2015/08/ratelimiter-discovering-google-guava.html

在职刷题 + System Design + 面试准备的路上
Rate Limiter: 目前了解的有三种,第一种是token bucket, 意思是给每个user一个bucket,里面放了规定的k个token,同一分钟或者小时(根据要求)来的request,我们会给token直到数量为0. 第二种是leaky bucket, 就是给每个user分配一个buffer,规定buffer的容量,满了就不serve其他request,直到来的request的时间与buffer中的第一个相差时间超过一定数额,就pop出,再插入新的request。第三种是记录每一分钟或者秒有多少个request同时来,然后每来一个request,就判断当前时间的一分钟内有多少个request,少于k就serve并记录,可以用redis或者Memcached来实现

为了避免遗忘之前看过的,今天复习了一下Tiny URL 和 Rate Limiter.
Rate Limiter再复习的时候又有了新的理解,关于第三种方法Memcached虽然能够很好的handle分秒级别的粒度,但是对于小时和天,如果只记录每小时,很容易出现一个小时的最后1秒和下一小时的第1秒有大量request,所以对于较大粒度,我们需要一方面去aggregrate之前细粒度的数据变为相对粗粒度的数据,一方面需要保持最近细粒度数据以便能够处理临界情况。还有一个是如何限制30分钟或者1个小时不能再登录的问题,我们可以在数据库中新添一个column为next available time,标明解冻时间。
https://brandur.org/rate-limiting
$ curl --silent --head -i -H "Authorization: token $GITHUB_TOKEN" \ https://api.github.com/users/brandur | grep RateLimit X-RateLimit-Limit: 5000 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1442423816

leaky bucket. It’s intuitive to understand by comparing it to its real-world namesake: imagine a bucket partially filled with water and which has some fixed capacity (τ). The bucket has a leak so that some amount of water is escaping at a constant rate (T). Whenever an action that should be rate limited occurs, some amount of water flows into the bucket, with the amount being proportional to its relative costliness. If the amount of water entering the bucket is greater than the amount leaving through the leak, the bucket starts to fill. Actions are disallowed if the bucket is full.

The leaky bucket is normally implemented using a background process that simulates a leak. It looks for any active buckets that need to be drained, and drains each one in turn. In Redis, this might look like a hash that groups all buckets under a type of rate limit and which is dripped by iterating each key and decrementing it.
The naive leaky bucket’s greatest weakness is its “drip” process. If it goes offline or gets to a capacity limit where it can’t drip all the buckets that need to be dripped, then new incoming requests might be limited incorrectly. There are a number of strategies to help avoid this danger, but if we could build an algorithm without a drip, it would be fundamentally more stable.

This leads us to the leaky bucket variant called “Generic Cell Rate Algorithm” (GCRA). The name “cell” comes from a communications technology called Asynchronous Transfer Mode (ATM) which encoded data into small packets of fixed size called “cells” (as opposed to the variable size frames of IP). GCRA was the algorithm recommended by the ATM Forumfor use in an ATM network’s scheduler so that it could either delay or drop cells that came in over their rate limit. Although today ATM is dead, we still catch occasional glimpses of its past innovation with examples like GCRA.
GCRA works by tracking remaining limit through a time called the “theoretical arrival time” (TAT), which is seeded on the first request by adding a duration representing its cost to the current time. The cost is calculated as a multiplier of our “emission interval” (T), which is derived from the rate at which we want the bucket to refill. When any subsequent request comes in, we take the existing TAT, subtract a fixed buffer representing the limit’s total burst capacity from it (τ + T), and compare the result to the current time. This result represents the next time to allow a request. If it’s in the past, we allow the incoming request, and if it’s in the future, we don’t. After a successful request, a new TAT is calculated by adding T.

9. Design an API Rate Limiter (e.g. for Firebase or Github)
Limit the number of requests an entity can send to an API within a time window e.g., 15 requests per second.
The rate limiting should work for a distributed setup, as the APIs are accessible through a cluster of servers.
How would you handle throttling (soft and hard throttling etc.).

consistent hash - same user, same api goes to same server
sticky session
embedded in application or separated
Separate of concerns: separate rate limit server and real app server, so app server would be not overloaded - based on what kind of products, do we need this

Better user experience
Thinking from client/user perspective
How they use it, what they would like to know

warn/notify user when user reaches 90% limit
global or user specific?

Whether we can change the limit

Non-function features
Unique features
High availability, ddos aatack
http://crackingjavainterviews.blogspot.com/2013/06/what-do-you-understand-by-token-bucket.html
We will choose a Leaky Bucket Implementation, where a fixed amount of tokens are filled after a predefined interval into the bucket. If no one utilizes those token, then they do not get accumulated over time, they just over flow after the capacity of bucket is reached. Let's name this strategy as FixedIntervalRefillStrategy.

public class TokenBucket {
    private final RefillStrategy refillStrategy;
    private final long capacity;
    private AtomicLong size;

    public TokenBucket(long capacity, RefillStrategy refillStrategy) {
        this.refillStrategy = refillStrategy;
        this.capacity = capacity;
        this.size = new AtomicLong(0L);
    }

    public void consume(long numTokens) throws InterruptedException {
        if (numTokens < 0)
            throw new RuntimeException("Number of tokens to consume must be positive");
        if (numTokens >= capacity)
            throw new RuntimeException("Number of tokens to consume must be less than the capacity of the bucket");

        long newTokens = Math.max(0, refillStrategy.refill());
        while (!Thread.currentThread().isInterrupted()) {
            long existingSize = size.get();
            long newValue = Math.max(0, Math.min(existingSize + newTokens, capacity));
            if (numTokens <= newValue) {
                newValue -= numTokens;
                if (size.compareAndSet(existingSize, newValue))
                    break;
            } else {
                Thread.sleep(refillStrategy.getIntervalInMillis());
                newTokens = Math.max(0, refillStrategy.refill());
            }
        }
    }
public class FixedIntervalRefillStrategy implements TokenBucket.RefillStrategy {
    private final long numTokens;
    private final long intervalInMillis;
    private AtomicLong nextRefillTime;

   public FixedIntervalRefillStrategy(long numTokens, long interval, TimeUnit unit) {
        this.numTokens = numTokens;
        this.intervalInMillis = unit.toMillis(interval);
        this.nextRefillTime = new AtomicLong(-1L);
    }

    public long refill() {
        final long now = System.currentTimeMillis();
        final long refillTime = nextRefillTime.get();
        if (now < refillTime) {
            return 0;
        }
        
        return nextRefillTime.compareAndSet(refillTime, now + intervalInMillis) ? numTokens : 0;
    }

    public long getIntervalInMillis() {
        return intervalInMillis;
    }

}

http://blog.51cto.com/zhangfengzhe/2066683
其次,应对大流量的一些常见手段是什么?
缓存:说白了,就是让数据尽早进入缓存,离程序近一点,不要大量频繁的访问DB。
降级:如果不是核心链路,那么就把这个服务降级掉。打个比喻,现在的APP都讲究千人千面,拿到数据后,做个性化排序展示,如果在大流量下,这个排序就可以降级掉!
限流:大家都知道,北京地铁早高峰,地铁站都会做一件事情,就是限流了!想法很直接,就是想在一定时间内把请求限制在一定范围内,保证系统不被冲垮,同时尽可能提升系统的吞吐量。
注意到,有些时候,缓存和降级是解决不了问题的,比如,电商的双十一,用户的购买,下单等行为,是涉及到大量写操作,而且是核心链路,无法降级的,这个时候,限流就比较重要了。
限流的常用处理手段有:计数器、滑动窗口、漏桶、令牌。
计数器
计数器是一种比较简单的限流算法,用途比较广泛,在接口层面,很多地方使用这种方式限流。在一段时间内,进行计数,与阀值进行比较,到了时间临界点,将计数器清0。
接口限流算法总结
public class CounterDemo {
public long timeStamp = getNowTime();
public int reqCount = 0;
public final int limit = 100; // 时间窗口内最大请求数
public final long interval = 1000; // 时间窗口ms
public boolean grant() {
long now = getNowTime();
if (now < timeStamp + interval) {
// 在时间窗口内
reqCount++;
// 判断当前时间窗口内是否超过最大请求控制数
return reqCount <= limit;
}
else {
timeStamp = now;
// 超时后重置
reqCount = 1;
return true;
}
}
}

从上图中我们可以看到,假设有一个恶意用户,他在0:59时,瞬间发送了100个请求,并且1:00又瞬间发送了100个请求,那么其实这个用户在1秒里面,瞬间发送了200个请求。我们刚才规定的是1分钟最多100个请求,也就是每秒钟最多1.7个请求,用户通过在时间窗口的重置节点处突发请求,可以瞬间超过我们的速率限制。用户有可能通过算法的这个漏洞,瞬间压垮我们的应用。
聪明的朋友可能已经看出来了,刚才的问题其实是因为我们统计的精度太低。那么如何很好地处理这个问题呢?或者说,如何将临界问题的影响降低呢?我们可以看下面的滑动窗口算法。
滑动窗口,又称rolling window。为了解决这个问题,我们引入了滑动窗口算法。如果学过TCP网络协议的话,那么一定对滑动窗口这个名词不会陌生。下面这张图,很好地解释了滑动窗口算法:
2016-09-01_20:42:46.jpg
在上图中,整个红色的矩形框表示一个时间窗口,在我们的例子中,一个时间窗口就是一分钟。然后我们将时间窗口进行划分,比如图中,我们就将滑动窗口划成了6格,所以每格代表的是10秒钟。每过10秒钟,我们的时间窗口就会往右滑动一格。每一个格子都有自己独立的计数器counter,比如当一个请求在0:35秒的时候到达,那么0:30~0:39对应的counter就会加1。
那么滑动窗口怎么解决刚才的临界问题的呢?我们可以看上图,0:59到达的100个请求会落在灰色的格子中,而1:00到达的请求会落在橘黄色的格子中。当时间到达1:00时,我们的窗口会往右移动一格,那么此时时间窗口内的总请求数量一共是200个,超过了限定的100个,所以此时能够检测出来触发了限流。
我再来回顾一下刚才的计数器算法,我们可以发现,计数器算法其实就是滑动窗口算法。只是它没有对时间窗口做进一步地划分,所以只有1格。
由此可见,当滑动窗口的格子划分的越多,那么滑动窗口的滚动就越平滑,限流的统计就会越精确。


虽然滑动窗口有效避免了时间临界点的问题,但是依然有时间片的概念,而漏桶算法在这方面比滑动窗口而言,更加先进。
有一个固定的桶,进水的速率是不确定的,但是出水的速率是恒定的,当水满的时候是会溢出的。
对高并发流量控制的一点思考
从图中我们可以看到,整个算法其实十分简单。首先,我们有一个固定容量的桶,有水流进来,也有水流出去。对于流进来的水来说,我们无法预计一共有多少水会流进来,也无法预计水流的速度。但是对于流出去的水来说,这个桶可以固定水流出的速率。而且,当桶满了之后,多余的水将会溢出。
我们将算法中的水换成实际应用中的请求,我们可以看到漏桶算法天生就限制了请求的速度。当使用了漏桶算法,我们可以保证接口会以一个常速速率来处理请求。所以漏桶算法天生不会出现临界问题。









1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public class LeakyDemo {
public long timeStamp = getNowTime();
public int capacity; // 桶的容量
public int rate; // 水漏出的速度
public int water; // 当前水量(当前累积请求数)
public boolean grant() {
long now = getNowTime();
water = max(0, water - (now - timeStamp) * rate); // 先执行漏水,计算剩余水量
timeStamp = now;
if ((water + 1) < capacity) {
// 尝试加水,并且水还未满
water += 1;
return true;
}
else {
// 水满,拒绝加水
return false;
}
}
}

注意到,漏桶的出水速度是恒定的,那么意味着如果瞬时大流量的话,将有大部分请求被丢弃掉(也就是所谓的溢出)。为了解决这个问题,令牌桶进行了算法改进。
对高并发流量控制的一点思考
生成令牌的速度是恒定的,而请求去拿令牌是没有速度限制的。这意味,面对瞬时大流量,该算法可以在短时间内请求拿到大量令牌,而且拿令牌的过程并不是消耗很大的事情。(有一点生产令牌,消费令牌的意味)
不论是对于令牌桶拿不到令牌被拒绝,还是漏桶的水满了溢出,都是为了保证大部分流量的正常使用,而牺牲掉了少部分流量,这是合理的,如果因为极少部分流量需要保证的话,那么就可能导致系统达到极限而挂掉,得不偿失。

2016-09-02_10:10:24.jpg
从图中我们可以看到,令牌桶算法比漏桶算法稍显复杂。首先,我们有一个固定容量的桶,桶里存放着令牌(token)。桶一开始是空的,token以一个固定的速率r往桶里填充,直到达到桶的容量,多余的令牌将会被丢弃。每当一个请求过来时,就会尝试从桶里移除一个令牌,如果没有令牌的话,请求无法通过。
public class TokenBucketDemo {
public long timeStamp = getNowTime();
public int capacity; // 桶的容量
public int rate; // 令牌放入速度
public int tokens; // 当前令牌数量
public boolean grant() {
long now = getNowTime();
// 先添加令牌
tokens = min(capacity, tokens + (now - timeStamp) * rate);
timeStamp = now;
if (tokens < 1) {
// 若不到1个令牌,则拒绝
return false;
}
else {
// 还有令牌,领取令牌
tokens -= 1;
return true;
}
}
}



我们再来考虑一下临界问题的场景。在0:59秒的时候,由于桶内积满了100个token,所以这100个请求可以瞬间通过。但是由于token是以较低的速率填充的,所以在1:00的时候,桶内的token数量不可能达到100个,那么此时不可能再有100个请求通过。所以令牌桶算法可以很好地解决临界问题。下图比较了计数器(左)和令牌桶算法(右)在临界点的速率变化。我们可以看到虽然令牌桶算法允许突发速率,但是下一个突发速率必须要等桶内有足够的token后才能发生:

计数器 VS 滑动窗口

计数器算法是最简单的算法,可以看成是滑动窗口的低精度实现。滑动窗口由于需要存储多份的计数器(每一个格子存一份),所以滑动窗口在实现上需要更多的存储空间。也就是说,如果滑动窗口的精度越高,需要的存储空间就越大。

漏桶算法 VS 令牌桶算法

漏桶算法和令牌桶算法最明显的区别是令牌桶算法允许流量一定程度的突发。因为默认的令牌桶算法,取走token是不需要耗费时间的,也就是说,假设桶内有100个token时,那么可以瞬间允许100个请求通过。
令牌桶算法由于实现简单,且允许某些流量的突发,对用户友好,所以被业界采用地较多。当然我们需要具体情况具体分析,只有最合适的算法,没有最优的算法。

上面所说的限流的一些方式,都是针对单机而言的,其实大部分的场景,单机的限流已经足够了。分布式下限流的手段常常需要多种技术相结合,比如Nginx+Lua,Redis+Lua等去做。本文主要讨论的是单机的限流,这里就不在详细介绍分布式场景下的限流了。
一句话,让系统的流量,先到队列中排队、限流,不要让流量直接打到系统上
Develop an API Rate-limit Throttling Client
要求写一个api, 请求第三方api, 如果一秒内的请求太多, 自己的api就直接忽略掉。
面试小哥给了个框架
import time
import datetime

class GoogleMapsClient(object):
    """3rd party maps client; we CANT EDIT THIS."""
.鏈枃鍘熷垱鑷�1point3acres璁哄潧
    def __init__(self):
        self.requests_made = 0

    def make_request(self):. more info on 1point3acres.com
        self.requests_made += 1
        now = datetime.datetime.now().time()
        return "%d - %s - San Francisco" % (self.requests_made, now)


刚开始对python time的单位不熟悉,有个bug。后来改了
class MyRequest(object): 鏉ユ簮涓€浜�.涓夊垎鍦拌鍧�. 

    def __init__(self):
        self.client = GoogleMapsClient()
        self.time = time.time()
        self.request_times = 0. 鐣欏鐢宠璁哄潧-涓€浜╀笁鍒嗗湴

    def make_request(self):

        if (time.time()-self.time)>1:
            self.time = time.time()
            self.request_times = 0
        else:
            self.request_times += 1
        if self.request_times>=10: 
            time.sleep(1)
        return self.client.make_request(). visit 1point3acres.com for more.


Follow up 是如何让myclient 继续接受request。 可能我也没听明白题意, 小哥直接说time.sleep(1). 

最优解难道不是 token bucket 和 leaky bucket 么?

var rate = 5.0; // unit: messages
var per  = 8.0; // unit: seconds
var allowance = rate; // unit: messages
var lastTime = new Date(); // floating-point, e.g. usec accuracy. Unit: seconds

while (var msg = recv()) {
  var currentTime = new Date();
  var timeElapsed = (currentTime - lastTime) / 1000;. from: 1point3acres.com/bbs
  var lastTime = currentTime;
  allowance += timeElapsed * (rate / per);

  if (allowance > rate) {
    allowance = rate; // throttle. visit 1point3acres.com for more.
  }

  if (allowance < 1.0) {
    discard(msg);
  } else {
    forward(msg);
    allowance -= 1.0;
  }
}.

Token Bucket的思想是同样有一个桶,令牌以恒定速率放入桶,桶内的令牌数有上限,每个请求会acquire一个令牌,如果某个请求来到而桶内没有令牌了,请说明这个请求是过载的。和Lecky Bucket不同的是,Token Bucket存在burst rate。比如当前令牌放入速率4个每秒,桶的令牌上限是8,第一秒内没有请求,第二秒实际就可以处理8个请求!虽然平均速率还是4个每秒,但是爆发速率是8个每秒。

Token Bucket虽然有burst rate,但是只要调整为和rate一样就可以了,而且实现起来不需要定时器。
Token Bucket的实现原理是计算请求时间和上一次请求时间之间内增加的令牌数放入桶,比较桶内的令牌数是否足够用于请求,如果不够就认为过载,否则减去响应令牌,设置上一次请求时间为本次请求时间。注意下面的take方法实现
另外调整burst rate和rate的方式从之前的4个每秒的例子中隐约看到了,如果桶的大小等于每秒的令牌数的话,那么每秒的burst rate就是rate。TokenBucket的构造函数也是这么做的。

static class TokenBucket {
  private final int capacity;
  private final int tokensPerSeconds;
  private int tokens = 0;
  private long timestamp = System.currentTimeMillis();
  public TokenBucket(int tokensPerUnit, TimeUnit unit) {
    capacity = tokensPerSeconds = (int) (tokensPerUnit / unit.toSeconds(1L));
  }
  public boolean take() {
    long now = System.currentTimeMillis();
    tokens += (int) ((now - timestamp) * tokensPerSeconds / 1000);
    if (tokens > capacity) tokens = capacity;
    timestamp = now;
    if (tokens < 1) return false;
    tokens--;
    return true;
  }
}
http://crackingjavainterviews.blogspot.com/2013/06/what-do-you-understand-by-token-bucket.html
TokenBucket's consume() method accepts the number of tokens to consume. This method will then remove those number of Tokens from the bucket, refilling the bucket if required. This method utilizes CAS (CompareAndSet) operation of AtomicLong to make the resize operation atomic so that no-locking is required. This will make the class thread-safe when multiple threads will simultaneously demand for the tokens.
public class TokenBucket {
    private final RefillStrategy refillStrategy;
    private final long capacity;
    private AtomicLong size;

    public TokenBucket(long capacity, RefillStrategy refillStrategy) {
        this.refillStrategy = refillStrategy;
        this.capacity = capacity;
        this.size = new AtomicLong(0L);
    }

    public void consume(long numTokens) throws InterruptedException {
        if (numTokens < 0)
            throw new RuntimeException("Number of tokens to consume must be positive");
        if (numTokens >= capacity)
            throw new RuntimeException("Number of tokens to consume must be less than the capacity of the bucket");

        long newTokens = Math.max(0, refillStrategy.refill());
        while (!Thread.currentThread().isInterrupted()) {
            long existingSize = size.get();
            long newValue = Math.max(0, Math.min(existingSize + newTokens, capacity));
            if (numTokens <= newValue) {
                newValue -= numTokens;
                if (size.compareAndSet(existingSize, newValue))
                    break;
            } else {
                Thread.sleep(refillStrategy.getIntervalInMillis());
                newTokens = Math.max(0, refillStrategy.refill());
            }
        }
    }


@Override
 public String toString() {
     return "Capacity : " + capacity + ", Size : " + size;
 }
}


public static interface RefillStrategy {
     long refill();
     long getIntervalInMillis();
 }


public final class TokenBuckets {

    private TokenBuckets() {}

    public static TokenBucket newFixedIntervalRefill(long capacityTokens, long refillTokens, long period, TimeUnit unit)
    {
        TokenBucket.RefillStrategy strategy = new FixedIntervalRefillStrategy(refillTokens, period, unit);
        return new TokenBucket(capacityTokens, strategy);
    }

}

public class FixedIntervalRefillStrategy implements TokenBucket.RefillStrategy {
    private final long numTokens;
    private final long intervalInMillis;
    private AtomicLong nextRefillTime;

   public FixedIntervalRefillStrategy(long numTokens, long interval, TimeUnit unit) {
        this.numTokens = numTokens;
        this.intervalInMillis = unit.toMillis(interval);
        this.nextRefillTime = new AtomicLong(-1L);
    }

    public long refill() {
        final long now = System.currentTimeMillis();
        final long refillTime = nextRefillTime.get();
        if (now < refillTime) {
            return 0;
        }

        return nextRefillTime.compareAndSet(refillTime, now + intervalInMillis) ? numTokens : 0;
    }

    public long getIntervalInMillis() {
        return intervalInMillis;
    }

}
Leaky Bucket Algorithm
  1. when the user wishes to do a controlled action (e.g. send an SMS), he tries to add a “drop” of water to the “bucket” using addDropToBucket()
  2. addDropToBucket() checks to see whether some drops should have leaked out since the last call. If so, it lets them leak
  3. then addDropToBucket() checks to see if there is space in the bucket for a single drop. If there is, it adds the drop and returns true, otherwise it returns false
  4. if the user receives “true” he carries out the controlled action, otherwise he doesn’t
     * Leaky bucket algorithm to prevent huge amounts of SMS text messages
     * from being dispatched by any insane processes. Each SMS message
     * sent adds a drop to the
     * bucket which leaks at a constant rate. Once the bucket fills, no
     * message can be sent until a drop has leaked out.
     */
    private class LeakyBucketLimiter {
        private int numDropsInBucket = 0;
        private Date timeOfLastDropLeak = null;
        private final int _BUCKET_SIZE_IN_DROPS = 20;
        private final long _MS_BETWEEN_DROP_LEAKS = 1000 * 60 * 60; // 1 hour

        public synchronized boolean addDropToBucket() {
            Date now = new Date();
            // first of all, let the bucket leak by the appropriate amount
            if (timeOfLastDropLeak != null) {
                long deltaT = now.getTime() - timeOfLastDropLeak.getTime();
                // note round down as part of integer arithmetic
                long numberToLeak = deltaT / _MS_BETWEEN_DROP_LEAKS;
                if (numberToLeak > 0) { //now go and do the leak
                    if (numDropsInBucket <= numberToLeak) {
                        numDropsInBucket = 0;
                    } else {
                        numDropsInBucket -= (int) numberToLeak;
                    }
                    timeOfLastDropLeak = now;
                }
            }
            if (numDropsInBucket < _BUCKET_SIZE_IN_DROPS) {
                numDropsInBucket++;
                return true; // drop added
            }
            return false; // overflow
        }
    }
    // here is how you use it
    bucketLimiter = new LeakyBucketLimiter();
    if (bucketLimiter.addDropToBucket()) {
        // dispatch SMS
    }
http://sharecore.net/2014/06/21/%E8%BF%87%E8%BD%BD%E4%BF%9D%E6%8A%A4%E7%AE%97%E6%B3%95%E6%B5%85%E6%9E%90/


http://blog.csdn.net/big_gutan/article/details/46413167

计数器(简单粗暴)

简单维护一个单位时内的计数器,请求过来则递增,计算器过期则清理。
long timeStamp=getNowTime();
int reqCount=0;
const int rate=100;//时间周期内最大请求数
const long interval=1000;//时间控制周期ms

bool grant(){
    long now=getNowTime();
    if (now <timeStamp+interval){//在时间控制范围内
        reqCount++;
        return reqCount>rate;//当前时间范围内超过最大请求控制数
    }else{
        timeStamp=now;//超时后重置
        reqCount=0;
        return true;
    }
}

漏桶算法(Leaky Bucket)

水(请求)先进入到漏桶里,漏桶以恒定的速度出水,当水流入速度过大会导致溢出。
漏桶算法示意图
long timeStamp=getNowTime();        
int capacity;        // 桶的容量
int rate ;          //水漏出的速度
int water;          //当前水量

bool grant() {
  //先执行漏水,因为rate是固定的,所以可以认为“时间间隔*rate”即为漏出的水量
  long  now = getNowTime();
  water = max(0, water- (now - timeStamp)*rate);
  timeStamp = now;

  if (water < capacity) { // 水还未满,加水
    water ++;
    return true;
  } else {
    return false;//水满,拒绝加水
  }
}

令牌桶算法(Token Bucket)

以恒定速度往桶里放入Token,请求过来需从桶里获取Token,当请求速度过大会导致获取Token失败。
令牌桶算法示意图
Google开源工具包Guava提供了限流工具类RateLimiter,该类基于令牌桶算法来完成限流
long timeStamp=getNowTime();        
int capacity;              // 桶的容量
int rate ;              //令牌放入速度
int tokens;            //当前水量

bool grant() {
  //先执行添加令牌的操作
  long  now = getNowTime();
  tokens = max(capacity, tokens+ (now - timeStamp)*rate);
  timeStamp = now;
  //令牌已用完,拒绝访问
  if(tokens<1){
    return false;
  }else{//还有令牌,领取令牌
    tokens--;
    retun true;
  }
}
基于上述计数器方式,将单位时间切割到更小的时间片,每个时间片维护一个计数器,随着时间推送,清理单位时间外的所有计数器,统计当前单位时间内的所有计数器。
其他算法示意图

http://www.cnblogs.com/LBSer/p/4083131.html
  Google开源工具包Guava提供了限流工具类RateLimiter,该类基于令牌桶算法来完成限流,非常易于使用。RateLimiter类的接口描述请参考:RateLimiter接口描述,具体使用请参考:RateLimiter使用实践
public double acquire() {
        return acquire(1);
    }

 public double acquire(int permits) {
        checkPermits(permits);  //检查参数是否合法(是否大于0)
        long microsToWait;
        synchronized (mutex) { //应对并发情况需要同步
            microsToWait = reserveNextTicket(permits, readSafeMicros()); //获得需要等待的时间 
        }
        ticker.sleepMicrosUninterruptibly(microsToWait); //等待,当未达到限制时,microsToWait为0
        return 1.0 * microsToWait / TimeUnit.SECONDS.toMicros(1L);
    }

private long reserveNextTicket(double requiredPermits, long nowMicros) {
        resync(nowMicros); //补充令牌
        long microsToNextFreeTicket = nextFreeTicketMicros - nowMicros;
        double storedPermitsToSpend = Math.min(requiredPermits, this.storedPermits); //获取这次请求消耗的令牌数目
        double freshPermits = requiredPermits - storedPermitsToSpend;

        long waitMicros = storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)
                + (long) (freshPermits * stableIntervalMicros); 

        this.nextFreeTicketMicros = nextFreeTicketMicros + waitMicros;
        this.storedPermits -= storedPermitsToSpend; // 减去消耗的令牌
        return microsToNextFreeTicket;
    }

private void resync(long nowMicros) {
        // if nextFreeTicket is in the past, resync to now
        if (nowMicros > nextFreeTicketMicros) {
            storedPermits = Math.min(maxPermits,
                    storedPermits + (nowMicros - nextFreeTicketMicros) / stableIntervalMicros);
            nextFreeTicketMicros = nowMicros;
        }
    }
电商课题I:集群环境下业务限流
对网站某一个URL/表单提交/Ajax请求的访问进行实时检测,找出过于频繁请求的ip,对这些ip的访问频率进行限制。
对网站开放平台访问,对某一个开放接口的调用,有频次约束,即针对单一App Key不得超过每小时150次调用。

推荐采用令牌桶算法的简易实现。
令牌桶和漏桶算法最主要的差别在于:漏桶算法能够强行限制数据的传输速率,而令牌桶算法能够在限制数据的平均传输速率的同时还允许某种程度的突发传输。
在令牌桶算法中,只要令牌桶中存在令牌,那么就允许突发地传输数据直到达到用户配置的门限,因此它适合于具有突发特性的流量。
Leaky bucket
Token Bucket,令牌桶算法是网络流量整形(Traffic Shaping)和速率限制(Rate Limiting)中最常使用的一种算法。
漏桶算法不够灵活,因此加入令牌机制。
基本思想:
令牌桶在 traffic shaping 中的应用思想如下图2.1所示。
http://images.cnblogs.com/cnblogs_com/zhengyun_ustc/255879/o_clipboard121112.png
http://images.cnblogs.com/cnblogs_com/zhengyun_ustc/255879/o_clipboard32.png
    图2.1 CAR和CTS进行流量控制示意图
我们主要关注约定访问速率(CAR)模式,即:
a. 按特定的速率向令牌桶投放令牌;
b.根据预设的匹配规则先对报文进行分类,不符合匹配规则的报文不需要经过令牌桶的处理,直接发送;
c.符合匹配规则的报文,则需要令牌桶进行处理。当桶中有足够的令牌则报文可以被继续发送下去,同时令牌桶中的令牌量按报文的长度做相应的减少;
d.当令牌桶中的令牌不足时,报文将不能被发送(即丢弃),只有等到桶中生成了新的令牌,报文才可以发送。这就可以限制报文的流量只能是小于等于令牌生成的速度,达到限制流量的目的。

实现:
在数据结构上,没有必要真的实现一个令牌桶。
基于时间的流逝生成受控制数量的令牌即可——以时间的流逝来洗涤旧迹,也就是将两次发包或者收包的间隔和令牌数量联系起来。
http://www.slideshare.net/vimal25792/leaky-bucket-tocken-buckettraffic-shaping

http://zim.logdown.com/posts/300977-distributed-rate-limiter

http://blog.csdn.net/u013815546/article/details/63253495
http://xiaobaoqiu.github.io/blog/2017/02/04/api-blocking/
Redis实现限流

Redis的官网的命令手册的例子就是如何使用 incr 指令实现接口限流.参见官网: https://redis.io/commands/incr/
简单说就是每个请求生成一个key(可以根据IP + 接口url生成, 也可以直接根据接口url生成), value为计数值. 设置过期时间.
需要注意 Redis 的过期策略是混合的:
1.被动删除:当读/写一个已经过期的key时,会触发惰性删除策略,直接删除掉这个过期key;
2.主动删除:Redis会定期(默认好像是100ms)主动淘汰一批已过期的key;当已用内存超过限定时, 也会触发主动清理策
https://stripe.com/blog/rate-limiters
when you’re running a production API, not only do you have to make it robust with techniques like idempotency

Rate limiters are amazing for day-to-day operations, but during incidents (for example, if a service is operating more slowly than usual), we sometimes need to drop low-priority requests to make sure that more critical requests get through. This is called load shedding

load shedder makes its decisions based on the whole state of the system rather than the user who is making the request. Load shedders help you deal with emergencies, since they keep the core part of your business working while the rest is on fire.

Request rate limiter
After analyzing our traffic patterns, we added the ability to briefly burst above the cap for sudden spikes in usage during real-time events (e.g. a flash sale.)

Concurrent requests limiter
Instead of “You can use our API 1000 times a second”, this rate limiter says “You can only have 20 API requests in progress at the same time”. Some endpoints are much more resource-intensive than others, and users often get frustrated waiting for the endpoint to return and then retry. These retries add more demand to the already overloaded resource, slowing things down even more. The concurrent rate limiter helps address this nicely.
Our concurrent request limiter is triggered much less often (12,000 requests this month), and helps us keep control of our CPU-intensive API endpoints. Before we started using a concurrent requests limiter, we regularly dealt with resource contention on our most expensive endpoints caused by users making too many requests at one time. The concurrent request limiter totally solved this.
Fleet usage load shedder
Using this type of load shedder ensures that a certain percentage of your fleet will always be available for your most important API requests.
We divide up our traffic into two types: critical API methods (e.g. creating charges) and non-critical methods (e.g. listing charges.) We have a Redis cluster that counts how many requests we currently have of each type.

We always reserve a fraction of our infrastructure for critical requests. If our reservation number is 20%, then any non-critical request over their 80% allocation would be rejected with status code 503.
Worker utilization load shedder
Most API services use a set of workers to independently respond to incoming requests in a parallel fashion. This load shedder is the final line of defense. If your workers start getting backed up with requests, then this will shed lower-priority traffic.
This one gets triggered very rarely, only during major incidents.
We divide our traffic into 4 categories:
  1. Critical methods
  2. POSTs
  3. GETs
  4. Test mode traffic
We track the number of workers with available capacity at all times. If a box is too busy to handle its request volume, it will slowly start shedding less-critical requests, starting with test mode traffic. If shedding test mode traffic gets it back into a good state, great! We can start to slowly bring traffic back. Otherwise, it’ll escalate and start shedding even more traffic.
It’s very important that shedding and bringing load happen slowly



https://helpx.adobe.com/coldfusion/api-manager/throttling-and-rate-limiting.html
When a throttle limit is crossed, the server sends 429 message as HTTP status to the user with message content as "too many requests".
Throttling in API manager can be of two types: Hard and soft.
Hard: The number of API requests cannot exceed the throttle limit.
Soft: In this type, you can set the API request limit to exceed a certain percentage. For example, if you set the exceed limit to 90%, user gets a notification when the exceed limit is crossed.


Publisher assigns multiple plans for APIs while publishing them. Subscriber can select one of the plans while consuming APIs. 
https://medium.com/smyte/rate-limiter-df3408325846
While Redis is great for quickly taking a prototype to production, it has a lot of downsides:
  • It’s hard to scan over subsets of the keyspace.
  • No built-in key compression. For applications that have many keys that share the same prefix and relatively small values (like our rate limiter), this results in much higher storage requirements.
  • Redis in production isn’t easy. We decided not to trust the new sentinel mode just yet, and the standard leader-follower replication is far from failsafe.
  • BGSAVE is problematic. It is required for leader-follower replication and uses the fork() syscall which then operates in copy-on-write memory. For write-heavy workloads such as rate-limiting, every write likely will cause that entire page to be copied for the BGSAVE process. This means the operation might not complete unless you have double the amount of memory available.

The industry standard algorithm for rate limiting is called a token bucket, sometimes called a “leaky bucket”. Each bucket has a string key and initially contains the maximum number of tokens. Every time an event occurs, you check if the bucket contains enough tokens and reduce the number of tokens in the bucket by the requested amount. After a period of time called the refill time, the number of tokens in the bucket is increased by the refill amount. Over time, these refills will fill up the bucket to the maximum number of tokens.
  • “Each user account is allowed a maximum of 10 failed logins per hour, which refill at a rate of 1 login per hour”.
  • key: a unique byte string that identifies the bucket
  • maxAmount: the maximum number of tokens the bucket can hold
  • refillTime: the amount of time between refills
  • refillAmount: the number of tokens that are added to the bucket during a refill
  • value: the current number of tokens in the bucket
  • lastUpdate: the last time the bucket was updated



https://gist.github.com/petehunt/08c9fef5703e79ca26ceed845163dcdc
The example shows it’s very easy to implement this algorithm in memory, and can be quite cheap since we’re only storing two integers for each bucket. One problem with implementing this in memory is persistence. We may have very long-lived rate limits in our application, and we can’t afford to lose them if we take down or move a rate limit shard.
Related: 架构必备:Rate limiting 的作用和常见方式

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts