Sunday, August 30, 2015

Redis Internal



https://www.xilidou.com/2018/03/20/redis-server/
我们先看代码 server.h/redisServer
1
2
3
4
5
6
7
8
9
10
11
struct redisServer{
...
//保存 db 的数组
redisDb *db;
//db 的数量
int dbnum;
...
}
再看redisDb的代码:
1
2
3
4
5
6
7
8
9
typedef struct redisDb {
dict *dict; /* The keyspace for this DB */
dict *expires; /* Timeout of keys with a timeout set */
dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/
dict *ready_keys; /* Blocked keys that received a PUSH */
dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
int id; /* Database ID */
long long avg_ttl; /* Average TTL, just for stats */
} redisDb;
总体来说redis的 server 包含若干个(默认16个) redisDb 数据库。
db
Redis 是一个 k-v 存储的键值对数据库。其中字典 dict 保存了数据库中的所有键值对,这个地方叫做 keyspace 直译过来就是“键空间”。
所以我们就可以这么认为,在 redisDb 中我们使用 dict(字典)来维护键空间。
  • keyspace 的 kay 是数据库的 key,每一个key 是一个字符串对象。注意不是字符串,而是字符串对象。
  • keyspace 的 value 是数据库的 value,这个 value 可以是 redis 的,字符串对象,列表对象,哈希表对象,集合对象或者有序对象中的一种。
  • 在 redisDb 中使用了 dict *expires,来存储过期时间的。其中 key 指向了 keyspace 中的 key(c 语言中的指针), value 是一个 long long 类型的时间戳,标定这个 key 过期的时间点,单位是毫秒。
    如果我们为上文的 mobile 增加一个过期时间。
    1
    >redis PEXPIREAT mobile 1521469812000
    这个时候就会在过期的 字典中增加一个键值对。如下图:
    db
    对于过期的判断逻辑就很简单:
    1. 在 字典 expires 中 key 是否存在。
    2. 如果 key 存在,value 的时间戳是否小于当前系统时间戳。
key的删除有三种策略:
  1. 定时删除,Redis定时的删除内存里面所有过期的键值对,这样能够保证内存友好,过期的key都会被删除,但是如果key的数量很多,一次删除需要CPU运算,CPU不友好。
  2. 惰性删除,只有 key 在被调用的时候才去检查键值对是否过期,但是会造成内存中存储大量的过期键值对,内存不友好,但是极大的减轻CPU 的负担。
  3. 定时部分删除,Redis定时扫描过期键,但是只删除部分,至于删除多少键,根据当前 Redis 的状态决定。
这三种策略就是对时间和空间有不同的倾向。Redis为了平衡时间和空间,采用了后两种策略 惰性删除和定时部分删除。
过期键的定时删除的策略由 expire.c/activeExpireCycle() 函数实现,server.c/serverCron() 定时的调用 activieExpireCycle() 。
activeExpireCycle 的大的操作原则是,如果过期的key比较少,则删除key的数量也比较保守,如果,过期的键多,删除key的策略就会很激进。
1
2
3
static unsigned int current_db = 0; /* Last DB tested. */
static int timelimit_exit = 0; /* Time limit hit in previous call? */
static long long last_fast_cycle = 0; /* When last fast cycle ran. */
  • 首先三个 static 全局参数分别记录目前遍历的 db下标,上一次删除是否是超时退出的,上一次快速操作是什么时候进行的。
  • 计算 timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100; 可以理解为 25% 的 cpu 时间。
  • 如果 db 中 expire 的大小为0 不操作
  • expire 占总 key 小于 1% 不操作
  • num = dictSize(db->expires);num 是 expire 使用的key的数量。
  • slots = dictSlots(db->expires); slots 是 expire 字典的尺寸大小。
  • 已使用的key(num) 大于 ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 则设置为 ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP。也就是说每次只检查 ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 个键。
  • 随机获取带过期的 key。计算是否过期,如果过期就删除。
  • 然后各种统计,包括删除键的次数,平均过期时间。
  • 每遍历十六次,计算操作时间,如果超过 timelimit 结束返回。
  • 如果删除的过期键大于 ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 的 1\4 就跳出循环,结束。
步骤比较复杂,总结一下:(这里都是以默认配置描述)
  1. redis 会用最多 25% 的 cpu 时间处理键的过期。
  2. 遍历所有的 redisDb
  3. 在每个 redisDb 中如果数据中没有过期键或者过期键比例过低就直接进入下一个 redisDb。
  4. 否则,遍历 redisDb 中的过期键,如果删除的键达到有过期时间的的key 的25% ,或者操作时间大于 cpu 时间的 25% 就结束当前循环,进入下一个redisDb。
看 Redis 的代码越多越发现,实际上 Redis 一直在做的一件事情就是平衡,一直在平衡程序的空间和时间。其实平时的业务设计,就是在宏观上平衡,平衡宏观系统的时间和空间。所以,看源码是让我们从微观学习系统架构的良好途径,是架构师的成长的必经之路。
Redis过期机制介绍
EXPIRE key ttl(单位秒)
设置session的值为100,然后设置他的ttl为5s,之后连续几次使用get命令获取session,5s之后将获取不到session,因为ttl时间已到,session被删除。

如果想知道一个键还有多长时间被删除,则可以使用TTL命令查看,使用方法如下:

TTL key
如果想取消某个键的过期时间,可以使用PERSIST命令,用法如下:
PERSIST key
除了PERSIST命令会清除键的过期时间之外,SET,GETSET命令也能清除键的过期时间,但是只对键进行操作的命令(比如INCR,LPUSH等等)不会清除键的过期时间。
EXPIRE命令的单位是秒,如果想要更精确的过期时间,则可以使用PEXPIRE命令,该命令的单位是毫秒,相应地可以使用PTTL看剩余时间。
如果WATCH命令监控了一个具有过期时间的键,如果监控期间这个键过期被自动删除,WATCH并不认为该键被改变
http://segmentfault.com/a/1190000003877523
https://miafish.wordpress.com/2015/03/05/redis-and-its-c-implementation/
Redis Cluster data sharding
hash slot: there are 16384 hash slots in Redis Cluster.
Eg: Every node in a Redis Cluster is responsible of a subset of the hash slots.

Redis Cluster mater-slave model

Internal work
A,B,C nodes. B with foo inside

Client => A : Cluster Hints
A => Client: a map of hash slots with nodes
Client => B: get foo
B => Client: bar

Fault tolerance
A guesses B is failing, as the latest PING request timed out. A will not take any action without any other hint.

C sends a PONG to A. with the gossip section containing  about B; C also think B is failing.

At this point  A marks B as failed, and notifies the information to all the other nodes in the cluster, that will mark the node as failing.

If B will ever return back, the first time he will PING any nodes of the cluster. It will be notified to shut down ASAP. As intermitting clients are not good for the clients.

Note: only way to rejoin a Redis cluster after massive crash is : Redis-trib by hand

Redis Partitioning
Redis Cluster (internal system)
Twemproxy (mid layer)
Clients supporting consistent hashing (client side)
How Redis expires keys
Redis keys are expired in two ways: a passive way, and an active way.

A key is actively expired simply when some client tries to access it, and the key is found to be timed out.

Of course this is not enough as there are expired keys that will never be accessed again. This keys should be expired anyway, so periodically Redis test a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.

Specifically this is what Redis does 10 times per second:

Test 100 random keys from the set of keys with an associated expire.
Delete all the keys found expired.
If more than 25 keys were expired, start again from step 1.
This is a trivial probabilistic algorithm, basically the assumption is that our sample is representative of the whole key space, and we continue to expire until the percentage of keys that are likely to be expired is under 25%

This means that at any given moment the maximum amount of keys already expired that are using memory is at max equal to max amount of write operations per second divided by 4.
List
RPUSH Pushes the value onto the right end of the list
LRANGE Fetches a range of values from the list
LINDEX Fetches an item at a given position in the list
LPOP Pops the value from the left end of the list and returns it
The command RPOP removes and returns the last element of a List.

LPUSH books "Clean Code"
RPUSH books "Code Complete"
LLEN books
LINDEX books 1
LRANGE books 0 -1

Set
SADD Adds the item to the set
SMEMBERS Returns the entire set of items
SISMEMBER Checks if an item is in the set
SREM Removes the item from the set, if it exists

Hash
HSET Stores the value at the key in the hash
HGET Fetches the value at the given hash key
HGETALL Fetches the entire hash
HDEL Removes a key from the hash, if it exists

Sorted set
http://www.tutorialspoint.com/redis/redis_sorted_sets.htm
every member of a Sorted Set is associated with score, that is used in order to take the sorted set ordered, from the smallest to the greatest score.
In redis sorted set add, remove, and test for existence of members in O(1) 
ZADD tutorials 1 redis
ZRANGE tutorials 0 10 WITHSCORES

ZADD key score1 member1 [score2 member2]
ZCARD key
Get the number of members in a sorted set
ZCOUNT key min max
Count the members in a sorted set with scores within the given values

ZRANGE key start stop [WITHSCORES]
Return a range of members in a sorted set, by index
ZSCORE key member

ZADD Adds member with the given score to the ZSET
ZRANGE Fetches the items in the ZSET from their positions in sorted order
ZRANGEBYSCORE Fetches items in the ZSET based on a range of scores
ZREM Removes the item from the ZSET, if it exists
http://redis.io/topics/data-types-intro
EX seconds -- Set the specified expire time, in seconds.
PX milliseconds -- Set the specified expire time, in milliseconds.
NX -- Only set the key if it does not already exist.
XX -- Only set the key if it already exist.

SET mykey "Hello"
GET mykey
set mykey newval nx
(nil)
set mykey newval xx
OK

atomic increment:
set counter 100
> incr counter
incrby counter 50

the GETSET command sets a key to a new value, returning the old value as the result.

mset a 10 b 20 c 30
mget a b c
exists mykey
del mykey

Redis expires: keys with limited time to live
Basically you can set a timeout for a key, which is a limited time to live. When the time to live elapses, the key is automatically destroyed.
Information about expires are replicated and persisted on disk, the time virtually passes when your Redis server remains stopped (this means that Redis saves the date at which a key will expire).

expire key 5
set key 100 ex 10
The TTL command is called in order to check the remaining time to live for the key.
ttl key

to set and check expires in milliseconds, check the PEXPIRE and the PTTL commands.

Common use cases for lists
Remember the latest updates posted by users into a social network.
To describe a common use case step by step, imagine your home page shows the latest photos published in a photo sharing social network and you want to speedup access.
Every time a user posts a new photo, we add its ID into a list with LPUSH.
When users visit the home page, we use LRANGE 0 9 in order to get the latest 10 posted items.

Capped lists
LPUSH mylist <some element>
LTRIM mylist 0 999
The above LTRIM command tells Redis to take just list elements from index 0 to 999, everything else will be discarded.

Blocking operations on lists
BRPOP and BLPOP are versions of RPOP and LPOP able to block if the list is empty: they'll return to the caller only when a new element is added to the list, or when a user-specified timeout is reached.
brpop tasks 5

Redis Hashes

hmset user:1000 username antirez birthyear 1977 verified 1
hget user:1000 username
hget user:1000 birthyear
hgetall user:1000

hincrby user:1000 birthyear 10
The command HMSET sets multiple fields of the hash, while HGET retrieves a single field. HMGET is similar to HGET but returns an array of values:

Redis Sets
Redis Sets are unordered collections of strings. 
sadd myset 1 2 3
smembers myset
sismember myset 3

http://www.tutorialspoint.com/redis/redis_transactions.htm
Redis transactions allow the execution of a group of commands in a single step
MULTI
Mark the start of a transaction block
EXEC
Execute all commands issued after MULTI

The WATCH command is used to provide a check-and-set (CAS) behavior to Redis transactions. The WATCH command is used to make sure the transaction is executed only when other clients modify none of the watched keys. For example, let's say we want to remove the top player from a real-time leaderboard. We have to make sure the top player does not change while we are calculating the leader. The following set of commands will explain how the WATCH command works:

WATCH leaderboard element = ZREVRANGEBYSCORE leaderboard +inf -inf WITHSCORES
MULTI
ZREM leaderboard element
EXEC
If the leaderboard gets updated before we remove the item from the sorted set, the EXEC command is never executed.

http://www.programering.com/a/MDOykzNwATQ.html
SORT mylist
SORT mylist DESC
sort them lexicographically, use the ALPHA modifier:
SORT mylist ALPHA

SORT mylist LIMIT 0 10
SORT mylist LIMIT 0 5 ALPHA DESC

http://www.programering.com/a/MDOykzNwATQ.html
The use of an external key sort
the following code to sort the uid key according to the size of user_level_{uid} to sort:
sort uid by user_level_*
User_level_* is a placeholder, it must extract the values in the uid, and then use this value to find the corresponding key.
For example, in the sorted list of uid, the program will first take out the uid value of 1, 2, 3, 4, and then use the user_level_1, user_level_2, user_level_3 and user_level_4 values as weights of UID.


The get option
Use the get option, can remove the corresponding key according to the results of sequencing.
For example, the following code to sort uid, then take out the key value of user_name_{uid}:
sort uid get user_name_*

To obtain more external key
sort uid get user_level_* get user_name_*

# to get the sort key values.
sort uid get # get user_level_* get user_name_*

To obtain external key, but not the sort
By applying a non-existent key as a parameter passed to the by option, can let sort skip the sort operation, return the result:
sort uid by not-exists-key
sort uid by not-exists-key get # get user_level_* get user_name_*

KEYS * will list all keys stored in redis.
KEYS *o*
KEYS t??

Scan
http://redis.io/commands/scan
SCAN is a cursor based iterator. This means that at every call of the command, the server returns an updated cursor that the user needs to use as the cursor argument in the next call.
An iteration starts when the cursor is set to 0, and terminates when the cursor returned by the server is 0.

SCAN return value is an array of two values: the first value is the new cursor to use in the next call, the second value is an array of elements.


scan 0 MATCH h*
MATCH filter is applied after elements are retrieved from the collection, just before returning data to the client. This means that if the pattern matches very little elements inside the collection,SCAN will likely return no elements in most iterations.

where a COUNT of 1000 was used in order to force the command to do more scanning for that iteration.
scan 176 MATCH *11* COUNT 1000

hmset hash name Jack age 33
hscan hash 0
https://redislabs.com/blog/5-key-takeaways-for-developing-with-redis
http://redis.io/topics/memory-optimization
1. keep track of your keys

Using a proper naming methodology for your keys can make housekeeping of your database much easier.
keep all the identifiers for a given data item in a Redis set. This ensures that a cleanup procedure invoked after deletion from the primary data store only needs to iterate through that set's contents in order to remove all the relevant copies and related tidbits (including the set itself upon completion).

2. keep track of the length of your key names
3. use the right data structures
Instead of storing your data in thousands (or millions) of independent string values, consider grouping related data with the hash data structure. Hashes are very efficient and can decrease your memory usage (plus they offer the added value of abstracting some of the details and making your code more readable).
--bitmaps, or bitsets

4. use scan, never use keys
SSCAN, HSCAN and ZSCAN allow you to iterate the contents of sets, hashes and sorted sets (respectively).
5. use server-side lua scripts
https://miafish.wordpress.com/2015/03/05/redis-and-its-c-implementation/

Redis Cluster mater-slave model

when the cluster is created (or at a latter time) we add a slave node to every master, so that the final cluster is composed of A, B, C that are masters, and A1, B1, C1 that are slaves, the system is able to continue if node B fails(B1 will become master and has copy data of B).

Redis Partitioning

  • Redis Cluster (internal system)
  • Twemproxy (mid layer)
  • Clients supporting consistent hashing (client side)

How Redis expires keys

Redis keys are expired in two ways: a passive way, and an active way.
A key is actively expired simply when some client tries to access it, and the key is found to be timed out.
Of course this is not enough as there are expired keys that will never be accessed again. This keys should be expired anyway, so periodically Redis test a few keys at random among keys with an expire set. All the keys that are already expired are deleted from the keyspace.
Specifically this is what Redis does 10 times per second:
  1. Test 100 random keys from the set of keys with an associated expire.
  2. Delete all the keys found expired.
  3. If more than 25 keys were expired, start again from step 1.
This is a trivial probabilistic algorithm, basically the assumption is that our sample is representative of the whole key space, and we continue to expire until the percentage of keys that are likely to be expired is under 25%
This means that at any given moment the maximum amount of keys already expired that are using memory is at max equal to max amount of write operations per second divided by 4.
Redis作者:深度剖析Redis持久化
数据损坏
所谓数据损坏,就是数据无法恢复,上面我们讲的都是如何保证数据是确实写到磁盘上去,但是写到磁盘上可能并不意味着数据不会损坏。比如我们可能一次写请求会进行两次不同的写操作,当意外发生时,可能会导致一次写操作安全完成,但是另一次还没有进行。如果数据库的数据文件结构组织不合理,可能就会导致数据完全不能恢复的状况出现。
这里通常也有三种策略来组织数据,以防止数据文件损坏到无法恢复的情况
  • 第一种是最粗糙的处理,就是不通过数据的组织形式保证数据的可恢复性。而是通过配置数据同步备份的方式,在数据文件损坏后通过数据备份来进行恢复。实际上MongoDB在不开启操作日志,通过配置Replica Sets时就是这种情况。
  • 另一种是在上面基础上添加一个操作日志,每次操作时记一下操作的行为,这样我们可以通过操作日志来进行数据恢复。因为操作日志是顺序追加的方式写的,所以不会出现操作日志也无法恢复的情况。这也类似于MongoDB开启了操作日志的情况。
  • 更保险的做法是数据库不进行旧数据的修改,只是以追加方式去完成写操作,这样数据本身就是一份日志,这样就永远不会出现数据无法恢复的情况了。实际上CouchDB就是此做法的优秀范例。
1、Redis的第一个持久化策略:RDB快照
Redis支持将当前数据的快照存成一个数据文件的持久化机制。而一个持续写入的数据库如何生成快照呢。Redis借助了fork命令的copy on write机制。在生成快照时,将当前进程fork出一个子进程,然后在子进程中循环所有的数据,将数据写成为RDB文件。
我们可以通过Redis的save指令来配置RDB快照生成的时机,比如你可以配置当10分钟以内有100次写入就生成快照,也可以配置当1小时内有1000次写入就生成快照,也可以多个规则一起实施。这些规则的定义就在Redis的配置文件中,你也可以通过Redis的CONFIG SET命令在Redis运行时设置规则,不需要重启Redis。
Redis的RDB文件不会坏掉,因为其写操作是在一个新进程中进行的,当生成一个新的RDB文件时,Redis生成的子进程会先将数据写到一个临时文件中,然后通过原子性rename系统调用将临时文件重命名为RDB文件,这样在任何时候出现故障,Redis的RDB文件都总是可用的。
同时,Redis的RDB文件也是Redis主从同步内部实现中的一环。
但是,我们可以很明显的看到,RDB有它的不足,就是一旦数据库出现问题,那么我们的RDB文件中保存的数据并不是全新的
2、Redis的第二个持久化策略:AOF日志
AOF日志的全称是Append Only File,从名字上我们就能看出来,它是一个追加写入的日志文件。与一般数据库不同的是,AOF文件是可识别的纯文本,它的内容就是一个个的Redis标准命令
AOF重写
你可以会想,每一条写命令都生成一条日志,那么AOF文件是不是会很大?答案是肯定的,AOF文件会越来越大,所以Redis又提供了一个功能,叫做AOF rewrite。其功能就是重新生成一份AOF文件,新的AOF文件中一条记录的操作只会有一次,而不像一份老文件那样,可能记录了对同一个值的多次操作。其生成过程和RDB类似,也是fork一个进程,直接遍历数据,写入新的AOF临时文件。在写入新文件的过程中,所有的写操作日志还是会写到原来老的 AOF文件中,同时还会记录在内存缓冲区中。当重完操作完成后,会将所有缓冲区中的日志一次性写入到临时文件中。然后调用原子性的rename命令用新的 AOF文件取代老的AOF文件。
http://blog.huangz.me/diary/2015/comparison-of-redis-and-memcached.html
对比项目MemcachedRedis
支持的数据结构
  • 字符串(二进制安全,可直接储存字节数据)
  • 字符串(二进制安全,可直接储存字节数据)
  • 散列
  • 列表
  • 集合
  • 有序集合
  • 位图(bitmap)
  • 地理位置(GEO)
  • HyperLogLog
单机附加功能
  • 自动过期
  • 流水线
  • 自动过期
  • 流水线
  • 事务
  • Lua 脚本
  • 发布与订阅
  • 键空间通知
  • AOF 持久化
  • RDB 持久化
多机附加功能
  • 由客户端实现的,基于分片的集群(无Sentinel或复制)
  • 通过 twemproxy 实现分片
  • 由服务器端实现的,基于分片的集群,自带Sentinel和复制
  • 复制(无需集群,可独立运作)
  • Sentinel(无需集群,可独立运作)
  • twemproxy、codis、redis-cerberus 等多种第三方代理可选
内存分配方式slabjemalloc
网络模型使用多个线程处理多个客户端,使用锁对线程进行同步单线程,通过 I/O 多路复用来处理多个客户端
http://antirez.com/news/94
http://dbarobin.com/2015/11/09/scale-out-of-redis-with-twemproxy/

http://blog.csdn.net/chengyabingfeiqi/article/details/49794085
redis支持四种持久化方式,一是 Snapshotting(快照)也是默认方式;二是Append-only file(缩写aof)的方式;三是虚拟内存方式;四是diskstore方式。下面分别介绍之。
(一)Snapshotting
       快照是默认的持久化方式。这种方式是就是将内存中数据以快照的方式写入到二进制文件中,默认的文件名为dump.rdb。可以通过配置设置自动做快照持久化的方式。我们可以配置redisn秒内如果超过mkey被修改就自动做快照
       1. redis调用fork,现在有了子进程和父进程。
       2. 父进程继续处理client请求,子进程负责将内存内容写入到临时文件。由于os的写时复制机制(copy on write)父子进程会共享相同的物理页面,当父进程处理写请求时os会为父进程要修改的页面创建副本,而不是写共享的页面。所以子进程的地址空间内的数据是fork时刻整个数据库的一个快照。
       3. 当子进程将快照写入临时文件完毕后,用临时文件替换原来的快照文件,然后子进程退出(fork一个进程入内在也被复制了,即内存会是原来的两倍)。

       client 也可以使用save或者bgsave命令通知redis做一次快照持久化。save操作是在主线程中保存快照的,由于redis是用一个主线程来处理所有 client的请求,这种方式会阻塞所有client请求。所以不推荐使用。另一点需要注意的是,每次快照持久化都是将内存数据完整写入到磁盘一次,并不是增量的只同步脏数据。如果数据量大的话,而且写操作比较多,必然会引起大量的磁盘io操作,可能会严重影响性能。
       另外由于快照方式是在一定间隔时间做一次的,所以如果redis意外down掉的话,就会丢失最后一次快照后的所有修改。如果应用要求不能丢失任何修改的话,可以采用aof持久化方式。下面介绍:
(二)Append-only file
aof 比快照方式有更好的持久化性,是由于在使用aof持久化方式时,redis会将每一个收到的写命令都通过write函数追加到文件中(默认是appendonly.aof)。当redis重启时会通过重新执行文件中保存的写命令来在内存中重建整个数据库的内容。当然由于os会在内核中缓存 write做的修改,所以可能不是立即写到磁盘上。这样aof方式的持久化也还是有可能会丢失部分修改。不过我们可以通过配置文件告诉redis我们想要通过fsync函数强制os写入到磁盘的时机。有三种方式如下(默认是:每秒fsync一次):
appendonly yes           #启用aof持久化方式
# appendfsync always   #每次收到写命令就立即强制写入磁盘,最慢的,但是保证完全的持久化,不推荐使用appendfsync everysec     #每秒钟强制写入磁盘一次,在性能和持久化方面做了很好的折中,推荐
# appendfsync no    #完全依赖os,性能最好,持久化没保证
aof 的方式也同时带来了另一个问题。持久化文件会变的越来越大。例如我们调用incr test命令100次,文件中必须保存全部的100条命令,其实有99条都是多余的。因为要恢复数据库的状态其实文件中保存一条set test 100就够了。为了压缩aof的持久化文件。redis提供了bgrewriteaof命令。收到此命令redis将使用与快照类似的方式将内存中的数据以命令的方式保存到临时文件中,最后替换原来的文件。具体过程如下:

       1.  redis调用fork ,现在有父子两个进程
       2. 子进程根据内存中的数据库快照,往临时文件中写入重建数据库状态的命令
       3. 父进程继续处理client请求,除了把写命令写入到原来的aof文件中。同时把收到的写命令缓存起来。这样就能保证如果子进程重写失败的话并不会出问题。
       4. 当子进程把快照内容写入已命令方式写到临时文件中后,子进程发信号通知父进程。然后父进程把缓存的写命令也写入到临时文件。
       5. 现在父进程可以使用临时文件替换老的aof文件,并重命名,后面收到的写命令也开始往新的aof文件中追加。

       需要注意到是重写aof文件的操作,并没有读取旧的aof文件,而是将整个内存中的数据库内容用命令的方式重写了一个新的aof文件,这点和快照有点类似。
http://carlosfu.iteye.com/blog/2254154
http://www.zenlife.tk/redis.html

双hash表

rehash时为了不影响正常服务而提供
增量rehash
计时   
while(rehash 100个)   
        计时,如果时间超了就break   

事件模型

异步单线程事件驱动。
优点是可以省掉线程切换的开销,缺点是没法充分利用多核硬件优势。
因此redis适合于并发量高但一次请求非常短的应用场景。
aeMain函数中
while(服务没停)
给机会执行事件处理之前的一些东西
事件处理aeProcessEvents
aeProcessEvents函数中调用aeApiPoll()
aeApiPoll函数中是调用操作系统的epoll,注意设置超时的,这样aeProcessEvents就可以返回

支持多种数据结构

key-value中的value不仅可以是常规的string,也可以是一个链表,或哈希表
还支持集合,用skiplist实现的

持久化

快照和aof方式
aof方式实现:
  1. Redis通过fork一个子进程,遍历数据,写入新临时文件
  2. 父进程继续处理client请求,子进程继续写临时文件
  3. 父进程把新写入的aof写在缓冲区
  4. 子进程写完退出,父进程接到退出消息,将缓冲区aof写入临时文件
  5. 临时文件签名为appendonly.aof,原来文件被覆盖,整个过程完成
aof创建是指定的flags是open(filename, O_WRONLY|O_APPEND|O_CREATE, 0644)
redis对象的反序列化是通过从对象解析出命令实现的,反序列化成命令
主从
Redis架构
Redis是一个Client/Server模式的系统
Redis服务端是一个进程,该进程通过TCP协议和客户端进行会话。
http://huoding.com/2015/09/14/463
在 Redis 里,所谓 SETNX,是「SET if Not eXists」的缩写,也就是只有不存在的时候才设置,可以利用它来实现锁的效果,不过很多人没有意识到 SETNX 有陷阱!
比如说:某个查询数据库的接口,因为调用量比较大,所以加了缓存,并设定缓存过期后刷新,问题是当并发量比较大的时候,如果没有锁机制,那么缓存过期的瞬间,大量并发请求会穿透缓存直接查询数据库,造成雪崩效应,如果有锁机制,那么就可以控制只有一个请求去更新缓存,其它的请求视情况要么等待,要么使用过期的缓存。
下面以目前 PHP 社区里最流行的 PHPRedis 扩展为例,实现一段演示代码:
<?php

$ok = $redis->setNX($key, $value);

if ($ok) {
    $cache->update();
    $redis->del($key);
}

?>
缓存过期时,通过 SetNX  获取锁,如果成功了,那么更新缓存,然后删除锁。看上去逻辑非常简单,可惜有问题:如果请求执行因为某些原因意外退出了,导致创建了锁但是没有删除锁,那么这个锁将一直存在,以至于以后缓存再也得不到更新。于是乎我们需要给锁加一个过期时间以防不测:
<?php

$redis->multi();
$redis->setNX($key, $value);
$redis->expire($key, $ttl);
$redis->exec();

?>
因为 SetNX 不具备设置过期时间的功能,所以我们需要借助 Expire 来设置,同时我们需要把两者用 Multi/Exec 包裹起来以确保请求的原子性,以免 SetNX 成功了 Expire 却失败了。 可惜还有问题:当多个请求到达时,虽然只有一个请求的 SetNX 可以成功,但是任何一个请求的 Expire 却都可以成功,如此就意味着即便获取不到锁,也可以刷新过期时间,如果请求比较密集的话,那么过期时间会一直被刷新,导致锁一直有效。于是乎我们需要在保证原子性的同时,有条件的执行 Expire,接着便有了如下 Lua 代码:
local key   = KEYS[1]
local value = KEYS[2]
local ttl   = KEYS[3]

local ok = redis.call('setnx', key, value)
 
if ok == 1 then
  redis.call('expire', key, ttl)
end
 
return ok
没想到实现一个看起来很简单的功能还要用到 Lua 脚本,着实有些麻烦。其实 Redis 已经考虑到了大家的疾苦,从 2.6.12 起,SET 涵盖了 SETEX 的功能,并且 SET 本身已经包含了设置过期时间的功能,也就是说,我们前面需要的功能只用 SET 就可以实现。
<?php

$ok = $redis->set($key, $value, array('nx', 'ex' => $ttl));

if ($ok) {
    $cache->update();
    $redis->del($key);
}

?>
如上代码是完美的吗?答案是还差一点!设想一下,如果一个请求更新缓存的时间比较长,甚至比锁的有效期还要长,导致在缓存更新过程中,锁就失效了,此时另一个请求会获取锁,但前一个请求在缓存更新完毕的时候,如果不加以判断直接删除锁,就会出现误删除其它请求创建的锁的情况,所以我们在创建锁的时候需要引入一个随机值:
<?php

$ok = $redis->set($key, $random, array('nx', 'ex' => $ttl));

if ($ok) {
    $cache->update();

    if ($redis->get($key) == $random) {
        $redis->del($key);
    }
}

?>
如此基本实现了单机锁,假如要实现分布锁,请参考:Distributed locks with Redis,这里就不深入讨论了,总结:避免掉入 SETNX 陷阱的最好方法就是永远不要使用它!

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts