Wednesday, March 28, 2018

Redis Misc 2



https://redislabs.com/redis-enterprise-documentation/concepts-architecture/high-availability/clustering/

https://www.codeproject.com/Articles/1135870/Hash-Tagging-Redis-Keys-in-a-Clustered-Environment
At the same time, we also need to understand that, clustering implies some limitations on the way we use Redis keys (these limitations are very logical though).
  1. Transaction cannot be performed on keys which are part of different range of hash slots.
  2. Multi key operations cannot be performed on keys which are part of different range of hash slots.
For example, suppose there are two keys "key1" and "key2". key1 is mapped to hash slot 5500, thus, is stored in Node A. key2 is mapped to hash slot 5501, thus, is stored in Node B.
So, we cannot perform transaction on those keys. Nor we can perform multi key operations on key1 and key2. Multi key operation like "mget key1 key2" will throw exception.

A simple answer is, by ensuring that the keys on which we perform multi-key operation or transaction, are part of same hash slot range. And we ensure this by "Hash Tagging" Redis keys.
Hash Tags are a way to ensure that multiple keys are allocated in the same hash slot. There is an exception in the computation of hash slots which is used in implementing Hash Tags.
In order to implement hash tags, the hash slot for a key is computed in a slightly different way in certain conditions. If the key contains a "{...}" pattern, only the substring between { and } is hashed in order to obtain the hash slot. However, since it is possible that there are multiple occurrences of { or }, the algorithm is well specified by the following rules:
  • IF the key contains a { character.
  • AND IF there is a } character to the right of {
  • AND IF there are one or more characters between the first occurrence of { and the first occurrence of }
Then instead of hashing the key, only what is between the first occurrence of { and the following first occurrence of } is hashed. Let's have a look at the following examples:
  1. The two keys {user1000}.following and {user1000}.followers will hash to the same hash slot since only the substring user1000 will be hashed in order to compute the hash slot.
  2. For the key foo{}{bar}, the whole key will be hashed as usually since the first occurrence of { is followed by } on the right without characters in the middle.
  3. For the key foo{{bar}}zap the substring {bar will be hashed, because it is the substring between the first occurrence of { and the first occurrence of } on its right.
  4. For the key foo{bar}{zap} the substring bar will be hashed, since the algorithm stops at the first valid or invalid (without bytes inside) match of { and }.
  5. What follows from the algorithm is that if the key starts with {}, it is guaranteed to be hashed as a whole. This is useful when using binary data as key names.
http://rndblog.github.io/nosql/architecture/2015/09/13/notes-on-nosql-redis.html
Query routing means that you can send your query to a random instance, and the instance will make sure to forward your query to the right node.
Redis Cluster implements an hybrid form of query routing, with the help of the client (the request is not directly forwarded from a Redis instance to another, but the client gets redirected to the right node).
Redis cluster divides keyspace onto 16384 hash slots (CRC 16 mod 16384) and assign range to every node, so every node in a Redis Cluster is responsible of a subset of the hash slots. Move of the hash slot is automatic.
Redis Cluster supports multiple key operations as long as all the keys involved into a single command execution (or whole transaction, or Lua script execution) all belong to the same hash slot. The user can force multiple keys to be part of the same hash slot by using a concept called hash tags.
Master and slave roles are pre-configured, also this is possible to configure replica to serve particular master.
As well as for Sentinel, slave may become a master. In this case part of the changes will be lost due to eventual consistency.
In a Redis Cluster replica may be assigned to the master automatically and automatically migrated.
Split-brain is prevented by majority principle.
Until fail this will be a strong (linear, due to single master and single thread) view consistency for master and eventual consistency for replicas. In case of failover - eventual consistency with data loss.
There is no rack-aware replication inside Redis, but Redis Labs declared rack-awareness (as well as HA) in their product, Redis Labs Enterprise Cluster, which is based on Redis.
Redis provides asynchronous persistence to files:
  • Snapshot (RDB)
  • AOF (Append-only file) operation log
RDB file is a compact snapshot of the storage which is created automatically and good for backup, disaster recovery and quick restart.
Save of snapshot may be configured by timeout or by changes count.
In the same time, creating of the snapshot is a heavy operation (full dump) which should not be invoked too frequently, so in case of node fail, RDF can be old.
AOF is a command log of every write operation received by the server, to be replayed on start-up, reconstructing original dataset.
Commands are logged using the same format as the Redis protocol itself, in an append-only fashion.
Redis is able to rewrite the log on background when it gets too big. Nevertheless this means that re-play may take significantly more time than load from RDF.
Also in case of failure, AOF file may be incomplete and data after last filesync may be damaged and ignored.
AOF is updated on every write operation, but file sync is triggered depending on the setting:
  • fsync every time a new command is appended to the AOF. Very very slow, very safe.
  • fsync every second. Fast enough (in 2.4 likely to be as fast as snapshotting), and means that you can lose 1 second of data if there is a disaster.
  • Never fsync, just put your data in the hands of the Operating System. The faster and less safe method.
So, as well as in almost all NoSQL solutions, persistence is asynchronous (tunable, but dramatically affects performance). But in other NoSQL solution this is compensated by synchronous replication.
This means that Redis can’t guarantee durability under any cluster schemas (Sentinel or cluster).
Redis does not support encryption. In order to implement setups where trusted parties can access a Redis instance over the internet or other untrusted networks, an additional layer of protection should be implemented, such as an SSL proxy or Slipped.
Redis is designed to be accessed by trusted clients inside trusted environments. This means that usually it is not a good idea to expose the Redis instance directly to the internet.
In the same time Redis:
  • can bind specific interface
  • have tiny layer of authentication (password, pre-defined in redis.conf configured file)
  • can be configured to disable or rename particular commands in redis.conf
https://www.inovex.de/blog/redis-cluster-partitioning/
  1. Client writes to master B.
  2. Master B replies OK to the client.
  3. Master B propagates the writes to its slave B1.
There is now acknowledgement from B1, before master B gives its OK to the client. In the worst case the following happens:
  • B accepts a write from the client and gives its OK.
  • B crashes before the write is replicated to the slave B1.
In this case the write is lost forever. This is very similar to what happens with most databases that flush data to disk every second.
https://redis.io/topics/partitioning

Redis Cluster is a mix between query routing and client side partitioning.

Twemproxy supports automatic partitioning among multiple Redis instances, with optional node ejection if a node is not available (this will change the keys-instances map, so you should use this feature only if you are using Redis as a cache).
It is not a single point of failure since you can start multiple proxies and instruct your clients to connect to the first that accepts the connection.

Presharding

We learned that a problem with partitioning is that, unless we are using Redis as a cache, to add and remove nodes can be tricky, and it is much simpler to use a fixed keys-instances map.
However the data storage needs may vary over the time. Today I can live with 10 Redis nodes (instances), but tomorrow I may need 50 nodes.
Since Redis is extremely small footprint and lightweight (a spare instance uses 1 MB of memory), a simple approach to this problem is to start with a lot of instances since the start. Even if you start with just one server, you can decide to live in a distributed world since your first day, and run multiple Redis instances in your single server, using partitioning.
And you can select this number of instances to be quite big since the start. For example, 32 or 64 instances could do the trick for most users, and will provide enough room for growth.
In this way as your data storage needs increase and you need more Redis servers, what to do is to simply move instances from one server to another. Once you add the first additional server, you will need to move half of the Redis instances from the first server to the second, and so forth.

ZADD KEY_NAME SCORE1 VALUE1.. SCOREN VALUEN
ZREVRANGE key min max

Executing batches of commands using redis cli
./redis-cli < temp.redisCmds

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts