Monday, August 31, 2015

DynamoDB Internal



A good example to choose a hash and range type primary key for the Person table would be choosing birth years as the hash key and SSN as the range key.

If we don't create secondary indexes, the only option to get the item for a certain non-primary key attribute is to scan the complete table.

you need to create secondary indexes at the time of table creation itself, and you cannot add indexes to already existing tables. Also, DynamoDB does not allow editing or deleting indexes from a given table.

Local secondary index
local secondary indexes give you more range query options other than your table range key attribute. So to define the local secondary index, we can say that it is an index which has the same hash key as a table but a different range key.

A local secondary index must have both hash and range keys.
The hash key of the local secondary index is the same as that of the table.
The local secondary index allows you to query data from a single partition only, which is specified by a hash key. As we know DynamoDB creates partitions for unique hash values, the local secondary index allows us to query non-key attributes for a specific hash key.
The local secondary index supports both eventual and strong consistency, so while querying data, you can choose whichever is suitable.

Global secondary index
The global name suggests a query search on all table partitions compared to a single partition in the case of the local secondary index. Here, we can create a new hash key and an optional range key, which is different than the table hash and range keys to get the index working.

The global secondary index should have a hash key and an optional range key.
The hash and range keys of a global secondary index are different from table hash and range keys.
The global secondary index allows you to query data across the table. It does not restrict its search for a single data partition; hence, the name global.
The global secondary index eventually supports only consistent reads.
The global secondary index maintains its separate read and write capacity units, and it does not take read and write capacity units from the table capacity units.
Unlike the local secondary index, global ones do not have any size limits.


Data types
Scalar data types (string, number, and binary)
Multivalued data types (string set, number set, and binary set)


Items in DynamoDB are simply collections of attributes. Attributes can be in the form of strings, numbers, binaries, or a set of scalar attributes. Each attribute consists of a name and a value. An item must have a primary key. A primary key can have a hash key or a combination of hash and range keys. In addition to the primary key, items can have any number of attributes except for the fact that item size cannot exceed 64 KB.

Parameter
Local secondary index
Global secondary index
Hash and range keys
Needs both hash and range keys. The index hash key is the same as the table hash key.
Needs hash key and optional range key. Index hash and range keys are different than those of table keys.
Query scope
Limited to single partition data only.
Queries over complete table data, which means you can also query on other hash keys that are not part of table hash keys.
Consistency
Provides option to select either eventual or strong consistency.
Supports only eventual consistency.
Size
The size of all items together for a single index should be less than 10 GB per hash key.
No size limit.
Provisioned throughput
Uses the same provisioned throughput as specified for a table.
Has a different calculation for provisioned throughput. We need to specify the read and write capacity units at the time of index creation itself.
Strong versus eventual consistency

Conditional writes
AWS supports atomic counters, which can be used to increment or decrement the value as and when needed. This is a special feature that handles the data increment and decrement request in the order they are received. To implement atomic counters, you can use the ADD operation of the UpdateItem API. A good use case to implement atomic counters is website visitor count.

Item size
One good practice to reduce the item size is to reduce the size of the attribute name/length. For example, instead of having the attribute name as yearOfPublishing, you should use the acronym yop.


Query versus scan

From a performance point of view, a query is much more efficient than the scan operation, as a query works on a limited set of items, while the scan operation churns the entire table data. The scan operation first returns the entire table data and then applies filters on it to remove the unnecessary data from the result set. So, it's obvious that as your table data grows, the scan operation would take more time to give back the results.

The query operation's performance is totally dependent on the amount of data retrieved. The number of matching keys for a given search criteria decides the query performance. If a specific hash key has more matching range keys than the size limit of 1 MB, then you can use pagination where an ExclusiveStartKey parameter allows you to continue your search from the last retrieved key by an earlier request. You need to submit a new query request for this.

Query results can be either eventually consistent or optionally strongly consistent, while scan results are eventually consistent only

Pagination
DynamoDB provides us with two useful parameters, LastEvaluatedKey and ExclusiveStartKey, which allow us to fetch results in pages. If a query or scan result reaches the maximum size limit of 1 MB, then you can put the next query by setting ExclusiveStartKey derived from LastEvaluatedKey. When DynamoDB reaches the end of search results, it puts LastEvaluatedKey as null.

Parallel scan

It uses a consistent hashing algorithm to achieve uniform data partitioning. Object versioning is done for consistency. The quorum technique is used to maintain consistency amongst the replicas.


DynamoDB does not have any master node or superior node that would control everything. Rather, it maintains a ring structure of nodes, where each node would have equal responsibilities to perform.

DynamoDB does not have any master node or superior node that would control everything. Rather, it maintains a ring structure of nodes, where each node would have equal responsibilities to perform.

Design features

Data replication
replicas would be updated asynchronously by a background process.

Conflict resolution
it uses an always writable strategy, allowing writes all the time. This is a crucial strategy from Amazon's business point of view, as they don't want people to wait for some write to happen until the conflict is resolved.

When it comes to the database resolving conflicts, it prefers to use the last write wins strategy. In the case of Amazon, you are given the choice to choose your own conflict resolution by providing features such as conditional writes.

Scalability
Symmetry
Flexibility

Load balancing

The coordinator node first keeps the key on the local machine and then replicates it to N minus 1 successor machine in the clockwise direction

Handling failures
Temporary failures  ==> hinted handoff

Permanent failures  ==> Merkle tree

Merkle tree
DynamoDB uses Merkle tree to maintain the replica synchronization. Comparing all existing replicas and updating replicas with the latest changes is called AntiEntropy.

A Merkle tree is an algorithm used to store and compare objects.
In a Merkle tree, the root node contains the hash of all children, and if the hash values of the root nodes of two trees are the same, then it means those two trees are equal.

In the case of DynamoDB, we create a Merkle tree of the replica on each node and compare them. If the root hashes of trees are the same, then it means the replicas are in sync, whereas if the root hash is not the same, then that means that the replicas are out of sync, and then you can compare the next child nodes and find out the actual discrepancy.

Each DynamoDB node maintains a Merkle tree for each and every key range it has. Doing this, it allows DynamoDB to check whether certain key ranges are in sync or not. 
If it finds any discrepancy, then child-wise traversal is done to find the cause of the discrepancy

This technique of replica synchronization is the same in Cassandra and Riak as well.

Seed nodes
DynamoDB keeps seed nodes that would have static information about the cluster. Some nodes from the cluster play the role of seed nodes. Seed nodes have all the information about the membership, as the information is derived from an external service. All nodes ultimately reconcile the membership information with seed nodes

Each DynamoDB node consists of the following components:
Request coordinator

Membership and failure detection

Local persistent store (storage engine)
DynamoDB's local persistence store is a pluggable system where you can select the storage depending upon the application use. Generally, DynamoDB uses Berkeley DB or MySQL as the local persistence store. Berkeley DB is a high-performance embedded database library used for key-value pair type of storage. The storage is selected depending upon the application object size. Berkeley DB is used where object size has a maximum limit of 10 KB, while MySQL is used for application object size expected to be on the higher side.

To avoid creating hot partitions in DynamoDB, it is recommended to append a random number to the hash key.

Managing time series data
You can choose the order ID as the hash key and the date/time as the range. - not good
it is recommended to create tables based on time range, which means creating a new table for each week or month instead of saving all data in the table. This strategy helps avoid the creation of any hot or cold partitions. You can simply query data for a particular time range table itself. This strategy also helps when you need to purge data where you can simply drop the tables you don't wish to see any more. Alternatively, you can simply dump that data on AWS S3, as flat files, which is a cheap data storage service from Amazon.

Storing large attribute values
Using compressions

Implementing one-to-many relationship
Table – UserCreds
Table – User

Indexing should be done for the tables that do not get heavy writes as maintaining those indexes is quite costly.
If a required attribute is not projected in the index, then DynamoDB fetches that attribute from the table, which consumes more read capacity units.

Purging of unnecessary items from the table: This would also delete the respective items from the index as well.
Updating unnecessary items to remove the projected attributes from the index: This will automatically reduce the size of item collection.
Backup or moving old data to a new table: It is always a good practice to save historical data in a different table.

AWS supports Identity and Access Management (IAM) as a service.

There is a limit of five local and five global secondary indexes per table.

There is a maximum limit of 20 attributes on a local secondary index that is created by the user.
For US East Region, a table can scale up to 40,000 read or write capacity units, and for the rest of regions, DynamoDB tables can scale up to 10000 read/write capacity units per table.

http://stackoverflow.com/questions/9752262/dynamodb-database-design-key-value-store-nosql
https://aws.amazon.com/blogs/aws/now-available-global-secondary-indexes-for-amazon-dynamodb/
Each table has a specified attribute called a hash key. An additional range key attribute can also be specified for the table. The hash key and optional range key attribute(s) define the primary index for the table, and each item is uniquely identified by its hash key and range key (if defined). Items contain an arbitrary number of attribute name-value pairs, constrained only by the maximum item size limit. In the absence of indexes, item lookups require the hash key of the primary index to be specified.

The Local and Global Index models extend the basic indexing functionality provided by DynamoDB. Lets consider some use cases for each model:
  • Local Secondary Indexes are always queried with respect to the table’s hash key, combined with the range key specified for that index.  In effect (as commenter Stuart Marshall made clear on the preannouncement post), Local Secondary Indexes provide alternate range keys. For example, you could have an Order History table with a hash key of customer id, a primary range key of order date, and a secondary index range key on order destination city. You can use a Local Secondary Index to find all orders delivered to a particular city using a simple query for a given customer id.
  • Global Secondary Indexes can be created with a hash key different from the primary index; a single Global Secondary Index hash key can contain items with different primary index hash keys. In the Order History table example, you can create a global index on zip code, so that you can find all orders delivered to a particular zip code across all customers. Global Secondary Indexes allow you to retrieve items based on any desired attribute.
Both Global and Local Secondary Indexes allow multiple items for the same secondary key value. 
Local Secondary Indexes support strongly consistent reads, allow projected and non-projected attributes to be retrieved via queries and share provisioned throughput capacity with the associated table. Local Secondary Indexes also have the additional constraint that the total size of data for a single hash key is currently limited to 10 gigabytes.
Global Secondary Indexes are eventually consistent, allow only projected attributes to be retrieved via queries, and have their own provisioned throughput specified separately from the associated table.
As I noted earlier, each Global Secondary Index has its own provisioned throughput capacity. By combining this feature with the ability to project selected attributes into an index, you can design your table and its indexes to support your application’s unique access patterns, while also tuning your costs. If your table is “wide” (lots of attributes) and an interesting and frequently used query requires a small subset of the attributes, consider projecting those attributes into a Global Secondary Index. This will allow the frequently accessed attributes to be fetched without expending read throughput on unnecessary attributes.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts