Friday, July 17, 2015

Dynamodb Miscs



https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_HowItWorks.html
global table is a collection of one or more replica tables, all owned by a single AWS account.
replica table (or replica, for short) is a single DynamoDB table that functions as a part of a global table. Each replica stores the same set of data items. Any given global table can only have one replica table per region.

Conflicts can arise if applications update the same item in different regions at about the same time. To ensure eventual consistency, DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates, where DynamoDB makes a best effort to determine the last writer. With this conflict resolution mechanism, all of the replicas will agree on the latest update, and converge toward a state in which they all have identical data.


https://stackoverflow.com/questions/47329936/dynamodb-conflict-resolution-strategy
DynamoDB, for GetItem queries, allows both eventual consistent and strongly consistent reads, configurable with a parameter on the request (as described in the docs here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html). For strongly consistent reads the value returned is the most recent value at the time the query was executed. For eventual consistent reads it is possible to read a slightly out of date version of an item but there is no "conflict resolution" per se.
You may be thinking about conditional updates which allow for requests to fail if an expected condition is not met at the time the query is executed.
https://aws.amazon.com/blogs/developer/performing-conditional-writes-using-the-amazon-dynamodb-transaction-library/
The DynamoDB transaction library provides a convenient way to perform atomic reads and writes across multiple DynamoDB items and tables. The library does all of the nuanced item locking, commits, applies, and rollbacks for you, so that you don’t have to worry about building your own state machines or other schemes to make sure that writes eventually happen across multiple items. 
dynamodb.updateItem(
  new UpdateItemRequest()
    .withTableName("Games")
    .addKeyEntry("GameId", new AttributeValue("cf3df"))
    .addAttributeUpdatesEntry("Top-Left", 
      new AttributeValueUpdate(new AttributeValue("X"), AttributeAction.PUT))
    .addAttributeUpdatesEntry("Turn",
      new AttributeValueUpdate(new AttributeValue("Alice"), AttributeAction.PUT))
    .addExpectedEntry("Turn", new ExpectedAttributeValue(new AttributeValue("Bob"))) // A condition to ensure it's still Bob's turn
    .addExpectedEntry("Top-Left", new ExpectedAttributeValue(false)));               // A condition to ensure the Top-Left hasn't been played
Conditional writes with the transaction library
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.Methods.html


Map<String, List<Object>> items = mapper.batchLoad(itemsToGet);


If the partition key and sort key are of type String, and annotated with @DynamoDBAutoGeneratedKey, then they are given a random universally unique identifier (UUID) if left uninitialized. Version fields annotated with@DynamoDBVersionAttribute will be incremented by one.

By default, only attributes corresponding to mapped class properties are updated; any additional existing attributes on an item are unaffected. However, if you specify SaveBehavior.CLOBBER, you can force the item to be completely overwritten.
mapper.save(obj, new DynamoDBMapperConfig(DynamoDBMapperConfig.SaveBehavior.CLOBBER));
If you have versioning enabled, then the client-side and server-side item versions must match. However, the version does not need to match if the SaveBehavior.CLOBBER option is used

you can optionally request strongly consistent reads to ensure that this method retrieves only the latest item values as shown in the following Java statement.
CatalogItem item = mapper.load(CatalogItem.class, item.getId(), 
                new DynamoDBMapperConfig(DynamoDBMapperConfig.ConsistentReads.CONSISTENT)); 
By default, DynamoDB returns the item that has values that are eventually consistent

Queries a table or a secondary index. You can query a table or an index only if it has a composite primary key (partition key and sort key). This method requires you to provide a partition key value and a query filter that is applied on the sort key. A filter expression includes a condition and a value.

Strong consistent reads cost twice the amount of consumed units than eventually consistent reads, so consumes more TPS and hence costs you more.
https://aws.amazon.com/dynamodb/pricing/

Amazon DynamoDB Streams is a time-ordered sequence of item-level changes on an Amazon DynamoDB table. DynamoDB Streams can be enabled on a per table basis. There is no charge for enabling DynamoDB Streams. You only pay for reading data from DynamoDB Streams

https://blog.cloudability.com/eight-ways-to-lower-your-dynamodb-costs/
When using DynamoDB, you pay a flat, hourly rate based on how much capacity you have provisioned. Capacity is provisioned according to a number of Write Capacity units, and a number of Read Capacity units. Each Write Capacity unit can perform up to 36,000 writes per hour, and each Read Capacity unit can perform up to 180,000 strongly consistent reads per hour, or 360,000 eventually consistent reads per hour. There is a different price for Write Capacity units than there is for Read Capacity units, and these prices differ based on Region.
With this billing model in mind, many of your primary considerations when cost-optimizing your DynamoDB usage will be related to lean, strategic provisioning.

3) Don’t discount Eventually Consistent Reads (unless you have to)

For the same price, AWS allows you to do more Eventually Consistent Reads per hour than Strongly Consistent Reads. This means that you can get a lower price-per-read by selecting the Eventually Consistent Reads option, if Eventually Consistent Reads will satisfy your read demands.
DynamoDB Reserved Capacity offers savings over the normal price of DynamoDB in exchange for paying for the capacity in advance. In other words, you can choose to commit to using a certain amount of Write Capacity and Read Capacity units in exchange for a lower price for those units. Should you not use those units, however, too bad for you—you still have to pay for them.

http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html
Amazon DynamoDB is available in multiple AWS Regions around the world. Each region is completely independent and isolated from other AWS Regions. For example, if you have a table called People in the us-east-1 region and another table named People in the us-west-2 region, these are considered two entirely separate tables. For a list of all the AWS Regions in which DynamoDB is available



Amazon DynamoDB is available in multiple AWS Regions around the world. Each region is completely independent and isolated from other AWS Regions. For example, if you have a table called People in the us-east-1 region and another table named People in the us-west-2 region, these are considered two entirely separate tables. For a list of all the AWS Regions in which DynamoDB is available, see AWS Regions and Endpoints in the Amazon Web Services General Reference.
Every AWS Region consists of multiple distinct locations called Availability Zones. Each Availability Zone is engineered to be isolated from failures in other Availability Zones, and to provide inexpensive, low-latency network connectivity to other Availability Zones in the same region. This design allows for rapid replication of your data among multiple Availability Zones in a region.
When your application writes data to a DynamoDB table and receives an HTTP 200 response (OK), all copies of the data are updated. However, it takes time for the data to propagate to all storage locations within the current AWS region. The data will eventually be consistent across all of these storage locations, usually within one second or less.
To support varied application requirements, DynamoDB supports both eventually consistent and strongly consistent reads.
Eventually Consistent Reads
When you read data from a DynamoDB table, the response might not reflect the results of a recently completed write operation. The response might include some stale data. However, if you repeat your read request after a short time, the response should return the latest data.
Strongly Consistent Reads
When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful. Note that a strongly consistent read might not be available in the case of a network delay or outage.
Note
DynamoDB uses eventually consistent reads, unless you specify otherwise. Read operations (such as GetItemQuery, and Scan) provide a ConsistentRead parameter: If you set this parameter to true, DynamoDB will use strongly consistent reads during the operation.

http://stackoverflow.com/questions/20870041/amazon-dynamodb-strong-consistent-reads-are-they-latest-and-how
From the definition above, what I get is that a strong consistent read will return the latest write value.
Taking an example: Lets say Client1 issues a write command on Key K1 to update the value from V0 to V1. After few milliseconds Client2 issues a read command for Key K1, then in case of strong consistency V1 will be returned always, however in case of eventual consistency V1 or V0 may be returned. Is my understanding correct?
If it is, What if the write operation returned success but the data is not updated to all replicas and we issue a strongly consistent read, how it will ensure to return the latest write value in this case?
Short answer: Writing successfully in strongly consistent mode requires that your write succeed on a majority of servers that can contain the record, therefore any future consistent reads will always see the same data, because a consistent read must read a majority of the servers that can contain the desired record. If you do not perform a strongly consistent read, the system will ask a random server for the record, and it is possible that the data will not be up-to-date.
Imagine three servers. Server 1, server 2 and server 3. To write a strongly consistent record, you pick two servers at minimum, and write the data. Let's pick 1 and 2.
Now you want to read the data consistently. Pick a majority of servers. Let's say we picked 2 and 3.
Server 2 has the new data, and this is what the system returns.
Eventually consistent reads could come from server 1, 2, or 3. This means if server 3 is chosen by random, your new write will not appear yet, until replication occurs.
If a single server fails, your data is still safe, but if two out of three servers fail your new write may be lost until the offline servers are restored.
More explanation: DynamoDB (assuming it is similar to the database described in the Dynamo paper that Amazon released) uses a ring topology, where data is spread to many servers. Strong consistency is guaranteed because you directly query all relevant servers and get the current data from them. There is no master in the ring, there are no slaves in the ring. A given record will map to a number of identical hosts in the ring, and all of those servers will contain that record. There is no slave that could lag behind, and there is no master that can fail.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
com.amazonaws.ClientConfiguration

https://www.packtpub.com/books/content/dynamodb-best-practices

    Auto-retries are required if we get any errors during the first request. We can use the Amazon client configurations to set our retry strategy. By default, the DynamoDB client auto-retries a request if any error is generated three times. If we think that this is not efficient for us, then we can define this on our own, as follows:
    1. public class CustomRetryCondition implements RetryCondition {
      public boolean shouldRetry(AmazonWebServiceRequest originalRequest,
           AmazonClientException exception, int retriesAttempted) {
         if (retriesAttempted < 3 && exception.isRetryable()) {
           return true;
         } else {
           return false;
         }
      }
      }
    Performing atomic transactions on DynamoDB tables
    I hope we are all aware that operations in DynamoDB are eventually consistent. Considering this nature it obviously does not support transactions the way we do in RDBMS. A transaction is a group of operations that need to be performed in one go, and they should be handled in an atomic nature. (If one operation fails, the complete transaction should be rolled back.)
    There might be use cases where you would need to perform transactions in your application. Considering this need, AWS has provided open sources, client-side transaction libraries, which helps us achieve atomic transactions in DynamoDB. In this recipe, we are going to see how to perform transactions on DynamoDB.

    Performing asynchronous requests to DynamoDB
    dynamoDBAsync.deleteItemAsync(deleteItemRequest, new AsyncHandler<DeleteItemRequest, DeleteItemResult>() { public void onSuccess(DeleteItemRequest request, DeleteItemResult result) { System.out.println("Item deleted successfully: "+ System.currentTimeMillis()); } public void onError(Exception exception) { System.out.println("Error deleting item in async way"); } });
    Asynchronous clients use AsyncHttpClient to invoke the DynamoDB APIs.
    http://stackoverflow.com/questions/13128613/amazon-dynamo-db-max-client-connections
    Since the connections to Amazon DynamoDB is http(s) based, the concept of open connections is limited to your tcp max open connections at once. I highly doubt there's a limit on Amazons end at all as it is load balanced close to infinity.
    Naturally, the exception is your read and write capacity limits. Note that they want you to contact them if you will exceed a certain amount of capacity units which depend on your region.
    You've probably already read them, but the limitations of DynamoDB is found here:
    https://java.awsblog.com/post/Tx1JADUVTWF7SP2/Specifying-Conditional-Constraints-with-Amazon-DynamoDB-Mapper
       DynamoDBSaveExpression saveExpression = new DynamoDBSaveExpression();
       Map expected = new HashMap();
       expected.put("status",
          new ExpectedAttributeValue(new AttributeValue("READY").withExists(true));
       saveExpression.setExpected(expected);
       mapper.save(obj, saveExpression);
    https://www.jayway.com/2013/08/24/create-entity-if-not-exists-in-dynamodb-from-java/
     PutItemRequest putItemRequest = new PutItemRequest().withTableName("tableName")
                    .withItem(new ImmutableMap.Builder<string, AttributeValue>()
                            .put("id", new AttributeValue("entityId"))
                            .put("my_attribute", new AttributeValue("value"))
                            .build())
                    .withExpected(new ImmutableMap.Builder<string, ExpectedAttributeValue>()
                            // When exists is false and the id already exists a ConditionalCheckFailedException will be thrown
                            .put("id", new ExpectedAttributeValue(false))
                            .build());
            try {
                amazonDynamoDB.putItem(putItemRequest);
            } catch (ConditionalCheckFailedException e) {
                // This indicates that the entity already exists
                ….
            }

    How to get row count from DynamoDB
    https://forums.aws.amazon.com/message.jspa?messageID=378737
    We got an update from DynamoDB team. They said you use the DescribeTable API: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DescribeTable.html. One of the attributes returned is an "itemCount" for the table, which corresponds to "row count" in classic SQL.

    Also, Amazon DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.

    http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DescribeTable.html
    https://java.awsblog.com/post/Tx2LVB1TA774M13/Snippet-Creating-Amazon-DynamoDB-Tables
                DescribeTableRequest request = new DescribeTableRequest()
                     .withTableName(tableName);
                TableDescription table = dynamo.describeTable(request).getTable();
                if ( table == null ) continue;
                String tableStatus = table.getTableStatus();
                System.out.println("  - current state: " + tableStatus);
    https://medium.com/@quodlibet_be/an-overview-of-tools-to-export-from-dynamoddb-to-csv-d2707ad992ac
    Exporting from the console :
    maximum of 100 records
    If your dynamodb table contains nested documents, the columns in your csv files will contain json objects.

    DynamodbToCSV4j : A java tool/library to export a complete dynamodb table, or the result of a scan operation to a csv file. It supports mixed documents, and will flatten any nested documents for easier
    processing.

    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.QueryScanExample.html
    Map<String, AttributeValue> eav = new HashMap<String, AttributeValue> (); eav.put(":val1", new AttributeValue().withS("Bicycle")); eav.put(":val2", new AttributeValue().withS(bicycleType)); DynamoDBScanExpression scanExpression = new DynamoDBScanExpression() .withFilterExpression("ProductCategory = :val1 and BicycleType = :val2") .withExpressionAttributeValues(eav); List<Bicycle> scanResult = mapper.parallelScan(Bicycle.class, scanExpression, numberOfThreads); for (Bicycle bicycle : scanResult) { System.out.println(bicycle); }

    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ScanJavaDocumentAPI.html
    private static void findProductsForPriceLessThanZero() { Table table = dynamoDB.getTable(tableName); Map<String, Object> expressionAttributeValues = new HashMap<String, Object>(); expressionAttributeValues.put(":pr", 100); ItemCollection<ScanOutcome> items = table.scan( "Price < :pr", //FilterExpression "Id, Title, ProductCategory, Price", //ProjectionExpression null, //ExpressionAttributeNames - not used in this example expressionAttributeValues); System.out.println("Scan of " + tableName + " for items with a price less than 100."); Iterator<Item> iterator = items.iterator(); while (iterator.hasNext()) { System.out.println(iterator.next().toJSONPretty()); } }

    http://www.programcreek.com/java-api-examples/index.php?api=com.amazonaws.services.dynamodbv2.datamodeling.PaginatedScanList
        DynamoDBScanExpression dynamoDBScanExpression = new DynamoDBScanExpression();
        DynamoDBMapperConfig config = new DynamoDBMapperConfig(DynamoDBMapperConfig.PaginationLoadingStrategy.EAGER_LOADING);
        PaginatedScanList<T> paginatedScanList = dynamoDBMapper.scan(getType(), dynamoDBScanExpression, config);
        paginatedScanList.loadAllResults();
        return paginatedScanList.size();

    http://stackoverflow.com/questions/31850030/dynamodbmapper-load-vs-query
    If you have a hash key only schema, they perform the same operation - retrieve the item with the hash key specified.
    If you have a hash-range schema, load retrieves a specific item identified by a single hash + range pair. Query retrieves all items that have the specified hash key and meet the range key conditions.
    Since you are using the equality operator for both the hash key and range key, the operations are exactly equivalent.
    - doesn't support multiple getbyId query

    Dynamonito is a drop in replacement for the DynamoDB's high level mapper. It intercepts the DynamoDB save operations, serializes the object into DynamoDB’s native wire protocol format in json, and puts the json in cache. The cache is a write-through cache.
    http://stackoverflow.com/questions/36347774/how-do-i-dynamically-change-the-table-accessed-using-dynamodbs-java-mapper
    DynamoDBMapperConfig is not a static/global class. You need to pass it to the DynamoDBMapperconstructor.
        AmazonDynamoDBClient client = . . .;
        mapperConfig = new DynamoDBMapperConfig.Builder().withTableNameOverride(TableNameOverride.withTableNameReplacement(tableName))
            .build();
        mapper = new DynamoDBMapper(client, mapperConfig);

    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html
    Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in DynamoDB. If you use this strategy, then your database writes are protected from being overwritten by the writes of others — and vice-versa.
    With optimistic locking, each item has an attribute that acts as a version number. If you retrieve an item from a table, the application records the version number of that item. You can update the item, but only if the version number on the server side has not changed. If there is a version mismatch, it means that someone else has modified the item before you did; the update attempt fails, because you have a stale version of the item. If this happens, you simply try again by retrieving the item and then attempting to update it. Optimistic locking prevents you from accidentally overwriting changes that were made by others; it also prevents others from accidentally overwriting your changes.

    When you save an object, the corresponding item in the DynamoDB table will have an attribute that stores the version number. The DynamoDBMapperassigns a version number when you first save the object, and it automatically increments the version number each time you update the item. Your update or delete requests will succeed only if the client-side object version matches the corresponding version number of the item in the DynamoDB table.
    @DynamoDBIgnore public String getSomeProp() { return someProp;} @DynamoDBVersionAttribute public Long getVersion() { return version; }
    Disabling Optimistic Locking
    mapper.save(item, new DynamoDBMapperConfig( DynamoDBMapperConfig.SaveBehavior.CLOBBER));
    catch (ConditionalCheckFailedException e) {
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.CrossRegionRepl.html
    https://github.com/awslabs/dynamodb-cross-region-library

    Walkthrough: Setting Up Replication Using the Cross Region Replication Console
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.CrossRegionRepl.Walkthrough.html
    1. Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.
    2. In the upper right corner of the window, choose US East (N. Virginia) from the region selector.
    1. In the Overview tab, choose Manage Stream, and then do the following:
      • For View Type, choose New and old images.
      • Choose Enable.
    http://tech.adroll.com/blog/ops/2013/10/02/dynamodb-replication.html

    http://blog.sungardas.com/CTOLabs/2015/09/planning-for-failures-with-amazon-dynamodb-2/
     Streams, which “provides a time ordered sequence of item level changes in any DynamoDB table”.
    https://www.reddit.com/r/aws/comments/3lq36p/multiregion_replicationfailover_discuss/

    https://www.trek10.com/blog/running-in-one-aws-region/
    • AWS’s DNS service Route 53 (along with other DNS services) allows failover, load balanced, and latency-based DNS record sets. These are a key building block of a redundant configuration. Route 53 can automatically monitor the health of your endpoints and automatically switch the DNS from one region to another if the first region becomes unhealthy.
    • If you can, use CloudFormation and a configuration management system (i.e. Ansible, Salt, or Dockerfiles) to replicate your stack in two regions. This is the best way to replicate your environment and application tiers.

    Data Model
    Since the table is partitioned based on the hash key attribute, do not choose repeating attributes that will have only single-digit (very few) unique values. For example, the Language attribute of our table has only three identical values. Choosing this attribute will eat up a lot of throughput.
    Give the most restricted data type. For example, if we decide to make some number attributes as primary key attributes, then (even though String can also store numbers) we must use the Number data type only, because the hash and ordering logic will differ for each data type
    Do not put too many attributes or too lengthy attributes (using delimiter as discussed formerly) into the primary key attributes, because it becomes mandatory that every item must have these attributes and all the attributes will become part of the query operation, which is inefficient.

    Make the attribute to be ordered as the range key attribute.
    Make the attribute to be grouped (or partitioned) as the hash key attribute.

    DynamoDB created an unordered hash index on the hash key and sorted the range key index on a range key attribute. Range key should be selected in such a manner that it will evenly distribute the data load across partitions.

    for the Person table would be choosing birth years as the hash key and SSN as the range key

    Book (yearOfPublishing,bookId, , ..)
    Hash key and range

    yearOfPublishing—this will allow us to save books as per the year in which they got published
    bookId—unique book ID

    http://stackoverflow.com/questions/21794945/dynamodb-scan-in-sorted-order
    http://stackoverflow.com/questions/9297326/is-it-possible-to-order-results-with-query-or-scan-in-dynamodb
    If you know the HashKey, then any query will return the items sorted by Range key:
    "Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order; otherwise, the results are returned in order of ASCII character code values. By default, the sort order is ascending. To reverse the order use the ScanIndexForward parameter set to false." Query and Scan Operations - Amazon DynamoDB :http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
    Dynamic Table Name in DynamoDB with Java DynamoMapper
    @DynamoDBTable(tableName = "REPLACED_BY_VALUE_IN_PROPERTIES_FILE")
    public class MyEntity

    String tableName = .. // Get table name from a property file
    dynamoMapper.save(myEntity, new DynamoDBMapperConfig(new TableNameOverride(tableName)));
    What is Hash and Range Primary Key?
    http://stackoverflow.com/questions/27329461/what-is-hash-and-range-primary-key
    "Hash and Range Primary Key" means that a single row in DynamoDB has a unique primary key made up of both the hash and the range key. For example with a hash key of X and range key ofY, your primary key is effectively XY. You can also have multiple range keys for the same hash key but the combination must be unique, like XZ and XA. Let's use their examples for each type of table:
    Hash Primary Key – The primary key is made of one attribute, a hash attribute. For example, a ProductCatalog table can have ProductID as its primary key. DynamoDB builds an unordered hash index on this primary key attribute.
    This means that every row is keyed off of this value. Every row in DynamoDB will have a required, unique value for this attribute. Unordered hash index means what is says - the data is not ordered and you are not given any guarantees into how the data is stored. You won't be able to make queries on an unordered index such as Get me all rows that have a ProductID greater than X. You write and fetch items based on the hash key. For example, Get me the row from that table that has ProductID X. You are making a query against an unordered index so your gets against it are basically key-value lookups, are very fast, and use very little throughput.

    Hash and Range Primary Key – The primary key is made of two attributes. The first attribute is the hash attribute and the second attribute is the range attribute. For example, the forum Thread table can have ForumName and Subject as its primary key, where ForumName is the hash attribute and Subject is the range attribute. DynamoDB builds an unordered hash index on the hash attribute and a sorted range index on the range attribute.
    This means that every row's primary key is the combination of the hash and range key. You can make direct gets on single rows if you have both the hash and range key, or you can make a query against the sorted range index. For example, get Get me all rows from the table with Hash key X that have range keys greater than Y, or other queries to that affect. They have better performance and less capacity usage compared to Scans and Queries against fields that are not indexed. Fromtheir documentation:
    Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order; otherwise, the results are returned in order of ASCII character code values. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter to false
    I probably missed some things as I typed this out and I only scratched the surface. There are a lotmore aspects to take into consideration when working with DynamoDB tables (throughput, consistency, capacity, other indices, key distribution, etc.). You should take a look at the sample tables and data page for examples.
    Scan/Pagination
    https://java.awsblog.com/post/Tx24R2FVIIIV9IB/Understanding-Auto-Paginated-Scan-with-DynamoDBMapper

    The limit for a scan doesn't apply to how many results are returned, but to how many table items are examined. Because scan works on arbitrary item attributes, not the indexed table keys like query does, DynamoDB has to scan through every item in the table to find the ones you want, and it can't predict ahead of time how many items it will have to examine to find a match. The limit parameter is there so that you can control how much of your table's provisioned throughput to consume with the scan before returning the results collected so far, which may be empty.
    mapper.scan(MyObject.class, new DynamoDBScanExpression());
    Pagination with DynamoDBMapper Java AWS SDK
    // Using 'PaginatedScanList'
    final DynamoDBScanExpression paginatedScanListExpression = new DynamoDBScanExpression()
            .withLimit(limit);
    final PaginatedScanList<MyClass> paginatedList = mapper.scan(MyClass.class, paginatedScanListExpression);
    paginatedList.forEach(System.out::println);
    
    System.out.println();
    // using 'ScanResultPage'
    final DynamoDBScanExpression scanPageExpression = new DynamoDBScanExpression()
            .withLimit(limit);
    do {
        ScanResultPage<MyClass> scanPage = mapper.scanPage(MyClass.class, scanPageExpression);
        scanPage.getResults().forEach(System.out::println);
        System.out.println("LastEvaluatedKey=" + scanPage.getLastEvaluatedKey());
        scanPageExpression.setExclusiveStartKey(scanPage.getLastEvaluatedKey());
    
    } while (scanPageExpression.getExclusiveStartKey() != null);

    https://java.awsblog.com/post/TxPTR0HTAPBTM2/DynamoDB-Local-Test-Tool-Integration-for-Eclipse
    Once the test tool is installed, pop open the AWS Explorer view and switch it to the Local (localhost)region. 
    // The secret key doesn't need to be valid, DynamoDB Local doesn't care.
    AWSCredentials credentials = new BasicAWSCredentials(yourAccessKeyId, "bogus");
    AmazonDynamoDBClient client = new AmazonDynamoDBClient(credentials);
    // Make sure you use the same port as you configured DynamoDB Local to bind to.
    client.setEndpoint("http://localhost:8000");
    // Sign requests for the "local" region to read data written by the toolkit.
    client.setSignerRegionOverride("local");

    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.DynamoDBLocal.html#Tools.DynamoDBLocal.DownloadingAndRunning
    java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
    • -sharedDb — DynamoDB Local will use a single database file, instead of using separate files for each credential and region. If you specify -sharedDb, all DynamoDB Local clients will interact with the same set of tables regardless of their region and credential configuration.

    client = new AmazonDynamoDBClient(credentials);
    client.setEndpoint("http://localhost:8000");

    asw command line:
    http://aws.amazon.com/cli/
    aws dynamodb create-table --table-name MusicCollection --attribute-definitions AttributeName=Artist,AttributeType=S AttributeName=SongTitle,AttributeType=S --key-schema AttributeName=Artist,KeyType=HASH AttributeName=SongTitle,KeyType=RANGE --provisioned-
    throughput ReadCapacityUnits=5,WriteCapacityUnits=5
    http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

    Locate the AWS Completer
    which aws_completer
     complete -C '/usr/local/bin/aws_completer' aws
    ====>
    aws configure --profile local
    aws ec2 describe-instances --profile user2
    aws dynamodb put-item 
        --table-name MusicCollection 
        --item '{ 
            "Artist": {"S": "No One You Know"}, 
            "SongTitle": {"S": "Call Me Today"} , 
            "AlbumTitle": {"S": "Somewhat Famous"} }' 
        --return-consumed-capacity TOTAL 
    
    aws dynamodb list-tables --endpoint-url http://localhost:8000
    aws dynamodb list-tables --endpoint-url http://localhost:8000 --profile local

    aws dynamodb query --table-name MusicCollection --key-conditions file://key-conditions.json
    aws dynamodb describe-table --table-name MusicCollection --profile local --endpoint-url http://localhost:8000

    aws dynamodb create-table --profile local --endpoint-url http://localhost:8000 --table-name MusicCollection --attribute-definitions AttributeName=Artist,AttributeType=S AttributeName=SongTitle,AttributeType=S --key-schema AttributeName=Artist,KeyType=HASH AttributeName=SongTitle,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
    DynamoDB Data Types
    Amazon DynamoDB supports the following data types:
    • Scalar types – Number, String, Binary, Boolean, and Null.
    • Multi-valued types – String Set, Number Set, and Binary Set.
    • Document types – List and Map.
    API:
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ORM.html
    AmazonDynamoDBClient client = new AmazonDynamoDBClient(new ProfileCredentialsProvider());
    DynamoDBMapper mapper = new DynamoDBMapper(client);
    mapper.save(item);     
    @DynamoDBIgnore
    @DynamoDBTable
    @DynamoDBAttribute
    @DynamoDBHashKey
    @DynamoDBRangeKey
    @DynamoDBAutoGeneratedKey
    The object persistence model will generate a random UUID when saving these attributes. Only String properties can be marked as auto-generated keys.
    @DynamoDBIndexHashKey
    Disabling Optimistic LockingTo disable optimistic locking, you can change the DynamoDBMapperConfig.SaveBehavior enumeration value from UPDATE to CLOBBER.Optimistic Locking With Version Number 
    Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in DynamoDB. If you use this strategy, then your database writes are protected from being overwritten by the writes of others — and vice-versa.
        @DynamoDBVersionAttribute
        public Long getVersion() { return version; }

    http://labs.journwe.com/2013/12/15/dynamodb-secondary-indexes/
    protected static DynamoDBMapper pm = new
      DynamoDBMapper(dynamoDBClient);
    pm.load(InspirationCategory.class, hashId, rangeId)
    By default, one DynamoDB Table can have exactly one hash id, and optionally one range id. Both hash id and range id must be specified at the time when a Table is created and cannot be changed later. If you want to make queries on more than one range id, DynamoDB offers Local Secondary Indexes. 

        @DynamoDBIndexRangeKey(localSecondaryIndexName =
         "active-index")
        public boolean isActive() {
            return active;
        }

    DynamoDBQueryExpression query = new
      DynamoDBQueryExpression().withLimit(limit).
      withRangeKeyCondition("active", new
      Condition().withComparisonOperator(
      ComparisonOperator.EQ).
      withAttributeValueList(new AttributeValue().
      withN("1")));
    InspirationTip tip = new InspirationTip();
    tip.setInspirationId(inspirationId);
    query.setHashKeyValues(tip);
    pm.query(InspirationTip.class, query);
     Local Secondary Indexes help us making queries on a different range id (“active” instead of “created”), however, the hash id (“inspirationId”) must still be specified for the query.
    DynamoDB Global Secondary Indexes
        @DynamoDBIndexRangeKey(globalSecondaryIndexName =
                "email-index")
        public String getEmail() {
            return email;
        }
    QueryRequest queryRequest = new QueryRequest()
            .withTableName("journwe-useremail")
            .withIndexName("email-index");
    HashMap<String, Condition> keyConditions = new HashMap<String, Condition>();
    keyConditions.put("email", new Condition().
      withComparisonOperator(ComparisonOperator.EQ).
      withAttributeValueList(new AttributeValue().withS(email)));
    queryRequest.setKeyConditions(keyConditions);
    queryRequest.setSelect(Select.ALL_PROJECTED_ATTRIBUTES);
    AmazonDynamoDB dynamoDbClient = PersistenceHelper.getDynamoDBClient();
    QueryResult queryResult = dynamoDbClient.query(queryRequest);
    List<Map<String, AttributeValue>> items = queryResult.getItems();
    Iterator<Map<String, AttributeValue>> itemsIter = items.iterator();
    String userId = null;
    while (itemsIter.hasNext()) {
        Map<String, AttributeValue> currentItem = itemsIter.next();
        Iterator<String> currentItemIter = currentItem.keySet().iterator();
        while (currentItemIter.hasNext()) {
            String attr = (String) currentItemIter.next();
            if (attr == "userId" ) {
                userId = currentItem.get(attr).getS();
            }
        }
    }

    Start DynamoDB Local:
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.DynamoDBLocal.html
    java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

    JavaScript Shell for DynamoDB Local
    http://localhost:8000/shell
    Using the AWS CLI with DynamoDB Local
    The AWS CLI can interact with DynamoDB Local, in addition to DynamoDB. To do this, add the --endpoint-url parameter to each command:
    --endpoint-url http://localhost:8000
    Here is an example, using the AWS CLI to list the tables in a DynamoDB Local database:
    aws dynamodb list-tables --endpoint-url http://localhost:8000

    Data types
    http://stackoverflow.com/questions/22962006/dynamodb-most-efficient-date-type
    DynamoDB is essentially limited to three data types: String, Number, and Binary. That seems to leave two options for storing a date or timestamp:
    • String of an 8601 date/time format or Unix timestamp
    • Number of a Unix timestamp
    Index
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
    you could create a global secondary index named GameTitleIndex, with a hash key of GameTitle and a range key of TopScore. Since the table's primary key attributes are always projected into an index, the UserId attribute is also present.

    Every global secondary index must have a hash key, and can have an optional range key. The index key schema can be different from the table schema; you could have a table with a hash type primary key, and create a global secondary index with a hash-and-range index key — or vice-versa. The index key attributes can consist of any attributes from the table, as long as the data types are scalar rather than multi-value sets.
    You can project other table attributes into the index if you want. When you query the index, DynamoDB can retrieve these projected attributes efficiently; however, global secondary index queries cannot fetch attributes from the parent table.
    Attribute Projections
    projection is the set of attributes that is copied from a table into a secondary index. The hash and range keys of the table are always projected into the index; you can project other attributes to support your application's query requirements. When you query an index, Amazon DynamoDB can access any attribute in the projection as if those attributes were in a table of their own.
    When you create a secondary index, you need to specify the attributes that will be projected into the index. DynamoDB provides three different options for this:
    • KEYS_ONLY – Each item in the index consists only of the table hash and range key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
    • INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
    • ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALLprojection results in the largest possible secondary index.
    http://stackoverflow.com/questions/30866030/number-of-attributes-in-key-schema-must-match-the-number-of-attributes-defined-i
    DynamoDB is schemaless (except the schema)
    That is to say, you do need to specify the key schema (attribute name and type) when you create the table. Well, you don't need to specify any non-key attributes. You can put an item with any attribute later (must include the keys of course).
    From the documentation page, the AttributeDefinitions is defined as:
    An array of attributes that describe the key schema for the table and indexes.
    When you create table, the AttributeDefinitions field is used for the hash and/or range keys only. In your first case, there is hash key only (number 1) while you provide 2 AttributeDefinitions. This is the root cause of the exception.
    TL;DR Don't include any non-key attribute definitions in AttributeDefinitions.

    Global secondary index — an index with a hash and range key that can be different from those on the table. A global secondary index is considered "global" because queries on the index can span all of the data in a table, across all partitions.
    Local secondary index — an index that has the same hash key as the table, but a different range key. A local secondary index is "local" in the sense that every partition of a local secondary index is scoped to a table partition that has the same hash key.
    However, the differences go way beyond the possibilities in terms of key definitions. Find below some important factors that will directly impact the cost and effort for maintaining the indexes:
    • Throughput :
    Local Secondary Indexes consume throughput from the table. When you query records via the local index, the operation consumes read capacity units from the table. When you perform a write operation (create, update, delete) in a table that has a local index, there will be two write operations, one for the table another for the index. Both operations will consume write capacity units from the table.
    Global Secondary Indexes have their own provisioned throughput, when you query the index the operation will consume read capacity from the table, when you perform a write operation (create, update, delete) in a table that has a global index, there will be two write operations, one for the table another for the index*.
    *When defining the provisioned throughput for the Global Secondary Index, make sure you pay special attention to the following requirements:
    In order for a table write to succeed, the provisioned throughput settings for the table and all of its global secondary indexes must have enough write capacity to accommodate the write; otherwise, the write to the table will be throttled.
    • Management :
    Local Secondary Indexes can only be created when you are creating the table, there is no way to add Local Secondary Index to an existing table, also once you create the index you cannot delete it.
    Global Secondary Indexes can be created when you create the table and added to an existing table, deleting an existing Global Secondary Index is also allowed.
    • Read Consistency:
    Local Secondary Indexes support eventual or strong consistency, whereas, Global Secondary Index only supports eventual consistency.
    • Projection:
    Local Secondary Indexes allow retrieving attributes that are not projected to the index (although with additional cost: performance and consumed capacity units). With Global Secondary Index you can only retrieve the attributes projected to the index.
    http://stackoverflow.com/questions/29558948/dynamo-local-from-node-aws-all-operations-fail-cannot-do-operations-on-a-non-e
    The problem is that the JavaScript console and your app use different profiles (credential and region) and therefore DynamoDB local will create separate database files for them. By using the -sharedDb flag when starting the local DynamoDB, a single database file will be shared for all clients.
    From the doc:
    -sharedDb — DynamoDB Local will use a single database file, instead of using separate files for each credential and region. If you specify -sharedDb, all DynamoDB Local clients will interact with the same set of tables regardless of their region and credential configuration.

    DynamoDB and TableNameOverride with prefix
    http://stackoverflow.com/questions/20888550/dynamodb-and-tablenameoverride-with-prefix
        <bean id="tableNameOverride" class="org.springframework.beans.factory.config.MethodInvokingFactoryBean">
            <property name="staticMethod" value="com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapperConfig.TableNameOverride.withTableNamePrefix"/>
            <property name="arguments" value="DES-" />
        </bean>
    
        <bean id="dynamoDBMapperConfig" class="com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapperConfig">
            <constructor-arg index="0" ref="tableNameOverride" />
        </bean>
    
        <bean id="BasicAWSCredentials" class="com.amazonaws.auth.BasicAWSCredentials">
             <constructor-arg index="0" value="${amazon.accessKey}" />
             <constructor-arg index="1" value="${amazon.secretKey}" />
        </bean>
    
        <bean id="amazonDynamoDBClient" class="com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient">
            <constructor-arg index="0" ref="BasicAWSCredentials" />
            <property name="endpoint" value="http://dynamodb.us-west-2.amazonaws.com" />
        </bean>
    
        <bean id="dynamoDBMapper" class="com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper">
            <constructor-arg index="0" ref="amazonDynamoDBClient" />
            <constructor-arg index="1" ref="dynamoDBMapperConfig" />
        </bean>
    Building a general DynamoDBMarshalling for enums
    public class EnumMarshaller implements DynamoDBMarshaller<Enum> {
        @Override
        public String marshall(Enum getterReturnResult) {
            return getterReturnResult.name();
        }
    
        @Override
        public Enum unmarshall(Class<Enum> clazz, String obj) {
            return Enum.valueOf(clazz, obj);
        }
    }
    In my table class with an enum:
    @DynamoDBMarshalling(marshallerClass=EnumMarshaller.class)
    @DynamoDBAttribute(attributeName = "MyEnum")
    public MyEnum getMyEnum() {
       return myEnum;
    }

    Using Custom Marshallers to Store Complex Objects in Amazon DynamoDB
    Out of the box, DynamoDBMapper works with StringDate, and any numeric type such as intIntegerbyteLong.
    @DynamoDBMarshalling (marshallerClass = PhoneNumberMarshaller.class)

    Local Secondary Indexes can only be defined at table creation time, and cannot be removed or modified later. Global Secondary Indexes can be defined at table creation time, or later. Please see Secondary IndexesGlobal Index Guidelines, and Local Index Guidelines 

    Test
    DynamoDB Local Maven Plugin
    <plugin>
      <groupId>com.jcabi</groupId>
      <artifactId>jcabi-dynamodb-maven-plugin</artifactId>
      <executions>
        <execution>
          <goals>
            <goal>create-tables</goal>
          </goals>
          <configuration>
            <tables>
              <table>${basedir}/src/test/dynamodb/foo.json</table>
            </tables>
          </configuration>
        </execution>
      </executions>
    </plugin>

    The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException
    -- This usually means the key(hash or range) defined in code is not same as defined in table.
    http://www.slideshare.net/AmazonWebServices/bdt203-from-zero-to-nosql-hero-amazon-dynamodb-tutorial-aws-reinvent-2014

    Client-side error with status code 400 - retry not needed
    AccessDeniedException
    ConditionalCheckFailedException
    IncompleteSignatureException
    LimitExceededException
    MissingAuthenticationTokenException
    ResourceInUseException
    ResourceNotFoundException
    ValidationException
    Client-side error with status code 400 - retry possible
    ProvisionedThroughputExceededException
    ItemCollectionSizeLimitExceededException
    ThrottlingException
    UnrecognizedClientException

    In the case of a batch operation (BatchGetItem and BatchWriteItem), the framework will not retry the entire batch because a single batch operation can operate on multiple tables at a time. In the case of the BatchGetItem request, the tables and primary keys in BatchGetItemrequests are returned in the UnprocessedKeys parameter of the request. For BatchWriteItem, the tables and primary keys in the BatchWriteItem requests are returned in UnprocessedItems. With the help of these two parameters, we can easily retry only the failed requests.

    Labels

    Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

    Popular Posts