Demystifying AWS DynamoDB Partition Keys

AWS DynamoDB belongs to the category of schema-less NoSQL databases (one of the most popular in fact) and each table upon creation must have a Primary key to uniquely reference items stored in the table.

A Primary Key of AWS DynamoDB particularly, constitutes a Partition Key as well as a Sort Key. We will talk about the latter after first emphasizing Partition Keys, understanding why it exists and why it matters to the whole DynamoDB construct.

A Partition Key determines the partition (a physical storage space internal to distributed databases) where the item will be stored.

Let's understand what a partition signifies (for what it's worth to us at least). Ever wondered about the how-abouts of the mysteriously quick response times of DynamoDB? Well, it has to do with how the data is stored and organized behind the scenes.

Dynamo DB (and distributed NoSQL databases alike) segments the transactional data and stores them into different subsets by what are known as ‘partitions’. These partitions account for their distributed nature and are present across different nodes. To know which partition an item must belong to, DynamoDB uses an internal Hashing function, and the Hashing function in turn runs decisively on one parameter - The Partition Key.

This is what makes Partition Key the Bossman 🙂.

We can now conceptualize it a little better, A Partition Key groups data based on its uniqueness, and every item with the same Partition Key will reside in the same partition.

DynamoDB uses Partition Key as the primary entry point for running its queries more efficiently. The attribute chosen as Partition Key must also be unique across the table - if you don’t have a Sort Key that is, and we will visit this in a minute.

Looking at the 'Itinerary' table above, quite clearly the itinerary_id makes for the Partition Key of DynamoDB while also becoming the Primary Key point of reference.

That being said, a Partition Key does not necessarily one-to-one map a Partition Key to a Partition. Say, if you have a million itinerary_id's it doesn't mean DynamoDB will set up a million partitions. The number of partitions is purely decided by the throughput mode(set by you) in accordance with the internal hash function.

Refactoring the earlier thought process, Although a partition key can enable uniquely identifying an item, it NEED NOT BE the only attribute contributing to the Primary key. It could also be composite and compounded with another attribute.

Enter Sort Key 🚪

Taking the same example of the itinerary table, let us introduce customers who actually make itinerary bookings. Thereby we can have a 1-N mapping from customer_id to multiple itinerary_id attributes as below:

Here the itinerary_id no longer functions as a Primary Key to singularly identify an item in the table. This brings us to composite Primary Keys (or compounded) composed of both customer_id as well as itinerary_id. Here the customer_id realigns to serve as the Partition Key whereas the itinerary_id serves as the Sort Key to uniquely identify a itinerary booking for a particular customer. A Sort Key also goes by the name of a Clustering Key (for example Cassandra) and is responsible for sorting data within a partition.

Intuitively, we now have partitions that are grouped by customer itinerary bookings enabled by using customer_id as the Partition Key. This makes for faster querying, as well as the added flexibility to query records, either solely by providing the customer_id and retrieving all the itineraries corresponding to a customer or a subset of itineraries by additionally providing a range of itinerary_ids.

Although you don't necessarily need to lift under the hood of DynamoDB partitions (and you can’t either 😥) here are a couple of valuable knowledge points:

1. The number of partitions is derived based on the maximum desired total throughput as well as the distribution of items in the partition space.

2. By default, every partition in the table will strive to deliver a full capacity of 3000 RCU (Read Capacity Units) and 1000 WCU (Write Capacity Units). The total throughput across all partitions is governed by the throughput mode set (Provisioned or On-Demand).

Choosing the right Partition Key is central to data modeling and building scalable and reliable applications. AWS DynamoDB allows to use either Simple (Partition Key only) or Composite ( a Partition Key paired with Sort Key) primary key to uniquely identify an item. They also must be scalar (each attribute must be a single value) and supported data types include string, binary, or number.

Well, that wraps this up! See you in the next one 👋

Anupam Rajanish

Anupam Rajanish

Demystifying AWS DynamoDB Partition Keys

The laterals of Internet construe wide and perhaps confusing definitions for DynamoDB Partition Keys. Here's a brief word to help detangle the same😊

Footnotes & References: