Advanced Data Modelling in AWS DynamoDB

Introduction

AWS DynamoDB (DDB) has proved to be one of the most robust, scalable and cost-effective NoSQL solutions out there. Unlike SQL table rows, data is stored as JSON objects called items. There are multiple ways you can store those items on your table(s). In other words, there are several design patterns for DDB:

Multi-table design: Only one entity is stored in one table, similar to normalised RDBMS.
Single-table design: Multiple entities are stored in a single table in a denormalized fashion.
A mix of both: Multiple entities are stored in a single table where some entities are related to others from another table, which also contains one or more entities.

Which one to choose depends on your application and requirements. In this article, we will focus on the single-table design, where we will look at how our table can evolve depending on new access patterns.

Let’s get started.

Prerequisites

Not so fast! Before diving deep into the article, make sure you understand DDB fundamentals:

Example: Sensor Management

Overview

The example is based on a table called “sensor-management”, which stores information about IoT sensors and their locations. As mentioned, multiple entities are stored in the table according to the single-table design. Below, there is an image presenting how the entities are related to each other via the ER diagram:

One building can have multiple floors; one floor can have multiple rooms; one room can have multiple sensors; one sensor can have multiple measurements. Basically, we have several one-to-many relationships between the entities.

We are interested in the following access patterns, which will help us model our table:

Get building info
Get the building’s floor info
Get the floor’s room info
Get the building’s sensor info
Get the floor’s sensor info
Get the room’s sensor info
Get the latest measurement of a particular sensor
Get all items of a particular entity

To download the finished Cloudformation or NoSQL workbench template with initial data, go here.

Schema design

First, we create our table with a composite primary key: pk (partition key) and sk (sort or range key). Generic names are used since those fields will contain different entities. It is a good practice to put a prefix before IDs in pk and sk for readability and efficiency. The table will mainly use entity type as the prefix (building, floor, sensor, etc.).

Get building info

Let’s start with a simple query. The pk field contains the building ID with the prefix: `building#b-1`, whereas the sk field contains the constant value: `info`. Some developers may prefer duplicating the pk value in the sk field instead of `info`. All in all, you may get the following:

To retrieve this item, you can use the `GetItem` operation:

Access pattern	Base table / GSI / LSI	Operation	Partition key value	Sort key value	Other conditions / filters
Get building info	Base table	GetItem	pk=`buildingId`	sk=`info`

Get the building’s floor info and Get the floor’s room info

For these access patterns, we can effectively utilise our sk. Chaining multiple entity IDs in sk helps achieve the one-to-many relationship:

pk	sk
parent#id	child#id#grandchild#id#greatgrandchild#id…

Following the same pattern, we can put our building ID in the pk field and floor and room IDs in the sk field:

You might ask why I used `@` as a delimiter for room entities. The reason is to avoid filter expressions in query operations. However, this is a 100% overkill solution in this case because practically one building does not have so many floors and rooms to affect performance negatively. Nonetheless, if you cannot afford filter expressions, you can use that workaround, especially for deeply nested entities.

This is how you can achieve the access patterns:

Access pattern	Base table / GSI / LSI	Operation	Partition key value	Sort key value	Other conditions / filters
Get the building’s floor info	Base table	Query	pk=`buildingId`	sk begins_with `floor#`
Get the floor’s room info	Base table	Query	pk=`buildingId`	sk begins_with `floor@{{floorId}}@room#`

Get the building’s sensor info, Get the floor’s sensor info and Get the room’s sensor info

First of all, we need to decide how to store sensor information in the table. We can either continue our entity chaining in sk or put sensors under their own pk, which I did:

It would be completely fine to use the first proposed approach with the entity chaining. However, if local secondary indexes were used, and items via that approach resided under the same partition (identical partition key), it would imply the following limits:

A single partition cannot handle more than 3000 read operations or more than 1000 write operations.
The cumulative item size in one partition with one or more LSIs cannot exceed 10 GB. More information can be found here.

To address the access patterns, global indexes can be created. Again, we will use generic names if other entities might want to use the indexes (`GSI1-pk`, `GSI1-sk`, etc.).

After creating all those attributes, we can create 3 GSI indexes. Note that if your table is already created, you cannot add or remove multiple indexes simultaneously.

3 indexes are created, and, for simplicity, I projected all attributes in the indexes, but you may want to think twice about doing so in production. Using indexes that are utilised by all items with all projected attributes can double your storage and write operation costs. Fortunately, those indexes affect only our sensor items for now. In other words, they are sparse indexes. Also, note that when using GSIs you cannot issue transactions because they are eventually consistent.

As for operations:

Access pattern	Base table / GSI / LSI	Operation	Partition key value	Sort key value	Other conditions / filters
Get the building’s sensor info	GSI 1	Query	GSI1-pk=`buildingId`
Get the floor’s sensor info	GSI 2	Query	GSI2-pk=`floorId`
Get the room’s sensor info	GSI 3	Query	GSI3-pk=`roomId`

Get the latest measurement of a particular sensor

Given the requirements, there are still many ways to place sensor measurements inside the table. My choice was to use the sensor ID and measurement date in pk and measurement time in sk:

Using that structure, we can easily achieve the required access pattern, but understand that it is not the best solution for other cases. For example, we wouldn’t be able to get all measurements of a specific sensor for two or more days via one query operation. Since it is still fine for our pattern, I will leave that structure.

The latest measurement can be obtained by sorting in descending order and limiting results:

Access pattern	Base table / GSI / LSI	Operation	Partition key value	Sort key value	Other conditions / filters
Get the latest measurement of a particular sensor	Base table	Query	pk=`sensorId` + current date		Limit=1ScanIndexForward=false

Get all items of a particular entity

You might want to get data for all buildings, all rooms or all sensors. To achieve that, we can add the `entityType` field to every item and create the fourth global index. In this case, the index will consist of a partition key only.

Note that this index is not sparse, so it should be cautiously implemented.

Access pattern	Base table / GSI / LSI	Operation	Partition key value	Sort key value	Other conditions / filters
Get all items of a particular entity	GSI 4	Query	entityType=`Sensor` \| `Floor` \| `Building` …

Final result

We have covered a lot of techniques and access patterns. There are other uncovered topics which may further extend your design skills:

Let’s see what we have in the end:

Access pattern	Base table / GSI / LSI	Operation	Partition key value	Sort key value	Other conditions / filters
Get building info	Base table	GetItem	pk=`buildingId`	sk=`info`
Get the building’s floor info	Base table	Query	pk=`buildingId`	sk begins_with `floor#`
Get the floor’s room info	Base table	Query	pk=`buildingId`	sk begins_with `floor@{{floorId}} @room#`
Get the building’s sensor info	GSI 1	Query	GSI1-pk=`buildingId`
Get the floor’s sensor info	GSI 2	Query	GSI2-pk=`floorId`
Get the room’s sensor info	GSI 3	Query	GSI3-pk=`roomId`
Get the latest measurement of a particular sensor	Base table	Query	pk=`sensorId` + current date		Limit=1 ScanIndexForward=false
Get all items of a particular entity	GSI 4	Query	entityType=`Sensor` \| `Floor` \| `Building` …

New requirements do not fit!

In software development, client requirements are often subject to change. You may finish your “perfect” table design only to find out after some time that it cannot be extended because of the new requirements. There are some workarounds you can implement like creating an additional table or GSI to cater to client’s needs. However, there is a chance that you must migrate your table(s) to new ones with another schema/model. This also imposes a challenge when you cannot afford any downtime on production. Fortunately, AWS offers multiple ways for transferring your data reliably. For more information, you can refer to this post.

Conclusion

DynamoDB single-table design is substantially different from conventional SQL table modelling. There are different techniques which help implement relationships between entities in one table. We have learned how we can change and evolve our design depending on the provided access patterns. Remember that there are almost always multiple solutions to one problem. Do not stop on one design only and try experimenting more to achieve a reasonable and flexible model, but do not attempt to get a design which satisfies all possible future requirements.

Other useful sources of information:

By Ruslan Yeleussinov, Software Developer at Klika Tech, Inc.