Azure Cosmos Db TTL, your data cleanup manager

In every scenario where you store data, you need to make sure your data is as clean and relevant as possible. Especially when storing data in the cloud, where every byte stored and transferred costs money. Not only storing data has it's price tag, also having irrelevant/old/obsolete data will impact the overall performance so you might need to buy more capacity to keep the performance you need.

So for several reasons it's important to keep your data clean and tidy.

In general this is handled by maintenance jobs. A maintenance job can be a scheduled task or in case of SQL a SQL job., but for Cosmos DB jobs don't exist. Other options are a time-triggered Function to run a cleanup script, or you to have a “scheduled task” you can use a Azure DevOps pipeline to run a script.

For Cosmos DB however, there is since 2018 already this little known feature called TTL - Time To Live. It allows you to assign an expiration timestamp to a document, after which it will be removed by the Cosmos DB engine. The good thing here is it doesn't cost you anything, as the cleanup of expired documents is handled by left over capacity on your Cosmos DB. This does mean the documents might not deleted immediately after expiration, so this might not be suitable for legal purposes when a direct delete is required. Although the expired documents might not be immediately deleted, they are flagged so they won't be part of a query result.

Configure TTL on Container level

When you have played with Cosmos DB, you probably have seen the TTL settings page before.

Container level TTL settings

There are three options regarding TTL:

  • Off -> entire feature is disabled
  • On (no default) -> feature is enabled, but no expiration is set by default
  • On -> feature is enabled, and you need to specify the time to live in seconds

When you select ‘On’, a textbox is displayed to key in the expiration in seconds.

Container level TTL on

The minimum is 1 second, the maximum is the int32 max value, which is the equivalent of 68 years. In the screenshot it's set to 10 seconds, which means any document you create in this container will become invisible after after 10 seconds and eventually will be removed. Although it's a nice feature, it seems like an all-or-nothing approach which is not very suitable document maintenance.

Configure TTL on Document level

The TTL feature really becomes valuable when you can set it per document. In that way each document in the container can have it's own expiration value, allowing cleanup of documents which were created but no longer needed. GDPR is an example where this can be used for, although it is a legal purpose it still might be suitable as you have 30 days between data delete request and when the data needs to be gone.

If you want to benefit from document level TTL, you need to do two things:

  1. configure TTL as ‘On (no default)’

Container level TTL on, without default

  1. add the ‘ttl’ field to your document

Document level TTL

It is that simple.

Obviously you need to have that defined in the entity model, so the ‘ttl’ field will end up in the document. Not only to be able to set TTL, but also to be able to update or disable it after the document has been created. After all, if you set the ttl field to value 1000, you can change that value while the document hasn't expired and even disable the TTL by setting the value to -1.

/// <summary>
/// Specify the number of seconds this document may live in storage.
/// When expired, it will be deleted automatically
/// By default the value is -1, which means no expiration
/// </summary>
public int TimeToLiveInSeconds { get; set; } = -1;

In the example above the default value is set to -1 which means ‘no expiration’. So by default we have the same behavior as disabling the TTL feature entirely, but it allows us to change that behavior later on.

Use case

This blog post began by mentioning data cleanup maintenance jobs, and that's where I see the most utility in this feature. It is a nice and cheap way to handle this task.

In our scenario we have mobile clients who need to upload data. Before they can upload, they request an upload URL. The uploaded data requires processing, so for each requested upload URL we create a document in Cosmos DB to contain the metadata. If for whatever reason the upload fails, the document with metadata is abandoned. Uploads can fail for example because the network connection is unstable or was dropped or the app was closed. The app keeps trying to upload the data, but requests an upload URL on every attempt. This leaves us with abandoned documents with metadata for uploads that never will take place.

One option we had was to run a Function based maintenance job, but we found out TTL is a much more elegant solution. When the client app requests an upload URL, we set the TTL to a certain value. When the upload is received, we reset the TTL to -1 so the document never expires from then on. This approach results in abandoned documents being removed automatically, and as this is not time critical we're fine it doesn't happen immediately.

Final thoughts

I really found this to be an elegant way to get rid of documents which are not supposed to be there anymore. Especially the fact the number of RU needed to delete the expired documents, are consumed from unused capacity is really nice. So utilizing this feature will cost you totally nothing at all.

If you have any comments or remarks, you can reach me on Twitter @jeanpaulsmit.