A Primer on Idempotence for AWS Serverless Architecture
Updated September 17, 2024.
In programming, the term idempotence may sound like a complex and arcane concept reserved for mathematical discussions or computer science lectures. However, its relevance stretches far beyond academia.
Idempotence, also called idempotency, is a fundamental principle that is pivotal in ensuring software systems’ predictability, reliability, and consistency.
In this article, we’ll demystify the concept of idempotency, explore what it means, why it matters, and how it shapes how we design and interact with software. Whether you’re a seasoned developer or just embarking on your coding journey, understanding idempotency is crucial to writing more robust and resilient programs.
What’s Idempotency?
Idempotency is a property of a function or operation in which applying it multiple times has the same result as applying it once.
In other words, an idempotent function, when called repeatedly, does not change the outcome beyond the first call.
For example, in mathematics, the absolute value function is idempotent because taking the absolute value of a number multiple times does not alter the result.
Whether you apply the absolute value function once or multiple times to a number, the result remains the same, as it always yields the non-negative value of the input.
>> abs(-5) == abs(abs(-5)) == abs(abs(abs(-5))) # ... and so on
True
Why Care about Idempotency in Cloud Workloads?
When developing cloud applications, where in this example we will demonstrate this concept using AWS, it’s crucial to grasp the concept of "at-least-once" delivery/invocation. This term implies that a particular target may receive or be invoked by an event at least once and possibly multiple times. As a developer, it’s imperative to anticipate and handle scenarios where the same event is processed multiple times. It’s not a matter of "if" this will occur but "when." This is where idempotency becomes paramount. Writing idempotent functions ensures that even if an event is processed multiple times, the outcome remains consistent and avoids unintended side effects, contributing to the reliability and robustness of your AWS applications.
Why is at-least-once delivery even a thing?
It is essential to delve into the operational mechanics under the hood if you want to understand the challenges posed by idempotency. This explanation will be based on Lambda, which Jit has written extensively about, but can be relevant to other services such as SQS or SNS.
When orchestrating asynchronous invocations of Lambda, it’s crucial to acknowledge the two distinct processes involved in the execution from start to finish. The initial process entails placing the event into a queue, while the subsequent process revolves around retrieving events from this queue. Given the intricate nature of multi-node databases and the concept of eventual consistency, occasional scenarios arise wherein two concurrent runners may process the same invocation event in parallel.
To understand how frequently these events occur, I wrote an experiment of a Lambda function triggered by an EventBridge event sending huge amounts of events to wake up the lambda. I monitored how often the Lambda got triggered on the same event by its ID. My experiments revealed that multiple concurrent executions of the same event can transpire in the order of one in tens of thousands of runs.
Idempotent Functions by Design
It is possible to write naturally idempotent functions. Consider the example of a function responsible for updating the status of an item in a database to "completed." This function is classified as idempotent because, irrespective of how many times it’s invoked, the item’s status will persist as “completed” to note that this idempotence assumption holds as long as there are no external factors, such as listeners or triggers, monitoring changes in the database table.
def handler(event: EventBridgeEvent, __) -> None:
executions = boto3.resource('dynamodb').Table('Executions')
executions.update_item(
Key={'id': event['detail']['id']},
UpdateExpression='SET #status = :status',
ExpressionAttributeNames={'#status': 'status'},
ExpressionAttributeValues={':status': 'COMPLETED'},
)
As a general rule, it is recommended to design functions to be idempotent whenever possible. This way, developers should not worry about manually handling idempotency in their code, thus reducing complexity and future maintenance costs.
Functions Which Are Not Naturally Idempotent
Some functions are not idempotent by design. For example, one that sends a notification message to a client may not be idempotent because if the function runs twice on the same event, the client will receive two notification messages, which leads to a bad user experience. Instead, we want the client to receive only one notification message. And that is where idempotency comes into the picture and where it’s most important to handle idempotency.
Solving Idempotency Issues with Lambda Powertools
We understand that not every function is idempotent and that occasionally, our Lambda will be invoked with the same event; then what do we do?
Luckily, AWS offers a great solution to make functions manually idempotent using their developer toolkit Powertools for AWS Lambda, which even has a specific utility package for idempotency handling. The package offers developers the "idempotent" decorator, which you can configure to handle multiple executions on the same event.
It works by hashing configurable specific values inside the event, which should identify the uniqueness of a specific event, and by storing in a database the state of execution on each event.
The first unique event that will arrive in the function’s context will be stored as an item in the storage layer. When the second invocation on the same event happens, the decorator will know that an execution has already started or finished and will abort the second execution.
A common storage layer to use in AWS is DynamoDB, which offers consistent reading abilities. Without diving into detail, the above example should look like the following with the decorator set.
Example
Let’s take a closer look at how you can use the idempotent decorator.
from aws_lambda_powertools.utilities.idempotency import DynamoDBPersistenceLayer, idempotent, IdempotencyConfig
@idempotent(
persistence_store=DynamoDBPersistenceLayer(table_name='IdempotencyTable'),
config=IdempotencyConfig(
event_key_jmespath='id',
raise_on_no_idempotency_key=True,
),
)
def handler(event: EventBridgeEvent, __):
ExecutionsManager.from_event(event).complete_execution()
As seen here, the idempotent decorator is configured with a persistence layer, in this case, a DynamoDB table named IdempotencyTable. In addition, by passing id in the event_key_jmespath parameter, the decorator knows to use only the id attribute to create a unique hash of the event object. The raise_on_no_idempotency_key is set to True to avoid cases where id is missing in the event, which is unexpected in this case.
Testing the Solution
By adding the idempotent decorator to the code base, although not pure code lines, it is a good practice to test that it is properly configured and behaves as expected.
At Jit, we’ve discovered an effective approach to testing the idempotent decorator. We accomplish this by utilizing moto (a Python mocking library for AWS infrastructure) to simulate a scenario in which the Lambda function is invoked twice with the same event.
from test_utils.aws import idempotency
def test_handler_idempotency(executions_manager):
idempotency.create_idempotency_table()
# It's important to import the handler after moto context is in action
from src.handlers.complete_execution import handler
# Call the handler for the first time
handler(event, None)
# Validate an idempotency key was created
idempotency.assert_idempotency_table_item_count(expected=1)
# Assert status changed to completed and completed_at has updated
execution = executions_manager.get_item(...)
assert execution.status == COMPLETED
assert execution.completed_at_ts >= datetime.utcnow().timestamp() - 1
# Call the handler for the second time
complete_execution_handler(event, None)
idempotency.assert_idempotency_table_item_count(expected=1)
assert executions_manager.get_item(...) == execution # Assert nothing has changed (idempotency worked)
Let’s break it down:
- Create the idempotency table: In the initial step, we create the idempotency table within the moto context. Since the idempotency table can be shared among multiple services in an AWS infrastructure, it’s practical to develop a test utility that creates the table and calls it from various tests
- Import the handler in the moto context: The second step is to import the handler after the moto context is activated. This is crucial because the moto context mocks the boto3 client, and the boto3 client is initialized in the decorator during import.
- Call the handler for the first time: Invoke the handler for the first time and validate that the idempotency key was successfully created in the idempotency table.
- Verify status and completion: The next step involves confirming that the execution’s status has changed to "completed" and the "completed_at" timestamp has been updated. This ensures that the Lambda function performed its task correctly.
- Call the handler for the second time: Finally, call the handler for the second time and ensure that the idempotency key was not created again and the attributes of the execution remain unchanged. This demonstrates that the Lambda function was idempotent and did not run again on the same event.
A small pro tip, which also helps to understand how the decorator works, is to debug such tests and follow the lines to see and validate that the second execution is never taking place.
Summary
I hope this sheds more light on why idempotence is a fundamental practice that ensures greater predictability, reliability, and consistency in systems. While failed operations aren’t the norm, but rather the outlier, at-least-once delivery has been one of the driving reasons for idempotence in cloud-based systems.
It is important to note that in this article, AWS Lambda with Python was used as the programming language in the examples. However, the challenges are valid for other programming languages and services. In SQS, for instance, a developer can choose between standard and FIFO types of queues. While in the standard queue, the delivery is at-least-once, FIFO features exactly-once processing, coming with the trade-off of a lower throughput and a higher cost compared to the standard queue. That’s an example of a case where a developer doesn’t need to implement any extra logic from his side and can just use an existing provided solution.
In modern operations, there are excellent tools available to make it possible to both apply idempotence practices and test their effectiveness before deploying to production. Through the example above, I have provided a simple and common example of how you can do this, too. By following the examples and testing process, you can have confidence that your idempotent code functions as intended, providing reliability and consistency in your AWS infrastructure.