DDD Value Objects: Mastering Data Validation in Python

DDD Value Objects: Mastering Data Validation in Python

·

6 min read

Although DDD (Domain-Driven Design) is not widely adopted within the Python community, there are several resources available on how to implement this approach in the language. Unfortunately, only a few of them offer a good way of defining Value Objects that really ensure data consistency.

This article will walk through these implementation techniques, illustrating their strengths and weaknesses based on a particular example. We will also discuss practical tips on how to manage single-field Value Objects in a deft way. Finally, we will address the problem of redundant validation in Pydantic and how to avoid it.

Introduction

One of the greatest advantages of using Value Objects is that they ensure their values always adhere to business rules (invariants) by validating input data in the constructor. Therefore, when we receive an instance of a Value Object, we can be certain that its value has already been validated.

A prime example reflecting the usability of Value Objects is the Price class. When working with prices, developers usually use Decimal as a data type. Despite this approach being better than using float, there are still some drawbacks. We typically do not allow negative values or those with excessive precision like 0.0000001. It can be cumbersome if we have to additionally validate Decimal instances this way in several places in our codebase. Hence, creating a Price class with validation at the constructor level can greatly simplify and secure the code.

Another use case for Value Objects that we will focus on in the remaining part of the article is email validation. Since an email address is a single value, it is more natural to treat it like a simple data type rather than a complex structure. What we want to achieve in the next paragraphs is to validate the email address and make it lowercase.


#0 Primitive Data Type Approach

The most straightforward way of creating a Value Object is by extending the behavior of some base type. This is quite convenient, especially for single-value objects. Then we need to manually check the input data and alter it if needed.

import re

class EmailAddress(str):
    _REGEX = r"^\S+@\S+\.\S+$"

    def __new__(cls, value) -> str:
        value: str = super().__new__(cls, value).lower()

        if not re.match(cls._REGEX, value):
            raise ValueError(f"Invalid format")

        return value

In the snippet above, EmailAddress is a subclass of str. This can be problematic because it inherits all methods that are not suitable for an email address. For example, nothing prevents us from writing code like this:

email = EmailAddress("alice@mail.com")
...
email += '\nRegards!' # Invalid state

In the shown scenario, it's not a big deal, but considering the first example with prices, it may bring more serious consequences:

price = Price(1)  # Decimal-based value object
...
discount = 2
price -= discount # Invalid state

#1 Dataclass approach

In many resources, including Cosmic Python, authors recommend using dataclasses, which is understandable as it is part of the Standard Library. Here's an example:

import re
from dataclasses import dataclass
from typing import ClassVar

@dataclass(frozen=True)
class EmailAddress:
    value: str

    _REGEX: ClassVar[str] = r"^\S+@\S+\.\S+$"

    def __post_init__(self):
        if not isinstance(self.value, str):
            raise ValueError(f"Value must be a string")

        if not re.match(self._REGEX, self.value):
            raise ValueError(f"Invalid format")

        object.__setattr__(self, "value", self.value.lower())
🤬
Notice that this dataclass is frozen, so in order to make value attribute lowercase we need to use tricky object.setattr().

Now, we finally don't have to worry about unexpected methods in our EmailAddress, but as you can tell, it seems like quite a lot of code for simple email validation. From now on, we have to manually check the input type, since dataclasses do not support it. To make matters worse, our regex-based validation is still pretty leaky. Why reinvent the wheel if there are already tools created specifically for this purpose?

#2 Pydantic BaseModel Approach

Here is where Pydantic really shines. It automatically handles type casting and validation. Using the Annotated type allows adding extra logic in a concise manner.

from typing import Annotated
from pydantic import AfterValidator, BaseModel, ConfigDict, EmailStr

class EmailAddress(BaseModel):
    value: Annotated[EmailStr, AfterValidator(lambda x: x.lower())]

    model_config = ConfigDict(frozen=True)

Class definition became really neat, but because BaseModel is a complex structure, we need to pass value keyword every time we want to specify or access a source value.

Let's consider a common scenario where, in a FastAPI project, we have to use a Value Object as a field in the Request Body. The sample code may look like this:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class SignUpRequestBody(BaseModel):
    email: EmailAddress
    password: str

@app.post("/sign_up")
async def sign_up(request_body: SignUpRequestBody):
    ...

Because our Value Object is an instance of the pydantic model, all error messages will be directly propagated into the API response. The problem is that EmailAddress inherits from BaseModel, so it is treated as a nested object. In other words, the user cannot directly pass a string into the email field but instead has to send a body like:

{"email": {"value": "abc@gmail.com"}, "password": "Secret123!"}

#3 Pydantic RootModel Approach

This problem can be fixed by replacing BaseModel with the less-known RootModel.

from typing import Annotated
from pydantic import AfterValidator, ConfigDict, EmailStr, RootModel

class EmailAddress(RootModel):
    root: Annotated[EmailStr, AfterValidator(lambda x: x.lower())]

    model_config = ConfigDict(frozen=True)

RootModel expects only a single field named root. This way, wherever we use EmailAddress, it will act as a regular Pydantic field, so the expected input will look like:

{"email": "abc@gmail.com", "password": "Secret123!"}

Avoiding unecessary re-validation

Layered architecture requires repacking of data. Value Objects rely on the lowest layer, so they are excessively used throughout the codebase. Let's consider a sample pseudo-code with a sign-up endpoint implemented in FastAPI.

class SignUpRequestBody(BaseModel):
    email: EmailAddress
    password: str


class User(BaseModel):
    email: EmailAddress
    password_hash: str


@app.post("/sign_up")
async def sign_up(request_body: SignUpRequestBody):
    ...
    email = request_body.email
    password_hash = PasswordHasher().generate(request_body.password)

    with UnitOfWork() as unit_of_work:
        user = User(email=email, password_hash=password_hash)
        unit_of_work.repository.create(user)
    ...

EmailAddress is used in both SignUpRequestBody and User classes. Before the sign_up function is executed, the request body needs to be validated, which means that EmailAddress validation is also performed. When we instantiate the User class, the entire validation logic from EmailAddress is executed once again. This is a simplified example, but in reality, the amount of "repacking" can be much larger. To avoid unnecessary revalidation, we can wrap our value object with pydantic.InstanceOf.

class User(BaseModel):
    email: InstanceOf[EmailAddress]
    password_hash: str

This way, Pydantic only checks if the given value is an instance of a specific class. Of course, in the outermost model, we still need to pass the bare ValueObject class, but within all inner classes, we should use the InstanceOf wrapper.


Takeaways

In conclusion, with Pydantic, we can achieve data validation nearly as effectively as in statically typed languages. Additionally, it supports annotated metadata, which makes the code concise.

To encapsulate the insights shared, here are three key points discussed in the article:

  • Pydantic models provide automatic type validation

  • For single-field value objects RootModel is preferred over BaseModel

  • Using InstanceOf prevents unnecessary re-validation of the same Value Object