Software Consulting Services

Data structure and validation in Python with Pydantic, Marshmallow and Dataclasses

January 07, 2025

Tags: Technologies
python

 

It is common in any programming language to expect that the data received and sent have a defined structure. In Python there are multiple ways to achieve this; as shown below.

 

Dataclasses

 

Coming from any other programming language, some of the options that come to mind are to use structs or classes that fulfill this purpose.

 

In the case of Python, to facilitate this task, dataclasses were included. These are created using a decorator included in [PEP 557 - Data Classes](https://peps.python.org/pep-0557/) since Python 3.7.

 

Some of the tasks that dataclasses facilitate are the creation of an automatic __init__ that by default accepts the attributes of our class (with their default types and values) and a __repr__ for the class.

 

Example

 

```py
from dataclasses import dataclass

@dataclass
class User:
name: str
age: int

user = User(name="Esteban", age=45)
```

 

python

 

Pydantic

 

Up to this point, we can create structures with the values ​​we expect to use, but something is missing; validation. In the case of dataclasses, apart from Python annotations, we don't really have any other way to ensure that the data is of the type we expect, or with the restrictions we want.

 

[Pydantic](https://docs.pydantic.dev/2.10/) solves this, integrating directly with Python annotations. It also includes clear error messages in case the validation cannot be done correctly, some classes like PositiveInt to add other restrictions to conventional types, and functionalities like user.model_dump() to show specific fields.

 

Example

 

```py
from pydantic import BaseModel, PositiveInt, ValidationError

class User(BaseModel):
name: str
age: PositiveInt

data = {
"name": "Esteban",
"age": -45,
}

try:
user = User(**data)
except ValidationError as e:
print(e.errors())
```

 

Result

 

```
[{'type': 'greater_than', 'loc': ('age',), 'msg': 'Input should be greater than 0', 'input': -45, 'ctx': {'gt': 0}, 'url': 'https://errors.pydantic.dev/2.10/v/greater_than'}]
```

 

Marshmallow

 

When Pydantic's types and validations are not enough, it comes in [marshmallow](https://marshmallow.readthedocs.io/en/stable/) to the scene. Marshmallow replaces type annotations with its own fields with data types with validations such as URL validations, customizable functions for validation, value restrictions, error messages, among others.

 

All this, being also a library that allows serialization of other types of data to our schema with all its validations.

 

Example

 

```py
from marshmallow import Schema, ValidationError, fields, validate

class UserSchema(Schema):
name = fields. Str(required=True)
age = fields. Int(required=True, validate=validate. Range(min=0))

data = {
"name": "Esteban",
"age": -45,
}

try:
user = UserSchema(). load(data)
except ValidationError as e:
print(e.messages)
```

 

Result

 

```
{'age': ['Must be greater than or equal to 0.']}

```

 

Which one to use?

 

Each of the examples fulfills different functions, and which one to use will depend on the complexity of validating the data and the flexibility you are looking for.

 

  • Dataclasses: These are useful when validation is not important but we want to have a way to group related data within the system in a simple way.
  • Pydantic: It is very easy to use and integrates very well with existing dataclasses. For simple validations it is perfect.
  • Marshmallow: For more complex structures, greater control over restrictions, nesting schemas, and a more controlled organization of data. In addition to including direct JSON serialization.

 

We recommend you on video