3.7. Data Classes#

3.7.1. Data Classes vs Normal Classes#

If you want to use classes to store data, use the dataclass module. This module is available in Python 3.7+.

With dataclass, you can create a class with attributes, type hints, and a nice representation of the data in a few lines of code. To use dataclass, simply add the @dataclass decorator on top of a class.

from dataclasses import dataclass


@dataclass
class DataClassDog:
    color: str
    age: int
DataClassDog(color="black", age=9)
DataClassDog(color='black', age=9)

Without dataclass, you need to use __init__ to assign values to appropriate variables and use __repr__ to create a nice presentation of the data, which can be very cumbersome.

class Dog:
    def __init__(self, color, age):
        self.color = color
        self.age = age

    def __repr__(self):
        return f"Dog(color={self.color} age={self.age})"
Dog(color="black", age=9)
Dog(color=black age=9)

3.7.2. frozen=True: Make Your Data Classes Read-Only#

If you don’t want anybody to adjust the attributes of a class, use @dataclass(frozen=True).

from dataclasses import dataclass


@dataclass(frozen=True)
class DataClassDog:
    color: str
    age: int

Now changing the attribute color of the DataClassDog’s instance will throw an error.

pepper = DataClassDog(color="black", age=9)
pepper.color = 'golden'
---------------------------------------------------------------------------
FrozenInstanceError                       Traceback (most recent call last)
<ipython-input-2-0d6f339835b8> in <module>
      1 pepper = DataClassDog(color="black", age=9)
----> 2 pepper.color = 'golden'

<string> in __setattr__(self, name, value)

FrozenInstanceError: cannot assign to field 'color'

3.7.3. Compare Between Two Data Classes#

Normally, you need to implement the __eq__ method so that you can compare between two classes.

class Dog:
    def __init__(self, type, age):
        self.type = type
        self.age = age
    
    def __eq__(self, other):
        return (self.type == other.type 
        and self.age == other.age)

pepper = Dog(type="Dachshund", age=7)
bim = Dog(type="Dachshund", age=7)
pepper == bim
True

dataclasses automatically implements the __eq__ method for you. With dataclasses, you can compare between 2 classes by only specifying their attributes.

from dataclasses import dataclass

@dataclass
class DataClassDog:
    type: str
    age: int
pepper = DataClassDog(type="Dachshund", age=7)
bim = DataClassDog(type="Dachshund", age=7)
pepper == bim 
True

3.7.4. Post-init: Add Init Method to a Data Class#

With a data class, you don’t need an __init__ method to assign values to its attributes. However, sometimes you might want to use an ___init__ method to initialize certain attributes. That is when data class’s __post_init__ comes in handy.

In the code below, I use __post_init__ to initialize the attribute info using the attributes names and ages.

from dataclasses import dataclass
from typing import List


@dataclass
class Dog:
    names: str
    age: int


@dataclass
class Dogs:
    names: List[str]
    ages: List[int]

    def __post_init__(self):
        self.info = [Dog(name, age) for name, age in zip(self.names, self.ages)]
names = ['Bim', 'Pepper']
ages = [5, 6]
dogs = Dogs(names, ages)
dogs.info 
[Dog(names='Bim', age=5), Dog(names='Pepper', age=6)]
from dataclasses import dataclass


@dataclass
class Dog:
    names: str
    age: int


dog = Dog(names="Bim", age="ten")
if not isinstance(dog.age, int):
    raise ValueError("Dog's age must be an integer.")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 12
     10 dog = Dog(names="Bim", age="ten")
     11 if not isinstance(dog.age, int):
---> 12     raise ValueError("Dog's age must be an integer.")

ValueError: Dog's age must be an integer.

3.7.5. Simplify Data Validation with Pydantic#

Dataclasses require manual implementation of validation.

On the other hand, Pydantic offers built-in validation that automatically validates data and provides informative error messages. This makes Pydantic particularly useful when working with data from external sources.

from pydantic import BaseModel


class Dog(BaseModel):
    names: str
    age: int


dog = Dog(names="Bim", age="ten")
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[3], line 9
      5     names: str
      6     age: int
----> 9 dog = Dog(names="Bim", age="ten")

File ~/book/venv/lib/python3.11/site-packages/pydantic/main.py:164, in BaseModel.__init__(__pydantic_self__, **data)
    162 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    163 __tracebackhide__ = True
--> 164 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)

ValidationError: 1 validation error for Dog
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='ten', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/int_parsing