3.7. Data Classes#

3.7.1. Data Classes vs Normal Classes#

If you want to use classes to store data, use the dataclass module. This module is available in Python 3.7+.

With dataclass, you can create a class with attributes, type hints, and a nice representation of the data in a few lines of code. To use dataclass, simply add the @dataclass decorator on top of a class.

from dataclasses import dataclass


@dataclass
class DataClassDog:
    color: str
    age: int
DataClassDog(color="black", age=9)
DataClassDog(color='black', age=9)

Without dataclass, you need to use __init__ to assign values to appropriate variables and use __repr__ to create a nice presentation of the data, which can be very cumbersome.

class Dog:
    def __init__(self, color, age):
        self.color = color
        self.age = age

    def __repr__(self):
        return f"Dog(color={self.color} age={self.age})"
Dog(color="black", age=9)
Dog(color=black age=9)

3.7.2. frozen=True: Make Your Data Classes Read-Only#

If you don’t want anybody to adjust the attributes of a class, use @dataclass(frozen=True).

from dataclasses import dataclass


@dataclass(frozen=True)
class DataClassDog:
    color: str
    age: int

Now changing the attribute color of the DataClassDog’s instance will throw an error.

pepper = DataClassDog(color="black", age=9)
pepper.color = 'golden'
---------------------------------------------------------------------------
FrozenInstanceError                       Traceback (most recent call last)
<ipython-input-2-0d6f339835b8> in <module>
      1 pepper = DataClassDog(color="black", age=9)
----> 2 pepper.color = 'golden'

<string> in __setattr__(self, name, value)

FrozenInstanceError: cannot assign to field 'color'

3.7.3. Compare Between Two Data Classes#

Normally, you need to implement the __eq__ method so that you can compare between two classes.

class Dog:
    def __init__(self, type, age):
        self.type = type
        self.age = age
    
    def __eq__(self, other):
        return (self.type == other.type 
        and self.age == other.age)

pepper = Dog(type="Dachshund", age=7)
bim = Dog(type="Dachshund", age=7)
pepper == bim
True

dataclasses automatically implements the __eq__ method for you. With dataclasses, you can compare between 2 classes by only specifying their attributes.

from dataclasses import dataclass

@dataclass
class DataClassDog:
    type: str
    age: int
pepper = DataClassDog(type="Dachshund", age=7)
bim = DataClassDog(type="Dachshund", age=7)
pepper == bim 
True

3.7.4. Post-init: Add Init Method to a Data Class#

With a data class, you don’t need an __init__ method to assign values to its attributes. However, sometimes you might want to use an ___init__ method to initialize certain attributes. That is when data class’s __post_init__ comes in handy.

In the code below, I use __post_init__ to initialize the attribute info using the attributes names and ages.

from dataclasses import dataclass
from typing import List


@dataclass
class Dog:
    names: str
    age: int


@dataclass
class Dogs:
    names: List[str]
    ages: List[int]

    def __post_init__(self):
        self.info = [Dog(name, age) for name, age in zip(self.names, self.ages)]
names = ['Bim', 'Pepper']
ages = [5, 6]
dogs = Dogs(names, ages)
dogs.info 
[Dog(names='Bim', age=5), Dog(names='Pepper', age=6)]
from dataclasses import dataclass


@dataclass
class Dog:
    names: str
    age: int


dog = Dog(names="Bim", age="ten")
if not isinstance(dog.age, int):
    raise ValueError("Dog's age must be an integer.")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 12
     10 dog = Dog(names="Bim", age="ten")
     11 if not isinstance(dog.age, int):
---> 12     raise ValueError("Dog's age must be an integer.")

ValueError: Dog's age must be an integer.

3.7.5. Python Best Practices: Using default_factory for Mutable Defaults#

When defining classes in Python, using mutable default values for instance variables can lead to unexpected behavior.

For example, if you use a list as a default value in a class’s __init__ method, all instances of the class will share the same list object:

class Book:
    def __init__(self, title, authors=[]):
        self.title = title
        self.authors = authors


book1 = Book("Book 1")
book1.authors.append("Author 1")

book2 = Book("Book 2")
print(book2.authors)
['Author 1']

In this example, book1 and book2 share the same list object, which is why modifying the list in book1 affects book2.

To avoid this issue, you can use the default_factory parameter in dataclasses, which creates a new object for each instance:

from dataclasses import dataclass, field


@dataclass
class Book:
    title: str
    authors: list = field(default_factory=list)


book1 = Book("Book 1")
book1.authors.append("Author 1")

book2 = Book("Book 2")
print(book2.authors)
[]

Now, each instance has its own separate list object, and modifying one instance’s list does not affect others.