4.12. Test#

This section shows how to compare between 2 Pandas DataFrame or between 2 Pandas Series

4.12.1. assert_frame equal: Test Whether Two DataFrames are Similar#

If you want to test whether two DataFrames are similar or how much they are different from each other, try pandas.testing.assert_frame_equal.

from pandas.testing import assert_frame_equal
import pandas as pd


df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"coll": [1, 3, 4], "col2": [4, 5, 6]})
assert_frame_equal(df1, df2)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[4], line 7
      5 df1 = pd.DataFrame({'coll': [1,2,3], 'col2': [4,5,6]})
      6 df2 = pd.DataFrame({'coll': [1,3,4], 'col2': [4,5,6]})
----> 7 assert_frame_equal(df1, df2)

    [... skipping hidden 2 frame]

File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:52, in pandas._libs.testing.assert_almost_equal()

File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:167, in pandas._libs.testing.assert_almost_equal()

File ~/book/venv/lib/python3.9/site-packages/pandas/_testing/asserters.py:679, in raise_assert_detail(obj, message, left, right, diff, index_values)
    676 if diff is not None:
    677     msg += f"\n[diff]: {diff}"
--> 679 raise AssertionError(msg)

AssertionError: DataFrame.iloc[:, 0] (column name="coll") are different

DataFrame.iloc[:, 0] (column name="coll") values are different (66.66667 %)
[index]: [0, 1, 2]
[left]:  [1, 2, 3]
[right]: [1, 3, 4]

4.12.2. Ignore the Order of Index When Comparing Two DataFrames#

If you want to ignore the order of index & columns when comparing two DataFrames , use assert_frame_equal(df1, df2, check_like=True).

from pandas.testing import assert_frame_equal
import pandas as pd


df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"col2": [4, 5, 6], "coll": [1, 2, 3]})
assert_frame_equal(df1, df2, check_like=True)
df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"col2": [4, 5, 6], "coll": [1, 2, 3]})
assert_frame_equal(df1, df2)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[10], line 3
      1 df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
      2 df2 = pd.DataFrame({"col2": [4, 5, 6], "coll": [1, 2, 3]})
----> 3 assert_frame_equal(df1, df2)

    [... skipping hidden 2 frame]

File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:52, in pandas._libs.testing.assert_almost_equal()

File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:167, in pandas._libs.testing.assert_almost_equal()

File ~/book/venv/lib/python3.9/site-packages/pandas/_testing/asserters.py:679, in raise_assert_detail(obj, message, left, right, diff, index_values)
    676 if diff is not None:
    677     msg += f"\n[diff]: {diff}"
--> 679 raise AssertionError(msg)

AssertionError: DataFrame.columns are different

DataFrame.columns values are different (100.0 %)
[left]:  Index(['coll', 'col2'], dtype='object')
[right]: Index(['col2', 'coll'], dtype='object')

4.12.3. Compare the Difference Between Two DataFrames#

If you want to show and align the differences between two DataFrames, use df.compare.

import pandas as pd

df1 = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"col1": [1, 3, 4], "col2": [4, 5, 6]})

df1.compare(df2)
col1
self other
1 2.0 3.0
2 3.0 4.0