4.12. Test#
This section shows how to compare between 2 Pandas DataFrame or between 2 Pandas Series
4.12.1. assert_frame equal: Test Whether Two DataFrames are Similar#
If you want to test whether two DataFrames are similar or how much they are different from each other, try pandas.testing.assert_frame_equal
.
from pandas.testing import assert_frame_equal
import pandas as pd
df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"coll": [1, 3, 4], "col2": [4, 5, 6]})
assert_frame_equal(df1, df2)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[4], line 7
5 df1 = pd.DataFrame({'coll': [1,2,3], 'col2': [4,5,6]})
6 df2 = pd.DataFrame({'coll': [1,3,4], 'col2': [4,5,6]})
----> 7 assert_frame_equal(df1, df2)
[... skipping hidden 2 frame]
File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:52, in pandas._libs.testing.assert_almost_equal()
File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:167, in pandas._libs.testing.assert_almost_equal()
File ~/book/venv/lib/python3.9/site-packages/pandas/_testing/asserters.py:679, in raise_assert_detail(obj, message, left, right, diff, index_values)
676 if diff is not None:
677 msg += f"\n[diff]: {diff}"
--> 679 raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] (column name="coll") are different
DataFrame.iloc[:, 0] (column name="coll") values are different (66.66667 %)
[index]: [0, 1, 2]
[left]: [1, 2, 3]
[right]: [1, 3, 4]
4.12.2. Ignore the Order of Index When Comparing Two DataFrames#
If you want to ignore the order of index & columns when comparing two DataFrames , use assert_frame_equal(df1, df2, check_like=True)
.
from pandas.testing import assert_frame_equal
import pandas as pd
df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"col2": [4, 5, 6], "coll": [1, 2, 3]})
assert_frame_equal(df1, df2, check_like=True)
df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"col2": [4, 5, 6], "coll": [1, 2, 3]})
assert_frame_equal(df1, df2)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[10], line 3
1 df1 = pd.DataFrame({"coll": [1, 2, 3], "col2": [4, 5, 6]})
2 df2 = pd.DataFrame({"col2": [4, 5, 6], "coll": [1, 2, 3]})
----> 3 assert_frame_equal(df1, df2)
[... skipping hidden 2 frame]
File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:52, in pandas._libs.testing.assert_almost_equal()
File ~/book/venv/lib/python3.9/site-packages/pandas/_libs/testing.pyx:167, in pandas._libs.testing.assert_almost_equal()
File ~/book/venv/lib/python3.9/site-packages/pandas/_testing/asserters.py:679, in raise_assert_detail(obj, message, left, right, diff, index_values)
676 if diff is not None:
677 msg += f"\n[diff]: {diff}"
--> 679 raise AssertionError(msg)
AssertionError: DataFrame.columns are different
DataFrame.columns values are different (100.0 %)
[left]: Index(['coll', 'col2'], dtype='object')
[right]: Index(['col2', 'coll'], dtype='object')
4.12.3. Compare the Difference Between Two DataFrames#
If you want to show and align the differences between two DataFrames, use df.compare
.
import pandas as pd
df1 = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
df2 = pd.DataFrame({"col1": [1, 3, 4], "col2": [4, 5, 6]})
df1.compare(df2)
col1 | ||
---|---|---|
self | other | |
1 | 2.0 | 3.0 |
2 | 3.0 | 4.0 |