4.6. Combine Multiple DataFrames#

4.6.1. pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame#

If you want to fill null values in one DataFrame with non-null values at the same locations from another DataFrame, use pandas.DataFrame.combine_first.

import pandas as pd

store1 = pd.DataFrame({"orange": [None, 5], "apple": [None, 1]})
store2 = pd.DataFrame({"orange": [1, 3], "apple": [4, 2]})
print(store1)
   orange  apple
0     NaN    NaN
1     5.0    1.0
print(store2)
   orange  apple
0       1      4
1       3      2

In the code below, the values at the first row of store1 are updated with the values at the first row of store2.

print(store1.combine_first(store2))
   orange  apple
0     1.0    4.0
1     5.0    1.0

4.6.2. df.merge: Merge DataFrame Based on Columns#

If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge.

If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge.

If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge.

import pandas as pd

df1 = pd.DataFrame({"key1": ["a", "a", "b", "b", "a"], "value": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({"key2": ["a", "b", "c"], "value": [6, 7, 8]})

df1 
key1 value
0 a 1
1 a 2
2 b 3
3 b 4
4 a 5
df2 
key2 value
0 a 6
1 b 7
2 c 8
df1.merge(df2, left_on="key1", right_on="key2")
key1 value_x key2 value_y
0 a 1 a 6
1 a 2 a 6
2 a 5 a 6
3 b 3 b 7
4 b 4 b 7

4.6.3. Enhancing Readability in DataFrame Merging With Custom Suffixes#

When merging two DataFrames with overlapping columns, the default behavior is to add suffixes “_x” and “_y” to the column names.

import pandas as pd

df1 = pd.DataFrame({"id": [1, 2, 3], "val": ["A", "B", "C"]})
df2 = pd.DataFrame({"id": [2, 3, 4], "val": ["X", "Y", "Z"]})

merged_df = pd.merge(df1, df2, on="id")

print(merged_df)
   id val_x val_y
0   2     B     X
1   3     C     Y

To improve readability, you can specify custom suffixes.

merged_df = pd.merge(df1, df2, on="id", suffixes=("_df1", "_df2"))
print(merged_df)
   id val_df1 val_df2
0   2       B       X
1   3       C       Y

4.6.4. Include All Rows When Merging Two DataFrames#

df.merge only includes rows with matching values in both DataFrames. If you want to include all rows from both DataFrames, use how='outer'.

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B'], 'v1': [1, 2]})
df2 = pd.DataFrame({'key': ['B', 'C'], 'v2': [3, 4]})

# Keep only rows with matching keys
pd.merge(df1, df2, on='key')
key v1 v2
0 B 2 3
# Keep all rows
pd.merge(df1, df2, on='key', how='outer')
key v1 v2
0 A 1.0 NaN
1 B 2.0 3.0
2 C NaN 4.0

4.6.5. DataFrame.insert: Insert a Column Into a DataFrame at a Specified Location#

If you want to insert a column into a DataFrame at a specified location, use df.insert. In the code below, I insert column c at location 0.

import pandas as pd 

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.insert(loc=0, column='c', value=[5, 6])
df 
c a b
0 5 1 3
1 6 2 4