4.6. Combine Multiple DataFrames#

4.6.1. pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame#

If you want to fill null values in one DataFrame with non-null values at the same locations from another DataFrame, use pandas.DataFrame.combine_first.

import pandas as pd

store1 = pd.DataFrame({"orange": [None, 5], "apple": [None, 1]})
store2 = pd.DataFrame({"orange": [1, 3], "apple": [4, 2]})
print(store1)
   orange  apple
0     NaN    NaN
1     5.0    1.0
print(store2)
   orange  apple
0       1      4
1       3      2

In the code below, the values at the first row of store1 are updated with the values at the first row of store2.

print(store1.combine_first(store2))
   orange  apple
0     1.0    4.0
1     5.0    1.0

4.6.2. df.merge: Merge DataFrame Based on Columns#

If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge.

If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge.

If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge.

import pandas as pd

df1 = pd.DataFrame({"key1": ["a", "a", "b", "b", "a"], "value": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({"key2": ["a", "b", "c"], "value": [6, 7, 8]})

df1 
key1 value
0 a 1
1 a 2
2 b 3
3 b 4
4 a 5
df2 
key2 value
0 a 6
1 b 7
2 c 8
df1.merge(df2, left_on="key1", right_on="key2")
key1 value_x key2 value_y
0 a 1 a 6
1 a 2 a 6
2 a 5 a 6
3 b 3 b 7
4 b 4 b 7

4.6.3. Specify Suffixes When Using df.merge()#

If you are merging 2 dataframes that have the same features using df.merge(), it might be confusing to know which dataframe a_x or a_y belongs to.

import pandas as pd 

df1 = pd.DataFrame({"left_key": [1, 2, 3], "a": [4, 5, 6]})
df2 = pd.DataFrame({"right_key": [1, 2, 3], "a": [5, 6, 7]})
df1.merge(df2, left_on="left_key", right_on="right_key")
left_key a_x right_key a_y
0 1 4 1 5
1 2 5 2 6
2 3 6 3 7

A better way is to specify suffixes of the features in each Dataframe like below. Now a_x becomes a_left and a_y becomes a_right.

df1.merge(df2, left_on="left_key", right_on="right_key", suffixes=("_left", "_right"))
left_key a_left right_key a_right
0 1 4 1 5
1 2 5 2 6
2 3 6 3 7

Try it if you want the names of your columns to be less confusing.

4.6.4. Include All Rows When Merging Two DataFrames#

df.merge only includes rows with matching values in both DataFrames. If you want to include all rows from both DataFrames, use how='outer'.

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B'], 'v1': [1, 2]})
df2 = pd.DataFrame({'key': ['B', 'C'], 'v2': [3, 4]})

# Keep only rows with matching keys
pd.merge(df1, df2, on='key')
key v1 v2
0 B 2 3
# Keep all rows
pd.merge(df1, df2, on='key', how='outer')
key v1 v2
0 A 1.0 NaN
1 B 2.0 3.0
2 C NaN 4.0

4.6.5. DataFrame.insert: Insert a Column Into a DataFrame at a Specified Location#

If you want to insert a column into a DataFrame at a specified location, use df.insert. In the code below, I insert column c at location 0.

import pandas as pd 

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.insert(loc=0, column='c', value=[5, 6])
df 
c a b
0 5 1 3
1 6 2 4