4.6. Combine Multiple DataFrames#
4.6.1. pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame#
If you want to fill null values in one DataFrame with non-null values at the same locations from another DataFrame, use pandas.DataFrame.combine_first
.
import pandas as pd
store1 = pd.DataFrame({"orange": [None, 5], "apple": [None, 1]})
store2 = pd.DataFrame({"orange": [1, 3], "apple": [4, 2]})
print(store1)
orange apple
0 NaN NaN
1 5.0 1.0
print(store2)
orange apple
0 1 4
1 3 2
In the code below, the values at the first row of store1
are updated with the values at the first row of store2
.
print(store1.combine_first(store2))
orange apple
0 1.0 4.0
1 5.0 1.0
4.6.2. df.merge: Merge DataFrame Based on Columns#
If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge
.
If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge
.
If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge
.
import pandas as pd
df1 = pd.DataFrame({"key1": ["a", "a", "b", "b", "a"], "value": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({"key2": ["a", "b", "c"], "value": [6, 7, 8]})
df1
key1 | value | |
---|---|---|
0 | a | 1 |
1 | a | 2 |
2 | b | 3 |
3 | b | 4 |
4 | a | 5 |
df2
key2 | value | |
---|---|---|
0 | a | 6 |
1 | b | 7 |
2 | c | 8 |
df1.merge(df2, left_on="key1", right_on="key2")
key1 | value_x | key2 | value_y | |
---|---|---|---|---|
0 | a | 1 | a | 6 |
1 | a | 2 | a | 6 |
2 | a | 5 | a | 6 |
3 | b | 3 | b | 7 |
4 | b | 4 | b | 7 |
4.6.3. Specify Suffixes When Using df.merge()#
If you are merging 2 dataframes that have the same features using df.merge()
, it might be confusing to know which dataframe a_x
or a_y
belongs to.
import pandas as pd
df1 = pd.DataFrame({"left_key": [1, 2, 3], "a": [4, 5, 6]})
df2 = pd.DataFrame({"right_key": [1, 2, 3], "a": [5, 6, 7]})
df1.merge(df2, left_on="left_key", right_on="right_key")
left_key | a_x | right_key | a_y | |
---|---|---|---|---|
0 | 1 | 4 | 1 | 5 |
1 | 2 | 5 | 2 | 6 |
2 | 3 | 6 | 3 | 7 |
A better way is to specify suffixes of the features in each Dataframe like below. Now a_x
becomes a_left
and a_y
becomes a_right
.
df1.merge(df2, left_on="left_key", right_on="right_key", suffixes=("_left", "_right"))
left_key | a_left | right_key | a_right | |
---|---|---|---|---|
0 | 1 | 4 | 1 | 5 |
1 | 2 | 5 | 2 | 6 |
2 | 3 | 6 | 3 | 7 |
Try it if you want the names of your columns to be less confusing.
4.6.4. Include All Rows When Merging Two DataFrames#
df.merge
only includes rows with matching values in both DataFrames. If you want to include all rows from both DataFrames, use how='outer'
.
import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B'], 'v1': [1, 2]})
df2 = pd.DataFrame({'key': ['B', 'C'], 'v2': [3, 4]})
# Keep only rows with matching keys
pd.merge(df1, df2, on='key')
key | v1 | v2 | |
---|---|---|---|
0 | B | 2 | 3 |
# Keep all rows
pd.merge(df1, df2, on='key', how='outer')
key | v1 | v2 | |
---|---|---|---|
0 | A | 1.0 | NaN |
1 | B | 2.0 | 3.0 |
2 | C | NaN | 4.0 |
4.6.5. DataFrame.insert: Insert a Column Into a DataFrame at a Specified Location#
If you want to insert a column into a DataFrame at a specified location, use df.insert
. In the code below, I insert column c at location 0.
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.insert(loc=0, column='c', value=[5, 6])
df
c | a | b | |
---|---|---|---|
0 | 5 | 1 | 3 |
1 | 6 | 2 | 4 |