4.6. Combine Multiple DataFrames#
4.6.1. pandas.DataFrame.combine_first: Update Null Elements Using Another DataFrame#
If you want to fill null values in one DataFrame with non-null values at the same locations from another DataFrame, use pandas.DataFrame.combine_first
.
import pandas as pd
store1 = pd.DataFrame({"orange": [None, 5], "apple": [None, 1]})
store2 = pd.DataFrame({"orange": [1, 3], "apple": [4, 2]})
print(store1)
orange apple
0 NaN NaN
1 5.0 1.0
print(store2)
orange apple
0 1 4
1 3 2
In the code below, the values at the first row of store1
are updated with the values at the first row of store2
.
print(store1.combine_first(store2))
orange apple
0 1.0 4.0
1 5.0 1.0
4.6.2. df.merge: Merge DataFrame Based on Columns#
If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge
.
If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge
.
If you want to merge on DataFrame with another DataFrame based on the similarity between 2 columns, use df.merge
.
import pandas as pd
df1 = pd.DataFrame({"key1": ["a", "a", "b", "b", "a"], "value": [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({"key2": ["a", "b", "c"], "value": [6, 7, 8]})
df1
key1 | value | |
---|---|---|
0 | a | 1 |
1 | a | 2 |
2 | b | 3 |
3 | b | 4 |
4 | a | 5 |
df2
key2 | value | |
---|---|---|
0 | a | 6 |
1 | b | 7 |
2 | c | 8 |
df1.merge(df2, left_on="key1", right_on="key2")
key1 | value_x | key2 | value_y | |
---|---|---|---|---|
0 | a | 1 | a | 6 |
1 | a | 2 | a | 6 |
2 | a | 5 | a | 6 |
3 | b | 3 | b | 7 |
4 | b | 4 | b | 7 |
4.6.3. Enhancing Readability in DataFrame Merging With Custom Suffixes#
When merging two DataFrames with overlapping columns, the default behavior is to add suffixes “_x” and “_y” to the column names.
import pandas as pd
df1 = pd.DataFrame({"id": [1, 2, 3], "val": ["A", "B", "C"]})
df2 = pd.DataFrame({"id": [2, 3, 4], "val": ["X", "Y", "Z"]})
merged_df = pd.merge(df1, df2, on="id")
print(merged_df)
id val_x val_y
0 2 B X
1 3 C Y
To improve readability, you can specify custom suffixes.
merged_df = pd.merge(df1, df2, on="id", suffixes=("_df1", "_df2"))
print(merged_df)
id val_df1 val_df2
0 2 B X
1 3 C Y
4.6.4. Include All Rows When Merging Two DataFrames#
df.merge
only includes rows with matching values in both DataFrames. If you want to include all rows from both DataFrames, use how='outer'
.
import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B'], 'v1': [1, 2]})
df2 = pd.DataFrame({'key': ['B', 'C'], 'v2': [3, 4]})
# Keep only rows with matching keys
pd.merge(df1, df2, on='key')
key | v1 | v2 | |
---|---|---|---|
0 | B | 2 | 3 |
# Keep all rows
pd.merge(df1, df2, on='key', how='outer')
key | v1 | v2 | |
---|---|---|---|
0 | A | 1.0 | NaN |
1 | B | 2.0 | 3.0 |
2 | C | NaN | 4.0 |
4.6.5. DataFrame.insert: Insert a Column Into a DataFrame at a Specified Location#
If you want to insert a column into a DataFrame at a specified location, use df.insert
. In the code below, I insert column c at location 0.
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.insert(loc=0, column='c', value=[5, 6])
df
c | a | b | |
---|---|---|---|
0 | 5 | 1 | 3 |
1 | 6 | 2 | 4 |