I have two pandas dataframes, lets call them df1 and df2. df1 doesn't have df2's columns, I'm trying to set the values of df2 in df1 in a specific row, (df2 is generated when I loop over df1 so this is why I have the row index)
Basically trying to do something like this:
df1 = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
df2 = pd.DataFrame({"col3": [42], "col4": [83]})
row_index = 1
df1[df2.columns][row_index] = df2
Expected result is:
col1 col2 col3 col4
0 1 3 NaN NaN
1 2 4 42 83
I tried all of the following and nothing is working:
df1 = pd.concat([df1, df2], axis=1)
df1 = pd.concat([df1.iloc[row_index], df2], axis=1)
df1[df2.columns] = df2
df1[df2.columns].iloc[row_index] = df2
use loc to assign new values:
row_index = [1]
df1.loc[row_index, df2.columns] = df2.values
print(df1)
col1 col2 col3 col4
0 1 3 NaN NaN
1 2 4 42.0 83.0
Related
I have two dataframes
df1 = pd.DataFrame({'col1': [1,2,3], 'col2': [4,5,6]})
df2 = pd.DataFrame({'col3': [1,5,3]})
and would like to left merge df1 to df2. I don't have a fixed merge column in df1 though. I would like to merge on col1 if the cell value of col1 exists in df2.col3 and on col2 if the cell value of col2 exists in df2.col3. So in the above example merge on col1, col2 and then col1. (This is just an example, I actually have more than only two columns).
I could do this but I'm not sure if it's ok.
df1 = df1.assign(merge_col = np.where(df1.col1.isin(df2.col3), df1.col1, df1.col2))
df1.merge(df2, left_on='merge_col', right_on='col3', how='left')
Are there any better ways to solve it?
Perform the merges in the preferred order, and use combine_first to combine the merges:
(df1.merge(df2, left_on='col1', right_on='col3', how='left')
.combine_first(df1.merge(df2, left_on='col2', right_on='col3', how='left')
)
)
For a generic method with many columns:
cols = ['col1', 'col2']
from functools import reduce
out = reduce(
lambda a,b: a.combine_first(b),
[df1.merge(df2, left_on=col, right_on='col3', how='left')
for col in cols]
)
Output:
col1 col2 col3
0 1 4 1.0
1 2 5 5.0
2 3 6 3.0
Better example:
Adding another column to df2 to illustrate the merge:
df2 = pd.DataFrame({'col3': [1,5,3], 'new': ['A', 'B', 'C']})
Output:
col1 col2 col3 new
0 1 4 1.0 A
1 2 5 5.0 B
2 3 6 3.0 C
I think your solution is possible modify with get merged Series with compare all columns from list and then merge with this Series:
Explanation of s: Compare all columns by DataFrame.isin, create missing values if no match by DataFrame.where and for priority marge back filling missing values with select first column by position:
cols = ['col1', 'col2']
s = df1[cols].where(df1[cols].isin(df2.col3)).bfill(axis=1).iloc[:, 0]
print (s)
0 1.0
1 5.0
2 3.0
Name: col1, dtype: float64
df = df1.merge(df2, left_on=s, right_on='col3', how='left')
print (df)
col1 col2 col3
0 1 4 1
1 2 5 5
2 3 6 3
Your solution with helper column:
cols = ['col1', 'col2']
df1 = (df1.assign(merge_col = = df1[cols].where(df1[cols].isin(df2.col3))
.bfill(axis=1).iloc[:, 0]))
df = df1.merge(df2, left_on='merge_col', right_on='col3', how='left')
print (df)
col1 col2 merge_col col3
0 1 4 1.0 1
1 2 5 5.0 5
2 3 6 3.0 3
Explanation of s: Compare all columns by DataFrame.isin, create missing values if no match by DataFrame.where and for priority marge back filling missing values with select first column by position:
print (df1[cols].isin(df2.col3))
col1 col2
0 True False
1 False True
2 True False
print (df1[cols].where(df1[cols].isin(df2.col3)))
col1 col2
0 1.0 NaN
1 NaN 5.0
2 3.0 NaN
print (df1[cols].where(df1[cols].isin(df2.col3)).bfill(axis=1))
col1 col2
0 1.0 NaN
1 5.0 5.0
2 3.0 NaN
print (df1[cols].where(df1[cols].isin(df2.col3)).bfill(axis=1).iloc[:, 0])
0 1.0
1 5.0
2 3.0
Name: col1, dtype: float64
How to concat without column names?
>> df = pd.DataFrame({'col1': [1], 'col2': [4]})
>> df1 = pd.DataFrame([[5,5]])
>> pd.concat([df, df1])
col1 col2 0 1
0 1.0 4.0 NaN NaN
0 NaN NaN 5.0 5.0
Also the types changed into float64 from int64 if you see closely.
Expected
col1 col2
0 1 4
0 5 5
Create same columns names in both DataFrames by DataFrame.set_axis:
df1 = df1.set_axis(df.columns, axis=1)
Or assign columns names:
df1.columns = df.columns
#alternative - got 0,1 columns
#df.columns = df1.columns
Last use concat:
out = pd.concat([df, df1], ignore_index=True)
Temporarily set_axis with that of df on df1:
pd.concat([df, df1.set_axis(df.columns, axis=1)], ignore_index=True)
NB. append is being deprecated, don't use it.
output:
col1 col2
0 1 4
1 5 5
I have 2 dataframes(df1 and df2) and I want to append them as follows :
df1 and df2 have some columns in common but I want to append the columns that exist in df2 and not in df1 but keep the columns of df1 as they are
df2 is empty (all rows are nan)
I could just add columns in df1 but in the future, df2 could have new cols added that is why I do not want to hardcode the column names but rather be done automatically. I used to use append but I get the following message
df_new = df1.append(df2)
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead
I tried the following
df_new = pd.concat([df1, df2], axis=1)
but it concatenates all the columns of both dataframes
According to https://pandas.pydata.org/docs/reference/api/pandas.concat.html
join{‘inner’, ‘outer’}, default ‘outer’
How to handle indexes on other axis (or axes).
INNER
df = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], columns=['letter', 'number', 'animal'])
df2 = pd.DataFrame([[None,None,None,None],[None,None,None,None]], columns=['letter', 'number', 'animal', 'newcol'])
print(pd.concat([df,df2], join='inner').dropna(how='all'))
output:
letter number animal
0 c 3 cat
1 d 4 dog
OUTER
df = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], columns=['letter', 'number', 'animal'])
df2 = pd.DataFrame([[None,None,None,None],[None,None,None,None]], columns=['letter', 'number', 'animal', 'newcol'])
print(pd.concat([df,df2], join='outer').dropna(how='all'))
output:
letter number animal newcol
0 c 3 cat NaN
1 d 4 dog NaN
You could use pd.concat() with axis=0 (default) and join='outer' (default). I'm illustrating with some examples
df1 = pd.DataFrame({'col1': [3,3,3],
'col2': [4,4,4]})
df2 = pd.DataFrame({'col1': [1,2,3],
'col2': [1,2,3],
'col3': [1,2,3],
'col4': [1,2,3]})
print(df1)
col1 col2
0 3 4
1 3 4
2 3 4
print(df2)
col1 col2 col3 col4
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
df3 = pd.concat([df1, df2], axis=0, join='outer')
print(df3)
col1 col2 col3 col4
0 3 4 NaN NaN
1 3 4 NaN NaN
2 3 4 NaN NaN
0 1 1 1.0 1.0
1 2 2 2.0 2.0
2 3 3 3.0 3.0
To concatenate just the columns from df2 that are not present in df1:
pd.concat([df1, df2.loc[:, [c for c in df2.columns if c not in df1.columns]]], axis=1)
I have a pandas dataframe.
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
col1 col2
0 1 3
1 2 4
I want to add the list lst=[10, 20] element-wise to 'col1' to have the following dataframe.
col1 col2
0 11 3
1 22 4
How to do that?
If you want to edit the column in-place you could do,
df['col1'] += lst
after which df will be,
col1 col2
0 11 3
1 22 4
Similarly, other types of mathematical operations are possible, such as,
df['col1'] *= lst
df['col1'] /= lst
If you want to create a new dataframe after addition,
df1 = df.copy()
df1['col1'] = df['col1'].add(lst, axis=0) # df['col1'].add(lst) outputs a series, df['col1']+lst also works
Now df1 is;
col1 col2
0 11 3
1 22 4
Given the following dataframe:
import pandas as pd
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'A','A']})
df
COL1 COL2
0 A NaN
1 NaN A
2 A A
I would like to create a column ('COL3') that uses the value from COL1 per row unless that value is null (or NaN). If the value is null (or NaN), I'd like for it to use the value from COL2.
The desired result is:
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A
Thanks in advance!
In [8]: df
Out[8]:
COL1 COL2
0 A NaN
1 NaN B
2 A B
In [9]: df["COL3"] = df["COL1"].fillna(df["COL2"])
In [10]: df
Out[10]:
COL1 COL2 COL3
0 A NaN A
1 NaN B B
2 A B A
You can use np.where to conditionally set column values.
df = df.assign(COL3=np.where(df.COL1.isnull(), df.COL2, df.COL1))
>>> df
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A
If you don't mind mutating the values in COL2, you can update them directly to get your desired result.
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'B','B']})
>>> df
COL1 COL2
0 A NaN
1 NaN B
2 A B
df.COL2.update(df.COL1)
>>> df
COL1 COL2
0 A A
1 NaN B
2 A A
Using .combine_first, which gives precedence to non-null values in the Series or DataFrame calling it:
import pandas as pd
import numpy as np
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'B','B']})
df['COL3'] = df.COL1.combine_first(df.COL2)
Output:
COL1 COL2 COL3
0 A NaN A
1 NaN B B
2 A B A
If we mod your df slightly then you will see that this works and in fact will work for any number of columns so long as there is a single valid value:
In [5]:
df = pd.DataFrame({'COL1': ['B', np.nan,'B'],
'COL2' : [np.nan,'A','A']})
df
Out[5]:
COL1 COL2
0 B NaN
1 NaN A
2 B A
In [6]:
df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[6]:
0 B
1 A
2 B
dtype: object
first_valid_index will return the index value (in this case column) that contains the first non-NaN value:
In [7]:
df.apply(lambda x: x.first_valid_index(), axis=1)
Out[7]:
0 COL1
1 COL2
2 COL1
dtype: object
So we can use this to index into the series
You can also use mask which replaces the values where COL1 is NaN by column COL2:
In [8]: df.assign(COL3=df['COL1'].mask(df['COL1'].isna(), df['COL2']))
Out[8]:
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A