I have a dataframe with two columns: a and b
df
a b
0 john 123
1 john
2 mark
3 mark 456
4 marcus 789
I want to update values of b column based on a column.
a b
0 john 123
1 john 123
2 mark 456
3 mark 456
4 marcus 789
If john has value 123 in b. Remaining john also must have same value.
Assuming your dataframe is:
df = pd.DataFrame({'a': ['john', 'john', 'mark', 'mark', 'marcus'], 'b': [123, '', '', 456, 789]})
You can df.groupby the dataframe on column a and then apply transform on the column b of the grouped dataframe returning the first non empty value in the grouped column b.
Use:
df['b'] = (
df.groupby('a')['b']
.transform(lambda s: s[s.ne('')].iloc[0] if s.ne('').any() else s)
)
Result:
# print(df)
a b
0 john 123
1 john 123
2 mark 456
3 mark 456
4 marcus 789
Example:
df = pd.DataFrame({'A': [0," ", 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['a', 'b', 'c', 'd', 'e']})
df1=df.replace({'A':" "},3)
Hope this helps, In your case it would be like
df1=df.replace({'b':" "},123)
Related
There are 2 Dataframe, df1 & df2. e.g.
df1 = pd.DataFrame({'index': [1, 2, 3, 4],
'col1': ['12abc12', '12abcbla', 'abc', 'jh']})
df2 = pd.DataFrame({'col2': ['abc', 'efj']})
what i want looks like this (find all the rows which contains the character from df2, and tag them out)
index col1 col2
0 1 12abc12 abc
1 2 12abcbla abc
2 3 abc abc
3 4 jh
I've found a similar question but not exactly what i want. Thx for any ideas in advance.
Use Series.str.extract if need first matched value:
df1['new'] = df1['col1'].str.extract(f'({"|".join(df2["col2"])})', expand=False).fillna('')
print (df1)
index col1 new
0 1 12abc12 abc
1 2 12abcbla abc
2 3 abc abc
3 4 jh
If need all matched values use Series.str.findall and Series.str.join:
df1 = pd.DataFrame({'index': [1, 2, 3, 4],
'col1': ['12abc1defj2', '12abcbla', 'abc', 'jh']})
df2 = pd.DataFrame({'col2': ['abc', 'efj']})
df1['new'] = df1['col1'].str.findall("|".join(df2["col2"])).str.join(',')
print (df1)
index col1 new
0 1 12abc1defj2 abc,efj
1 2 12abcbla abc
2 3 abc abc
3 4 jh
How to compare and merge two dataframes based on common columns that have different dictionaries?
I have the following two dataframes,
df1 = pd.DataFrame({'name':['tom','keith','sam','joe'],'assets':[{'laptop':1,'scanner':2},{'laptop':1,'printer':3}, {'car':12,'keys':34},{'power-cables':24}]})
df2 = pd.DataFrame({'place':['ca','bal-vm'],'default_assets':[{'laptop':4,'printer':3,'scanner':2,'bag':8},{'car':12,'keys':34,'mat':24,'holder':45}]})
df1:
name assets
0 tom {'laptop':1,'scanner':2}
1 keith {'laptop':1,'printer':3}
2 sam {'car':12,'keys':34}
3 joe {'power-cables':24}
df2:
place default_assets
0 ca {'laptop':4,'printer':3,'scanner':2,'bag':8}
1 bal-vm {'car':12,'keys':34,'mat':24,'holder':45}
df2 is supposed to be merged with df1 when all the keys of df1.assets are in df2.default_assets, else None should be filled.
So the resultant df should be,
df:
name place assets default_assets
0 tom ca {'laptop':1,'scanner':2} {'laptop':4,'printer':3,'scanner':2,'bag':8}
1 keith ca {'laptop':1,'printer':3} {'laptop':4,'printer':3,'scanner':2,'bag':8}
2 sam bal-vm {'car':12,'keys':34} {'car':12,'keys':34,'mat':24,'holder':45}
3 joe None {'power-cables':24} None
You could do the following:
A cross join (cross product) of every row of df1 with df2
Then filter out the rows where all the keys of df1.assets are not in df2.default_assets.
Add the filtered out rows from df1, with pandas.concat.
For example:
# cross join
merged = df1.assign(key=1).merge(df2.assign(key=1), on='key').drop('key', axis=1)
# mask to filter
mask = [asset.keys() < default.keys() for asset, default in zip(merged['assets'], merged['default_assets'])]
# add those not in the mask
result = pd.concat([merged.loc[mask], df1], sort=True).drop_duplicates('name')
# print in full
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(result)
Output
assets \
0 {'laptop': 1, 'scanner': 2}
2 {'laptop': 1, 'printer': 3}
5 {'car': 12, 'keys': 34}
3 {'power-cables': 24}
default_assets name place
0 {'laptop': 4, 'printer': 3, 'scanner': 2, 'bag... tom ca
2 {'laptop': 4, 'printer': 3, 'scanner': 2, 'bag... keith ca
5 {'car': 12, 'keys': 34, 'mat': 24, 'holder': 45} sam bal-vm
3 NaN joe NaN
I have the following dataframe df1:
import pandas as pd
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Lisa', 'Molly', 'Lisa', 'Molly', 'Fred'],
'gender': ['m', 'f', 'f', 'm', 'f', 'f', 'f', 'f','f', 'm'],
}
df1 = pd.DataFrame(data, index = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
I want to create a table with some standard and some custom summary statistics df2.
df2 = df1.describe()
df2.rename(index={'top':'mode'},inplace=True)
df2.rename(index={'freq':'mode freq'},inplace=True)
df2
df2:
gender name
count 10 10
unique 2 7
mode f Molly
mode freq 7 3
I want to append one row to df2 for the second mode and one for the frequency of the second mode:
Example:
gender name
count 10 10
unique 2 7
mode f Molly
mode freq 7 3
2nd mode m Lisa
2nd freq 3 2
I figured out that you can get the second mode & frequency by doing this:
my_series
for column in df1:
my_series=df1[column].value_counts()[1:2]
print(my_series)
But how do I append this to df2?
You can do apply with value_counts, then we need modify your dataframe shape .
df1.apply(lambda x : x.value_counts().iloc[[1]]).stack().reset_index(level=0).T
Out[172]:
name gender
level_0 Lisa m
0 2 3
The final out put (Change the index name using what you show to us rename)
pd.concat([df1.describe(),df1.apply(lambda x : x.value_counts().iloc[[1]]).stack().reset_index(level=0).T])
Out[173]:
gender name
count 10 10
unique 2 7
top f Molly
freq 7 3
level_0 m Lisa
0 3 2
With Counter
from collections import Counter
def f(s):
return pd.Series(Counter(s).most_common(2)[1], ['mode2', 'mode2 freq'])
df1.describe().rename(dict(top='mode1', freq='mode1 freq')).append(df1.apply(f))
name gender
count 10 10
unique 7 2
mode1 Molly f
mode1 freq 3 7
mode2 Lisa m
mode2 freq 2 3
value_counts
Same thing without Counter
def f(s):
c = s.value_counts()
return pd.Series([s.iat[1], s.index[1]], ['mode2', 'mode2 freq'])
df1.describe().rename(dict(top='mode1', freq='mode1 freq')).append(df1.apply(f))
Numpy bits
def f(s):
f, u = pd.factorize(s)
c = np.bincount(f)
i = np.argpartition(c, -2)[-2]
return pd.Series([u[i], c[i]], ['mode2', 'mode2 freq'])
df1.describe().rename(dict(top='mode1', freq='mode1 freq')).append(df1.apply(f))
I would like to write a function that updates the values of df1 when the column names of df1 and df2 match each other.
For example:
df1:
Name | Graduated | Employed | Married
AAA 1 2 3
BBB 0 1 2
CCC 1 0 1
df2:
Answer_Code | Graduated | Employed | Married
0 No No No
1 Yes Intern Engaged
2 N/A PT Yes
3 N/A FT Divorced
Final Result:
df3:
Name | Graduated | Employed | Married
AAA Yes PT Divorced
BBB No Intern Yes
CCC Yes No NO
I would like to code something like this:
IF d1.columns = d2.columns THEN
df1.column.update(df1.column.map(df2.set_index('Answer_Code').column))
One method is to utilise pd.DataFrame.lookup:
df1 = pd.DataFrame({'Name': ['AAA', 'BBB', 'CCC'],
'Graduated': [1, 0, 1],
'Employed': [2, 1, 0],
'Married': [3, 2, 1]})
df2 = pd.DataFrame({'Answer_Code': [0, 1, 2, 3],
'Graduated': ['No', 'Yes', np.nan, np.nan],
'Employed': ['No', 'Intern', 'PT', 'FT'],
'Married': ['No', 'Engaged', 'Yes', 'Divorced']})
# perform lookup on df2 using row & column labels from df1
arr = df2.set_index('Answer_Code')\
.lookup(df1.iloc[:, 1:].values.flatten(),
df1.columns[1:].tolist()*3)\
.reshape(3, -1)
# copy df1 and allocate values from arr
df3 = df1.copy()
df3.iloc[:, 1:] = arr
print(df3)
Name Graduated Employed Married
0 AAA Yes PT Divorced
1 BBB No Intern Yes
2 CCC Yes No Engaged
You can use map.
Example:
df1.Graduated.map(df2.Graduated)
yields
0 Yes
1 No
2 Yes
Thus just do that for every column, as follows
for col in df1.columns:
if col in df2.columns:
df1[col] = df1[col].map(df2[col])
Remember to set the index to the answer code first, i.e. df2 = df2.set_index("Answer_Code"), if necessary.
Hello I have the following Data Frame:
df =
ID Value
a 45
b 3
c 10
And another dataframe with the numeric ID of each value
df1 =
ID ID_n
a 3
b 35
c 0
d 7
e 1
I would like to have a new column in df with the numeric ID, so:
df =
ID Value ID_n
a 45 3
b 3 35
c 10 0
Thanks
Use pandas merge:
import pandas as pd
df1 = pd.DataFrame({
'ID': ['a', 'b', 'c'],
'Value': [45, 3, 10]
})
df2 = pd.DataFrame({
'ID': ['a', 'b', 'c', 'd', 'e'],
'ID_n': [3, 35, 0, 7, 1],
})
df1.set_index(['ID'], drop=False, inplace=True)
df2.set_index(['ID'], drop=False, inplace=True)
print pd.merge(df1, df2, on="ID", how='left')
output:
ID Value ID_n
0 a 45 3
1 b 3 35
2 c 10 0
You could use join(),
In [14]: df1.join(df2)
Out[14]:
Value ID_n
ID
a 45 3
b 3 35
c 10 0
If you want index to be numeric you could reset_index(),
In [17]: df1.join(df2).reset_index()
Out[17]:
ID Value ID_n
0 a 45 3
1 b 3 35
2 c 10 0
You can do this in a single operation. join works on the index, which you don't appear to have set. Just set the index to ID, join df after also setting its index to ID, and then reset your index to return your original dataframe with the new column added.
>>> df.set_index('ID').join(df1.set_index('ID')).reset_index()
ID Value ID_n
0 a 45 3
1 b 3 35
2 c 10 0
Also, because you don't do an inplace set_index on df1, its structure remains the same (i.e. you don't change its indexing).