df1:
col1 col2
0 a 5
1 b 2
2 c 1
df2:
col1
0 qa0
1 qa1
2 qa2
3 qa3
4 qa4
5 qa5
final output:
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
Basically , in df1, I have index stored for another df data. I have to fetch data from df2 and append it in df1.
I don't know how to fetch data via index number.
Use Series.map by another Series:
df1['col3'] = df1['col2'].map(df2['col1'])
Or use DataFrame.join with rename column:
df1 = df1.join(df2.rename(columns={'col1':'col3'})['col3'], on='col2')
print (df1)
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
You can use iloc to get data and then to_numpy for values
df1["col3"] = df2.iloc[df1.col2].to_numpy()
df1
col1 col2 col3
0 a 5 qa5
1 b 2 qa2
2 c 1 qa1
Related
I'm looping through list with multiple dicionaries and want them to be appended into single data frame.
#getting values of specific key from AWS' boto3 response
events_list = response_event.get('Events')
for e in events_list:
df = pd.DataFrame.from_dict(e)
print(df)
Current and expected result below:
col1 col2
0 1 3
col1 col2
0 2 4
col1 col2
0 3 5
col1 col2
0 1 3
1 2 4
2 3 5
Try with concat
out = pd.concat(pd.DataFrame.from_dict(e) for e in events_list)
I have two data frames df1 and df2 with a common column ID. Both the data frames have different number of rows and columns.
I want to compare these two dataframe ID’s. I want to create another column y in df1 and for all the common id’s present in df1 and df2 the value of y should be 0 else 1.
For example df1 is
Id col1 col2
1 Abc def
2 Geh ghk
3 Abd fg
1 Dfg abc
And df2 is
Id col3 col4
1 Dgh gjs
2 Gsj aie
The final dataframe should be
Id col1 col2 y
1 Abc def 0
2 Geh ghk 0
3 Abd fg 1
1 Dfg abc 0
Lets create df1 and df2 first:
df1=pd.DataFrame({'ID':[1,2,3,1], 'col1':['A','B','C', 'D'], 'col2':['C','D','E', 'F']})
df2=pd.DataFrame({'ID':[1,2], 'col3':['AA','BB'], 'col4':['CC','DD']})
Here, pandas lambda function comes handy:
df1['y'] = df1['ID'].map(lambda x:0 if x in df2['ID'].values else 1)
df1
ID col1 col2 y
0 1 A C 0
1 2 B D 0
2 3 C E 1
3 1 D F 0
I'm trying to find the mean of values in different rows, grouped by similarities in other columns. Example:
In [14]: pd.DataFrame({'col1':[1,2,1,2], 'col2':['A','C','A','B'], 'col3':[1, 5, 6, 9]})
Out[14]:
col1 col2 col3
0 1 A 1
1 2 C 5
2 1 A 6
3 2 B 9
What I would like is to add a column with the means of col3, for all rows where the combination of col1 and col2 match. Desired output:
Out[14]:
col1 col2 col3 mean
0 1 A 1 3.5
1 2 C 5 5
2 1 A 6 3.5
3 2 B 9 9
I have tried several things with groupby in combination with apply but couldn't get proper results.
its a transform my man
df['mean'] = df.groupby(['col1','col2']).col3.transform('mean')
My question is related to my previous Question but it's different. So I am asking the new question.
In above question see the answer of #jezrael.
df = pd.DataFrame({'col1':[1,1,1],
'col2':[4,4,6],
'col3':[7,7,9],
'col4':[3,3,5]})
print (df)
col1 col2 col3 col4
0 1 4 7 3
1 1 4 7 3
2 1 6 9 5
df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
col4 col3 result_col
col1 col2
1 4 1 2 2.0
6 1 1 1.0
Now here I want to take count for the specific value of col4 . Say I also want to take count of col4 == 3 in the same query.
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'}) ... + count(col4=='3')
How to do this in same above query I have tried bellow but not getting solution.
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique','col4':'x: lambda x[x == 7].count()'})
Do some preprocessing by including the col4==3 as a column ahead of time. Then use aggregate
df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
old answers
g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
result_col=g.col4.apply(lambda x: x.eq(3).sum()))
col3 col4 result_col
col1 col2
1 4 2 1 2
6 1 1 0
slightly rearranged
g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
I think you need aggregate with list of function in dict for column col4.
If need count 3 values the simpliest is sum True values in x == 3:
df1 = df.groupby(['col1','col2'])
.agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
col4_nunique col4_count_3 col3_size
col1 col2
1 4 1 2 2
6 1 0 1
Say I have two matrices, an original and a reference:
import pandas as pa
print "Original Data Frame"
# Create a dataframe
oldcols = {'col1':['a','a','b','b'], 'col2':['c','d','c','d'], 'col3':[1,2,3,4]}
a = pa.DataFrame(oldcols)
print "Original Table:"
print a
print "Reference Table:"
b = pa.DataFrame({'col1':['x','x'], 'col2':['c','d'], 'col3':[10,20]})
print b
Where the tables look like this:
Original Data Frame
Original Table:
col1 col2 col3
0 a c 1
1 a d 2
2 b c 3
3 b d 4
Reference Table:
col1 col2 col3
0 x c 10
1 x d 20
Now I want to subtract from the third column (col3) of the original table (a), the value in the reference table (c) in the row where the second columns of the two tables match. So the first row of table two should have the value 10 added to the third column, because the row of table b where the column is col2 is 'c' has a value of 10 in col3. Make sense? Here's some code that does that:
col3 = []
for ix, row in a.iterrows():
col3 += [row[2] + b[b['col2'] == row[1]]['col3']]
a['col3'] = col3
print "Output Table:"
print a
Yielding the following output:
Output Table:
col1 col2 col3
0 a c [11]
1 a d [22]
2 b c [13]
3 b d [24]
My question is, is there a more elegant way to do this? Also, the results in 'col3' should not be lists. Solutions using numpy are also welcome.
I did not quite understand your description of what you are trying to do, but the output you have shown can be generated by first merging the two data frames and then some simple operations;
>>> df = a.merge(b.filter(['col2', 'col3']), how='left',
left_on='col2', right_on='col2', suffixes=('', '_'))
>>> df
col1 col2 col3 col3_
0 a c 1 10
1 b c 3 10
2 a d 2 20
3 b d 4 20
[4 rows x 4 columns]
>>> df.col3_.fillna(0, inplace=True) # in case there are no matches
>>> df.col3 += df.col3_
>>> df
col1 col2 col3 col3_
0 a c 11 10
1 b c 13 10
2 a d 22 20
3 b d 24 20
[4 rows x 4 columns]
>>> df.drop('col3_', axis=1, inplace=True)
>>> df
col1 col2 col3
0 a c 11
1 b c 13
2 a d 22
3 b d 24
[4 rows x 3 columns]
If values in col2 in b are not unique, then probably you also need something like:
>>> b.groupby('col2', as_index=False)['col3'].aggregate(sum)