This question already has answers here:
Pandas grouby and transform('count') gives placement error - works fine on smaller dataset
(1 answer)
Merging a pandas groupby result back into DataFrame
(3 answers)
Closed 4 years ago.
I was wondering if anyone knew of a better method to what I am currently doing. Here is an example data set:
ID Number
a 1
a 2
a 3
b 4
c 5
c 6
c 7
c 8
Example: if I wanted to get a count of Numbers by ID column in the table above. I would first do a groupby ID and do a count on Number, then merge the results back to the original table like so:
df2 = df.groupby('ID').agg({'Number':'count'}).reset_index()
df2 = df2.rename(columns = {'Number':'Number_Count'})
df = pd.merge(df, df2, on = ['ID'])
This results in:
It feels like a roundabout way of doing this, does anyone know a better alternative? The reason I ask is because when working with large data sets, this method can chew up a lot of memory (by creating another table and then merging them).
You can do that quite simply with this:
import pandas as pd
df = pd.DataFrame({'ID': list('aaabcccc'),
'Number': range(1,9)})
df['Number_Count'] = df.groupby('ID').transform('count')
df
# ID Number Number_Count
#0 a 1 3
#1 a 2 3
#2 a 3 3
#3 b 4 1
#4 c 5 4
#5 c 6 4
#6 c 7 4
#7 c 8 4
Related
I currently got this dataframe:
original dataframe
However I would like to obtain a dataframe (not containing the 't') from this which looks like this (considering the index):
The index we want for our original dataframe
This of course is done easily when using .groupby().agg(), but the thing is that I don't got a simple aggregation function such as 'max' or 'mean', that I would like to use. Hence my question is: 'Is it possible to group by a dataframe with a customized aggregation function and without using SQL? If so, please let me know!'
I would love to get some help!
Simplified code example explaining my question:
df_example =
C D E
A B
1 2 5 8 9
3 7 9 3
2 4 9 5 5
6 1 4 5
We would like to obtain:
df_example_groupedby_A_only_aggregating_with_custom_function =
Z_custom
A
1 33
2 34
The values in Z_custom are obtained by using the custom aggregation function which uses the values in columns [C,D,E] from df_example.
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two data frames here
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'id':[1,2,3,2,5], 'grade':[3,5,3,2,1]})
df2 = pd.DataFrame({'id':[1,2,3], 'final':[6,4,2]})
Now I want to take final column from df2 and add to df1 based on the id column. Here is the desired output
output = pd.DataFrame({'id':[1,2,3,2,5],'grade':[3,5,3,2,1], 'final':[6,4,2,4,np.nan]})
What approach can I try?
One way to do it is by using map
df1['final'] = df1['id'].map(df2.set_index('id')['final'])
#result
id grade final
0 1 3 6.0
1 2 5 4.0
2 3 3 2.0
3 2 2 4.0
4 5 1 NaN
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two dataframes with different sizes and I want to merge them.
It's like an "update" to a dataframe column based on another dataframe with different size.
This is an example input:
dataframe 1
CODUSU Situação TIPO1
0 1AB P A0
1 2C3 C B1
2 3AB P C1
dataframe 2
CODUSU Situação ABC
0 1AB A 3
1 3AB A 4
My output should be like this:
dataframe 3
CODUSU Situação TIPO1
0 1AB A A0
1 2C3 C B1
2 3AB A C1
PS: I did it through loop but I think there should better and easier way to make it!
I read this content: pandas merging 101 and wrote this code:
df3=df1.merge(df2, on=['CODUSU'], how='left', indicator=False)
df3['Situação'] = np.where((df3['Situação_x'] == 'P') & (df3['Situação_y'] == 'A') , df3['Situação_y'] , df3['Situação_x'])
df3=df3.drop(columns=['Situação_x', 'Situação_y','ABC'])
df3 = df3[['CODUSU','Situação','TIPO1']]
And Voilà, df3 is exactly what I needed!
Thanks for everyone!
PS: I already found my answer, is there a better place to answer my own question?
df1.merge(df2,how='left', left_on='CODUSU', right_on='CODUSU')
This should do the trick.
Also, worth noting that if you want your resultant data frame to not contain the column ABC, you'd use df2.drop("ABC") instead of just df2.
This question already has answers here:
how to merge two dataframes and sum the values of columns
(2 answers)
Closed 4 years ago.
Suppose I have two dataframes with partly repeated entries:
source1=pandas.DataFrame({'key':['a','b'],'value':[1,2]})
# key value
#0 a 1
#1 b 2
source2=pandas.DataFrame({'key':['b','c'],'value':[3,0]})
# key value
#0 b 3
#1 c 0
What do I need to do with source1 and source2 in order to get resulting frame with following entries:
# key value
#0 a 1
#1 b 5
#2 c 0
Just add
source1.set_index('key').add(source2.set_index('key'), fill_value=0)
If key is already the index, just use
source1.add(source2, fill_value=0)
You man want to .reset_index() at the end if you don't want key as index
With grouping:
>>> pd.concat([source1, source2]).groupby('key', as_index=False).sum()
key value
0 a 1
1 b 5
2 c 0
This question already has answers here:
Concatenate rows of two dataframes in pandas
(3 answers)
Closed 5 years ago.
I have two Pandas DataFrames, each with different columns. I want to basically glue them together horizontally (they each have the same number of rows so this shouldn't be an issue).
There must be a simple way of doing this but I've gone through the docs and concat isn't what I'm looking for (I don't think).
Any ideas?
Thanks!
concat is indeed what you're looking for, you just have to pass it a different value for the "axis" argument than the default. Code sample below:
import pandas as pd
df1 = pd.DataFrame({
'A': [1,2,3,4,5],
'B': [1,2,3,4,5]
})
df2 = pd.DataFrame({
'C': [1,2,3,4,5],
'D': [1,2,3,4,5]
})
df_concat = pd.concat([df1, df2], axis=1)
print(df_concat)
With the result being:
A B C D
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 4 4 4 4
4 5 5 5 5