How to find sum of these values in Pandas? [duplicate] - python

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 15 days ago.
I'm trying to create a script for analysing my datas. Here is my problem. Let's say I have an excel file like that.
Code Value
A1 20
B1 30
A1 15
C1 20
B1 20
I need Pandas to do this file like this. And write an excel file. I got an excel file like this.
A1 35
B1 50
C1 20
Code pretty much like this.
import pandas as pd
data = pd.read_excel("filename.xlsx")
Missing Part Here
x.to_excel(r'filename.xlsx', index = True, header=True)
And I need missing part. Thanks a lot for your solutions already.
Trying to automate my data analysis. Expect adding part of my script.

What you are looking for is groupby:
>>> df
Code Value
0 A1 20
1 B1 30
2 A1 15
3 C1 20
4 B1 20
>>> df.groupby('Code', as_index=False)['Value'].sum()
Code Value
0 A1 35
1 B1 50
2 C1 20
Please take the time to read 10 minutes to Pandas

Related

How to transpose a column in a pandas dataframe with its values drawn from a different column? [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 6 months ago.
I have a pandas dataframe with column "Code" (categorical) with more than 100 unique values. I have multiple rows for same "Name" and would like to capture all of information pertaining to a unique "Name" in one row. Therefore, I'd like to transpose the column "Code" with the values from "Counter".
How do I transpose "Code" in such a way that the following table:
Name
Col1
Col2
Col3
Code
Counter
Alice
a1
4
Alice
a2
3
Bob
b1
9
Bob
c2
1
Bob
a2
4
becomes this:
Name
Col1
Col2
Col3
a1
a2
b1
c2
Alice
4
3
0
0
Bob
0
4
9
1
I can't comment yet but the above answer (from Yuca) should work for you - you can assign the pivot table to a variable and it will be your dataframe. you can also to be sure use Pandas to make it a dataframe:
import pandas as pd
Pivoted = df.pivot(index='Name', columns='Code', values='Counter').fillna(0)
dataframe = pd.Dataframe (data = Pivoted)
try
df.pivot(index='Name', columns='Code', values='Counter').fillna(0)
output
Code a1 a2 b1 c2
Name
Alice 4.0 3.0 0.0 0.0
Bob 0.0 4.0 9.0 1.0

Pandas: Some MultiIndex values appearing as NaN when reading Excel sheets

When reading an Excel spreadsheet into a Pandas DataFrame, Pandas appears to be handling merged cells in an odd fashion. For the most part, it interprets the merged cells as desired, apart from the first merged cell for each column, which is producing NaN values where it shouldn't.
dataframes = pd.read_excel(
"../data/data.xlsx",
sheet_name=[0,1,2], # read the first three sheets as separate DataFrames
header=[0,1], # rows [1,2] in Excel
index_col=[0,1,2], # cols [A,B,C] in Excel
)
I load three sheets, but behaviour is identical for each so from now on I will only discuss one of them.
> dataframes[0]
Header 1
H2
H3
Value 1
Overall
Overall
A1
B1
0
10
NaN
NaN
1
11
NaN
B2
0
12
NaN
B2
1
13
--------
-------
-------
-------
A2
B1
0
11
A2
B1
1
12
A2
B2
0
13
A2
B2
1
14
As you can see, A1 loads with NaNs yet A2 (and all beyond it, in the real data) load fine. Both A1 and A1 are actually a single merged cell spanning 4 rows in the Excel spreadsheet itself.
What could be causing this issue? It would normally be a simple fix via a fillna(method="ffill") but MultiIndex does not support that. I have so far not found another workaround.

Merging two dataframes while considering overlaps and missing indexes [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I have multiple dataframes that have an ID and a value and I am trying to merge them such that each ID has all the values in it's row.
ID
Value
1
10
3
21
4
12
5
43
7
11
And then I have another dataframe:
ID
Value2
1
12
2
14
4
55
6
23
7
90
I want to merge these two in a way where it considers the ID's that are already in the first dataframe and if an ID that is the second dataframe is not in the first one, it adds it to the ID row with value2 leaving value empty. This is what my result would look like:
ID
Value
Value2
1
10
12
3
21
-
4
12
55
5
43
-
7
11
90
2
-
14
6
-
23
Hope this makes sense. I don't really care for the order of the ID numbers, they can be sorted or not. My goal is to be able to create dictionaries for each ID with "Value", "Value2", "Value3,... as keys and the corresponding actual value numbers as the keys values. Please let me know if any clarification needed.
You can use pandas' merge method (see here for the help page):
import pandas as pd
df1.merge(df2, how='outer', on='ID')
Specifying 'outer' will use union keys from both dataframes.

Merge two dataframes with different sizes [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two dataframes with different sizes and I want to merge them.
It's like an "update" to a dataframe column based on another dataframe with different size.
This is an example input:
dataframe 1
CODUSU Situação TIPO1
0 1AB P A0
1 2C3 C B1
2 3AB P C1
dataframe 2
CODUSU Situação ABC
0 1AB A 3
1 3AB A 4
My output should be like this:
dataframe 3
CODUSU Situação TIPO1
0 1AB A A0
1 2C3 C B1
2 3AB A C1
PS: I did it through loop but I think there should better and easier way to make it!
I read this content: pandas merging 101 and wrote this code:
df3=df1.merge(df2, on=['CODUSU'], how='left', indicator=False)
df3['Situação'] = np.where((df3['Situação_x'] == 'P') & (df3['Situação_y'] == 'A') , df3['Situação_y'] , df3['Situação_x'])
df3=df3.drop(columns=['Situação_x', 'Situação_y','ABC'])
df3 = df3[['CODUSU','Situação','TIPO1']]
And Voilà, df3 is exactly what I needed!
Thanks for everyone!
PS: I already found my answer, is there a better place to answer my own question?
df1.merge(df2,how='left', left_on='CODUSU', right_on='CODUSU')
This should do the trick.
Also, worth noting that if you want your resultant data frame to not contain the column ABC, you'd use df2.drop("ABC") instead of just df2.

groupby with multiple columns with addition and frequency counts in pandas [duplicate]

This question already has answers here:
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 4 years ago.
I have a table that is looks like follows:
name type val
A online 12
B online 24
A offline 45
B online 32
A offline 43
B offline 44
I want to dataframe in such a manner that it can be groupby with multiple cols name & type, which also have additional columns that return the count of the record with val being added of the same type records. It should be like follows:
name type count val
A online 1 12
offline 2 88
B online 2 56
offline 1 44
I have tried pd.groupby(['name', 'type'])['val'].sum() that gives the addition but unable to add the count of records.
Add parameter sort=False to groupby for avoid default sorting and aggregate by agg with tuples with new columns names and aggregate functions, last reset_index for MultiIndex to columns:
df1 = (df.groupby(['name', 'type'], sort=False)['val']
.agg([('count', 'count'),('val', 'sum')])
.reset_index())
print (df1)
name type count val
0 A online 1 12
1 B online 2 56
2 A offline 2 88
3 B offline 1 44
You can try pivoting i.e
df.pivot_table(index=['name','type'],aggfunc=['count','sum'],values='val')
count sum
val val
name type
A offline 2 88
online 1 12
B offline 1 44
online 2 56

Categories