I want to group by column A, and sum over column C and return the results immediately into the dataframe. I know that I need to use groupby, and I know that I need to use sum, but I cannot figure out how to get these functions to interact seamlessly and in one line of code.
Have
A B C
0 x text 3
1 x text 7
2 y text 5
Want
A B C D
0 x text 3 10
1 x text 7 10
2 y text 5 5
call transform on the groubpy to add the aggregated column back to the original df:
In [28]:
df['D'] = df.groupby('A')['C'].transform('sum')
df
Out[28]:
A B C D
0 x text 3 10
1 x text 7 10
2 y text 5 5
transform returns a series with its index aligned to the original df so you can add it as a new column
Related
I have a dataset of 100 rows, I want to split them into multiple of 4 and then perform operations on it, i.e., first perform operation on first four rows, then on the next four rows and so on.
Note: Rows are independent of each other.
I don't know how to do it. Can somebody pls help me, I would be extremely thankful to him/her.
i will divide df per 2 row (simple example)
and make list dfs
Example
df = pd.DataFrame(list('ABCDE'), columns=['value'])
df
value
0 A
1 B
2 C
3 D
4 E
Code
grouper for grouping
grouper = pd.Series(range(0, len(df))) // 2
grouper
0 0
1 0
2 1
3 1
4 2
dtype: int64
divide to list
g = df.groupby(grouper)
dfs = [g.get_group(x) for x in g.groups]
result(dfs):
[ value
0 A
1 B,
value
2 C
3 D,
value
4 E]
Check
dfs[0]
output:
value
0 A
1 B
I have a big DataFrame I need to split into two (A and B), with the same number of rows from a certain column value in A and in B. That column has over 700 unique values, all of them strings. I leave an example:
DataFrame
Price Type
1 X
2 Y
3 Y
4 X
5 X
6 X
7 Y
8 Y
When splitting it (randomly), I should get two values of X, and two values of Y in DataFrame A and DataFrame B, like:
A
Price Type
1 X
5 X
2 Y
3 Y
B
Price Type
4 X
6 X
7 Y
8 Y
Thanks in advance!
You can use groupby().cumcount() to enumerate the rows within Type, then %2 to divide rows into two groups:
df['groups'] = df.groupby('Type').cumcount()%2
A,B = df[df['groups']==0], df[df['groups']==1]
Output:
**A**
Price Type groups
0 1 X 0
1 2 Y 0
4 5 X 0
6 7 Y 0
**B**
Price Type groups
2 3 Y 1
3 4 X 1
5 6 X 1
7 8 Y 1
Could you group by the value of Type and assign A/B to half of the group as a new column, then copy only rows with the label A/B assigned? If you need an exact split you could base it off the size of the group
You can you use "arry_split" feature of numpy library like below:
import numpy as np
df_split = np.array_split(df, 2)
df1 = df_split[0]
df2 = df_split[1]
I have a table similar to this, with the blank spaces being empty strings and the numbers being floats:
1 2 3 4 5 6
A
B 8 5
C 5 7
D 2 3 5
E 0
I want to replace the value of each cell with the output of a function which takes two arguments: the index of the row and the value of the cell.
For example, the values in the first column should be replaced with the output of func(D, 2) and func(E, 0) and the empty cells should stay empty. The function output is a string.
Expected output table:
if func(D, 2) returns X and func(E, 0) returns Y, then column 1 should look like:
1 2 3 4 5 6
A
B 8 5
C 5 7
D X 3 5
E Y
How do I do this?
First of all I would fill the dataframe:
df.fillna(0, inplace=True)
then let's assume that your function is called func and it's first argument is row index, you can apply it using the map method while iterating over rows:
for idx in df.index:
df.loc[idx] = df.loc[idx].map(lambda x: func(idx, x))
I have a dataframe A with 80 columns, and I did group by A and Sum 20 columns
E.g.
New_df=A.groupby(['X','Y','Z'])['a','b','c',......].sum().reset_Index()--------(1)
Then I want to overwrite the values in columns which are present in A with the New_df columns value which are common.
You can do:
cols1=set(A.columns.tolist())
cols2=set(New_df.columns.tolist())
common_cols = list(cols1.intersection(cols2))
A[common_cols]=New_df[common_cols]
to find the columns that the two df's have in common , then replace those in the first with the columns from the second.
This will give you results for example given an initial A:
x y
0 1 a
1 2 b
2 3 c
and New_df:
z y
0 4 d
1 5 e
2 6 f
And we wind up with final 'A', with y column taken from New_df:
x y
0 1 d
1 2 e
2 3 f
I have a dataframe extracted from an excel file which I have manipulated to be in the following form (there are mutliple rows but this is reduced to make my question as clear as possible):
|A|B|C|A|B|C|
index 0: 1 2 3 4 5 6
As you can see there are repetitions of the column names. I would like to merge this dataframe to look like the following:
|A|B|C|
index 0: 1 2 3
index 1: 4 5 6
I have tried to use the melt function but have not had any success thus far.
import pandas as pd
df = pd.DataFrame([[1,2,3,4,5,6]], columns = ['A', 'B','C','A', 'B','C'])
df
A B C A B C
0 1 2 3 4 5 6
pd.concat(x for _, x in df.groupby(df.columns.duplicated(), axis=1))
A B C
0 1 2 3
0 4 5 6