Python pandas groupby get list in the cell (excel) - python

So I'm very new with pandas excel in python
here's what I'm trying to achieve
data before
what I'm trying to get
I haven't found specific ways to do this with groupby
pls help

You can try groupby then agg
df['c'] = df['c'].astype(str)
out = df.groupby(['a', 'b'])['c'].agg(','.join).reset_index()
print(out)
a b c
0 1 2 3,4
1 1 3 5
2 2 2 3,4
3 2 3 5
4 5 6 7

Related

Sum cells with duplicate column headers in pandas during import - python

I am trying to do some basic dimensional reduction. I have a CSV file that looks something like this:
A B C A B B A C
1 1 2 2 1 3 1 1
1 2 3 0 0 1 1 2
0 2 1 3 0 1 2 2
I want to import as a pandas DF but without renaming the headers to A.1 A.2 etc. Instead I want to sum the duplicates and keep the columns names. Ideally my new DF should look like this:
A B C
4 5 3
2 3 5
5 3 3
Is it possible to do this easily or would you recommend a different way? I can also use bash, R, or anything that can do the trick with a file that is 1 million lines and 1000 columns.
Thank you!
Try split the column names by . and groupby the first part:
df.groupby(df.columns.str.split('.').str[0], axis=1).sum()
Output:
A B C
0 4 5 3
1 2 3 5
2 5 3 3
Just load the dataframe normally and group by the first letter of the column name, and sum the values:
df.groupby(lambda colname: colname[0], axis=1).sum()
which gives
A B C
0 4 5 3
1 2 3 5
2 5 3 3

Appending rows to existing pandas dataframe

I have a pandas dataframe df1
a b
0 1 2
1 3 4
I have another dataframe in the form of a dictionary
dictionary = {'2' : [5, 6], '3' : [7, 8]}
I want to append the dictionary values as rows in dataframe df1. I am using pandas.DataFrame.from_dict() to convert the dictionary into dataframe. The constraint is, when I do it, I cannot provide any value to the 'column' argument for the method from_dict().
So, when I try to concatenate the two dataframes, the pandas adds the contents of the new dataframe as new columns. I do not want that. The final output I want is in the format
a b
0 1 2
1 3 4
2 5 6
3 7 8
Can someone tell me how do I do this in least painful way?
Use concat with help of pd.DataFrame.from_dict, setting the columns of df1 during the conversion:
out = pd.concat([df1,
pd.DataFrame.from_dict(dictionary, orient='index',
columns=df1.columns)
])
Output:
a b
0 1 2
1 3 4
2 5 6
3 7 8
Another possible solution, which uses numpy.vstack:
pd.DataFrame(np.vstack([df.values, np.array(
list(dictionary.values()))]), columns=df.columns)
Output:
a b
0 1 2
1 3 4
2 5 6
3 7 8

Pivot table based on the first value of the group in Pandas

Have the following DataFrame:
I'm trying to pivot it in pandas and achieve the following format:
Actually I tried the classical approach with pd.pivot_table() but it does not work out:
pd.pivot_table(df,values='col2', index=[df.index], columns = 'col1')
Would be appreciate for some suggestions :) Thanks!
You can use pivot and then dropna for each column:
>>> df.pivot(columns='col1', values='col2').apply(lambda x: x.dropna().tolist()).astype(int)
col1 a b c
0 1 2 9
1 4 5 0
2 6 8 7
Another option is to create a Series of lists using groupby.agg; then construct a DataFrame:
out = df.groupby('col1')['col2'].agg(list).pipe(lambda x: pd.DataFrame(zip(*x), columns=x.index.tolist()))
Output:
A B C
0 1 2 9
1 4 5 0
2 6 8 7

How do I transpose columns into rows of a Pandas DataFrame?

My current data frame is comprised of 10 rows and thousands of columns. The setup currently looks similar to this:
A B A B
1 2 3 4
5 6 7 8
But I desire something more like below, where essentially I would transpose the columns into rows once the headers start repeating themselves.
A B
1 2
5 6
3 4
7 8
I've been trying df.reshape but perhaps can't get the syntax right. Any suggestions on how best to transpose the data like this?
I'd probably go for stacking, grouping and then building a new DataFrame from scratch, eg:
pd.DataFrame({col: vals for col, vals in df.stack().groupby(level=1).agg(list).items()})
That'll also give you:
A B
0 1 2
1 3 4
2 5 6
3 7 8
Try with stack, groupby and pivot:
stacked = df.T.stack().to_frame().assign(idx=df.T.stack().groupby(level=0).cumcount()).reset_index()
output = stacked.pivot("idx", "level_0", 0).rename_axis(None, axis=1).rename_axis(None, axis=0)
>>> output
A B
0 1 2
1 5 6
2 3 4
3 7 8

Using groupby on already grouped data in Pandas

I would like to achieve the result below in Python using Pandas.
I tried groupby and sum on the id and Group columns using the below:
df.groupby(['id','Group'])['Total'].sum()
I got the first two columns, but I'm not sure how to get the third column (Overall_Total).
How can I do it?
Initial data (before grouping)
id
Group
Time
1
a
2
1
a
2
1
a
1
1
b
1
1
b
1
1
c
1
2
e
2
2
a
4
2
e
1
2
a
5
3
c
1
3
e
4
3
a
3
3
e
4
3
a
2
3
h
4
Assuming df is your initial dataframe, please try this:
df_group = df.groupby(['id','group']).sum(['time']).rename(columns={'time':'Total'})
df_group['All_total'] = df_group.groupby(['id'])['Total'].transform('sum')

Categories