I have a dataframe, that looks something like :
Item Type Location Count
1 Dog USA 10
2 Dog UK 20
3 Cat JAPAN 30
4 Cat UK 40
5 Bird CHINA 50
6 Bird SPAIN 60
7 Bird UAE 70
I would like to add "Total" row of the sum "Count" column to the end of each unique "Type" column, Morover : I would like to fill down the "Type" column only, as below :
Item Type Location Count
1 Dog USA 10
2 Dog UK 20
Total Dog 30
3 Cat JAPAN 30
4 Cat UK 40
Total Cat 70
5 Bird CHINA 50
6 Bird SPAIN 60
7 Bird UAE 70
Total Bird 180
What i have tried, which it sums all the "Count" row values :
df.loc["Count"] = df.sum()
First reset the index of the dataframe then group the dataframe on Type and aggregate the column Count using sum and index using max, then assign the Item column whose value is Total. Finally .concat the frame with the original dataframe df and .sort the index to maintain the order.
frame = df.reset_index()\
.groupby('Type', as_index=False)\
.agg({'Count': 'sum', 'index': 'max'})\
.assign(Item='Total').set_index('index')
pd.concat([df, frame]).sort_index(ignore_index=True)
Another approach you might want to try (might be faster than the above one):
def summarize():
for k, g in df.groupby('Type', sort=False):
yield g.append({'Item': 'Total',
'Type': k, 'Location': '',
'Count': g['Count'].sum()}, ignore_index=True)
pd.concat(summarize(), ignore_index=True)
which results:
Item Type Location Count
0 1 Dog USA 10
1 2 Dog UK 20
2 Total Dog 30
3 3 Cat JAPAN 30
4 4 Cat UK 40
5 Total Cat 70
6 5 Bird CHINA 50
7 6 Bird SPAIN 60
8 7 Bird UAE 70
9 Total Bird 180
df.groupby("Type").agg({"Count": "sum"})
Related
I've got a df with a MultiIndex like so
nums = np.arange(5)
key = ['kfc'] * 5
mi = pd.MultiIndex.from_arrays([key,nums])
df = pd.DataFrame({'rent': np.arange(10,60,10)})
df.set_index(mi)
rent
kfc 0 10
1 20
2 30
3 40
4 50
How can I write to the cell below kfc, I want to add meta info e.g. The address or the monthly rent
rent
kfc 0 10
NYC 1 20
2 30
3 40
4 50
According to your expected output you would need to recreate the df MultiIndex:
df.index = pd.MultiIndex.from_tuples(zip(['kfc'] + ['NYC'] * 4, df.index.levels[1]))
print(df)
rent
kfc 0 10
NYC 1 20
2 30
3 40
4 50
Right now, I have 3 pandas dfs that I want to combine. Below is a short version of what I'm working with. Basically, dfs 2 and 3 have the indices that correspond to df1. I want to create another column on the first df that has the labels I want according to the indices of dfs 2 and 3 (please see below for a reference of what my desired result is).
Any help is very appreciated! Thank you!
#df 1
Animal Number
2
4
6
9
11
#df 2
Lions
2
11
#df 3
Tigers
4
6
9
This is what I would want my result to look like:
Animal # Animal Type
0 2 Lion
1 4 Tiger
2 6 Tiger
3 9 Tiger
4 11 Lion
Try:
m = pd.concat([df2, df3], axis=1).stack().reset_index().set_index(0)['level_1']
df1['Animal Type'] = df1['Animal Number'].map(m)
print(df1)
Output:
Animal Number Animal Type
0 2 Lions
1 4 Tigers
2 6 Tigers
3 9 Tigers
4 11 Lions
Suppose I have two dataframes:
df1:
Person Number Type
0 Kyle 12 Male
1 Jacob 15 Male
2 Jacob 15 Male
df2:
A much larger dataset with similar format except there is a count column that needs to increment based on df1
Person Number Type Count
0 Kyle 12 Male 0
1 Jacob 15 Male 0
3 Sally 43 Female 0
4 Mary 15 Female 5
What I am looking to do is increase the count column based on the number of occurrences of the same person in df1
Excepted output for this example:
Person Number Type Count
0 Kyle 12 Male 1
1 Jacob 15 Male 2
3 Sally 43 Female 0
4 Mary 15 Female 5
Increase count to 1 for Kyle because there is one instance, increase count to 2 because there are two instances for Jacob. Don't change value for Sally and Mary and keep the value the same.
How do I do this? I have tried using .loc but I can't figure out how to account for two instances of the same row. Meaning that I can only get count to increase by one for Jacob even though there are two Jacobs in df1.
I have tried
df2.loc[df2['Person'].values == df1['Person'].values, 'Count'] += 1
However this does not account for duplicates.
df1 = df1.groupby(df.columns.tolist(), as_index=False).size().to_frame('Count').reset_index()
df1 = df1.set_index(['Person','Number','Type'])
df2 = df2.set_index(['Person','Number','Type'])
df1.add(df2, fill_value=0).reset_index()
Or
df1 = df1.groupby(df.columns.tolist(), as_index=False).size().to_frame('Count').reset_index()
df2.merge(df1, on=['Person','Number','Type'], how='left').set_index(['Person','Number','Type']).sum(axis=1).to_frame('Count').reset_index()
value_counts + Index alignment.
u = df2.set_index("Person")
u.assign(Count=df1["Person"].value_counts().add(u["Count"], fill_value=0))
Number Type Count
Person
Kyle 12 Male 1.0
Jacob 15 Male 2.0
Sally 43 Female 0.0
Mary 15 Female 5.0
Hi I am working with a pandas.Dataframe like below:
Name Quality
Carrot 50
Potato 34
Raddish 43
Ginger 50
Tomato 43
Cabbage 12
I want to associate a rank to the dataframe. I have successfully been able to sort the dataframe based on the field Quality like below:
Name Quality
Carrot 50
Ginger 50
Raddish 43
Tomato 43
Potato 34
Cabbage 12
Now what I want to do is, add a new column called Position and have the rank at which they exist.
The point is, the same rank can be given to two different elements if their quality is the same.
Sample Output Dataframe:
Name Quality Position
Carrot 50 1
Ginger 50 1
Raddish 43 2
Tomato 43 2
Potato 34 3
Cabbage 12 4
Notice how two elements with same quality have the same position while the below elements have +1 position. Also, the dataframe has avg 10 million records
How can I do this in Pandas.Dataframe?
I Sort my Dataframe like below:
df_sort = dataframe.sort_values(by=attribute, ascending=order)
df_sort.reset_index(drop=True)
You're going to want to use Rank.
There are a few variations to rank. The one you want is Dense. That ensures that ties don't result in halves.
df['Position'] = df.Quality.rank(method='dense', ascending = False).astype(int)
df
Name Quality Position
0 Carrot 50 1
1 Ginger 50 1
2 Raddish 43 2
3 Tomato 43 2
4 Potato 34 3
5 Cabbage 12 4
For demonstration purposes, if you didn't use dense but rather min, your dataframe would look like this:
Name Quality Position
0 Carrot 50 1
1 Ginger 50 1
2 Raddish 43 3
3 Tomato 43 3
4 Potato 34 5
5 Cabbage 12 6
The key here is to use ascending = False
For a pre-sorted dataframe, you can use pandas.factorize:
df['Rank'] = pd.factorize(df['Quality'])[0] + 1
print(df)
Name Quality Rank
0 Carrot 50 1
1 Ginger 50 1
2 Raddish 43 2
3 Tomato 43 2
4 Potato 34 3
5 Cabbage 12 4
I have a data frame. I would like to create a pivot table from this dataframe with both the rows and the columns of the pivot table equal to df['event'].
In [7]:
df
Out[7]:
event event_time num session_id
0 dog 1 2 a
1 cat 2 3 a
2 bird 3 5 a
3 tree 4 7 a
4 cat 1 3 b
5 dog 2 2 b
6 tree 1 7 c
7 dog 2 2 c
8 cat 3 3 c
Using:
pv = pd.pivot_table(df, 'num', rows='event', cols='event', aggfunc=np.sum)
I get the following error:
ValueError: Grouper for 'event' not 1-dimensional
I would like to get something like (the agg function is arbitrary. I am concerned with the grouping):
bird cat dog tree
dog 29 13 3 43
cat 31 17 5 47
bird 37 19 7 53
tree 41 23 11 59
(numbers are just arbitrary primes.)
Any thoughts?