Combining pandas dataframes according to other dataframes - python

Right now, I have 3 pandas dfs that I want to combine. Below is a short version of what I'm working with. Basically, dfs 2 and 3 have the indices that correspond to df1. I want to create another column on the first df that has the labels I want according to the indices of dfs 2 and 3 (please see below for a reference of what my desired result is).
Any help is very appreciated! Thank you!
#df 1
Animal Number
2
4
6
9
11
#df 2
Lions
2
11
#df 3
Tigers
4
6
9
This is what I would want my result to look like:
Animal # Animal Type
0 2 Lion
1 4 Tiger
2 6 Tiger
3 9 Tiger
4 11 Lion

Try:
m = pd.concat([df2, df3], axis=1).stack().reset_index().set_index(0)['level_1']
df1['Animal Type'] = df1['Animal Number'].map(m)
print(df1)
Output:
Animal Number Animal Type
0 2 Lions
1 4 Tigers
2 6 Tigers
3 9 Tigers
4 11 Lions

Related

Pandas: Create column with rolling sum of previous n rows of another column for within the same id/group

Sample dataset:
id fruit
0 7 NaN
1 7 apple
2 7 NaN
3 7 mango
4 7 apple
5 7 potato
6 3 berry
7 3 olive
8 3 olive
9 3 grape
10 3 NaN
11 3 mango
12 3 potato
In fruit column value of NaN and potato is 0. All other strings value is 1. I want to generate a new column sum_last_3 where each row calculates the sum of previous 3 rows (inclusive) of fruit column. When a new id appears, it should calculate from the beginning.
Output I want:
id fruit sum_last3
0 7 NaN 0
1 7 apple 1
2 7 NaN 1
3 7 mango 2
4 7 apple 2
5 7 potato 2
6 3 berry 1
7 3 olive 2
8 3 olive 3
9 3 grape 3
10 3 NaN 2
11 3 mango 2
12 3 potato 1
My Code:
df['sum_last5'] = (df['fruit'].ne('potato') & df['fruit'].notna())
.groupby('id',sort=False, as_index=False)['fruit']
.rolling(min_periods=1, window=3).sum().astype(int).values
You can modify your codes slightly, as follows:
df['sum_last3'] = ((df['fruit'].ne('potato') & df['fruit'].notna())
.groupby(df['id'],sort=False)
.rolling(min_periods=1, window=3).sum().astype(int)
.droplevel(0)
)
or use .values as in your codes:
df['sum_last3'] = ((df['fruit'].ne('potato') & df['fruit'].notna())
.groupby(df['id'],sort=False)
.rolling(min_periods=1, window=3).sum().astype(int)
.values
)
Your codes are close, just need to change id to df['id'] in the .groupby() call (since the main subject for calling .groupby() is now a boolean series rather than df itself, so .groupby() cannot recognize the id column by the column label 'id' alone and need also the dataframe name to fully qualify/identify the column).
Also remove as_index=False since this parameter is for dataframe rather than (boolean) series here.
Result:
print(df)
id fruit sum_last3
0 7 NaN 0
1 7 apple 1
2 7 NaN 1
3 7 mango 2
4 7 apple 2
5 7 potato 2
6 3 berry 1
7 3 olive 2
8 3 olive 3
9 3 grape 3
10 3 NaN 2
11 3 mango 2
12 3 potato 1

Redefining a pandas dataframe based on its group

Iam using this dataframe
source fruit 2019 2020 2021
0 a apple 3 1 1
1 a banana 4 3 5
2 a orange 2 2 2
3 b apple 3 4 5
4 b banana 4 5 2
5 b orange 1 6 4
i want to refine it like this
source fruit 2019 2020 2021
0 a total 9 6 8
1 a seeds 5 3 3
2 a banana 4 3 5
3 b total 8 15 11
4 b seeds 4 10 9
5 b banana 4 5 2
total is sum of all fruits in that year for each source.
seeds is the sum of fruits containing seeds for each year for each source.
I tried
Appending new empty rows : Insert a new row after every nth row & Insert row at any position
But wasn't getting the expected result.
What would be the best way to get the desired output?
TRY:
df1 = df.groupby('source', as_index=False).sum().assign(fruit = 'total')
seeds = ['orange','apple']
df2 = df.loc[df['fruit'].isin(seeds)].groupby('source', as_index=False).sum().assign(fruit = 'seeds')
final_df = pd.concat([df.loc[~df['fruit'].isin(seeds)], df1,df2])

Modify DataFrame based on another DataFrame in Pandas

I have these two dataframes
df1
Product Quantity Price Description
0 bread 3 12 desc1
1 cookie 5 10 desc2
2 milk 7 15 desc3
3 sugar 4 7 desc4
4 chocolate 5 9 desc5
df2
Attribute Configuration
0 Product C
1 Quantity C
2 Price D
3 Description D
What I'm trying to do is if the letter D is in the Configuration column in df2. The entire row is deleted in df1.
So that df2 is like the way to create another dataframe with the configuration that this gives me.
The condition could be...
if df2.Configuration == 'D'
df1.drop when df1.header = df2.Attribute
I kind of give that idea but I'm not sure it's like that. What I can do?
The result should look like this...
df3
Product Quantity
0 bread 3
1 cookie 5
2 milk 7
3 sugar 4
4 chocolate 5
Using
df1.drop(df2.loc[df2.Configuration=='D','Attribute'].tolist(),1)
Product Quantity
0 bread 3
1 cookie 5
2 milk 7
3 sugar 4
4 chocolate 5

Grouping values in a a dataframe

i have a dataframe like this
Number Names
0 1 Josh
1 2 Jon
2 3 Adam
3 4 Barsa
4 5 Fekse
5 6 Bravo
6 7 Barsa
7 8 Talyo
8 9 Jon
9 10 Zidane
how can i group these numbers based on names
for Number,Names in zip(dsa['Number'],dsa['Names'])
print(Number,Names)
The above code gives me following output
1 Josh
2 Jon
3 Adam
4 Barsa
5 Fekse
6 Bravo
7 Barsa
8 Talyo
9 Jon
10 Zidane
How can i get a output like below
1 Josh
2,9 Jon
3 Adam
4,7 Barsa
5 Fekse
6 Bravo
8 Talyo
10 Zidane
I want to group the numbers based on names
Something like this?
df.groupby("Names")["Number"].unique()
This will return you a series and then you can transform as you wish.
Use pandas' groupby function with agg which aggregates columns. Assuming your dataframe is called df:
grouped_df = df.groupby(['Names']).agg({'Number' : ['unique']})
This is grouping by Names and within those groups reporting the unique values of Number.
Lets say the DF is:
A = pd.DataFrame({'n':[1,2,3,4,5], 'name':['a','b','a','c','c']})
n name
0 1 a
1 2 b
2 3 a
3 4 c
4 5 c
You can use groupby to group by name, and then apply 'list' to the n of those names:
A.groupby('name')['n'].apply(list)
name
a [1, 3]
b [2]
c [4, 5]

Pandas - how to sort a DataFrame via custom sorting of a column?

Below is the output for my DataFrame. I would like to sort the DataFrame by the column animals and subsequently by day. How can I sort animals in the following order: dogs, pigs, cats? Thanks.
index animals day number
0 dogs 1 3
1 cats 2 1
2 dogs 3 4
3 pigs 4 0
4 pigs 5 6
5 cats 6 1
You can pass the columns to sort by as a list -
In [30]: df.sort(['animals', 'day'])
Out[30]:
animals day number
1 cats 2 1
5 cats 6 1
0 dogs 1 3
2 dogs 3 4
3 pigs 4 0
4 pigs 5 6
The order of columns determines how the dataframe gets sorted first, and how ties are broken.

Categories