Generate combinations by systematically selecting rows from groups (using pandas)

Generate combinations by systematically selecting rows from groups (using pandas) - python

I have a pandas dataframe df which appears as following: (toy version below but the real df contains many more columns and groups)
group sub fruit
a 1 apple
a 2 banana
a 3 orange
b 1 pear
b 2 strawberry
b 3 cherry
c 1 kiwi
c 2 tomato
c 3 lemon
All groups have the same number of rows. I am trying to generate a new dataframe that contains all the combinations of group and sub but specifically so that each combo contains the all types of groups and all types of subs.
Desired output:
combo group sub fruit
1 a 1 apple
1 b 2 strawberry
1 c 3 lemon
2 a 1 apple
2 c 2 tomato
2 b 3 cherry
3 b 1 pear
3 a 2 banana
3 c 3 lemon
4 c 1 kiwi
4 a 2 banana
4 b 3 cherry
5 c 1 kiwi
5 b 2 strawberry
5 a 3 orange
...
So the below would be a wrong combination, since it was two values of the same sub:
6 c 2 tomato
6 b 2 strawberry
6 a 3 orange
A previously post of mine randomly selected subs but I realized was too unconstrained: Generate combinations by randomly selecting a row from multiple groups (using pandas)

A solution could be:
import pandas as pd
import numpy as np
from itertools import permutations
df = pd.DataFrame({"group":{"0":"a","1":"a","2":"a","3":"b","4":"b","5":"b","6":"c","7":"c","8":"c"},
"sub":{"0":1,"1":2,"2":3,"3":1,"4":2,"5":3,"6":1,"7":2,"8":3},
"fruit":{"0":"apple","1":"banana","2":"orange","3":"pear","4":"strawberry",
"5":"cherry","6":"kiwi","7":"tomato","8":"lemon"}})
df2 = pd.DataFrame({"combo": [j for i in ([i]*3 for i in range(1,7)) for j in i],
"group": [j for i in permutations(["a","b","c"]) for j in i],
"sub":[1,2,3]*6})
pd.merge(df2, df, how="left", on=["group","sub"])

Related

merging 2 data frames that doesnt have a shared column

I am having some trouble with merging 2 data frames that doesn't have any same column.
I have 2 data frames and I want to combine them (they both have the same row number)
For example, I have these two data frames:
A:
store
red candy
apple
first
5
3
second
1
2
third
4
2
B:
yellow candy
banana
green candy
10
5
4
5
3
3
1
1
0
and I want to merge them so I will have one data frame looks like this:
store
red candy
apple
yellow candy
banana
green candy
first
5
3
10
5
4
second
1
2
5
3
3
third
4
2
1
1
0

Handling values with multiple items in dataframe

Supposed my dataframe
Name Value
0 A apple
1 A banana
2 A orange
3 B grape
4 B apple
5 C apple
6 D apple
7 D orange
8 E banana
I want to show the items of each name.
(By removing duplicates)
output what I want
Name Values
0 A apple, banana, orange
1 B grape, apple
2 C apple
3 D apple, orange
4 E banana
thank you for reading

Changed sample data with duplicates:
print (df)
Name Value
0 A apple
1 A apple
2 A banana
3 A banana
4 A orange
5 B grape
6 B apple
7 C apple
8 D apple
9 D orange
10 E banana
If duplicates per both columns is necessary remove first use DataFrame.drop_duplicates and then aggregate join:
df1 = (df.drop_duplicates(['Name','Value'])
.groupby('Name')['Value']
.agg(','.join)
.reset_index())
print (df1)
Name Value
0 A apple,banana,orange
1 B grape,apple
2 C apple
3 D apple,orange
4 E banana
If not removed duplicates output is:
df2 = (df.groupby('Name')['Value']
.agg(','.join)
.reset_index())
print (df2)
Name Value
0 A apple,apple,banana,banana,orange
1 B grape,apple
2 C apple
3 D apple,orange
4 E banana

update pandas groupby group with column value

I have a test df like this:
df = pd.DataFrame({'A': ['Apple','Apple', 'Apple','Orange','Orange','Orange','Pears','Pears'],
'B': [1,2,9,6,4,3,2,1]
})
A B
0 Apple 1
1 Apple 2
2 Apple 9
3 Orange 6
4 Orange 4
5 Orange 3
6 Pears 2
7 Pears 1
Now I need to add a new column with the respective %differences in col 'B'. How is this possible. I cannot get this to work.
I have looked at
update column value of pandas groupby().last()
Not sure that it is pertinent to my problem.
And this which looks promising
Pandas Groupby and Sum Only One Column
I need to find and insert into the col maxpercchng (all rows in group) the maximum change in col (B) per group of col 'A'.
So I have come up with this code:
grouppercchng = ((df.groupby['A'].max() - df.groupby['A'].min())/df.groupby['A'].iloc[0])*100
and try to add it to the group col 'maxpercchng' like so
group['maxpercchng'] = grouppercchng
Or like so
df_kpi_hot.groupby(['A'], as_index=False)['maxpercchng'] = grouppercchng
Does anyone know how to add to all rows in group the maxpercchng col?

I believe you need transform for Series with same size like original DataFrame filled by aggregated values:
g = df.groupby('A')['B']
df['maxpercchng'] = (g.transform('max') - g.transform('min')) / g.transform('first') * 100
print (df)
A B maxpercchng
0 Apple 1 800.0
1 Apple 2 800.0
2 Apple 9 800.0
3 Orange 6 50.0
4 Orange 4 50.0
5 Orange 3 50.0
6 Pears 2 50.0
7 Pears 1 50.0
Or:
g = df.groupby('A')['B']
df1 = ((g.max() - g.min()) / g.first() * 100).reset_index()
print (df1)
A B
0 Apple 800.0
1 Orange 50.0
2 Pears 50.0

how to map multiple records to one unique id

I have 2 data sets with a common unique ID(duplicates in 2nd data frame)
I want to map all records with respect to each ID.
df1
id
1
2
3
4
5
df2
id col1
1 mango
2 melon
1 straw
3 banana
3 papaya
i want the out put like
df1
id col1
1 mango
straw
2 melon
3 banana
papaya
4 not available
5 not available
Thanks in advance

You're looking to do an outer df.merge:
df1 = df1.merge(df2, how='outer').set_index('id').fillna('not available')
>>> df1
col1
id
1 mango
1 straw
2 melon
3 banana
3 papaya
4 not available
5 not available

Using Pandas Data Frame how to apply count to multi level grouped columns?

I have a data frame with multiple columns and I want to use count after group by such that it is applied to the combination of 2 or more columns. for example, let's say I have two columns:
user_id product_name
1 Apple
1 Banana
1 Apple
2 Carrot
2 Tomato
2 Carrot
2 Tomato
3 Milk
3 Cucumber
...
What I want to achieve is something like this:
user_id product_name Product_Count_per_User
1 Apple 1
1 Banana 2
2 Carrot 2
2 Tomato 2
3 Milk 1
3 Cucumber 1
I cannot get it. I tried this:
dcf6 = df3.groupby(['user_id','product_name'])['user_id', 'product_name'].count()
but it does not seem to get what I want and it is displaying 4 columns instead of 3. How to do to it? Thanks.

You are counting two columns at the same time, you can just use groupby.size:
(df.groupby(['user_id', 'Product_Name']).size()
.rename('Product_Count_per_User').reset_index())
Or count only one column:
df.groupby(['user_id','Product_Name'])['user_id'].size()

Use GroupBy.size:
dcf6 = df3.groupby(['user_id','Product_Name']).size()
.reset_index(name='Product_Count_per_User')
print (dcf6)
user_id Product_Name Product_Count_per_User
0 1 Apple 2
1 1 Banana 1
2 2 Carrot 2
3 2 Tomato 2
4 3 Cucumber 1
5 3 Milk 1
What is the difference between size and count in pandas?

Base on your own code , just do this .
df.groupby(['user_id','product_name'])['user_id'].
agg({'Product_Count_per_User':'count'}).reset_index(level=1)
product_name Product_Count_per_User
user_id
1 Apple 2
1 Banana 1
2 Carrot 2
2 Tomato 2
3 Cucumber 1
3 Milk 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generate combinations by systematically selecting rows from groups (using pandas) - python

Related

merging 2 data frames that doesnt have a shared column

Handling values with multiple items in dataframe

update pandas groupby group with column value

how to map multiple records to one unique id

Using Pandas Data Frame how to apply count to multi level grouped columns?

Categories

Resources