I have these two dataframes
df1
Product Quantity Price Description
0 bread 3 12 desc1
1 cookie 5 10 desc2
2 milk 7 15 desc3
3 sugar 4 7 desc4
4 chocolate 5 9 desc5
df2
Attribute Configuration
0 Product C
1 Quantity C
2 Price D
3 Description D
What I'm trying to do is if the letter D is in the Configuration column in df2. The entire row is deleted in df1.
So that df2 is like the way to create another dataframe with the configuration that this gives me.
The condition could be...
if df2.Configuration == 'D'
df1.drop when df1.header = df2.Attribute
I kind of give that idea but I'm not sure it's like that. What I can do?
The result should look like this...
df3
Product Quantity
0 bread 3
1 cookie 5
2 milk 7
3 sugar 4
4 chocolate 5
Using
df1.drop(df2.loc[df2.Configuration=='D','Attribute'].tolist(),1)
Product Quantity
0 bread 3
1 cookie 5
2 milk 7
3 sugar 4
4 chocolate 5
Related
Right now, I have 3 pandas dfs that I want to combine. Below is a short version of what I'm working with. Basically, dfs 2 and 3 have the indices that correspond to df1. I want to create another column on the first df that has the labels I want according to the indices of dfs 2 and 3 (please see below for a reference of what my desired result is).
Any help is very appreciated! Thank you!
#df 1
Animal Number
2
4
6
9
11
#df 2
Lions
2
11
#df 3
Tigers
4
6
9
This is what I would want my result to look like:
Animal # Animal Type
0 2 Lion
1 4 Tiger
2 6 Tiger
3 9 Tiger
4 11 Lion
Try:
m = pd.concat([df2, df3], axis=1).stack().reset_index().set_index(0)['level_1']
df1['Animal Type'] = df1['Animal Number'].map(m)
print(df1)
Output:
Animal Number Animal Type
0 2 Lions
1 4 Tigers
2 6 Tigers
3 9 Tigers
4 11 Lions
The .csv looks like this:
Date+time: shirt shirt (dress) shorts shorts (dress) accessories
2019-01-01 5:00 5 3 2 2 3
2019-01-01 5:05 1 1 4 1 5
2019-01-01 5:10 1 2 1 2 9
...
2019-12-31 11:55 5 2 1 1 7
I want to know if there is a way to combine the columns that share a common first name? For instance, the program should look for columns that share a similar first name such as shirt and shirt (dress), these should be merged together and considered one entity, same with shorts.
How would I go about finding the highest purchasing hour for each day of the year and then for those highest purchasing hours find the percentages of the total for each product?
You could trim off the second part of the columns, so that the ones with the same first name are changed to have the same whole name:
df.columns = df.columns.str.split(' ').str[0]
Output:
>>> df
Date+time: shirt shirt shorts shorts accessories
0 2019-01-01 5:00 5 3 2 2 3
1 2019-01-01 5:05 1 1 4 1 5
2 2019-01-01 5:10 1 2 1 2 9
3 2019-12-31 11:55 5 2 1 1 7
Then, sum the columns with the same names together:
new_df = df.groupby(level=0).sum()
Output:
>>> new_df
Date+time: accessories shirt shorts
0 2019-01-01+5:00 3 8 4
1 2019-01-01+5:05 5 2 5
2 2019-01-01+5:10 9 3 3
3 2019-12-31+11:55 7 7 2
Iam using this dataframe
source fruit 2019 2020 2021
0 a apple 3 1 1
1 a banana 4 3 5
2 a orange 2 2 2
3 b apple 3 4 5
4 b banana 4 5 2
5 b orange 1 6 4
i want to refine it like this
source fruit 2019 2020 2021
0 a total 9 6 8
1 a seeds 5 3 3
2 a banana 4 3 5
3 b total 8 15 11
4 b seeds 4 10 9
5 b banana 4 5 2
total is sum of all fruits in that year for each source.
seeds is the sum of fruits containing seeds for each year for each source.
I tried
Appending new empty rows : Insert a new row after every nth row & Insert row at any position
But wasn't getting the expected result.
What would be the best way to get the desired output?
TRY:
df1 = df.groupby('source', as_index=False).sum().assign(fruit = 'total')
seeds = ['orange','apple']
df2 = df.loc[df['fruit'].isin(seeds)].groupby('source', as_index=False).sum().assign(fruit = 'seeds')
final_df = pd.concat([df.loc[~df['fruit'].isin(seeds)], df1,df2])
Given a dataframe:
qid cid title
0 1 a croc
1 2 b dog
2 3 a fish
3 4 b cat
4 5 a bird
I want to get a new dataframe that is the cartesian product of each row with each other row which has the same cid value as it (that is, to get all the pairs of rows with the same cid):
cid1 cid2 qid1 title1 qid2 title2
0 a a 1 croc 3 fish
1 a a 1 croc 5 bird
2 a a 3 fish 5 bird
3 b b 2 dog 4 cat
Suppose my dataset is about 500M, can anybody solve this problem in a comparatively efficient way?
One way to do it s to use a self merge then filter out all the unwanted records.
df.merge(df, on='cid', suffixes=('1','2')).query('qid1 < qid2')
Output:
qid1 cid title1 qid2 title2
1 1 a croc 3 fish
2 1 a croc 5 bird
5 3 a fish 5 bird
10 2 b dog 4 cat
Below is the output for my DataFrame. I would like to sort the DataFrame by the column animals and subsequently by day. How can I sort animals in the following order: dogs, pigs, cats? Thanks.
index animals day number
0 dogs 1 3
1 cats 2 1
2 dogs 3 4
3 pigs 4 0
4 pigs 5 6
5 cats 6 1
You can pass the columns to sort by as a list -
In [30]: df.sort(['animals', 'day'])
Out[30]:
animals day number
1 cats 2 1
5 cats 6 1
0 dogs 1 3
2 dogs 3 4
3 pigs 4 0
4 pigs 5 6
The order of columns determines how the dataframe gets sorted first, and how ties are broken.