Match quantity of items between two dataframes, based on SKU [duplicate] - python

This question already has answers here:
Pandas Merging 101
(8 answers)
Pandas: assign value depending on another dataframe
(1 answer)
Closed 1 year ago.
I want to copy the information (quantity) of one dataframe's column to the other dataframe's quantity column but do so matching the SKU column.
So for example the dataframes look like:
Dataframe 1:
SKU Quantity Title
A 3 Scissors
B 4 Cable
C 5 Goat
D 6 Cheese
Dataframe 2:
SKU Quantity Title
A 1 Blue Scissors
B 2 Red Cables
C 1 Fat Goat
D 2 Smelly Cheese
So I would like to get Dataframe 1's quantities and place them into Dataframe 2, but matching the SKUs (A, B, C, D etc) even though some other columns (such as Title) might have different information.

You can try to set index on SKU for both dataframes to align on index and copy the column Quantity with the aligned index. Reset index to restore SKU back to data column.
df1a = df1.set_index('SKU')
df2a = df2.set_index('SKU')
df2a['Quantity'] = df1a['Quantity']
df2 = df2a.reset_index()
Result:
print(df2)
SKU Quantity Title
0 A 3 Blue Scissors
1 B 4 Red Cables
2 C 5 Fat Goat
3 D 6 Smelly Cheese

You could try this :
df2['Quantity'] = np.where(df1['SKU'] == df2['SKU'], df1['Quantity'])

Related

Combining two pandas dataframes into one based on conditions

I got two dataframes, simplified they look like this:
Dataframe A
ID
item
1
apple
2
peach
Dataframe B
ID
flag
price ($)
1
A
3
1
B
2
2
B
4
2
A
2
ID: unique identifier for each item
flag: unique identifier for each vendor
price: varies for each vendor
In this simplified case I want to extract the price values of dataframe B and add them to dataframe A in separate columns depending on their flag value.
The result should look similar to this
Dataframe C
ID
item
price_A
price_B
1
apple
3
2
2
peach
2
4
I tried to split dataframe B into two dataframes the different flag values and merge them afterwards with dataframe A, but there must be an easier solution.
Thank you in advance! :)
*edit: removed the pictures
You can use pd.merge and pd.pivot_table for this:
df_C = pd.merge(df_A, df_B, on=['ID']).pivot_table(index=['ID', 'item'], columns='flag', values='price')
df_C.columns = ['price_' + alpha for alpha in df_C.columns]
df_C = df_C.reset_index()
Output:
>>> df_C
ID item price_A price_B
0 1 apple 3 2
1 2 peach 2 4
(dfb
.merge(dfa, on="ID")
.pivot_table(index=['ID', 'item'], columns='flag', values='price ($)')
.add_prefix("price_")
.reset_index()
)

Is there a pandas function to add in value of a column based on the other dataframe? [duplicate]

This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 6 months ago.
I would like to add a column to a pandas dataframe based on the value from another dataframe. Here is table 1 and table 2. I would like to update the duration for table 1 based on the value from table 2. Eg, row 1 in Table 1 is Potato, so the duration should be updated to 30 based on value from table 2.
Table 1
Crops
Entry Time
Duration
Potato
2022-03-01
0
Cabbage
2022-03-02
0
Tomato
2022-03-03
0
Potato
2022-03-0
0
Table 2
Crops
Duration
Potato
30
Cabbage
20
Tomato
25
Thanks.
Just use merge method:
df = df1.merge(df2, on='Crops', how='left')
Before doing that I suggest to drop the duration column in the first dataframe (df1).
The parameter 'on' defines on which column you want to merge (also called 'key') and how='left' it returns a dataframe with the length of the first dataframe. Imposing 'left' avoids that records with 'vegetables'in df1 that are not present in df2 will be deleted.
Google 'difference between inner, left, right and outer join'.

How do I rename and map column to multi index header and set another column as index?

I've been trying to change a dataset from this table to one that is multi indexed header of area and region code and an index of age of district. So far, tried using transpose and creating multi headers but I keep getting NA values. Any help would be appreciated!
Input:
Area Code
Region Code
Age of District
Amount of crime
C
B
0 - 2 years
2
D
A
2 - 4 years
5
Expected output:
Region Code
B
A
Area Code
C
D
Age of District
Amount of crime
Amount of crime
0 - 2 years
2
NA
2 - 4 years
NA
5
Any indication of how to do it or maybe a better way to structure would be greatly appreciated!
Lets say df is your data frame, simply use df.transpose to get transposed data
df_New = df.T.reset_index()

Python Pandas Delete empty cells

I'm trying to delete empty cells with pandas. I wanna delete only empty cells but I have no idea how to do that.
ex
A
B
C
D
E
F
G
H
I
J
1
apple
price
10
quantity
5
2
pineapple
price
12
condition
good
quantity
4
what I want
A
B
C
D
E
F
G
H
I
J
1
apple
price
10
quantity
5
2
pineapple
price
12
condition
good
quantity
4
I need all values without empty cells. So I don't want to delete whole row or column. I wanna delete empty cell and pull the values ​​in the back.
Real Data
EXCEL
I made it with this
Removing nan from pandas dataframe and reshaping dataframe
Keypoint : Change Invalid_val with len of value strings

How to Stack Data Frames on top of one another (Pandas,Python3)

Lets say i Have 3 Pandas DF
DF1
Words Score
The Man 2
The Girl 4
Df2
Words2 Score2
The Boy 6
The Mother 7
Df3
Words3 Score3
The Son 3
The Daughter 4
Right now, I have them concatenated together so that it becomes 6 columns in one DF. That's all well and good but I was wondering, is there a pandas function to stack them vertically into TWO columns and change the headers?
So to make something like this?
Family Members Score
The Man 2
The Girl 4
The Boy 6
The Mother 7
The Son 3
The Daughter 4
everything I'm reading here http://pandas.pydata.org/pandas-docs/stable/merging.html seems to only have "horizontal" methods of joining DF!
As long as you rename the columns so that they're the same in each dataframe, pd.concat() should work fine:
# I read in your data as df1, df2 and df3 using:
# df1 = pd.read_clipboard(sep='\s\s+')
# Example dataframe:
Out[8]:
Words Score
0 The Man 2
1 The Girl 4
all_dfs = [df1, df2, df3]
# Give all df's common column names
for df in all_dfs:
df.columns = ['Family_Members', 'Score']
pd.concat(all_dfs).reset_index(drop=True)
Out[16]:
Family_Members Score
0 The Man 2
1 The Girl 4
2 The Boy 6
3 The Mother 7
4 The Son 3
5 The Daughter 4

Categories