Finding the closest value in a column pandas

Finding the closest value in a column pandas - python

I'm trying to find for each price column the next cheapest product available on the day. my data looks something like this
data = [['29/10/18', 400, 300, 200],
['29/10/18', 250, 400, 100],
['29/10/18', 600, 600, 300],
['30/10/18', 300, 500, 100]]
df = pd.DataFrame(data, columns = ['date', 'price 1', 'price2', 'price3'])
my output would look something like this
date price1 nearestPrice1 price2 nearestPrice2
29/10/18 400 250 300 400
29/10/18 250 400 400 300
29/10/18 600 400 600 400

f = lambda row, col: df.loc[df[df['date'] == row['date']][col].sub(row[col])\
.abs().nsmallest(2).idxmax(), col]
df['nearest_price1'] = df.apply(f, col = 'price 1', axis = 1)
df['nearest_price2'] = df.apply(f, col = 'price2', axis = 1)
df['nearest_price3'] = df.apply(f, col = 'price3', axis = 1)
Outputs:
date price 1 price2 price3 nearest_price1 nearest_price2 \
0 29/10/18 400 300 200 250 400
1 29/10/18 250 400 100 400 300
2 29/10/18 600 600 300 400 400
3 30/10/18 300 500 100 300 500
nearest_price3
0 100
1 200
2 200
3 100
Explanation:
Uses a lambda function f, apply this function to each column (price 1, price2, price3), and gets the results.
It works as following:
By sub the price of other prices in same date.
It looks for the two smallest abs prices using nsmallest.
Lastly, use idxmax to index the second smallest price (because the 1st smallest price would be itself having an absolute difference of 0)

If I understand this correctly you need to find the cheapest prices for a given day starting with the cheapest then the nearest cheapest and so on...
This means that you'd need to first extract all the prices for the given day. You could do this with a simple for loop where for example if the text in the first column is '29/10/18' then add the data from the rest of the columns to a list or make a new DataFrame from it. In either case, once you have all the prices for the data you can use the .sort_values function provided with pandas and specify that you want it as ascending. Function documentation

Related

how to sum multiple row data using pandas and data is excel formet

Hi Everyone how to sum multiple row data using pandas and the data is excel format, and sum only edwin and maria data,please help me out thanks in advance.
excel data
name
salary
incentive
0
john
2000
400
1
edwin
3000
600
2
maria
1000
200
expected output
name
salary
incentive
0
Total
5000
1000
1
john
2000
400
2
edwin
3000
600
3
maria
1000
200

Judging by the Total line, you need the sums of 'john', 'edwin', not edwin and maria. I used the isin function, which returns a boolean mask, which is then used to select the desired rows (the ind variable). In the dataframe, the line with Total is filled with sums. Then pd.concat is used to concatenate the remaining lines. On the account sum in Excel, do not understand what you want?
import pandas as pd
df = pd.DataFrame({'name':['john', 'edwin', 'maria'], 'salary':[2000, 3000, 1000], 'incentive':[400, 600, 200]})
ind = df['name'].isin(['john', 'edwin'])
df1 = pd.DataFrame({'name':['Total'], 'salary':[df.loc[ind, 'salary'].sum()], 'incentive':[df.loc[ind, 'incentive'].sum()]})
df1 = pd.concat([df1, df])
df1 = df1.reset_index().drop(columns='index')
print(df1)
Output
name salary incentive
0 Total 5000 1000
1 john 2000 400
2 edwin 3000 600
3 maria 1000 200

Python : Subtract columns value in ascending order value of a column

Have a dataframe mortgage_data with columns name mortgage_amount and month (in asceding order)
mortgage_amount_paid = 1000
mortgage_data:
name mortgage_amount month
mark 400 1
mark 500 2
mark 200 3
How to deduct and update mortgage_amount in ascending order or month using mortgage_amount_paid row by row in a dataframe
and add a column paid_status as yes if mortgage_amount_paid is fully deducted for that amount and no if not like this
if mortgage_amount_paid = 1000
mortgage_data:
name mortgage_amount month mortgage_amount_updated paid_status
mark 400 1 0 full
mark 500 2 0 full
mark 200 3 100 partial
ex:
if mortgage_amount_paid = 600
mortgage_data:
name mortgage_amount month mortgage_amount_updated paid_status
mark 400 1 0 full
mark 500 2 300 partial
mark 200 3 200 zero
tried this:
import numpy as np
mortgage_amount_paid = 1000
df['mortgage_amount_updated'] = np.where(mortgage_amount_paid - df['mortgage_amount'].cumsum() >=0 , 0, df['mortgage_amount'].cumsum() - mortgage_amount_paid)
df['paid_status'] = np.where(df['mortgage_amount_updated'],'full','partial')

IIUC, you can use masks:
mortgage_amount_paid = 600
# amount saved - debt
m1 = df['mortgage_amount'].cumsum().sub(mortgage_amount_paid)
# is it positive?
m2 = m1>0
# is the previous month also positive?
m3 = m2.shift(fill_value=False)
df['mortgage_amount_updated'] = (m1.clip(0, mortgage_amount_paid)
.mask(m3, df['mortgage_amount'])
)
df['paid_status'] = np.select([m3, m2], ['zero', 'partial'], 'full')
output:
name mortgage_amount month mortgage_amount_updated paid_status
0 mark 400 1 0 full
1 mark 500 2 300 partial
2 mark 200 3 200 zero

Idea is the cumsum before partial should less than mortgage_amount_paid and there could be at most one partial
mortgage_amount_paid = 600
m = df['mortgage_amount'].cumsum()
df['paid_status'] = np.select(
[m <= mortgage_amount_paid,
(m > mortgage_amount_paid) & (m.shift() < mortgage_amount_paid)
],
['full', 'partial'],
default='zero'
)
df['mortgage_amount_updated'] = np.select(
[df['paid_status'].eq('full'),
df['paid_status'].eq('partial')],
[0, m-mortgage_amount_paid],
default=df['mortgage_amount']
)
print(df)
name mortgage_amount month paid_status mortgage_amount_updated
0 mark 400 1 full 0
1 mark 500 2 partial 300
2 mark 200 3 zero 200

check specific values in a dataframe and make the sum

I want to make the sum of each 'Group' which have at least one 'Customer' with an 'Active' Bail.
Sample Input :
Customer ID Group Bail Amount
0 23453 NAFNAF Active 200
1 23849 LINDT Active 350
2 23847 NAFNAF Inactive 100
3 84759 CARROUF Inactive 20
For example 'NAFNAF' has 2 customers, including one with an active bail.
Output expected :
NAFNAF : 300
LINDT : 350
TOTAL ACTIVE: 650
I don't wanna change the original dataframe

You can use:
(df.assign(Bail=df.Bail.eq('Active'))
.groupby('Group')[['Bail', 'Amount']].agg('sum')
.loc[lambda d: d['Bail'].ge(1), ['Amount']]
)
output:
Amount
Group
LINDT 350
NAFNAF 300
Full output with total:
df2 = (
df.assign(Bail=df.Bail.eq('Active'))
.groupby('Group')[['Bail', 'Amount']].agg('sum')
.loc[lambda d: d['Bail'].ge(1), ['Amount']]
)
df2 = pd.concat([df2, df2.sum().to_frame('TOTAL').T])
output:
Amount
LINDT 350
NAFNAF 300
TOTAL 650

Create a boolean mask of Group with at least one active lease:
m = df['Group'].isin(df.loc[df['Bail'].eq('Active'), 'Group'])
out = df[m]
At this point, your filtered dataframe looks like:
>>> out
Customer ID Group Bail Amount
0 23453 NAFNAF Active 200
1 23849 LINDT Active 350
2 23847 NAFNAF Inactive 100
Now you can use groupby and sum:
out = df[m].groupby('Group')['Amount'].sum()
out = pd.concat([out, pd.Series(out.sum(), index=['TOTAL ACTIVE'])])
# Output
LINDT 350
NAFNAF 300
TOTAL ACTIVE 650
dtype: int64

How can I sum rows of a column based on an index condition to create a % of group column?

I have the following Pandas DataFrame:
# Create DataFrame
import pandas as pd
data = {'Menu Category': ['Appetizers', 'Appetizers', 'Appetizers', 'Mains', 'Mains',
'Mains', 'Desserts', 'Desserts', 'Desserts'],
'Menu Item': ['Fries', 'Poppers', 'Wings', 'Pasta', 'Burger', 'Pizza',
'Ice Cream', 'Cake', 'Fruit'],
'Sales Quantity': [100, 50, 40, 200, 400, 250, 100, 120, 50],
}
df = pd.DataFrame(data)
df
I would like to add two columns. 1) that shows the % Quantity of the Menu that each item represents (entire menu being this dataset), and 2) that shows the % Quantity of the Menu Category the item belongs to (like what percentage of the Sale Quantity does Fries represent of the Appetizers group, i.e. (100/190) * 100).
I know how to get the first column mentioned:
# Add % Quantity of Menu Column
percent_menu_qty = []
for i in df['Sales Quantity']:
i = round(i/df['Sales Quantity'].sum() * 100, 2)
percent_menu_qty.append(i)
df['% Quantity of Menu'] = percent_menu_qty
df
What I am not sure how to do is the second one. I have tried by setting Menu Category as the index and doing the following:
# Add % Quantity of Menu Category Column
df = df.set_index('Menu Category')
lst = []
for index, x in df['Sales Quantity'].iteritems():
if index == 'Appetizers':
x = x/sum(x)
lst.append(x)
elif index == 'Mains':
x = x/sum(x)
lst.append(x)
elif index == 'Desserts':
x =x/sum(x)
lst.append(x)
lst
I know I need to somehow set a condition for each Menu Category that if index == 'a certain menu category value' then divide quantity by the sum of that menu category. Thus far I haven't been able to figure it out.

First of all, I would like to compliment you on using comprehensive row by row. I still use them for time to time, because I consider loops to be easier for someone else to read and understand what the principle is without running the code itself.
But ye. For this solution, I have created a couple one liners and let me explain what each are.
df['% Quantity of Menu'] = ((df['Sales Quantity']/df['Sales Quantity'].sum())*100).round(2)
For your first problem, instead of looping row to row, this divides the column value with a scalar value (which is the total of the column df['Sales Quantity'].sum()), then the ratio is multiplied with 100 for percentage, then round off at 2 decimal points.
df['%Qty of Menu Category'] = ((df['Sales Quantity']/df.groupby(['Menu Category'])['Sales Quantity'].transform('sum'))*100).round(2)
So, for the second problem, we need to divide the column value with the total of each corresponding category instead of the whole column. So, we get the value with groupby for each category df.groupby(['Menu Category'])['Sales Quantity'].transform('sum'), then did the same as the first one, by replacing the portion of the code.
Here, why do we use df.groupby(['Menu Category'])['Sales Quantity'].transform('sum') instead of df.groupby(['Menu Category'])['Sales Quantity'].sum()? Because for division of a series can be done either with a scalar or with a series of same dimension, and the former way gives us the series of same dimension.
df['Sales Quantity']
0 100
1 50
2 40
3 200
4 400
5 250
6 100
7 120
8 50
Name: Sales Quantity, dtype: int64
df.groupby(['Menu Category'])['Sales Quantity'].transform('sum')
0 190
1 190
2 190
3 850
4 850
5 850
6 270
7 270
8 270
Name: Sales Quantity, dtype: int64
df.groupby(['Menu Category'])['Sales Quantity'].sum()
Menu Category
Appetizers 190
Desserts 270
Mains 850
Name: Sales Quantity, dtype: int64

I think you're looking for groupby + transform sum to get the "Category" sums; then divide each "Sales Quantity" by their "Category" sum. This gives us the share of each menu item in their menu category.
You can also use the vectorized div method instead of loop for the first column:
df['%Qty of Menu'] = df['Sales Quantity'].div(df['Sales Quantity'].sum()).mul(100).round(2)
df['%Qty of Menu Cat'] = df.groupby('Menu Category')['Sales Quantity'].transform('sum').rdiv(df['Sales Quantity']).mul(100).round(2)
Output:
Menu Category Menu Item Sales Quantity %Qty of Menu %Qty of Menu Cat
0 Appetizers Fries 100 7.63 52.63
1 Appetizers Poppers 50 3.82 26.32
2 Appetizers Wings 40 3.05 21.05
3 Mains Pasta 200 15.27 23.53
4 Mains Burger 400 30.53 47.06
5 Mains Pizza 250 19.08 29.41
6 Desserts Ice Cream 100 7.63 37.04
7 Desserts Cake 120 9.16 44.44
8 Desserts Fruit 50 3.82 18.52

Cumulative subtracting a pandas group by column from a variable

Hi I have a dataframe that lists items that I own, along with their Selling Price.
I also have a variable that defines my current debt. Example:
import pandas as pd
current_debt = 16000
d = {
'Person' : ['John','John','John','John','John'],
'Ïtem': ['Car','Bike','Computer','Phone','TV'],
'Price':[10500,3300,2100,1100,800],
}
df = pd.DataFrame(data=d)
df
I would like to "payback" the current_debt starting with the most expensive item and continuing until the debt is paid. I would like to list the left over money aligned to the last item sold. I'm hoping the function can inlcude a groupby clause for Person as sometimes there is more than one name in the list
My expected output for the debt in the example above would be:
If anyone could help with a function to calculate this that would be fantastic. I wasnt sure whether I needed to convert the dataframe to a list or it could be kept as a dataframe. Thanks very much!

Using a cumsum transformation and np.where to cover your logic for the final price column:
import numpy as np
df = df.sort_values(["Person", "Price"], ascending=False)
df['CumPrice'] = df.groupby("Person")['Price'].transform('cumsum')
df['Diff'] = df['CumPrice'] - current_debt
df['PriceLeft'] = np.where(
df['Diff'] <= 0,
0,
np.where(
df['Diff'] < df['Price'],
df['Diff'],
df['Price']
)
)
Result:
Person Item Price CumPrice Diff PriceLeft
0 John Car 10500 10500 -5500 0
1 John Bike 3300 13800 -2200 0
2 John Computer 2100 15900 -100 0
3 John Phone 1100 17000 1000 1000
4 John TV 800 17800 1800 800

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding the closest value in a column pandas - python

Related

how to sum multiple row data using pandas and data is excel formet

Python : Subtract columns value in ascending order value of a column

check specific values in a dataframe and make the sum

How can I sum rows of a column based on an index condition to create a % of group column?

Cumulative subtracting a pandas group by column from a variable

Categories

Resources