Python Pandas exchange column value - python

While I am using pandas.DataFrame, when I want to inverse whole Column Value, I find that they are different when I use DF.loc[wInd, 'Column'] and DF.loc[:, 'Column'], where the 1st case exchanges the value, but 2nd case gives me same column value. Why are they different? Thank you.
wInd = LineCircPart_df.index
for cWd in ['X', 'Y', 'Angle']:
(LineCircPart_df.loc[wInd,f'Start{cWd}'], LineCircPart_df.loc[wInd,f'End{cWd}']) =
(LineCircPart_df.loc[wInd, f'End{cWd}'], LineCircPart_df.loc[wInd,f'Start{cWd}'])
and i need to modified with .copy() for the value assigned to, like:
wInd = LineCircPart_df.index
for cWd in ['X', 'Y', 'Angle']:
LineCircPart_df.loc[:,f'Start{cWd}'], LineCircPart_df.loc[:,f'End{cWd}'] =
(LineCircPart_df.loc[:, f'End{cWd}'].copy(), LineCircPart_df.loc[:,f'Start{cWd}'].copy())
Any Suggestions?
Example updated as follows:
LineCircPart_df = pd.DataFrame({'StartX': [3000, 4000, 5000], 'StartY': [30, 40, 50], 'StartAngle': [3, 4, 5], 'EndX': [6000, 7000, 8000], 'EndY': [60, 70, 80], 'EndAngle': [6, 7, 8],})
for cWd in ['X', 'Y', 'Angle']:
(LineCircPart_df.loc[:,f'Start{cWd}'], LineCircPart_df.loc[:,f'End{cWd}']) = (LineCircPart_df.loc[:, f'End{cWd}'],LineCircPart_df.loc[:,f'Start{cWd}'])

Related

generate new row in pandas if column value is over certain value

I'm writing a function to get the HOR_df output below by feeding in 2 dataframes, planned_routes and actual_shippers.
The function should look at the actual_pallets (from actual_shippers df) and then look at planned_routes to see if the columns actual pallets > max_truck_capacity. If it does, it should generate a new row, like in the picture below.
Visualisation of the inputs and wanted output:
Note
In the above case: S1 had planned 10 pallets, but with the new actual_pallets was increased so that the max_truck_capacity is too small to handle the new actual_pallets. Therefore a new row is generated with the S2 ID, and the 3 needed extra pallets.
HOR_df has in this case made sure that on the 1st of December 2021, the actual_pallets for shipper S2 were split up onto routes of 10 and 3 pallets separately, instead of the 10 pallets that were in the initial planned_routes.
Potential idea how it should be done
I'm not sure what's the most efficient way to do this, for instance if I should build something that iteratively goes through and "fill the routes up" with the new "actual_pallet" data?
# x = planned_routes
# y = actual_shippers
# z = cost of adhocs and cancellations
# w = truck eligibility
def optimal_trips(x, y, z, w):
# Step 1: Take in actual shippers package and pallet data
# Step 2: Take the actual data and feed it into the planned routes and add routes based on demand.
# Step 3: return a df with the new optimal routes.
Code for the dfs (to replicate)
Input 1:
planned_routes = pd.DataFrame({
'date':['2021-12-01','2021-12-02'],
'planned_route_id':['R1', 'R2'],
'S1_id':['S1', 'S1'],
'S2_id':['S2', 'S2'],
'S3_id':['NaN', 'NaN'],
'S4_id':['NaN', 'NaN'],
'S1_planned_packages':[110, 100],
'S2_planned_packages':[120, 100],
'S3_planned_packages':['NaN', 'NaN'],
'S4_planned_packages':['NaN', 'NaN'],
'total_planned_packages':[230, 200],
'S1_planned_pallets':[11, 10],
'S2_planned_pallets':[12, 10],
'S3_planned_pallets':['NaN', 'NaN'],
'S4_planned_pallets':['NaN', 'NaN'],
'total_pallets':[23, 20],
'truck_max_capacity':[24, 24],
'cost_route':[120, 120]
})
Input 2:
actual_shippers = pd.DataFrame({
'date':['2021-12-01','2021-12-01','2021-12-02','2021-12-02'],
'shipper_id':['S1', 'S2','S1', 'S2'],
'actual_packages':[140, 130, 140, 130],
'shipper_spp':[10, 10, 10, 10],
'actual_pallets':[14, 13, 14, 13],
'shipper_max_eligibility':[24, 24, 24, 24],
'truck_max_capacity':[24, 24, 24, 24]
})
Wanted output:
HOR_df = pd.DataFrame({
'date':['2021-12-01','2021-12-01', '2021-12-02', '2021-12-02'],
'planned_route_id':['R1', 'R3','R2', 'R4'],
'S1_id':['S1', 'S2', 'S1', 'S2'],
'S2_id':['S2', 'NaN', 'S2', 'NaN'],
'S3_id':['NaN', 'NaN','NaN', 'NaN'],
'S4_id':['NaN', 'NaN', 'NaN', 'NaN'],
'S1_actual_packages':[140, 0, 140, 0],
'S2_actual_packages':[100, 30, 100, 30],
'S3_actual_packages':['NaN', 'NaN', 'NaN', 'NaN'],
'S4_actual_packages':['NaN', 'NaN', 'NaN', 'NaN'],
'total_planned_packages':[240, 30, 240, 30], # sum(S1_actual_packages, S2_actual packages, S3... etc)
'S1_actual_pallets':[14, 3, 14, 3],
'S2_actual_pallets':[10, 'NaN', 10, 'NaN'],
'S3_actual_pallets':['NaN', 'NaN', 'NaN', 'NaN'],
'S4_actual_pallets':['NaN', 'NaN', 'NaN', 'NaN'],
'total_pallets':[24, 3, 24, 3], #sum(S1_actual_pallets, S2 ... etc)
'truck_max_capacity':[24, 24, 24, 24],
'cost_route':[120, 130, 120, 130]
})

Why is this Groupby transform not working?

For a dummy dataset, which each id corresponds to one match:
df2 = pd.DataFrame(columns=['id', 'score', 'duration', 'user'],
data=[[1, 800, 60, 'abc'], [1, 900, 60, 'zxc'], [2, 800, 250, 'abc'], [2, 5000, 250, 'bvc'],
[3, 6000, 250, 'zxc'], [3, 8000, 250, 'klp'], [4, 1400, 500,'kod'],
[4, 8000, 500, 'bvc']])
If I want to keep only the records where either one of the same id have duration greater than 120 and score greater than 1500, this works fine:
cond = df2['duration'].gt(120) & df2['score'].gt(1500)
out = df2[cond.groupby(df2['id']).transform('all')]
and returns 2 instances of the same id. However, if I want to keep only the pairs of id's where the user is 'abc' it does not work. I have tried:
out = df2[(df2['user'].eq('abc')).groupby(df2['id']).transform('all')]
out = df2[(df2['user'] == 'abc').groupby(df2['id']).transform('all')]
and they both return blank df's. How to solve this problem? The outcome should be any match that user 'abc' played in.
From the comments, you want 'any', not 'all':
out = df2[(df2['user'] == 'abc').groupby(df2['id']).transform('any')]

Splitting ranges according to list of excluded ranges

I have two arrays:
ranges = [[50, 60], [100, 100], [5000, 6000]]
exclude = [[1000, 1000], [5060, 5060]]
How to get the result as a sorted list that is
[[50, 60], [100, 100], [5000, 5059], [5061, 6000]]
Basically, remove the ranges of the second list from the ranges of the first list, creating new ranges where needed.
More examples:
ranges = [[2, 124235]]
exclude = [[2000, 3000], [400, 2500]]
that gives me output
[[2, 399], [3001, 124235]]

Rolling sum for a window of 2 days

I am trying to compute a rolling 2 day using trans_date sum against the amount column that is grouped by ID within the table below using python.
<table><tbody><tr><th>ID</th><th>Trans_Date</th><th>Trans_Time</th><th>Amount</th><th> </th></tr><tr><td>1</td><td>03/23/2019</td><td>06:51:03</td><td>100</td><td> </td></tr><tr><td>1</td><td>03/24/2019</td><td>12:32:48</td><td>600</td><td> </td></tr><tr><td>1</td><td>03/24/2019</td><td>14:15:35</td><td>50</td><td> </td></tr><tr><td>1</td><td>06/05/2019</td><td>16:18:21</td><td>75</td><td> </td></tr><tr><td>2</td><td>02/01/2019</td><td>18:02:52</td><td>200</td><td> </td></tr><tr><td>2</td><td>02/02/2019</td><td>10:03:02</td><td>150</td><td> </td></tr><tr><td>2</td><td>02/03/2019</td><td>23:47:51</td><td>800</td><td> </td></tr><tr><td>3</td><td>01/18/2019</td><td>11:12:58</td><td>1000</td><td> </td></tr><tr><td>3</td><td>01/23/2019</td><td>22:12:41</td><td>15</td><td> </td></tr></tbody></table>
Ultimately, I am trying to achieve the result below using
<table><tbody><tr><th>ID</th><th>Trans_Date</th><th>Trans_Time</th><th>Amount</th><th>2d_Running_Total</th><th> </th></tr><tr><td>1</td><td>03/23/2019</td><td>06:51:03</td><td>100</td><td>100</td><td> </td></tr><tr><td>1</td><td>03/24/2019</td><td>12:32:48</td><td>600</td><td>700</td><td> </td></tr><tr><td>1</td><td>03/24/2019</td><td>14:15:35</td><td>250</td><td>950</td><td> </td></tr><tr><td>1</td><td>06/05/2019</td><td>16:18:21</td><td>75</td><td>75</td><td> </td></tr><tr><td>2</td><td>02/01/2019</td><td>18:02:52</td><td>200</td><td>200</td><td> </td></tr><tr><td>2</td><td>02/02/2019</td><td>10:03:02</td><td>150</td><td>350</td><td> </td></tr><tr><td>2</td><td>02/03/2019</td><td>23:47:51</td><td>800</td><td>950</td><td> </td></tr><tr><td>3</td><td>01/18/2019</td><td>11:12:58</td><td>1000</td><td>1000</td><td> </td></tr><tr><td>3</td><td>01/23/2019</td><td>22:12:41</td><td>15</td><td>15</td><td> </td></tr></tbody></table>
This hyperlink was very close to solving this, but the issue is for the records that have multiple transactions on the same day, it provides the same value for the same day.
https://python-forum.io/Thread-Rolling-sum-for-a-window-of-2-days-Pandas
This should do it:
import pandas as pd
# create dummy data
df = pd.DataFrame(
columns = ['ID', 'Trans_Date', 'Trans_Time', 'Amount'],
data = [
[1, '03/23/2019', '06:51:03', 100],
[1, '03/24/2019', '12:32:48', 600],
[1, '03/24/2019', '14:15:35', 250],
[1, '06/05/2019', '16:18:21', 75],
[2, '02/01/2019', '18:02:52', 200],
[2, '02/02/2019', '10:03:02', 150],
[2, '02/03/2019', '23:47:51', 800],
[3, '01/18/2019', '11:12:58', 1000],
[3, '01/23/2019', '22:12:41', 15]
]
)
df_out = pd.DataFrame(
columns = ['ID', 'Trans_Date', 'Trans_Time', 'Amount', '2d_Running_Total'],
data = [
[1, '03/23/2019', '06:51:03', 100, 100],
[1, '03/24/2019', '12:32:48', 600, 700],
[1, '03/24/2019', '14:15:35', 250, 950],
[1, '06/05/2019', '16:18:21', 75, 75],
[2, '02/01/2019', '18:02:52', 200, 200],
[2, '02/02/2019', '10:03:02', 150, 350],
[2, '02/03/2019', '23:47:51', 800, 950],
[3, '01/18/2019', '11:12:58', 1000, 1000]
]
)
# convert into datetime object and set as index
df['Trans_DateTime'] = pd.to_datetime(df['Trans_Date'] + ' ' + df['Trans_Time'])
df = df.set_index('Trans_DateTime')
# group by ID and apply rolling window to the amount column
df['2d_Running_Total'] = df.groupby('ID')['Amount'].rolling('2d').sum().values.astype(int)
df.reset_index(drop=True)

Given a dictionary of lists, is there a pythonic smart way of comparing the i-th element of each list and extract the maximum value?

I have a dictionary like the following:
{
'k0': [10, 35, 20],
'k1': [2, 0, 40],
'k2': [21, 400, 5],
}
I want to obtain a list with the maximum values in each i-th position of the list. For instance, in this case:
max_val_list = [21, 400, 40]
Current way of doing it (which seems too messy to me):
1. Extract the lists:
k0_list = dicc_name[k0]
k1_list = dicc_name[k1]
k2_list = dicc_name[k2]
Find the max:
for i, item in enumerate(k0_list):
max_val_list.append(max([item, k1_list[i], k2_list[i]]))
I am sure there must be a way to do it in an elegant way directly from the dictionary and I would like to learn.
You can zip the values of the dict, and get the max of each column:
data = {
'k0': [100, 35, 20],
'k1': [2, 0, 40],
'k2': [21, 400, 5],
}
[max(col) for col in zip(*data.values())]
# [100, 400, 40]
If you use numpy
>>> import numpy as np
>>> np.max([*data.values()],axis = 0).tolist()
[100, 400, 40]

Categories