Challenge : TSP problem and finding the right minimized order of points - python

I have stolen diamonds in a lot of different places. The places are on a coordinate system (x,y) where each place is named after a number and have an d-time for example:
Name X Y dT
1 283 248 0
2 100 118 184
3 211 269 993
4 200 200 948
5 137 152 0
6 297 263 513
7 345 256 481
8 265 212 0
9 185 222 840
10 214 180 1149
11 153 218 0
12 199 199 0
13 289 285 149
14 177 184 597
15 359 192 0
16 161 207 0
17 94 121 316
18 296 246 0
19 193 122 423
20 265 216 11
dT stand for due time and it's given for each place, which is the fixed time when we need to get the diamonds back before the thief move their hideout away.
Starting point is always 1.
I need to visit all places only once and get the diamonds back such that the total delay is minimized.
Distance is calculated with Euclidean distance rounded to its closest integer.
The arrival time for each places is calculated as in distance + previous distance. The delay for each place is arrival-due and the total delay is the sum of the delays between places.
If the police can get the diamonds before the due time of that place, then the delay is equal to 0; otherwise, the delay equals the difference between the time of arrival and due time of the place.
My mission is to find the right order in which the police can visit each place once that minimizes the delay for two larger instances.
I think I'm pretty much close to an answer myself but I would love to know how would you solve it and also to get a better understanding of the math behind it to be able to program it better.
Here are my codes that calculate everything, the only thing missing is the way to find the right order :
#------------------------------------------------------------------------
poss=[(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)] # the order
here=[]
for p in range(len(poss)):
tempos=[]
for o in range(len(index)):
point=poss[p][o]
valuez=order[point-1]
tempos.append(valuez)
here.append(tempos)
#//DUE//
due =[[item[b][3] for b in range(len(index))] for item in here]
#//DISTANCE//
x_ = [[item[b][1] for b in range(len(index))] for item in here]
y_ = [[item[b][2] for b in range(len(index))] for item in here]
z = [list(zip(x_[a],y_[a])) for a in range(len(x_))]
dis = []
for aa in range(len(poss)) :
tempor=[]
for i in range(len(index)-1):
firstpoint = z[aa][i]
secondpoint = z[aa][i+1]
distance = round(np.linalg.norm(np.array(secondpoint)-np.array(firstpoint)))
distance = int(distance)
tempor.append(distance)
dis.append(tempor)
#//ARRIVAL TIME//
#Arrival time is the sum of the pv distance.
arrival = []
for v in range(len(poss)):
forone = [0,dis[v][0],]
for r in range(len(index)-2):
sumz = dis[v][r+1] + forone[r+1]
sumz = int(sumz)
forone.append(sumz)
arrival.append(forone)
#//DELAY//
delay=[]
for d in range(len(poss)) :
tempo=[]
for q in range(len(index)):
v=arrival[d][q]-due[d][q]
if arrival[d][q] <= due[d][q]:
tempo.append(0)
else :
tempo.append(v)
delay.append(tempo)
#//ORDER//
#goal is to find the right order that minimizes the delay for two larger instances.
total = [sum(x) for x in delay]
small= min(total)
final=here[total.index(small)]
print(small)

You could solve this by implmenting the travelling salesman problem, but it needs a simple modification. In the TSP, the cost of moving to the next location is just its distance from your current location. In your algorithm, the cost is calculated:
if current_travelling_time < dT then
cost = 0
else
cost = dT - current_travelling_time
The total cost of each path is calculated by summing up their costs. The path with the minimum cost is the one you want.
Solving this can be programmed simply by calculating these costs across all permutations of locations, bar the first location.
Note that this is very computationally expensive, so you should consider dynamic programming.
For this brute-force approach, see https://codereview.stackexchange.com/questions/110221/tsp-brute-force-optimization-in-python. The cost would need to be calculated differently, as I've mentioned.

Related

Calculating Distances and Filtering and Summing Values Pandas

I currently have data which contains a location name, latitude, longitude and then a number value associated locations. The final goal for me would to get a dataframe that has the sum of the values of each location within specific distance ranges. A sample dataframe is below:
IDVALUE,Latitude,Longitude,NumberValue
ID1,44.968046,-94.420307,1
ID2,44.933208,-94.421310,10
ID3,33.755787,-116.359998,15
ID4,33.844843,-116.54911,207
ID5,44.92057,-93.44786,133
ID6,44.240309,-91.493619,52
ID7,44.968041,-94.419696,39
ID8,44.333304,-89.132027,694
ID9,33.755783,-116.360066,245
ID10,33.844847,-116.549069,188
ID11,44.920474,-93.447851,3856
ID12,44.240304,-91.493768,189
Firstly, I managed to get the distances between each of them using the haversine function. Using the code below I turned the latlongs into radians and then created a matrix where the diagonals are infinite values.
df_latlongs['LATITUDE'] = np.radians(df_latlongs['LATITUDE'])
df_latlongs['LONGITUDE'] = np.radians(df_latlongs['LONGITUDE'])
dist = DistanceMetric.get_metric('haversine')
latlong_df = pd.DataFrame(dist.pairwise(df_latlongs[['LATITUDE','LONGITUDE']].to_numpy())*6373, columns=df_latlongs.IDVALUE.unique(), index=df_latlongs.IDVALUE.unique())
np.fill_diagonal(latlong_df.values, math.inf)
This distance matrix is then in kilometres. What I'm struggling with next is to be able to filter the distances of each of the locations and get the total number of values within a range and link this to the original dataframe.
Below is the code I have used to filter the distance matrix to get all of the locations within 500 meters:
latlong_df_rows = latlong_df[latlong_df < 0.5]
latlong_df_rows = latlong_df_rows.dropna(how='all', axis=0)
latlong_df_rows = latlong_df_rows.dropna(how='all', axis=1)
My attempt was to them get a list for each location of the locations that were in this value using the code below:
within_range_df = latlong_df_rows.apply(lambda row: row[row < 0.05].index.tolist(), axis=1)
within_range_df = within_range_df.to_frame()
within_range_df = within_range_df.dropna(how='all', axis=0)
within_range_df = within_range_df.dropna(how='all', axis=1)
From here I was going to try and get the NumberValue from the original dataframe by looping through the list of values to obtain another column for the number for that location. Then sum all of them. The final dataframe would ideally look like the following:
ID VALUE,<500m,500-1000m,>100m
ID1,x1,y1,z1
ID2,x2,y2,z2
ID3,x3,y3,z3
ID4,x4,y4,z4
ID5,x5,y5,z5
ID6,x6,y6,z6
ID7,x7,y7,z7
ID8,x8,y8,z8
ID9,x9,y9,z9
ID10,x10,y10,z10
ID11,x11,y11,z11
ID12,x12,y12,z12
Where x y and z are the total number values for the nearest locations for different distances. I know this is probably really weird and overcomplicated so any tips to change the question or anything else that is needed I'll be happy to provide. Cheers
I would define a helper function, making use of BallTree, e.g.
from sklearn.neighbors import BallTree
import pandas as pd
import numpy as np
df = pd.read_csv('input.csv')
We use query_radius() to get the IDs and use list comprehension to get the values and sum them;
locations_radians = np.radians(df[["Latitude","Longitude"]].values)
tree = BallTree(locations_radians, leaf_size=12, metric='haversine')
def summed_numbervalue_for_radius( radius_in_m=100):
distance_in_meters = radius_in_m
earth_radius = 6371000
radius = distance_in_meters / earth_radius
ids_within_radius = tree.query_radius(locations_radians, r=radius, count_only=False)
values_as_array = np.array(df.NumberValue)
summed_values = [values_as_array[ix].sum() for ix in ids_within_radius]
return np.array(summed_values)
With the helper function you can do for instance;
df = df.assign( sum_100=summed_numbervalue_for_radius(100))
df = df.assign( sum_500=summed_numbervalue_for_radius(500))
df = df.assign( sum_1000=summed_numbervalue_for_radius(1000))
df = df.assign( sum_1000_to_5000=summed_numbervalue_for_radius(5000)-summed_numbervalue_for_radius(1000))
Will give you
IDVALUE Latitude Longitude NumberValue sum_100 sum_500 sum_1000 \
0 ID1 44.968046 -94.420307 1 40 40 40
1 ID2 44.933208 -94.421310 10 10 10 10
2 ID3 33.755787 -116.359998 15 260 260 260
3 ID4 33.844843 -116.549110 207 395 395 395
4 ID5 44.920570 -93.447860 133 3989 3989 3989
5 ID6 44.240309 -91.493619 52 241 241 241
6 ID7 44.968041 -94.419696 39 40 40 40
7 ID8 44.333304 -89.132027 694 694 694 694
8 ID9 33.755783 -116.360066 245 260 260 260
9 ID10 33.844847 -116.549069 188 395 395 395
10 ID11 44.920474 -93.447851 3856 3989 3989 3989
11 ID12 44.240304 -91.493768 189 241 241 241
sum_1000_to_5000
0 10
1 40
2 0
3 0
4 0
5 0
6 10
7 0
8 0
9 0
10 0
11 0

Need help dividing the ratio of elements in a Python list

I am working on a problem where I have been asked to a) output Fibonacci numbers in a sequence based on user input, as I have done below, and b) divide and print the ratio of the two most recent terms.
fixed_start = [0, 1]
def fib(fixed_start, n):
if n == 0:
return fixed_start
else:
fixed_start.append(fixed_start[-1] + fixed_start[-2])
return fib(fixed_start, n -1)
numb = int(input('How many terms: '))
fibonacci_list = fib(fixed_start, numb)
print(fibonacci_list[:-1])
I would like for my output to look something like the below:
"How many terms:" 3
1 1
the ratio is 1.0
1 2
the ratio is 2.0
2 3
the ratio is 1.5
Are you looking for ratio of the last 2 items in the list? If yes, this should work.
print(fibonacci_list[-2:])
print(float(fibonacci_list[-1]/fibonacci_list[-2]))
Or, if you are looking for ratio between every 2 numbers (except 0 & 1 right at the start), the below code should do the trick
for x,y in zip(fibonacci_list[1:],fibonacci_list[2:]):
print(x,y)
print('the ratio is ' + str(round((y/x),3)))
output is something like below for a fibonacci list of 15 terms
1 1
the ratio is 1.0
1 2
the ratio is 2.0
2 3
the ratio is 1.5
3 5
the ratio is 1.667
5 8
the ratio is 1.6
8 13
the ratio is 1.625
13 21
the ratio is 1.615
21 34
the ratio is 1.619
34 55
the ratio is 1.618
55 89
the ratio is 1.618
89 144
the ratio is 1.618
144 233
the ratio is 1.618
233 377
the ratio is 1.618
377 610
the ratio is 1.618
610 987
the ratio is 1.618
As you have already solved the part one of generating the Fibonacci series in the form of a list, you can access the last two elements (most recent) from it and take their ratio. Python allows us to access the elements of the list from backwards using the negative indexing
def fibonacci_ratio(fibonacci_list):
last_element = fibonacci_list[-1]
second_last_element = fibonacci_list[-2]
ratio = last_element//second_last_element
return ratio
The double // in python will ensure floating point division.
Hope this helps!

Converting from AMPL to Pyomo

I'm trying to convert an AMPL model to Pyomo (something I have no experience with using). I'm finding the syntax hard to adapt to, especially the constraint and objective sections. I've already linked my computer together with python, anaconda, Pyomo, and GLPK, and just need to get the actual code down. I'm a beginner coder so forgive me if my code is poorly written. Still trying to get the hang of this!
Here is the data from the AMPL code:
set PROD := 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30;
set PROD1:= 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30;
ProdCost 414 3 46 519 876 146 827 996 922 308 568 176 58 13 20 974 121 751 130 844 280 123 275 843 717 694 72 413 65 631
HoldingCost 606 308 757 851 148 867 336 44 364 960 69 428 778 485 285 938 980 932 199 175 625 513 536 965 366 950 632 88 698 744
Demand 105 70 135 67 102 25 147 69 23 84 32 41 81 133 180 22 174 80 24 156 28 125 23 137 180 151 39 138 196 69
And here is the model:
set PROD; # set of production amounts
set PROD1; # set of holding amounts
param ProdCost {PROD} >= 0; # parameter set of production costs
param Demand {PROD} >= 0; # parameter set of demand at each time
param HoldingCost {PROD} >= 0; # parameter set of holding costs
var Inventory {PROD1} >= 0; # variable that sets inventory amount at each time
var Make {p in PROD} >= 0; # variable of amount produced at each time
minimize Total_Cost: sum {p in PROD} ((ProdCost[p] * Make[p]) + (Inventory[p] * HoldingCost[p]));
# Objective: minimize total cost from all production and holding cost
subject to InventoryConstraint {p in PROD}: Inventory[p] = Inventory[p-1] + Make[p] - Demand[p];
# excess production transfers to inventory
subject to MeetDemandConstraint {p in PROD}: Make[p] >= Demand[p] - Inventory[p-1];
# Constraint: holding and production must exceed demand
subject to InitialInventoryConstraint: Inventory[0] = 0;
# Constraint: Inventory must start at 0
Here's what I have so far. Not sure if it's right or not:
from pyomo.environ import *
demand=[105,70,135,67,102,25,147,69,23,84,32,41,81,133,180,22,174,80,24,156,28,125,23,137,180,151,39,138,196,69]
holdingcost=[606,308,757,851,148,867,336,44,364,960,69,428,778,485,285,938,980,932,199,175,625,513,536,965,366,950,632,88,698,744]
productioncost=[414,3,46,519,876,146,827,996,922,308,568,176,58,13,20,974,121,751,130,844,280,123,275,843,717,694,72,413,65,631]
model=ConcreteModel()
model.I=RangeSet(1,30)
model.J=RangeSet(0,30)
model.x=Var(model.I, within=NonNegativeIntegers)
model.y=Var(model.J, within=NonNegativeIntegers)
model.obj = Objective(expr = sum(model.x[i]*productioncost[i]+model.y[i]*holdingcost[i] for i in model.I))
def InventoryConstraint(model, i):
return model.y[i-1] + model.x[i] - demand[i] <= model.y[i]
InvCont = Constraint(model, rule=InventoryConstraint)
def MeetDemandConstraint(model, i):
return model.x[i] >= demand[i] - model.y[i-1]
DemCont = Constraint(model, rule=MeetDemandConstraint)
def Initial(model):
return model.y[0] == 0
model.Init = Constraint(rule=Initial)
opt = SolverFactory('glpk')
results = opt.solve(model,load_solutions=True)
model.solutions.store_to(results)
results.write()
Thanks!
The only issues I see are in some of your constraint declarations. You need to attach the constraints to the model and the first argument passed in should be the indexing set (which I'm assuming should be model.I).
def InventoryConstraint(model, i):
return model.y[i-1] + model.x[i] - demand[i] <= model.y[i]
model.InvCont = Constraint(model.I, rule=InventoryConstraint)
def MeetDemandConstraint(model, i):
return model.x[i] >= demand[i] - model.y[i-1]
model.DemCont = Constraint(model.I, rule=MeetDemandConstraint)
The syntax that you're using to solve the model is a little out-dated but should work. Another option would be:
opt = SolverFactory('glpk')
opt.solve(model,tee=True) # The 'tee' option prints the solver output to the screen
model.display() # This will print a summary of the model solution
Another command that is useful for debugging is model.pprint(). This will display the entire model including the expressions for Constraints and Objectives.

pandas group by multiple columns and remove rows based on multiple conditions

I have a dataframe which is as follows:
imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,1,491,182,78,1,1
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,5,451,95,48,2,1
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,455,342,84,93,9,-7
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130
Its a csv dump. From this I want to group by imagename and brandname. Wherever the values in xdiff and ydiff is less than 10 then remove the second line.
For example, from the first two lines I want to delete the second line, similarly from lines 3 and 4 I want to delete line 4.
I could do this quickly in R using dplyr group by, lag and lead functions. However, I am not sure how to combine different functions in python to achieve this. This is what I have tried so far:
df[df.groupby(['imagename','brandname']).xdiff.transform() <= 10]
Not sure what function should I call within transform and how to include ydiff too.
The expected output is as follows:
imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130
You can take individual groupby frames and apply the conditions through apply function
#df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if x['xdiff'].lt(10).any() else x)
df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if (x['xdiff'].lt(10).any() and x['ydiff'].lt(10).any()) else x)
Out:
imagename locationName brandname x y w h xdiff ydiff
2 95-20180407-215120-235505-00050.jpg Shirt DHFL 3 450 94 45 2 -41
5 95-20180407-215120-235505-00050.jpg Shirt DHFL 446 349 99 90 279 30
7 95-20180407-215120-235505-00050.jpg Shirt GOIBIBO 559 212 70 106 104 -130
0 95-20180407-215120-235505-00050.jpg Shirt SAMSUNG 0 490 177 82 0 0
4 95-20180407-215120-235505-00050.jpg DUGOUT VIVO 167 319 36 38 162 -132

Selecting rows with lowest values based on combination two columns from pandas

I'm not even sure if the title makes sense.
I have a pandas dataframe with 3 columns: x, y, time. There are a few thousand rows. Example below:
x y time
0 225 0 20.295270
1 225 1 21.134015
2 225 2 21.382298
3 225 3 20.704367
4 225 4 20.152735
5 225 5 19.213522
.......
900 437 900 27.748966
901 437 901 20.898460
902 437 902 23.347935
903 437 903 22.011992
904 437 904 21.231041
905 437 905 28.769945
906 437 906 21.662975
.... and so on
What I want to do is retrieve those rows which have the smallest time associated with x and y. Basically for every element on the y, I want to find which have the smallest time value but I want to exclude those that have time 0.0. This happens when x has the same value as y.
So for example, the fastest way to get to y-0 is by starting from x-225 and so on, therefore it could be the case that x repeats itself but for a different y.
e.g.
x y time
225 0 20.295270
438 1 19.648954
27 20 4.342732
9 438 17.884423
225 907 24.560400
I tried up until now groupby but I'm only getting the same x as y.
print(df.groupby('id_y', sort=False)['time'].idxmin())
y
0 0
1 1
2 2
3 3
4 4
The one below just returns the df that I already have.
df.loc[df.groupby("id_y")["time"].idxmin()]
Just to point out one thing, I'm open to options, not just groupby, if there are other ways that is very good.
So need remove rows with time equal first by boolean indexing and then use your solution:
df = df[df['time'] != 0]
df2 = df.loc[df.groupby("y")["time"].idxmin()]
Similar alternative with filter by query:
df = df.query('time != 0')
df2 = df.loc[df.groupby("y")["time"].idxmin()]
Or use sort_values with drop_duplicates:
df2 = df[df['time'] != 0].sort_values(['y','time']).drop_duplicates('y')

Categories