Python PuLP performance issue - taking too much time to solve - python

I am using pulp to create an allocator function which packs the items in the trucks based on the weight and volume. It works fine(takes 10-15 sec) for 10-15 items but when I double the items it takes more than half hour to solve it.
def allocator(item_mass,item_vol,truck_mass,truck_vol,truck_cost, id_series):
n_items = len(item_vol)
set_items = range(n_items)
n_trucks = len(truck_cost)
set_trucks = range(n_trucks)
print("working1")
y = pulp.LpVariable.dicts('truckUsed', set_trucks,
lowBound=0, upBound=1, cat=LpInteger)
x = pulp.LpVariable.dicts('itemInTruck', (set_items, set_trucks),
lowBound=0, upBound=1, cat=LpInteger)
print("working2")
# Model formulation
prob = LpProblem("Truck allocation problem", LpMinimize)
# Objective
prob += lpSum([truck_cost[i] * y[i] for i in set_trucks])
print("working3")
# Constraints
for j in set_items:
# Every item must be taken in one truck
prob += lpSum([x[j][i] for i in set_trucks]) == 1
for i in set_trucks:
# Respect the mass constraint of trucks
prob += lpSum([item_mass[j] * x[j][i] for j in set_items]) <= truck_mass[i]*y[i]
# Respect the volume constraint of trucks
prob += lpSum([item_vol[j] * x[j][i] for j in set_items]) <= truck_vol[i]*y[i]
print("working4")
# Ensure y variables have to be set to make use of x variables:
for j in set_items:
for i in set_trucks:
x[j][i] <= y[i]
print("working5")
s = id_series #id_series
prob.solve()
print("working6")
This is the data i am running it on
items:
Name Pid Quantity Length Width Height Volume Weight t_type
0 A 1 1 4.60 4.30 4.3 85.05 1500 Open
1 B 2 1 4.60 4.30 4.3 85.05 1500 Open
2 C 3 1 6.00 5.60 9.0 302.40 10000 Container
3 D 4 1 8.75 5.60 6.6 441.00 1000 Open
4 E 5 1 6.00 5.16 6.6 204.33 3800 Open
5 C 6 1 6.00 5.60 9.0 302.40 10000 All
6 C 7 1 6.00 5.60 9.0 302.40 10000 Container
7 D 8 1 8.75 5.60 6.6 441.00 6000 Open
8 E 9 1 6.00 5.16 6.6 204.33 3800 Open
9 C 10 1 6.00 5.60 9.0 302.40 10000 All
.... times 5
trucks(this just the top 5 rows, I have 54 types of trucks in total):
Category Name TruckID Length(ft) Breadth(ft) Height(ft) Volume \
0 LCV Tempo 407 0 9.5 5.5 5.5 287.375
1 LCV Tempo 407 1 9.5 5.5 5.5 287.375
2 LCV Tempo 407 2 9.5 5.5 5.5 287.375
3 LCV 13 Feet 3 13.0 5.5 7.0 500.500
4 LCV 14 Feet 4 14.0 6.0 6.0 504.000
Weight Price
0 1500 1
1 2000 1
2 2500 2
3 3500 3
4 4000 3
where ItemId is this:
data["ItemId"] = data.index + 1
id_series = data["ItemId"].tolist()

PuLP can handle multiple solvers. See what ones you have with:
pulp.pulpTestAll()
This will give a list like:
Solver pulp.solvers.PULP_CBC_CMD unavailable.
Solver pulp.solvers.CPLEX_DLL unavailable.
Solver pulp.solvers.CPLEX_CMD unavailable.
Solver pulp.solvers.CPLEX_PY unavailable.
Testing zero subtraction
Testing continuous LP solution
Testing maximize continuous LP solution
...
* Solver pulp.solvers.COIN_CMD passed.
Solver pulp.solvers.COINMP_DLL unavailable.
Testing zero subtraction
Testing continuous LP solution
Testing maximize continuous LP solution
...
* Solver pulp.solvers.GLPK_CMD passed.
Solver pulp.solvers.XPRESS unavailable.
Solver pulp.solvers.GUROBI unavailable.
Solver pulp.solvers.GUROBI_CMD unavailable.
Solver pulp.solvers.PYGLPK unavailable.
Solver pulp.solvers.YAPOSIB unavailable.
You can then solve using, e.g.:
lp_prob.solve(pulp.COIN_CMD())
Gurobi and CPLEX are commercial solvers that tend to work quite well. Perhaps you could access them? Gurobi has a good academic license.
Alternatively, you may wish to look into an approximate solution, depending on your quality constraints.

Related

Printing results of a function using a range of numbers

I am just learning python and I'm simply trying to print the results of a function using a range of numbers, but I am getting the error "The truth value of an array with more than one element is ambiguous."
print(t1) works and shows the range I want to use in the calculations.
print(some_function(55,t1)) produces the error
What am I missing?
Please note, I am doing this to help someone for an assignment and they can only use commands or functions that they have been shown, which is not a lot, basically just what's in the current code and arrays.
Thanks for any help
from pylab import *
def some_function(ff, dd):
if dd >=0 and dd <=300:
tt = (22/-90)*ff+24
elif dd >=300 and dd <=1000:
st = (22/-90)*(ff)+24
gg = (st-2)/-800
tt = gg*dd+(gg*-1000+2)
else:
tt = 2.0
return tt
t1=arange(0,12000,1000)
print(t1)
print(some_function(55,t1))
You are only making a minor error.
t1=arange(0,12000,1000)
print(t1)
[ 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000]
You have to loop through t1 and call the function for each value in the loop.
for x in t1:
print(some_function(55,x))
10.555555555555555
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
We are missing part of the loop in the calculation because of the values in t1. Let's adjust the range a bit.
t1=arange(0,2000,100)
print(t1)
[ 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300
1400 1500 1600 1700 1800 1900]
And the resultant function:
for x in t1:
print(some_function(55,x))
10.555555555555555
10.555555555555555
10.555555555555555
10.555555555555555
8.416666666666668
7.347222222222222
6.277777777777779
5.208333333333334
4.138888888888889
3.0694444444444446
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0

Finding smallest value in python

I am looking for some help with creating a code for the following in python
I have made an attempt at an answer but I am not quite sure how to finish it. Here is what I have so far
import numpy as np
import math
from numpy import cos
x=10**(-p)
funct = (1-math.cos(x))/x
So I have defined my function that I am trying to calculate, I believe I did that correctly with
funct = (1-math.cos(x))/x
I have said what x needs to be with
x=10**(-p)
But how do I add the code to find the smallest value of p which has no correct significant digit at x = 10**-p when using standard double precision?
Do I need to somehow use
print(min(funct))
Looking for some help with this execution, thanks!
Edit: new code
import numpy as np
import math
for p in range(10):
x=10.0**-p;
result = (1-np.cos(x))/x
print (p)
print (result)
Test = 2*np.sin(x/2)**2/x
print (p)
print(Test)
gives the results:
0
0.459697694132
0
0.459697694132
1
0.0499583472197
1
0.0499583472197
2
0.00499995833347
2
0.00499995833347
3
0.000499999958326
3
0.000499999958333
4
4.99999996961e-05
4
4.99999999583e-05
5
5.0000004137e-06
5
4.99999999996e-06
6
5.00044450291e-07
6
5e-07
7
4.99600361081e-08
7
5e-08
8
0.0
8
5e-09
9
0.0
9
5e-10
With the loop
for p in range(15): x=10.0**-p; print p, x, (1-np.cos(x))/x, 2*np.sin(x/2)**2/x
I get the values for the expression and a theoretically equivalent expression
p x (1-cos(x))/x 2*sin²(x/2)/x
0 1.0 0.459697694132 0.459697694132
1 0.1 0.0499583472197 0.0499583472197
2 0.01 0.00499995833347 0.00499995833347
3 0.001 0.000499999958326 0.000499999958333
4 0.0001 4.99999996961e-05 4.99999999583e-05
5 1e-05 5.0000004137e-06 4.99999999996e-06
6 1e-06 5.00044450291e-07 5e-07
7 1e-07 4.99600361081e-08 5e-08
8 1e-08 0.0 5e-09
9 1e-09 0.0 5e-10
10 1e-10 0.0 5e-11
11 1e-11 0.0 5e-12
12 1e-12 0.0 5e-13
13 1e-13 0.0 5e-14
14 1e-14 0.0 5e-15
but I have no idea how to interpret the task to give a valid answer. Could be p=5 or could be p=8.

Multiplying data within columns python

I've been working on this all morning and for the life of me cannot figure it out. I'm sure this is very basic, but I've become so frustrated my mind is being clouded. I'm attempting to calculate the total return of a portfolio of securities at each date (monthly).
The formula is (1 + r1) * (1+r2) * (1+ r(t))..... - 1
Here is what I'm working with:
Adj_Returns = Adj_Close/Adj_Close.shift(1)-1
Adj_Returns['Risk Parity Portfolio'] = (Adj_Returns.loc['2003-01-31':]*Weights.shift(1)).sum(axis = 1)
Adj_Returns
SPY IYR LQD Risk Parity Portfolio
Date
2002-12-31 NaN NaN NaN 0.000000
2003-01-31 -0.019802 -0.014723 0.000774 -0.006840
2003-02-28 -0.013479 0.019342 0.015533 0.011701
2003-03-31 -0.001885 0.010015 0.001564 0.003556
2003-04-30 0.088985 0.045647 0.020696 0.036997
For example, with 2002-12-31 being base 100 for risk parity, I want 2003-01-31 to be 99.316 (100 * (1-0.006840)), 2003-02-28 to be 100.478 (99.316 * (1+ 0.011701)) so on and so forth.
Thanks!!
You want to use pd.DataFrame.cumprod
df.add(1).cumprod().sub(1).sum(1)
Consider the dataframe of returns df
np.random.seed([3,1415])
df = pd.DataFrame(np.random.normal(.025, .03, (10, 5)), columns=list('ABCDE'))
df
A B C D E
0 -0.038892 -0.013054 -0.034115 -0.042772 0.014521
1 0.024191 0.034487 0.035463 0.046461 0.048123
2 0.006754 0.035572 0.014424 0.012524 -0.002347
3 0.020724 0.047405 -0.020125 0.043341 0.037007
4 -0.003783 0.069827 0.014605 -0.019147 0.056897
5 0.056890 0.042756 0.033886 0.001758 0.049944
6 0.069609 0.032687 -0.001997 0.036253 0.009415
7 0.026503 0.053499 -0.006013 0.053447 0.047013
8 0.062084 0.029664 -0.015238 0.029886 0.062748
9 0.048341 0.065248 -0.024081 0.019139 0.028955
We can see the cumulative return or total return is
df.add(1).cumprod().sub(1)
A B C D E
0 -0.038892 -0.013054 -0.034115 -0.042772 0.014521
1 -0.015641 0.020983 0.000139 0.001702 0.063343
2 -0.008993 0.057301 0.014565 0.014247 0.060847
3 0.011544 0.107423 -0.005853 0.058206 0.100105
4 0.007717 0.184750 0.008666 0.037944 0.162699
5 0.065046 0.235405 0.042847 0.039769 0.220768
6 0.139183 0.275786 0.040764 0.077464 0.232261
7 0.169375 0.344039 0.034505 0.135051 0.290194
8 0.241974 0.383909 0.018742 0.168973 0.371151
9 0.302013 0.474207 -0.005791 0.191346 0.410852
Plot it
df.add(1).cumprod().sub(1).plot()
Add sum of returns to new column
df.assign(Portfolio=df.add(1).cumprod().sub(1).sum(1))
A B C D E Portfolio
0 -0.038892 -0.013054 -0.034115 -0.042772 0.014521 -0.114311
1 0.024191 0.034487 0.035463 0.046461 0.048123 0.070526
2 0.006754 0.035572 0.014424 0.012524 -0.002347 0.137967
3 0.020724 0.047405 -0.020125 0.043341 0.037007 0.271425
4 -0.003783 0.069827 0.014605 -0.019147 0.056897 0.401777
5 0.056890 0.042756 0.033886 0.001758 0.049944 0.603835
6 0.069609 0.032687 -0.001997 0.036253 0.009415 0.765459
7 0.026503 0.053499 -0.006013 0.053447 0.047013 0.973165
8 0.062084 0.029664 -0.015238 0.029886 0.062748 1.184749
9 0.048341 0.065248 -0.024081 0.019139 0.028955 1.372626

Portfolio Selection in Python with constraints from a fixed set

I am working on a project where I am trying to select the optimal subset of players from a set of 125 players (example below)
The constraints are:
a) Number of players = 3
b) Sum of prices <= 30
The optimization function is Max(Sum of Votes)
Player Vote Price
William Smith 0.67 8.6
Robert Thompson 0.31 6.7
Joseph Robinson 0.61 6.2
Richard Johnson 0.88 4.3
Richard Hall 0.28 9.7
I looked at the scipy optimize package but I can't find anywhere a way to constraint the universe to this subset. Can anyone point me if there is a library that would do that?
Thanks
The problem is well suited to be formulated as mathematical program and can be solved with different Optimization libraries.
It is known as the exact k-item knapsack problem.
You can use the Package PuLP for example. It has interfaces to different optimization software packages, but comes bundled with a free solver.
easy_install pulp
Free solvers are often way slower than commercial ones, but I think PuLP should be able to solve reasonably large versions of your problem with its standard solver.
Your problem can be solved with PuLP as follows:
from pulp import *
# Data input
players = ["William Smith", "Robert Thompson", "Joseph Robinson", "Richard Johnson", "Richard Hall"]
vote = [0.67, 0.31, 0.61, 0.88, 0.28]
price = [8.6, 6.7, 6.2, 4.3, 9.7]
P = range(len(players))
# Declare problem instance, maximization problem
prob = LpProblem("Portfolio", LpMaximize)
# Declare decision variable x, which is 1 if a
# player is part of the portfolio and 0 else
x = LpVariable.matrix("x", list(P), 0, 1, LpInteger)
# Objective function -> Maximize votes
prob += sum(vote[p] * x[p] for p in P)
# Constraint definition
prob += sum(x[p] for p in P) == 3
prob += sum(price[p] * x[p] for p in P) <= 30
# Start solving the problem instance
prob.solve()
# Extract solution
portfolio = [players[p] for p in P if x[p].varValue]
print(portfolio)
The runtime to draw 3 players from 125 with the same random data as used by Brad Solomon is 0.5 seconds on my machine.
Your problem is discrete optimization task because of a) constraint. You should introduce discrete variables to represent taken/not taken players. Consider the following Minizinc pseudocode:
array[players_num] of var bool: taken_players;
array[players_num] of float: votes;
array[players_num] of float: prices;
constraint sum (taken_players * prices) <= 30;
constraint sum (taken_players) = 3;
solve maximize sum (taken_players * votes);
As far as I know, you can't use scipy to solve such problems (e.g. this).
You can solve your problem in these ways:
You can generate Minizinc problem in Python and solve it by calling external solver. It seems to be more scalable and robust.
You can use simulated annealing
Mixed integer approach
The second option seems to be simpler for you. But, personally, I prefer the first one: it allows you introducing a wide range of various constraints, problem formulation feels more natural and clear.
#CaptainTrunky is correct, scipy.minimize will not work here.
Here is an awfully crappy workaround using itertools, please ignore if one of the other methods has worked. Consider that to draw 3 players from 125 creates 317,750 combinations, n!/((n - k)! * k!). Runtime on the main loop ~ 6m.
from itertools import combinations
df = DataFrame({'Player' : np.arange(0, 125),
'Vote' : 10 * np.random.random(125),
'Price' : np.random.randint(1, 10, 125)
})
df
Out[109]:
Player Price Vote
0 0 4 7.52425
1 1 6 3.62207
2 2 9 4.69236
3 3 4 5.24461
4 4 4 5.41303
.. ... ... ...
120 120 9 8.48551
121 121 8 9.95126
122 122 8 6.29137
123 123 8 1.07988
124 124 4 2.02374
players = df.Player.values
idx = pd.MultiIndex.from_tuples([i for i in combinations(players, 3)])
votes = []
prices = []
for i in combinations(players, 3):
vote = df[df.Player.isin(i)].sum()['Vote']
price = df[df.Player.isin(i)].sum()['Price']
votes.append(vote); prices.append(price)
result = DataFrame({'Price' : prices, 'Vote' : votes}, index=idx)
# The index below is (first player, second player, third player)
result[result.Price <= 30].sort_values('Vote', ascending=False)
Out[128]:
Price Vote
63 87 121 25.0 29.75051
64 121 20.0 29.62626
64 87 121 19.0 29.61032
63 64 87 20.0 29.56665
65 121 24.0 29.54248
... ...
18 22 78 12.0 1.06352
23 103 20.0 1.02450
22 23 103 20.0 1.00835
18 22 103 15.0 0.98461
23 14.0 0.98372

Working with missing data

I have the following dataframe:
from pandas import *
from math import *
data=read_csv('agosto.csv')
Fecha DirViento MagViento
0 2011/07/01 00:00 N 6.6
1 2011/07/01 00:15 N 5.5
2 2011/07/01 00:30 N 6.6
3 2011/07/01 00:45 N 7.5
4 2011/07/01 01:00 --- 6.0
5 2011/07/01 01:15 --- 7.1
6 2011/07/01 01:30 S 4.7
7 2011/07/01 01:45 SE 3.1
.
.
.
The first thing i want to do, is to convert wind values ​​to numerical values ​​in order to obtain the u and v wind components. But when I perform the operations, the missing data (---) generates conflicts.
direccion=[]
for i in data['DirViento']:
if i=='SSW':
dir=202.5
if i=='S':
dir=180.0
if i=='N':
dir=360.0
if i=='NNE':
dir=22.5
if i=='NE':
dir=45.0
if i=='ENE':
dir=67.5
if i=='E':
dir=90.0
if i=='ESE':
dir=112.5
if i=='SE':
dir=135.0
if i=='SSE':
dir=157.5
if i=='SW':
dir=225.0
if i=='WSW':
dir=247.5
if i=='W':
dir=270.0
if i=='WNW':
dir=292.5
if i=='NW':
dir=315.0
if i=='NNW':
dir=337.5
direccion.append(dir)
data['DirViento']=direccion
i get the following:
data['DirViento'].head()
0 67.5
1 67.5
2 67.5
3 67.5
4 67.5
because missing data is assigned the value of the other rows? The components of get with the following code
Vviento=[]
Uviento=[]
for i in range(0,len(data['MagViento'])):
Uviento.append((data['MagViento'][i]*sin((data['DirViento'][i]+180)*(pi/180.0))))
Vviento.append((data['MagViento'][i]*cos((data['DirViento'][i]+180)*(pi/180.0))))
data['PromeU']=Uviento
data['PromeV']=Vviento
Now grouped to obtain statistical data
index=data.set_index(['Fecha','Hora'],inplace=True)
g = index.groupby(level=0)
but i get error
IndexError: index out of range for array
Am I doing something wrong? How to perform operations without taking into account missing data?
I see one flow in your code. You conditional statement should be more like:
if i == 'SSW':
dir = 202.5
elif i == 'S':
...
else:
dir = np.nan
Or you can clean dir variable in the beginning of the loop. Otherwise dir for row with missing data will be the same as dir for previous iteration.
But I think this code could be improved in more pythonic way, for example, something like this.
# test DataFrame
df = pd.DataFrame({'DirViento':['N', 'N', 'N', 'N', '--', '--', 'S', 'SE'])
DirViento
0 N
1 N
2 N
3 N
4 --
5 --
6 S
7 SE
# create points of compass list
dir_lst = ['NNE','NE','ENE','E','ESE','SE','SSE','S','SSW','WSW','W','WNW','NW','NNW','N']
# create dictionary from it
dir_dict = {x: (i + 1) *22.5 for i, x in enumerate(dir_lst)}
# add a new column
df['DirViento2'] = df['DirViento'].apply(lambda x: dir_dict.get(x, None))
DirViento DirViento2
0 N 360
1 N 360
2 N 360
3 N 360
4 -- NaN
5 -- NaN
6 S 180
7 SE 135
update Good suggestion from #DanAllan in comments, the code becomes even shorter and even more pythonic:
df['DirViento2'] = df['DirViento'].replace(dir_dict)

Categories