Python code went wrong for my optimization problem - python

I am new in using python. I tried to write a code for my optimization research model. Could you help me please? I don't know what's wrong with the code. I am using python 2 by the way. Thank you.
for i in range(len(lift)):
prob+=lpSum(dec_var[i])<=1
#constraints
col_con=[1,0,0,2,2,3,1,1]
dec_var=np.array(dec_var)
col_data=[]
for j in range(len(brands)):
col_data.append(list(zip(*dec_var)[j]))
prob+=lpSum(col_data[j])<=col_con[j]
#problem
prob.writeLP("SO.lp")
#solve the problem
prob.solve()
print("The maximum Total lift obtained is:",value(prob.objective)) # print the output
#print the decision variable output matrix
Matrix=[[0 for X in range(len(lift[0]))] for y in range(len(lift))]
for v in prob.variables():
Matrix[int(v.name.split("_")[2])][int(v.name.split("_")[3])]=v.varValue
matrix=np.int_(Matrix)
print ("The decision variable matrix is:")
print(matrix)
the error was :
TypeError Traceback (most recent call
last) in
13 for j in range(len(brands)):
14
---> 15 col_data.append(list(zip(*dec_var)[j]))
16
17 prob+=lpSum(col_data[j])<=col_con[j]
TypeError: 'zip' object is not subscriptable

Your code breaks in this line:
col_data.append(list(zip(*dec_var)[j]))
Lets go through it step by step:
dec_var is an array, probably with multiple dimensions. Something like this:
dec_var=np.array([[1,2,3,4],[5,6,7,8]])
dec_var
#array([[1, 2, 3, 4],
# [5, 6, 7, 8]])
The star operator (*) breaks the array into 'variables'. Something more or less like this:
a = [1,2,3,4], b = [5,6,7,8].
('a' and 'b' don't really exist, just trying to paint a picture).
Next, you apply zip(), which allows you to iterate two iterables together. You would usually use it like this:
for i,j in zip([1,2],[3,4]):
print(i,j)
#1,3
#2,4
However, zip itself is not subscriptable, that is the error you are getting.
To make it subscriptable, you can apply list on it.
list(zip([1,2],[3,4]))[0]
#(1,3)
In other words.. The solution to your problem most likely is changing the position of the [j] subscript:
From:
col_data.append(list(zip(*dec_var)[j]))
To:
col_data.append(list(zip(*dec_var))[j])

Related

How to filter elements of Cartesian product following specific ordering conditions

I have to generate multiple reactions with different variables. They have 3 elements. Let's call them B, S and H. And they all start with B1. S can be appended to the element if there is at least one B. So it can be B1S1 or B2S2 or B2S1 etc... but not B1S2. The same goes for H. B1S1H1 or B2S2H1 or B4S1H1 but never B2S2H3. The final variation would be B5S5H5. I tried with itertools.product. But I don't know how to get rid of the elements that don't match my condition and how to add the next element. Here is my code:
import itertools
a = list(itertools.product([1, 2, 3, 4], repeat=4))
#print (a)
met = open('random_dat.dat', 'w')
met.write('Reactions')
met.write('\n')
for i in range(1,256):
met.write('\n')
met.write('%s: B%sS%sH%s -> B%sS%sH%s' %(i, a[i][3], a[i][2], a[i][1], a[i][3], a[i][2], a[i][1]))
met.write('\n')
met.close()
Simple for loops will do what you want:
bsh = []
for b in range(1,6):
for s in range(1,b+1):
for h in range(1,b+1):
bsh.append( f"B{b}S{s}H{h}" )
print(bsh)
Output:
['B1S1H1', 'B2S1H1', 'B2S1H2', 'B2S2H1', 'B2S2H2', 'B3S1H1', 'B3S1H2', 'B3S1H3',
'B3S2H1', 'B3S2H2', 'B3S2H3', 'B3S3H1', 'B3S3H2', 'B3S3H3', 'B4S1H1', 'B4S1H2',
'B4S1H3', 'B4S1H4', 'B4S2H1', 'B4S2H2', 'B4S2H3', 'B4S2H4', 'B4S3H1', 'B4S3H2',
'B4S3H3', 'B4S3H4', 'B4S4H1', 'B4S4H2', 'B4S4H3', 'B4S4H4', 'B5S1H1', 'B5S1H2',
'B5S1H3', 'B5S1H4', 'B5S1H5', 'B5S2H1', 'B5S2H2', 'B5S2H3', 'B5S2H4', 'B5S2H5',
'B5S3H1', 'B5S3H2', 'B5S3H3', 'B5S3H4', 'B5S3H5', 'B5S4H1', 'B5S4H2', 'B5S4H3',
'B5S4H4', 'B5S4H5', 'B5S5H1', 'B5S5H2', 'B5S5H3', 'B5S5H4', 'B5S5H5']
Thanks to #mikuszefski for pointing out improvements.
Patrick his answer in list comprehension style
bsh = [f"B{b}S{s}H{h}" for b in range(1,5) for s in range(1,b+1) for h in range(1,b+1)]
Gives
['B1S1H1',
'B2S1H1',
'B2S1H2',
'B2S2H1',
'B2S2H2',
'B3S1H1',
'B3S1H2',
'B3S1H3',
'B3S2H1',
'B3S2H2',
'B3S2H3',
'B3S3H1',
'B3S3H2',
'B3S3H3',
'B4S1H1',
'B4S1H2',
'B4S1H3',
'B4S1H4',
'B4S2H1',
'B4S2H2',
'B4S2H3',
'B4S2H4',
'B4S3H1',
'B4S3H2',
'B4S3H3',
'B4S3H4',
'B4S4H1',
'B4S4H2',
'B4S4H3',
'B4S4H4']
I would implement your "use itertools.product and get rid off unnecessary elements" solution following way:
import itertools
a = list(itertools.product([1,2,3,4,5],repeat=3))
a = [i for i in a if (i[1]<=i[0] and i[2]<=i[1] and i[2]<=i[0])]
Note that I assumed last elements needs to be smaller or equal than any other. Note that a is now list of 35 tuples each holding 3 ints. So you need to made strs of them for example using so-called f-string:
a = [f"B{i[0]}S{i[1]}H{i[2]}" for i in a]
print(a)
output:
['B1S1H1', 'B2S1H1', 'B2S2H1', 'B2S2H2', 'B3S1H1', 'B3S2H1', 'B3S2H2', 'B3S3H1', 'B3S3H2', 'B3S3H3', 'B4S1H1', 'B4S2H1', 'B4S2H2', 'B4S3H1', 'B4S3H2', 'B4S3H3', 'B4S4H1', 'B4S4H2', 'B4S4H3', 'B4S4H4', 'B5S1H1', 'B5S2H1', 'B5S2H2', 'B5S3H1', 'B5S3H2', 'B5S3H3', 'B5S4H1', 'B5S4H2', 'B5S4H3', 'B5S4H4', 'B5S5H1', 'B5S5H2', 'B5S5H3', 'B5S5H4', 'B5S5H5']
However you might also use another methods of formatting instead of f-string if you wish.

Python 3: Use n*x**k to evaluate a list (general_poly)

I am trying to answer the same exam question as this OP and, although the posts are related the help requested is different.
I'll talk about my test cases and why I'm thinking it has to be a semantic error in a moment but first...
The exam problem:
Write a function called general_poly, that would, for example, evaluate >general_poly([1, 2, 3, 4])(10) to 1234 because
*1*10^3 + 2*10^2 + 3*10^1 + >4*10^0*.
So in the example the function only takes one argument with
general_poly([1, 2, 3, 4]) and it returns a function that you can
apply to a value, in this case x = 10 with general_poly([1, 2, 3,
4])(10).
My code:
def general_poly(L):
def in_poly(x):
total = 0
for i in range(0, len(L)):
k = (len(L)-1) - i
total += (L[i] * (x**k))
return total
return in_poly(x)
We are not told what the error(s) are specifically or given the lists used to test the code, only a pass/fail (error thrown), and the correct answer for each of the six tests.
However, we are given a hint because the answer to the first question is 1234, which is the example that was given.
I know I should get that one right at least, but the code is failing all six tests.
These are some of the test cases that I have run - and verified the results with a calculator - so I don't think the calculations wrong:
general_poly:
([1,2,3,4,1,2,3,4])(10) = 12341234;
([1,2,3,4])(0) = 4;
([1,2,3,4])(-7) = -262;
([5,6,7,8,9])(28) = 3210713;
([103, 42, -78, -3.26])(9) = 77783.74
I have also checked the indentation, checked that there was output, run through it line by line, checked spelling, etc. but no luck.
It may be connected to the fact that I can't call the function as general_poly(L)(x). I have to declare x separately before I call the function or I get:
TypeError: 'int' object is not callable
Return the function without calling it
return in_poly
not
return in_poly(x)
in_poly(x) is an integer, not a function.

Python: Function doesn't receive a value within a for loop

I'm using the bisection method from the scipy.optimize package within a for loop.
The idea is to get a value of "sig" with the bisection method for each element (value) in the "eps_komp" vector. I've coded this much:
import numpy as np
import scipy.optimize as optimize
K=300
n = 0.43
E = 210000
Rm = 700
sig_a = []
RO_K = 300
RO_n = 0.43
eps_komp = [0.00012893048999999997,
0.018839115269999998,
0.01230539995,
0.022996934109999999,
-0.0037319012899999999,
0.023293921169999999,
0.0036927752099999997,
0.020621037629999998,
0.0063656587500000002,
0.020324050569999998,
-0.0025439530500000001,
0.018542128209999998,
0.01230539995,
0.019730076449999998,
0.0045837363899999999,
0.015275270549999997,
-0.0040288883499999999,
0.021215011749999999,
-0.0031379271699999997,
0.023590908229999999]
def eps_f(i):
return eps_komp[i]
for j in range(len(eps_komp)):
eps_komp_j = eps_f(j)
if j <= len(eps_komp):
def func(sig):
return eps_komp_j - sig/E - (sig/RO_K)**(1/RO_n)
sig_a.append(optimize.bisect(func, 0, Rm))
else:
break
print(sig_a)
Now if I change the the value of "j" in eps_f(j) to 0:
eps_komp_j = eps_f(0)
it works, and so it does for all other values that I insert by hand, but if I keep it like it is in the for loop, the "j" value doesnt change automatically and I get an error:
f(a) and f(b) must have different signs
Has anyone a clue what is the problem and how could this be solved?
Regards,
L
P.S. I did post another topic on this problem yesterday, but I wasnt very specific with the problem and got negative feedback. However, I do need to solve this today, so I was forced to post it again, however I did manage to get a bit further with the code then I did in the earlier post, so it isn't a repost...
If you read the docs you'll find that:
Basic bisection routine to find a zero of the function f between the arguments a and b. f(a) and f(b) cannot have the same signs. Slow but sure.
In your code:
def func(sig):
return eps_komp_j - sig/Emod - (sig/RO_K)**(1/RO_n)
sig_a.append(optimize.bisect(func, 0, Rm))
You're passing it func(0) and func(700).
By replacing the optimize.bisect line with print(func(0), func(700)) I get the following output:
0.00012893048999999997 -7.177181168628421
0.018839115269999998 -7.158470983848421
0.01230539995 -7.165004699168421
0.02299693411 -7.15431316500842
-0.00373190129 -7.1810420004084206
0.02329392117 -7.154016177948421
0.0036927752099999997 -7.173617323908421
0.02062103763 -7.156689061488421
0.00636565875 -7.17094444036842
0.02032405057 -7.156986048548421
-0.00254395305 -7.17985405216842
0.018542128209999998 -7.15876797090842
0.01230539995 -7.165004699168421
0.019730076449999998 -7.157580022668421
0.00458373639 -7.172726362728421
0.015275270549999997 -7.162034828568421
-0.00402888835 -7.181338987468421
0.02121501175 -7.156095087368421
-0.0031379271699999997 -7.1804480262884205
0.02359090823 -7.153719190888421
Note the multiple pairs that have the same signs. optimize.bisect can't handle those. I don't know what you're trying to accomplish, but this is the wrong approach.

using enumerate to iterate over a dictionary of lists to extract information

I got some help earlier today about how to obtain positional information from a dictionary using enumerate(). I will provide the code shortly. However, now that I've found this cool tool, I want to implement it in a different manner to obtain some more information from my dictionary.
I have a dictionary:
length = {'A': [(0,21), (30,41), (70,80), (95,200)] 'B': [(0,42), (70,80)]..etc}
and a file:
A 73
B 15
etc
What I want to do now is to find the difference from the max of the first element in my list from the min from the second element. For example, the difference of 21 and 30. Then I want to add all these differences up until I hit the pair (range) of numbers that the number from my file matches to (if that makes sense).
Here is the code that I've been working on:
import csv
with open('Exome_agg_cons_snps_pct_RefSeq_HGMD_reinitialized.txt') as f:
reader = csv.DictReader(f,delimiter="\t")
for row in reader:
snppos = row['snp_rein']
name = row['isoform']
snpos = int(snppos)
if name in exons:
y = exons[name]
for sd, i in enumerate(exons[name]):
while not snpos<=max(i):
intron = min(i+1) - max(i) #this doesn't work unfortunately. It says I can't add 1 to i
totalintron = 0 + intron
if snpos<=max(i):
exonmin = min(i)
exonnumber = sd+1
print exonnumber,name,totalintron
break
I think it's the sd (indexer) that is confusing me. I don't know how to use it in the this context. The commented out portions are other avenues I've tried but failed to be successful. Any help? I know this is a confusing question and my code might be a little mixed up, but that's because I can't even get an output to correct my other mistakes yet.
I want my output to look like this based on the file provided:
exon name introntotal
3 A 38
1 B 0
To try to provide some help for this question: a critical part of the problem is that I don't think enumerate does what you think it does. Enumerate just numbers the things you are iterating over. So when you go through your for loop, sd will first be 0, then it will be 1... And that's all. In your case, you want to look at adjacent list entries (it seems?), so the more idiomatic ways of looping in python aren't nearly as clean. So you could do something like:
...
y = exons[name]
for index in range(len(y) - 1): # the - 1 is to prevent going out of bounds
first_max = max(y[index])
second_min = min(y[index+1])
... # do more stuff, I didn't completely follow what you're trying to do
I will add for the hardcore pythonistas, you can of course do some clever stuff to write this more idiomatically and avoid the C style loop that I wrote, but I think that getting into zip and so on might be a bit confusing for somebody new to python.
The issue is that you're using the output of enumerate() incorrectly.
enumerate() returns the index (position) first then the item
Ex:
x = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
for i, item in enumerate(x):
print(i, item)
# prints
#(0, 10)
#(1, 11)
#(2, 12)
#(3, 13)
#(4, 14)
#(5, 15)
#(6, 16)
#(7, 17)
#(8, 18)
#(9, 19)
So in your case, you should switch i and sd:
for i, sd in enumerate(exons[name]):
# do something
Like other commenters suggested, reading the python documentation is usually a good place to start resolving issues, especially if you're not sure how a function does what it does :)

Iterate over two big arrays at once

I have to iterate over two arrays which are 1000x1000 big. I already reduced the resolution to 100x100 to make the iteration faster, but it still takes about 15 minutes for ONE array!
So I tried to iterate over both at the same time, for which I found this:
for index, (x,y) in ndenumerate(izip(x_array,y_array)):
but then I get the error:
ValueError: too many values to unpack
Here is my full python code: I hope you can help me make this a lot faster, because this is for my master thesis and in the end I have to run it about a 100 times...
area_length=11
d_circle=(area_length-1)/2
xdis_new=xdis.copy()
ydis_new=ydis.copy()
ie,je=xdis_new.shape
while (np.isnan(np.sum(xdis_new))) and (np.isnan(np.sum(ydis_new))):
xdis_interpolated=xdis_new.copy()
ydis_interpolated=ydis_new.copy()
# itx=np.nditer(xdis_new,flags=['multi_index'])
# for x in itx:
# print 'next x and y'
for index, (x,y) in ndenumerate(izip(xdis_new,ydis_new)):
if np.isnan(x):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=xdis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
xdis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
if np.isnan(y):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=ydis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
ydis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
else:
print 'no need to interpolate'
xdis_new=xdis_interpolated
ydis_new=ydis_interpolated
Some advice:
Profile your code to see what is the slowest part. It may not be the iteration but the computations that need to be done each time.
Reduce function calls as much as possible. Function calls are not for free in Python.
Rewrite the slowest part as a C extension and then call that C function in your Python code (see Extending and Embedding the Python interpreter).
This page has some good advice as well.
You specifically asked for iterating two arrays in a single loop. Here is a way to do that
l1 = ["abc", "def", "hi"]
l2 = ["ghi", "jkl", "lst"]
for f,s in zip(l1,l2):
print "%s : %s" %(f,s)
The above is for python 3, you can use izip for python 2
You may use this as your for loop:
for index, x in ndenumerate((x_array,y_array)):
But it wont help you much, because your computer cant do two things at the same time.
Profiling is definitely a good start to identify where all the time spent actually goes.
I usually use the cProfile module, as it requires minimal overhead and gives me more than enough information.
import cProfile
import pstats
cProfile.run('main()', "ProfileData.txt", 'tottime')
p = pstats.Stats('ProfileData.txt')
p.sort_stats('cumulative').print_stats(100)
I your example you would have to wrap your code into a main() function to be able to use this code snippet at the very end of your file.
Comment #1: You don't want to use ndenumerate on the izip iterator, as it'll output you the iterator, which isn't what you want.
Comment #2:
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
could be simplified in i1 = min(index[0]-(area_length-1)/2, 0), and you could store your (area_length+/-1)/2 in specific variables.
Idea #1 : try to iterate on flat versions of the arrays, i.e. with something like
for (i, (x, y)) in enumerate(izip(xdis_new.flat,ydis_new.flat)):
You could get the original indices via divmod(i, xdis_new.shape[-1]), as you should be iterating by rows first.
Idea #2 : Iterate only on the nans, i.e. indexing your arrays with np.isnan(xdis_new)|np.isnan(ydis_new), that could save you some iterations
EDIT #1
You probably don't need to initialize d, weight_fac and weighted_area in your loop, as you compute them separately.
Your weight[where(d>0)] can be simplified in weight[d>0]
Do you need weight_fac ? Can't you just compute weight then normalize it in place ? That should save you some temporary arrays.

Categories