I am trying to optimize a function using l_bfgs constraint optimization routine in scipy.
But the optimization routine passes values to the function, which are not with in the Bounds.
my full code looks like,
def humpy(aParams):
aParams = numpy.asarray(aParams)
print aParams
####
# connect to some other software for simulation
# data[1] & data[2] are read
##### objective function
val = sum(0.5*(data[1] - data[2])**2)
print val
return val
####
def approx_fprime():
####
Initial = numpy.asarray([10.0, 15.0, 50.0, 10.0])
interval = [(5.0, 60000.0),(10.0, 50000.0),(26.0, 100000.0),(8.0, 50000.0)]
opt = optimize.fmin_l_bfgs(humpy,Initial,fprime=approx_fprime, bounds=interval ,pgtol=1.0000000000001e-05,iprint=1, maxfun=50000)
print 'optimized parameters',opt[0]
print 'Optimized function value', opt[1]
####### the end ####
based on the initial values(Initial) and bounds(interval)
opt = optimize.fmin_l_bfgs() will pass values to my software for simulation, but the values passed should be with in 'bounds'. Thats not the case..see below the values passed at various iterations
iter 1 = [ 10.23534209 15.1717302 50.5117245 10.28731118]
iter 2 = [ 10.23534209 15.1717302 50.01160842 10.39018429]
[ 11.17671043 15.85865102 50.05804208 11.43655591]
[ 11.17671043 15.85865102 50.05804208 11.43655591]
[ 11.28847754 15.85865102 50.05804208 11.43655591]
[ 11.17671043 16.01723753 50.05804208 11.43655591]
[ 11.17671043 15.85865102 50.5586225 11.43655591]
...............
...............
...............
[ 49.84670071 -4.4139714 62.2536381 23.3155698847]
at this iteration -4.4139714 is passed to my 2nd parameter but it should vary from (10.0, 50000.0), from where come -4.4139714 i don't know?
where should i change in the code? so that it passed values which should be with in bounds
You are trying to do bitwise exclusive or (the ^ operator) on floats, which makes no sense, so I don't think your code is actually the code you have problems with. However, I changed the ^ to ** assuming that was what you meant, and had no problems. The code worked fine for me with that change. The parameters are restricted exactly as defined.
Python 2.5.
Are you asking about doing something like this?
def humpy(aParams):
aParams = numpy.asarray(aParams)
x = aParams[0]
y = aParams[1]
z = aParams[2]
u = aParams[3]
v = aParams[4]
assert 2 <= x <= 50000
assert 1 <= y <= 35000
assert 1 <= z <= 45000
assert 2 <= u <= 50000
assert 2 <= v <= 60000
val=100.0*((y-x**2.0)^2.0+(z-y**2.0)^2.0+(u-z**2.0)^2.0+(v-u**2.0)^2.0)+(1-x)^2.0+(1-y)^2.0+(1-z)^2.0+(1-u)^2.0
return val
Related
I am writing this simple function to use power iteration for the dominant eigenvalue. I want to put 2 stop conditionals. One for iterations and one for a precision threshold. But this error calculation does not work.
What a i doing wrong here in principle ?
#power ite. vanilla
A = np.random.uniform(low=-5.0, high=10.0, size=[3,3])
def power_iteration(A, maxiter, threshold):
b0 = np.random.rand(A.shape[1])
it = 0
error = 0
while True:
for i in range(maxiter):
b1 = np.dot(A, b0)
b1norm = np.linalg.norm(b1)
error = np.linalg.norm(b1-b0)
b0 = b1/b1norm
domeig = (b0#A#b0)/np.dot(b0, b0)
if error<threshold:
break
elif it>maxiter:
break
else:
error = 0
it = it + 1
return b0, domeig, it, error
result = power_iteration(A, 10, 0.1)
result
The output shows a very correct eigenvalue of ~9 and corresponding eigenvector ( i checked with numpy)
But the error is off. There is no way the length of the difference vector is 8. Considering the result is very close to the actual.
How i want to calculate error is the norm of the difference between the current eigenvector - the previous (b0). I start the error = 0 because the first iteration is guaranteed to give a big difference if b0 is chosen random
(array([ 0.06009408, 0.95411524, -0.2933476 ]),
9.001665234545708,
11,
8.001665234545815)
Tried to make a loop stop by 2 conditions. One gets ignored
Seems to work much better like this.
def power_it(matrix, iterations, threshold):
domeigenvector = np.random.rand(matrix.shape[1])
counter = np.random.rand(matrix.shape[1])
it = 0
error = 0
for i in range(iterations):
k1 = np.dot(A, domeigenvector)
k1norm = np.linalg.norm(k1)
domeigenvector = k1/k1norm
error = np.linalg.norm(domeigenvector-counter)
counter = domeigenvector
domeigenvalue = (domeigenvector#A#domeigenvector)/np.dot(domeigenvector, domeigenvector)
it = it + 1
if error < threshold:
break
return domeigenvalue, domeigenvector, it
I can now use Schur deflation to calculate the rest of the eigenpairs.
Tl Dr. If I were to explain the problem in short:
I have signals:
np.random.seed(42)
x = np.random.randn(1000)
y = np.random.randn(1000)
z = np.random.randn(1000)
and human readable string tuple logic like :
entry_sig_ = ((x,y,'crossup',False),)
exit_sig_ = ((x,z,'crossup',False), 'or_',(x,y,'crossdown',False))
where:
'entry_sig_' means the output will be 1 when the time series unfolds from left to right and 'entry_sig_' is hit. (x,y,'crossup',False) means: x crossed y up at a particular time i, and False means signal doesn't have "memory". Otherwise number of hits accumulates.
'exit_sig_' means the output will again become '0' when the 'exit_sig_' is hit.
The output is generated through:
#njit
def run(x, entry_sig, exit_sig):
'''
x: np.array
entry_sig, exit_sig: homogeneous tuples of tuple signals
Returns: sequence of 0 and 1 satisfying entry and exit sigs
'''
L = x.shape[0]
out = np.empty(L)
out[0] = 0.0
out[-1] = 0.0
i = 1
trade = True
while i < L-1:
out[i] = 0.0
if reduce_sig(entry_sig,i) and i<L-1:
out[i] = 1.0
trade = True
while trade and i<L-2:
i += 1
out[i] = 1.0
if reduce_sig(exit_sig,i):
trade = False
i+= 1
return out
reduce_sig(sig,i) is a function (see definition below) that parses the tuple and returns resulting output for a given point in time.
Question:
As of now, an object of SingleSig class is instantiated in the for loop from scratch for any given point in time; thus, not having "memory", which totally cancels the merits of having a class, a bare function will do. Does there exist a workaround (a different class template, a different approach, etc) so that:
combined tuple signal can be queried for its value at a particular point in time i.
"memory" can be reset; i.e. e.g. MultiSig(sig_tuple).memory_field can be set to 0 at a constituent signals levels.
Following code adds a memory to the signals which can be wiped using MultiSig.reset() to reset the count of all signals to 0. The memory can be queried using MultiSig.query_memory(key) to return the number of hits for that signal at that time.
For the memory function to work, I had to add unique keys to the signals to identify them.
from numba import njit, int64, float64, types
from numba.types import Array, string, boolean
from numba import jitclass
import numpy as np
np.random.seed(42)
x = np.random.randn(1000000)
y = np.random.randn(1000000)
z = np.random.randn(1000000)
# Example of "human-readable" signals
entry_sig_ = ((x,y,'crossup',False),)
exit_sig_ = ((x,z,'crossup',False), 'or_',(x,y,'crossdown',False))
# Turn signals into homogeneous tuple
#entry_sig_
entry_sig = (((x,y,'crossup',False),'NOP','1'),)
#exit_sig_
exit_sig = (((x,z,'crossup',False),'or_','2'),((x,y,'crossdown',False),'NOP','3'))
#njit
def cross(x, y, i):
'''
x,y: np.array
i: int - point in time
Returns: 1 or 0 when condition is met
'''
if (x[i - 1] - y[i - 1])*(x[i] - y[i]) < 0:
out = 1
else:
out = 0
return out
kv_ty = (types.string,types.int64)
spec = [
('memory', types.DictType(*kv_ty)),
]
#njit
def single_signal(x, y, how, acc, i):
'''
i: int - point in time
Returns either signal or accumulator
'''
if cross(x, y, i):
if x[i] < y[i] and how == 'crossdown':
out = 1
elif x[i] > y[i] and how == "crossup":
out = 1
else:
out = 0
else:
out = 0
return out
#jitclass(spec)
class MultiSig:
def __init__(self,entry,exit):
'''
initialize memory at single signal level
'''
memory_dict = {}
for i in entry:
memory_dict[str(i[2])] = 0
for i in exit:
memory_dict[str(i[2])] = 0
self.memory = memory_dict
def reduce_sig(self, sig, i):
'''
Parses multisignal
sig: homogeneous tuple of tuples ("human-readable" signal definition)
i: int - point in time
Returns: resulting value of multisignal
'''
L = len(sig)
out = single_signal(*sig[0][0],i)
logic = sig[0][1]
if out:
self.update_memory(sig[0][2])
for cnt in range(1, L):
s = single_signal(*sig[cnt][0],i)
if s:
self.update_memory(sig[cnt][2])
out = out | s if logic == 'or_' else out & s
logic = sig[cnt][1]
return out
def update_memory(self, key):
'''
update memory
'''
self.memory[str(key)] += 1
def reset(self):
'''
reset memory
'''
dicti = {}
for i in self.memory:
dicti[i] = 0
self.memory = dicti
def query_memory(self, key):
'''
return number of hits on signal
'''
return self.memory[str(key)]
#njit
def run(x, entry_sig, exit_sig):
'''
x: np.array
entry_sig, exit_sig: homogeneous tuples of tuples
Returns: sequence of 0 and 1 satisfying entry and exit sigs
'''
L = x.shape[0]
out = np.empty(L)
out[0] = 0.0
out[-1] = 0.0
i = 1
multi = MultiSig(entry_sig,exit_sig)
while i < L-1:
out[i] = 0.0
if multi.reduce_sig(entry_sig,i) and i<L-1:
out[i] = 1.0
trade = True
while trade and i<L-2:
i += 1
out[i] = 1.0
if multi.reduce_sig(exit_sig,i):
trade = False
i+= 1
return out
run(x, entry_sig, exit_sig)
To reiterate what I said in the comments, | and & are bitwise operators, not logical operators. 1 & 2 outputs 0/False which is not what I believe you want this to evaluate to so I made sure the out and s can only be 0/1 in order for this to produce the expected output.
You are aware that the because of:
out = out | s if logic == 'or_' else out & s
the order of the time-series inside entry_sig and exit_sig matters?
Let (output, logic) be tuples where output is 0 or 1 according to how crossup and crossdown would evalute the passed information of the tuple and logic is or_ or and_.
tuples = ((0,'or_'),(1,'or_'),(0,'and_'))
out = tuples[0][0]
logic = tuples[0][1]
for i in range(1,len(tuples)):
s = tuples[i][0]
out = out | s if logic == 'or_' else out & s
out = s
logic = tuples[i][1]
print(out)
0
changing the order of the tuple yields the other signal:
tuples = ((0,'or_'),(0,'and_'),(1,'or_'))
out = tuples[0][0]
logic = tuples[0][1]
for i in range(1,len(tuples)):
s = tuples[i][0]
out = out | s if logic == 'or_' else out & s
out = s
logic = tuples[i][1]
print(out)
1
The performance hinges on how many times the count needs to be updated. Using n=1,000,000 for all three time series, your code had a mean run-time of 0.6s on my machine, my code had 0.63s.
I then changed the crossing logic up a bit to save the number of if/else so that the nested if/else is only triggered if the time-series crossed which can be checked by one comparison only. This further halved the difference in run-time so above code now sits at 2.5% longer run-time your original code.
I have the following polynomial equation that I would like to find the local minima and maxima for.
I defined the function as follows. It uses a flatten function to flatten the nested list, I'll include it for testing purposes (found it here http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html)
flatten list
from itertools import combinations
import math
def flatten(l, ltypes=(list, tuple)):
ltype = type(l)
l = list(l)
i = 0
while i < len(l):
while isinstance(l[i], ltypes):
if not l[i]:
l.pop(i)
i -= 1
break
else:
l[i:i + 1] = l[i]
i += 1
return ltype(l)
my polynomial
def poly(coefficients, factors):
#quadratic terms
constant = 1
singles = factors
products = [math.prod(c) for c in combinations(factors, 2)]
squares = [f**2 for f in factors]
sequence = flatten([constant, singles, products, squares])
z = sum([math.prod(i) for i in zip(coefficients, sequence)])
return z
The arguments it takes is a list of coefficients, for example:
coefs = [12.19764959, -1.8233151, 2.50952816,-1.56344375, 1.00003828, -1.72128301, -2.54254877, -1.20377309, 5.53510616, 2.94755653, 4.83759279, -0.85507208, -0.48007208, -3.70507208, -0.27007208]
And a list of factor or variable values:
factors = [0.4714, 0.4714, -0.4714, 0.4714]
Plug these in and it calculates the result of the polynomial. The reason I wrote it like this is because the number of variables (factors) changes from fit to fit, so I wanted to keep it flexible. I now want to find the combination of "factors" values within a certain range (let's say between -1 and 1) where the function reaches its maximum and minimum values. If the function was "hard coded" I could use scipy.optimize, but I can't figure out how to make it works as is.
Another option is a brute force grid search (which I use at the moment), but it's very slow as soon as you have more than 2 variables, especially with small step sizes. There may be no true minima/maxima where slope == 0 within the bounds, but as long as I can get the maximum and minimum values that is OK.
Ok, I figured it out. It was two really silly things:
the order of the arguments in the function had to be reversed, so that the first argument (the one I wanted to optimize for) were the "factors" or the X values, followed by the coefficients. That way an array of the same size could be used as the X0 and the coefficients could be used as args.
That wassn't enough, as the function would return an array if an array was the input. I just added a factors = list(factors) to the function itself to put it into the correct shape.
The new function:
def poly(factors, coefficients):
factors = list(factors)
#quadratic terms
constant = 1
singles = factors
products = [math.prod(c) for c in combinations(factors, 2)]
squares = [f**2 for f in factors]
sequence = flatten([constant, singles, products, squares])
z = sum([math.prod(i) for i in zip(coefficients, sequence)])
return z
And the optimization:
coefs = [4.08050532, -0.47042713, -0.08200181, -0.54184481, -0.18515675,
-0.96751856, -1.10814625, -1.7831592, 5.2763512, 2.83505438, 4.7082153,
0.22988773, 1.06488773, -0.70011227, 1.42988773]
x0 = [0.1, 0.1, 0.1, 0.1]
minimize(poly,x0 = x0, args = coefs, bounds = ((-1,1),(-1,1),(-1,1),(-1,1)))
Which returns:
fun: -1.6736636102536673
hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
jac: array([-2.10611305e-01, 2.19138777e+00, -8.16990766e+00, -1.11022302e-07])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 85
nit: 12
njev: 17
status: 0
success: True
x: array([1., -1.,1., 0.03327357])
I started learning Python < 2 weeks ago.
I'm trying to make a function to compute a 7 day moving average for data. Something wasn't going right so I tried it without the function.
moving_average = np.array([])
i = 0
for i in range(len(temp)-6):
sum_7 = np.array([])
avg_7 = 0
missing = 0
total = 7
j = 0
for j in range(i,i+7):
if pd.isnull(temp[j]):
total -= 1
missing += 1
if missing == 7:
moving_average = np.append(moving_average, np.nan)
break
if not pd.isnull(temp[j]):
sum_7 = np.append(sum_7, temp[j])
if j == (i+6):
avg_7 = sum(sum_7)/total
moving_average = np.append(moving_average, avg_7)
If I run this and look at the value of sum_7, it's just a single value in the numpy array which made all the moving_average values wrong. But if I remove the first for loop with the variable i and manually set i = 0 or any number in the range of the data set and run the exact same code from the inner for loop, sum_7 comes out as a length 7 numpy array. Originally, I just did sum += temp[j] but the same problem occurred, the total sum ended up as just the single value.
I've been staring at this trying to fix it for 3 hours and I'm clueless what's wrong. Originally I wrote the function in R so all I had to do was convert to python language and I don't know why sum_7 is coming up as a single value when there are two for loops. I tried to manually add an index variable to act as i to use it in the range(i, i+7) but got some weird error instead. I also don't know why that is.
https://gyazo.com/d900d1d7917074f336567b971c8a5cee
https://gyazo.com/132733df8bbdaf2847944d1be02e57d2
Hey you can using rolling() function and mean() function from pandas.
Link to the documentation :
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rolling.html
df['moving_avg'] = df['your_column'].rolling(7).mean()
This would give you some NaN values also, but that is a part of rolling mean because you don't have all past 7 data points for first 6 values.
Seems like you misindented the important line:
moving_average = np.array([])
i = 0
for i in range(len(temp)-6):
sum_7 = np.array([])
avg_7 = 0
missing = 0
total = 7
j = 0
for j in range(i,i+7):
if pd.isnull(temp[j]):
total -= 1
missing += 1
if missing == 7:
moving_average = np.append(moving_average, np.nan)
break
# The following condition should be indented one more level
if not pd.isnull(temp[j]):
sum_7 = np.append(sum_7, temp[j])
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
if j == (i+6):
# this ^ condition does not do what you meant
# you should use a flag instead
avg_7 = sum(sum_7)/total
moving_average = np.append(moving_average, avg_7)
Instead of a flag you can use a for-else construct, but this is not readable. Here's the relevant documentation.
Shorter way to do this:
moving_average = np.array([])
for i in range(len(temp)-6):
ngram_7 = [t for t in temp[i:i+7] if not pd.isnull(t)]
average = (sum(ngram_7) / len(ngram_7)) if ngram_7 else np.nan
moving_average = np.append(moving_average, average)
This could be refactored further:
def average(ngram):
valid = [t for t in temp[i:i+7] if not pd.isnull(t)]
if not valid:
return np.nan
return sum(valid) / len(valid)
def ngrams(seq, n):
for i in range(len(seq) - n):
yield seq[i:i+n]
moving_average = [average(k) for k in ngrams(temp, 7)]
The following is my script. Each equal part has self.number samples, in0 is input sample. There is an error as follows:
pn[i] = pn[i] + d
IndexError: list index out of range
Is this the problem about the size of pn? How can I define a list with a certain size but no exact number in it?
for i in range(0,len(in0)/self.number):
pn = []
m = i*self.number
for d in in0[m: m + self.number]:
pn[i] += d
if pn[i] >= self.alpha:
out[i] = 1
elif pn[i] <= self.beta:
out[i] = 0
else:
if pn[i] >= self.noise:
out[i] = 1
else:
out[i] = 0
if pn[i] >= self.noise:
out[i] = 1
else:
out[i] = 0
There are a number of problems in the code as posted, however, the gist seems to be something that you'd want to do with numpy arrays instead of iterating over lists.
For example, the set of if/else cases that check if pn[i] >= some_value and then sets a corresponding entry into another list with the result (true/false) could be done as a one-liner with an array operation much faster than iterating over lists.
import numpy as np
# for example, assuming you have 9 numbers in your list
# and you want them divided into 3 sublists of 3 values each
# in0 is your original list, which for example might be:
in0 = [1.05, -0.45, -0.63, 0.07, -0.71, 0.72, -0.12, -1.56, -1.92]
# convert into array
in2 = np.array(in0)
# reshape to 3 rows, the -1 means that numpy will figure out
# what the second dimension must be.
in2 = in2.reshape((3,-1))
print(in2)
output:
[[ 1.05 -0.45 -0.63]
[ 0.07 -0.71 0.72]
[-0.12 -1.56 -1.92]]
With this 2-d array structure, element-wise summing is super easy. So is element-wise threshold checking. Plus 'vectorizing' these operations has big speed advantages if you are working with large data.
# add corresponding entries, we want to add the columns together,
# as each row should correspond to your sub-lists.
pn = in2.sum(axis=0) # you can sum row-wise or column-wise, or all elements
print(pn)
output: [ 1. -2.72 -1.83]
# it is also trivial to check the threshold conditions
# here I check each entry in pn against a scalar
alpha = 0.0
out1 = ( pn >= alpha )
print(out1)
output: [ True False False]
# you can easily convert booleans to 1/0
x = out1.astype('int') # or simply out1 * 1
print(x)
output: [1 0 0]
# if you have a list of element-wise thresholds
beta = np.array([0.0, 0.5, -2.0])
out2 = (pn >= beta)
print(out2)
output: [True False True]
I hope this helps. Using the correct data structures for your task can make the analysis much easier and faster. There is a wealth of documentation on numpy, which is the standard numeric library for python.
You initialize pn to an empty list just inside the for loop, never assign anything into it, and then attempt to access an index i. There is nothing at index i because there is nothing at any index in pn yet.
for i in range(0, len(in0) / self.number):
pn = []
m = i*self.number
for d in in0[m: m + self.number]:
pn[i] += d
If you are trying to add the value d to the pn list, you should do this instead:
pn.append(d)