This is what my code looks like when simplified:
# This function returns some value depending on the index (integer)
# with which it is called.
def funct(index):
value <-- some_process[index]
# Return value for this index.
return value
where the indexes allowed are stored in a list:
# List if indexes.
x = [0,1,2,3,...,1000]
I need to find the x index that returns the minimum value for funct. I could just apply a brute force approach and loop through the full x list storing all values in new a list and then simply find the minimum with np.argmin():
list_of_values = []
for indx in x:
f_x = funct(x)
list_of_values.append(f_x)
min_value = np.argmin(list_of_values)
I've tried this and it works, but it becomes quite expensive when x has too many elements. I'm looking for a way to optimize this process.
I know that scipy.optimize has some optimization functions to find a global minimum like anneal and basin-hopping but I've failed to correctly apply them to my code.
Can these optimization tools be used when I can only call a function with an integer (the index) or do they require the function to be continuous?
the python builtin min accepts a key function:
min_idx = min(x, key=funct)
min_val = funct(min_idx)
This gives you an O(n) solution implemented about as well as you're going to get in python.
Related
Let's say I have inputs 'A' and 'B' for my function, which outputs 'C'. For each value of A, I would like to find what value of B results in the maximum value of C; I would then like to record values B and C. Is there a function that can perform this action? Perhaps something which depends on convergence mechanisms?
*in case you found this through one of the non-python related tags I applied, please make note that I am using python 3.x
Let's define function to take parameters (A,B) and return a value C. We can optimize this with Python by doing
from scipy import optimize
f = lambda a,b: ... # your_code_which_returns_C
optimal_vals = np.zeros((2, len(list_of_all_A_values)))
for i, a in enumerate(list_of_all_A_values) # assuming some list is defined above
b_opt, c_opt, *rest = optimize.fmin(lambda b: -f(a,b), 0)
optimal_vals[:,i] = np.array([b_opt, c_opt])
This takes advantage of scipy's fmin function, which relies on the convergence of the downhill simplex algorithm. For this reason, it's crucial to not forget the minus sign on .
I'm looking for a better, faster way to center a couple of lists. Right now I have the following:
import random
m = range(2000)
sm = sorted(random.sample(range(100000), 16000))
si = random.sample(range(16005), 16000)
# Centered array.
smm = []
print sm
print si
for i in m:
if i in sm:
smm.append(si[sm.index(i)])
else:
smm.append(None)
print m
print smm
Which in effect creates a list (m) containing a range of random numbers to center against, another list (sm) from which m is centered against and a list of values (si) to append.
This sample runs fairly quickly, but when I run a larger task with much more variables performance slows to a standstill.
your mainloop contains this infamous line:
if i in sm:
it seems to be nothing but since sm is a result of sorted it is a list, hence O(n) lookup, which explains why it's slow with a big dataset.
Moreover you're using the even more infamous si[sm.index(i)], which makes your algorithm O(n**2).
Since you need the indexes, using a set is not so easy, and there's better to do:
Since sm is sorted, you could use bisect to find the index in O(log(n)), like this:
for i in m:
j = bisect.bisect_left(sm,i)
smm.append(si[j] if (j < len(sm) and sm[j]==i) else None)
small explanation: bisect gives you the insertion point of i in sm. It doesn't mean that the value is actually in the list so we have to check that (by checking if the returned value is within existing list range, and checking if the value at the returned index is the searched value), if so, append, else append None.
I'd like to figure out how to code the following pseudo-code:
# Base-case
u_0(x) = x^3
for i in [0,5):
u_(i+1)(x) = u_(i)(x)^2
So that in the end I can call u_5(x), for example.
The difficulty I'm having with accomplishing the above is finding a way to index Python functions by i so that I can iteratively define each function.
I tried using recursion with two functions in place of indexing but I get "maximum recursion depth exceeded".
Here is a minimal working example:
import math
import sympy as sym
a,b = sym.symbols('x y')
def f1(x,y):
return sym.sin(x) + sym.cos(y)*sym.tan(x*y)
for i in range(0,5):
def f2(x,y):
return sym.diff(f1(x,y),x) + sym.cos(sym.diff(f1(x,y),y,y))
def f1(x,y):
return f2(x,y)
print(f2(a,b))
Yes, the general idea would be to "index" the results in order to avoid recalculating them. The simplest way to achieve that is to "memoize", meaning telling a function to remember the result for values it has already calculated.
If f(i+1) is based on f(i) where i is a natural number, that can be especially effective.
In Python3, doing it for a 1 variable function is surprisingly simple, with a decorator:
import functools
#functools.lru_cache(maxsize=None)
def f(x):
.....
return ....
To know more about this, you can consult
What is memoization and how can I use it in Python?. (If you are using Python 2.7, there is also a way to do it with a prepackaged decorator.)
Your specific case (if my understanding of your pseudo-code is correct) relies on a two variables function, where i is an integer variable and x is a symbol (i.e. not supposed to be resolved here). So you would need to memoize along i.
To avoid blowing the stack up when you brutally ask for the image of 5 (not sure why, but no doubt there is more recursion than meets the eye), then use a for loop to calculate your images on the range from 0 to 5 (in that order: 0, 1, 2...).
I hope this helps.
The answer is actually pretty simple:
Pseudocode:
u_0(x) = x^3
for i in [0,5):
u_(i+1)(x) = u_(i)(x)^2
Actual code:
import sympy as sym
u = [None]*6 #Creates an empty array of 6 entries, i.e., u[0], u[1], ..., u[5]
x=sym.symbols('x')
u[0] = lambda x: x**3
for i in range(0,5):
u[i+1] = lambda x, i=i: (u[i](x))**2 #the i=i in the argument of the lambda function is
#necessary in Python; for more about this, see this question.
#Now, the functions are stores in the array u. However, to call them (e.g., evaluate them,
#plot them, print them, etc) requires that we "lambdify" them, i.e., replace sympy
#functions with numpy functions, which the following for loop accomplishes:
for i in range(0,6):
ulambdified[i] = sym.lambdify(x,u[i](x),"numpy")
for i in range(0,6):
print(ulambdified[i](x))
I have a list of numbers, with sample mean and SD for these numbers. Right now I am trying to find out the numbers out of mean+-SD,mean +-2SD and mean +-3SD.
For example, in the part of mean+-SD, i made the code like this:
ND1 = [np.mean(l)+np.std(l,ddof=1)]
ND2 = [np.mean(l)-np.std(l,ddof=1)]
m=sorted(l)
print(m)
ND68 = []
if ND2 > m and m< ND1:
ND68.append(m<ND2 and m>ND1)
print (ND68)
Here is my question:
1. Could number be calculated by the list and arrange. If so, which part I am doing wrong. Or there is some package I can use to solve this.
This might help. We will use numpy to grab the values you are looking for. In my example, I create a normally distributed array and then use boolean slicing to return the elements that are outside of +/- 1, 2, or 3 standard deviations.
import numpy as np
# create a random normally distributed integer array
my_array = np.random.normal(loc=30, scale=10, size=100).astype(int)
# find the mean and standard dev
my_mean = my_array.mean()
my_std = my_array.std()
# find numbers outside of 1, 2, and 3 standard dev
# the portion inside the square brackets returns an
# array of True and False values. Slicing my_array
# with the boolean array return only the values that
# are True
out_std_1 = my_array[np.abs(my_array-my_mean) > my_std]
out_std_2 = my_array[np.abs(my_array-my_mean) > 2*my_std]
out_std_3 = my_array[np.abs(my_array-my_mean) > 3*my_std]
You are on the right track there. You know the mean and standard deviation of your list l, though I'm going to call it something a little less ambiguous, say, samplePopulation.
Because you want to do this for several intervals of standard deviation, I recommend crafting a small function. You can call it multiple times without too much extra work. Also, I'm going to use a list comprehension, which is just a for loop in one line.
import numpy as np
def filter_by_n_std_devs(samplePopulation, numStdDevs):
# you mostly got this part right, no need to put them in lists though
mean = np.mean(samplePopulation) # no brackets needed here
std = np.std(samplePopulation) # or here
band = numStdDevs * std
# this is the list comprehension
filteredPop = [x for x in samplePopulation if x < mean - band or x > mean + band]
return filteredPop
# now call your function with however many std devs you want
filteredPopulation = filter_by_n_std_devs(samplePopulation, 1)
print(filteredPopulation)
Here's a translation of the list comprehension (based on your use of append it looks like you may not know what these are, otherwise feel free to ignore).
# remember that you provide the variable samplePopulation
# the above list comprehension
filteredPop = [x for x in samplePopulation if x < mean - band or x > mean + band]
# is equivalent to this:
filteredPop = []
for num in samplePopulation:
if x < mean - band or x > mean + band:
filteredPop.append(num)
So to recap:
You don't need to make a list object out of your mean and std calculations
The function call let's you plug in your samplePopulation and any number of standard deviations you want without having to go in and manually change the value
List comprehensions are one line for loops, more or less, and you can even do the filtering you want right inside it!
I am trying to run through a list and delete elements that do not meet a certain threshold but i am receiving error 'float' object does not support item deletion when I try to delete.
Why am i getting this error? Is there anyway to delete items from lists like this for floats?
Relevant Code:
def remove_abnormal_min_max(distances, avgDistance):
#Define cut off for abnormal roots
cutOff = 0.20 * avgDistance # 20 percent of avg distance
for indx, distance in enumerate(distances): #for all the distances
if(distance <= cutOff): #if the distance between min and max is less than or equal to cutOff point
del distance[indx] #delete this distance from the list
return distances
Your list of float values is called distances (plural), each individual float value from that sequence is called distance (singular).
You are trying to use the latter, rather than the former. del distance[indx] fails because that is the float value, not the list object.
All you need to do is add the missing s:
del distances[indx]
# ^
However, now you are modifying the list in place, shortening it as you loop. This'll cause you to miss elements; items that were once at position i + 1 are now at i while the iterator happily continues at i + 1.
The work-around to that is to build a new list object with everything you wanted to keep instead:
distances = [d for d in distances if d > cutOff]
You mentioned in your comment that you need to reuse the index of the deleted distance. You can build a list of all the indxs you need at once using a list comprehension:
indxs = [k for k,d in enumerate(distances) if d <= cutOff]
And then you can iterate over this new list to do the other work you need:
for indx in indxs:
del distances[indx]
del otherlist[2*indx, 2*indx+1] # or whatever
You may also be able to massage your other work into another list comprehension:
indxs = [k for k,d in enumerate distances if d > cutOff] # note reversed logic
distances = [distances[indx] for indx in indxs] # one statement so doesn't fall in the modify-as-you-iterate trap
otherlist = [otherlist[2*indx, 2*indx+1] for indx in indxs]
As an aside, if you are using NumPy, which is a numerical and scientific computing package for Python, you can take advantage of boolean arrays and what they call smart indexing and use indxs directly to access your list:
import numpy as np
distances = np.array(distances) # convert to a numpy array so we can use smart indexing
keep = ~(distances > cutOff)
distances = distances[keep] # this won't work on a regular Python list