Suppose I have a (multivariate) function
f(x,data)
that I want to minimise, where x is the parameter vector. Ordinarily in Python I could do something like this:
scipy.optimize.minimize(f, x0, args=my_data)
However, suppose I now want to do this repeatedly, for lots of different realisations of my_data (this is for some statistical simulations so I need lots of realisations to study the behaviour of test statistics and so on).
I could do it in a big loop, but this is really slow if I have tens of thousands, up to millions, of data realisations.
So, I am wondering if there is some clever way to vectorise this rather than using a loop. I have one vague idea; say I have N data realisations and p parameters. I could make an enormous combined function with N*p parameters, which accepts a data vector of size N, and finds the minimum of this giant function all at once, where the global minimum of this giant function minimises all the individual functions simultaneously.
However this sounds like a difficult problem for most multivariate minimisers to handle. Yet, the structure of the problem is fairly simple, since each block of p parameters can be minimised entirely independently. So, I wonder, is there an algorithm in scipy or somewhere that can make use of this known dependency structure between the parameters? Or, if not, is there some other smart way to achieve a speedy minimisation of the same function repeatedly?
Related
I am just learning to use dask and read many threads on this forum related to Dask and for loops. But I am still unclear how to apply those solutions to my problem. I am working with climate data that are functions of (time, depth, location). The 'location' coordinate is a linear index such that each value corresponds to a unique (longitude, latitude). I am showing below a basic skeleton of what I am trying to do, assuming var1 and var2 are two input variables. I want to parallelize over the location parameter 'nxy', as my calculations can proceed simultaneously at different locations.
for loc in range(0,nxy): # nxy = total no. of locations
for it in range(0,ntimes):
out1 = expression1 involving ( var1(loc), var2(it,loc) )
out2 = expression2 involving ( var1(loc), var2(it,loc) )
# <a dozen more output variables>
My questions:
(i) Many examples illustrating the use of 'delayed' show something like "delayed(function)(arg)". In my case, I don't have too many (if any) functions, but lots of expressions. If 'delayed' only operates at the level of functions, should I convert each expression into a function and add a 'delayed' in front?
(ii) Should I wrap the entire for loop shown above inside a function and then call that function using 'delayed'? I tried doing something like this but might not be doing it correctly as I did not get any speed-up compared to without using dask. Here's what I did:
def test_dask(n):
for loc in range(0,n):
# same code as before
return var1 # just returning one variable for now
var1=delayed(tast_dask)(nxy)
var1.compute()
Thanks for your help.
Every delayed task adds about 1ms of overhead. So if your expression is slow (maybe you're calling out to some other expensive function), then yes dask.delayed might be a good fit. If not, then you should probably look elsewhere.
In particular, it looks like you're just iterating through a couple arrays and operating element by element. Please be warned that Python is very slow at this. You might want to not use Dask at all, but instead try one of the following approaches:
Find some clever way to rewrite your computation with Numpy expressions
Use Numba
Also, given the terms your using like lat/lon/depth, it may be that Xarray is a good project for you.
In python, I'd like to minimize hundreds of thousands of scalar valued functions. Is there a faster way than a for-loop over a def of the objective function and a call to scipy.optimize.minimize_scalar? Basically a vectorized version, where I could give a function foo and a 2D numpy array where each row is given as extra data to the objective function.
This is not possible currently in scipy, without using a workaround like multiprecessing, threads, or converting the many one dimensional problems to a giant multidimensional one. There is however a pull request currently open to adress this.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
My function gets a combinations of numbers, can be small or large depending on user input, and will loop through each combinations to perform some operations.
Below is a line-profiling I ran on my function and it is takes 0.336 seconds to run. While this is fast, this is only a subset of a bigger framework. In this bigger framework, I will need to run this function 50-20000 times which when multiplied to 0.336, takes 16.8 to 6720 seconds (I hope this is right). Before it takes 0.996 seconds but I've manage to cut it by half through avoiding functions calls.
The major contributor to time is the two __getitem__ which is accessing dictionary for information N times depending on the number of combinations. My dictionary is a collection data and it looks something like this:
dic = {"array1", a,
"array2", b,
"array3", c,
"listofarray", [ [list of 5 array], [list of 5 array], [list of 5 2d Array ] ]
}
I was able to cut it by another ~0.01 seconds when I placed the dictionary lookback outside of the loop..
x = dic['listofarray']['list of 5 2d array']
So when I loop to get access to the the 5 different elements I just did x[i].
Other than that I am lost in terms of where to add more performance boost.
Note: I apologize that I haven't provided any code. I'd love to show but its proprietary. I just wanted to get some thoughts on whether I am looking at the right place for speed ups.
I am willing to learn and apply new things so if cython or some other data structure can speed things up, i am all ears. Thanks so much
PS:
inside my first __getitem__:
inside my second __getitem__:
EDIT:
I am using iter tools product(xrange(10), repeat=len(food_choices)) and iterating over this. I covert everything into numpy arrays np.array(i).astype(float).
The major contributor to time is the two __getitem__ which is accessing dictionary for information N times depending on the number of combinations.
No it isn't. Your two posted profile traces clearly show that they're NumPy/Pandas __getitem__ functions, not dict.__getitem__. So, you're trying to optimize the wrong place.
Which explains why moving all the dict stuff out of the loop made a difference of a small fraction of a percent.
Most likely the problem is that you're looping over some NumPy object, or using some fake-vectorized function (e.g., via vectorize), rather than performing some NumPy-optimized broadcasting operation. That's what you need to fix.
For example, if you compare these:
np.vectorize(lambda x: x*2)(a)
a * 2
… the second one will go at least 10x faster on any sizable array, and it's mostly because of all the time spending doing __getitem__—which includes boxing up numbers to be usable by your Python function. (There's also some additional cost in not being able to use CPU-vectorized operations, cacheable tight loops, etc., but even if you arrange things to be complicated enough that those don't enter into it, you're still going to get much faster code.)
Meanwhile:
I am using itertools.product(xrange(10), repeat=len(food_choices)) and iterating over this. I covert everything into numpy arrays np.array(i).astype(float).
So you're creating 10**n separate n-element arrays? That's not making sensible use of NumPy. Each array is tiny, and most likely you're spending as much time building and pulling apart the arrays as you are doing actual work. Do you have the memory to build a single giant array with an extra 10**n-long axis instead? Or, maybe, batch it up into groups of, say, 100K? Because then you could actually build and process the whole array in native NumPy-vectorized code.
However, the first thing you might want to try is to just run your code in PyPy instead of CPython. Some NumPy code doesn't work right with PyPy/NumPyPy, but there's fewer problems with each version, so you should definitely try it.
If you're lucky (and there's a pretty good chance of that), PyPy will JIT the repeated __getitem__ calls inside the loop, and make it much faster, with no change in your code.
If that helps (or if NumPyPy won't work on your code), Cython may be able to help more. But only if you do all the appropriate static type declarations, etc. And often, PyPy already helps enough that you're done.
The functions min and max are very flexible; they can take any number of parameters, or a single parameter that is an iterable. any and all are similar in taking an iterable of any size, but they do not take more than one parameter. Is there a reason for this difference in behavior?
I realize that the question might seem unanswerable, but the process of enhancing Python is pretty open; many seemingly arbitrary design decisions are part of the public record. I've seen similar questions answered in the past, and I'm hoping this one can be as well.
Inspired by this question: Is there a builtin function version of and and/or or in Python?
A lot of the features in Python are suggested based on how much users need them, however they must also conform to the style of the language. People often need to do this:
max_val = 0
for x in seq:
# ... do complex calculations
max_val = max(max_val, result)
which warrants the use of the multiple parameters. It also looks good. I haven't heard of anyone needing to use any(x, y, z) because it is most often used on sequences. For a small number of values you can just use the and/or logical operators and for a lot of values you really should be using a list anyway or your code gets messy. I'm certain that not much thought has gone into this because it really wouldn't benefit anyone, it hasn't been under large demand so the Python devs don't worry about it.
I have been trying to optimize this code:
def spectra(mE, a):
pdf = lambda E: np.exp(-(a+1)*E/mE)*(((a+1)*E/mE)**(a+1))/(scipy.special.gamma(a+1)*E)
u=[]
n=np.random.uniform()
E=np.arange(0.01,150, 0.1)
for i in E:
u.append(scipy.integrate.quad(pdf,0,i)[0])
f=interp1d(u, E)
return f(n)
I was trying to create a lookup table out of f, but it appears that every time I call the function it does the integration all over again. Is there a way to put in something like an if statement, which will let me create f once for values of mE and a and then just call that afterwards?
Thanks for the help.
Cheers.
It sounds like what you want to do is return a known value if the function is re-called with the same (mE, a) values, and perform the calculation if the input is new.
That is called memoization. See for example What is memoization and how can I use it in Python? . Note that in modern versions of Python, you can create a decorator to apply the memoization of the function, which lets you be a little neater.
Most probably you'll not be able to store values of spectra(x, y) and reasonably retrieve them by exact values of floating-point x and y. You rarely encounter exact same floating point values in real life.
Note that I don't think you can cache f directly, becuse it depends on a long list of floats. Possible input space of it is so large that finding a close match seems very improbable to me.
If you cache values of spectra() you could retrieve the value for a close enough pair of arguments with a reasonable probability.
The problem is searching for such close pairs. A hash table cannot work (we need imprecise matches), an ordered list and binary search cannot work either (we have 2 dimensions). I'd use a quad tree or some other form of spatial index. You can build it dynamically and efficiently search for closest known points near your given point.
If you found cached a point really close to the one you need, you can just return the cached value. If no point is close enough, you add it to the index, in hope that it will be reused in the future. Maybe you could even interpolate if your point is between two known points close by.
The big prerequisite, of course, is that sufficient number of points in the cache has a chance to be reused. To estimate this, run your some of your calculation and store (mE, a) pairs somewhere (e.g in a file), then plot them. You'll instantly see if you have groups of points close to one another. You can look for tight clusters without plotting, too, of course. If you have enough clusters that are tight (where you could reuse one point's value for another), your cache will work. If not, don't bother implementing it.