In python, I'd like to minimize hundreds of thousands of scalar valued functions. Is there a faster way than a for-loop over a def of the objective function and a call to scipy.optimize.minimize_scalar? Basically a vectorized version, where I could give a function foo and a 2D numpy array where each row is given as extra data to the objective function.
This is not possible currently in scipy, without using a workaround like multiprecessing, threads, or converting the many one dimensional problems to a giant multidimensional one. There is however a pull request currently open to adress this.
Related
I'm writing a python script which takes signals from various different components, such as an accelerometer, GPS position data etc and then saves all of this data in one class (i'm calling a signal class). This way, it doesnt matter where the acceleration data is coming from, because it will all be processed into the same format. Every time there is a new piece of data, I currently add this new value to an expanding list. I chose a list as I belive it is dynamic, so you can add data without to much computational cost. Comapred to, numpy arrays which are static.
However, I need to also perform mathematical operations on these datasets in near live-time. Would it be faster to:
Store the data initially as a numpy array, and expand it as data is added
Store the data as an expanding list, and every time some math needs to be performed on the data convert what is needed into a numpy array and then use numpy functions
Keep all of the data as lists, and write custom functions to perform the math.
Some other method, that I dont know about?
The update times vary, depending on where the data comes from, from anywhere between 1Hz to 1000Hz.
Suppose I have a (multivariate) function
f(x,data)
that I want to minimise, where x is the parameter vector. Ordinarily in Python I could do something like this:
scipy.optimize.minimize(f, x0, args=my_data)
However, suppose I now want to do this repeatedly, for lots of different realisations of my_data (this is for some statistical simulations so I need lots of realisations to study the behaviour of test statistics and so on).
I could do it in a big loop, but this is really slow if I have tens of thousands, up to millions, of data realisations.
So, I am wondering if there is some clever way to vectorise this rather than using a loop. I have one vague idea; say I have N data realisations and p parameters. I could make an enormous combined function with N*p parameters, which accepts a data vector of size N, and finds the minimum of this giant function all at once, where the global minimum of this giant function minimises all the individual functions simultaneously.
However this sounds like a difficult problem for most multivariate minimisers to handle. Yet, the structure of the problem is fairly simple, since each block of p parameters can be minimised entirely independently. So, I wonder, is there an algorithm in scipy or somewhere that can make use of this known dependency structure between the parameters? Or, if not, is there some other smart way to achieve a speedy minimisation of the same function repeatedly?
I wrote a program using normal Python, and I now think it would be a lot better to use numpy instead of standard lists. The problem is there are a number of things where I'm confused how to use numpy, or whether I can use it at all.
In general how do np.arrays work? Are they dynamic in size like a C++ vector or do I have declare their length and type beforehand like a standard C++ array? In my program I've got a lot of cases where I create a list
ex_list = [] and then cycle through something and append to it ex_list.append(some_lst). Can I do something like with a numpy array? What if I knew the size of ex_list, could I declare and empty one and then add to it?
If I can't, let's say I only call this list, would it be worth it to convert it to numpy afterwards, i.e. is calling a numpy list faster?
Can I do more complicated operations for each element using a numpy array (not just adding 5 to each etc), example below.
full_pallete = [(int(1+i*(255/127.5)),0,0) for i in range(0,128)]
full_pallete += [col for col in right_palette if col[1]!=0 or col[2]!=0 or col==(0,0,0)]
In other words, does it make sense to convert to a numpy array and then cycle through it using something other than for loop?
Numpy arrays can be appended to (see http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html), although in general calling the append function many times in a loop has a heavy performance cost - it is generally better to pre-allocate a large array and then fill it as necessary. This is because the arrays themselves do have fixed size under the hood, but this is hidden from you in python.
Yes, Numpy is well designed for many operations similar to these. In general, however, you don't want to be looping through numpy arrays (or arrays in general in python) if they are very large. By using inbuilt numpy functions, you basically make use of all sorts of compiled speed up benefits. As an example, rather than looping through and checking each element for a condition, you would use numpy.where().
The real reason to use numpy is to benefit from pre-compiled mathematical functions and data processing utilities on large arrays - both those in the core numpy library as well as many other packages that use them.
I will need to create array of integer arrays like [[0,1,2],[4,4,5,7]...[4,5]]. The size of internal arrays changeable. Max number of internal arrays is 2^26. So what do you recommend for the fastest way for updating this array.
When I use list=[[]] * 2^26 initialization is very fast but update is very slow. Instead I use
list=[] , for i in range(2**26): list.append.([]) .
Now initialization is slow, update is fast. For example, for 16777216 internal array and 0.213827311993 avarage number of elements on each array for 2^26-element array it takes 1.67728900909 sec. It is good but I will work much bigger datas, hence I need the best way. Initialization time is not important.
Thank you.
What you ask is quite of a problem. Different data structures have different properties. In general, if you need quick access, do not use lists! They have linear access time, which means, the more you put in them, the longer it will take in average to access an element.
You could perhaps use numpy? That library has matrices that can be accessed quite fast, and can be reshaped on the fly. However, if you want to add or delete rows, it will might be a bit slow because it generally reallocates (thus copies) the entire data. So it is a trade off.
If you are gonna have so many internal arrays of different sizes, perhaps you could have a dictionary that contains the internal arrays. I think if it is indexed by integers it will be much faster than a list. Then, the internal arrays could be created with numpy.
i'm using python + numpy + scipy to do some convolution filtering over a complex-number array.
field = np.zeros((field_size, field_size), dtype=complex)
...
field = scipy.signal.convolve(field, kernel, 'same')
So, when i want to use a complex array in numpy all i need to do is pass the dtype=complex parameter.
For my research i need to implement two other types of complex numbers: dual (i*i=0) and double (i*i=1). It's not a big deal - i just take the python source code for complex numbers and change the multiplication function.
The problem: how do i make a numpy array of those exotic numeric types?
It looks like you are trying to create a new dtype for e.g. dual numbers. It is possible to do this with the following code:
dual_type = np.dtype([("a", np.float), ("b", np.float)])
dual_array = np.zeros((10,), dtype=dual_type)
However this is just a way of storing the data type, and doesn't tell numpy anything about the special algebra which it obeys.
You can partially achieve the desired effect by subclassing numpy.ndarray and overriding the relevant member functions, such as __mul__ for multiply and so on. This should work fine for any python code, but I am fairly sure that any C or fortran-based routines (i.e. most of numpy and scipy) would multiply the numbers directly, rather than calling the __mul__. I suspect that convolve would fall into this basket, therefore it would not respect the rules which you define unless you wrote your own pure python version.
Here's my solution:
from iComplex import SplitComplex as c_split
...
ctype = c_split
constructor = np.vectorize(ctype, otypes=[np.object])
field = constructor(np.zeros((field_size, field_size)))
That is the easy way to create numpy object array.
What about scipy.signal.convolve - it doesn't seem to work with my complex numbers and i had to make my own convolution and it works deadly slow. So now i am looking for ways to speed it up.
Would it work to turn things inside-out? I mean instead of an array as the outer container holding small containers holding a couple floating point values as a complex number, turn that around so that your complex number is the outer container. You'd have two arrays, one of plain floats as the real part, and another array as the imaginary part. The basic super-fast convolver can do its job although you'd have to write code to use it four times, for all combinations of real/imaginary of the two factors.
In color image processing, I have often refactored my code from using arrays of RGB values to three arrays of scalar values, and found a good speed-up due to simpler convolutions and other operations working much faster on arrays of bytes or floats.
YMMV, since locality of the components of the complex (or color) can be important.