Numpy Vectorized Function Over Successive 2d Slices - python

I have a 3D numpy array. I would like to form a new 3d array by executing a function on successive 2d slices along an axis, and stacking the resulting slices together. Clearly there are many ways to do this; I'd like to do it in the most concise way possible. I'd think this would be possible with numpy.vectorize, but this seems to produce a function that iterates over every value in my array, rather than 2D slices taken by moving along the first axis.
Basically, I want code that looks something like this:
new3dmat = np.vectorize(func2dmat)(my3dmat)
And accomplishes the same thing as this:
new3dmat = np.empty_like(my3dmat)
for i in range(my3dmat.shape[0]):
new3dmat[i] = func2dmat(my3dmat[i])
How can I accomplish this?

I am afraid something like below is the best compromise between conciseness and performance. apply_along_axis does not take multiple axes, unfortunately.
new3dmat = np.array([func2dmat(slice) for slice in my3dmat])
It isn't ideal in terms of extra allocations and so on, but unless .shape[0] is big relative to .size, the extra overhead should be minimal.

Related

numpy: why does np.append() flatten my array?

I'm trying to
replicate this function in numpy
but for some reason it keeps doing this
or flattening the array and generally not behaving like i'd expect
or returning an error
The docstring is very clear. It explains at least three times that:
If axis is None, out is a flattened array.
This is the only reasonable thing to do. If the inputs are multidimensional, but you don't specify which axis to operate on, how can the code determine the "right" axis? For example, what if the input is a square, 2D array? In that case, both axes are equally valid.
There are too many ways for code that tries to be smart about appending to fail, or worse, to succeed but with the wrong results. Instead, the authors decided that flattening is a reasonable default choice, and made that choice explicit in the documentation.
Also note that there is no way to replicate the behavior at the top of your post in NumPy. By definition, ndarrays are rectangular, but the list you have here is "ragged". You cannot have an ndarray where each row or column has different size.

Are there dynamic arrays in numpy?

Let's say I create 2 numpy arrays, one of which is an empty array and one which is of size 1000x1000 made up of zeros:
import numpy as np;
A1 = np.array([])
A2 = np.zeros([1000,1000])
When I want to change a value in A2, this seems to work fine:
A2[n,m] = 17
The above code would change the value of position [n][m] in A2 to 17.
When I try the above with A1 I get this error:
A1[n,m] = 17
IndexError: index n is out of bounds for axis 0 with size 0
I know why this happens, because there is no defined position [n,m] in A1 and that makes sense, but my question is as follows:
Is there a way to define a dynamic array without that updates the array with new rows and columns if A[n,m] = somevalue is entered when n or m or both are greater than the bound of an Array A?
It doesn't have to be in numpy, any library or method that can update array size would be awesome. If it is a method, I can imagine there being an if loop that checks if [n][m] is out of bounds and does something about it.
I am coming from a MATLAB background where it's easy to do this. I tried to find something about this in the documentation in numpy.array but I've been unsuccessful.
EDIT:
I want to know if some way to create a dynamic list is possible at all in Python, not just in the numpy library. It appears from this question that it doesn't work with numpy Creating a dynamic array using numpy in python.
This can't be done in numpy, and it technically can't be done in MATLAB either. What MATLAB is doing behind-the-scenes is creating an entire new matrix, then copying all the data to the new matrix, then deleting the old matrix. It is not dynamically resizing, that isn't actually possible because of how arrays/matrices work. This is extremely slow, especially for large arrays, which is why MATLAB nowadays warns you not to do it.
Numpy, like MATLAB, cannot resize arrays (actually, unlike MATLAB it technically can, but only if you are lucky so I would advise against trying). But in order to avoid the sort of confusion and slow code this causes in MATLAB, numpy requires that you explicitly make the new array (using np.zeros) then copy the data over.
Python, unlike MATLAB, actually does have a truly resizable data structure: the list. Lists still require there to be enough elements, since this avoids silent indexing errors that are hard to catch in MATLAB, but you can resize an array with very good performance. You can make an effectively n-dimensional list by using nested lists of lists. Then, once the list is done, you can convert it to a numpy array.

How to apply operations with conditionals, like if, to a large numpy array efficiently in python?

Good afternoon everybody, I was putting raw data into numpy arrays, then I wanted to perform operations, as logarithm base 10, with "if"s to those arrays, nevertheless, those numpy arrays are too big and consequently they take a lot of time to complete them.
enter image description here
x = [ 20*math.log10(i) if i>0 and 20*math.log10(i)>=-60 else (-(120+20*math.log10(abs(i))) if i<0 and 20*math.log10(abs(i))>=-60 else -60) for i in a3 ]
In the piece of code before, I use one of the channels array throwed out from the raw audio data, "a3", and I made another array, "x", that will contain an array to plot from -120 to 0, in the y edge. Futhermore, as you could note, I needed to separate positive original elements from numpy array than negative original elements from numpy array, and also 0s, being -60 the after operations 0. Having this final plot:
enter image description here
The problem with this code, is that, as I said before, it takes approximately 10 seconds to finish the computing, and this is only for 1 channel, and I need to compute 8 channels, so I need to wait approximately 80 seconds.
I wanted to know if there is a faster way to perform this, in addition, I found out a way to apply numpy.log10 to the whole numpy array, and it compute in less than two seconds:
x = 20*numpy.log10(abs(a3))
But I did not find anything related to manipulate the preferences of that operation, numpy.log10, with ifs, conditionals, or something like that. I really need to identify the negative and positive original values, and also the 0s, and obviously transform the 0 to -60, making the -60 the minimum limit, and the reference point, as the code that I showed you before.
Note: I already tried to do it with loops, like "for" and "while", but it takes way more time than the actual method, like 14 second each one.
Thank you for your responses!!
In general, when posting questions, its best practice to include a small working example. I know you included a picture of your data, but that is hard for others to use, so it would have been better to just give us a small array of data. This is important, because the solution often depends on the data. For example, all your data is (i think) between -1 and 1 so that log is always negative. If this isn't the case, then your solution might not work.
There is no need to check if i>0 and then apply abs if i is negative. This is exactly what applying abs does in the first place.
As you noticed, we can also use numpy vectorization to avoid the list comprehension. It is usually faster to do something like np.sin(X) than [ np.sin(x) for x in X].
Finally, if you do something like X>0 in numpy, it returns a boolean array saying if each element is >0.
Note that another way to have written your list comprehension would be first take 20*math.log10(abs(i)) and replace all values <-60 with -60 and then anywhere where i<0, flip the data about -60`. We can do this in the vectorized operation.
-120*(a3<0)+np.sign(a3)*np.maximum(20*np.log10(np.abs(a3)),-60)
This can probably be optimized a bit since a3<0 and np.sign(a3) are doing similar things. That said, I'm pretty sure this is faster than list comprehensions.

Questions regarding numpy in Python

I wrote a program using normal Python, and I now think it would be a lot better to use numpy instead of standard lists. The problem is there are a number of things where I'm confused how to use numpy, or whether I can use it at all.
In general how do np.arrays work? Are they dynamic in size like a C++ vector or do I have declare their length and type beforehand like a standard C++ array? In my program I've got a lot of cases where I create a list
ex_list = [] and then cycle through something and append to it ex_list.append(some_lst). Can I do something like with a numpy array? What if I knew the size of ex_list, could I declare and empty one and then add to it?
If I can't, let's say I only call this list, would it be worth it to convert it to numpy afterwards, i.e. is calling a numpy list faster?
Can I do more complicated operations for each element using a numpy array (not just adding 5 to each etc), example below.
full_pallete = [(int(1+i*(255/127.5)),0,0) for i in range(0,128)]
full_pallete += [col for col in right_palette if col[1]!=0 or col[2]!=0 or col==(0,0,0)]
In other words, does it make sense to convert to a numpy array and then cycle through it using something other than for loop?
Numpy arrays can be appended to (see http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html), although in general calling the append function many times in a loop has a heavy performance cost - it is generally better to pre-allocate a large array and then fill it as necessary. This is because the arrays themselves do have fixed size under the hood, but this is hidden from you in python.
Yes, Numpy is well designed for many operations similar to these. In general, however, you don't want to be looping through numpy arrays (or arrays in general in python) if they are very large. By using inbuilt numpy functions, you basically make use of all sorts of compiled speed up benefits. As an example, rather than looping through and checking each element for a condition, you would use numpy.where().
The real reason to use numpy is to benefit from pre-compiled mathematical functions and data processing utilities on large arrays - both those in the core numpy library as well as many other packages that use them.

speeding up numpy.dot inside list comprehension

I have a numpy script that is currently running quite slowly.
spends the vast majority of it's time performing the following operation inside a loop:
terms=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex,Ey,Ez_av)
res=[np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms]
res=array(res)
Ex[1:Nx-1]=res[1:Nx-1,0]
Ey[1:Nx-1]=res[1:Nx-1,1]
It's the list comprehension that is really slowing this code down.
In this case, Coeff_3, and Coeff_2 are length 1000 lists whose elements are 3x3 numpy matricies, and Ex,Ey,Ez, Curl_x, etc are all length 1000 numpy arrays.
I realize it might be faster if i did things like setting a single 3x1000 E vector, but i have to perform a significant amount of averaging of different E vectors between step, which would make things very unwieldy.
Curiously however, i perform this operation twice per loop (once for Ex,Ey, once for Ez), and performing the same operation for the Ez's takes almost twice as long:
terms2=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex_av,Ey_av,Ez)
res2=array([np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms2])
Anyone have any idea what's happening? Forgive me if it's anything obvious, i'm very new to python.
As pointed out in previous comments, use array operations. np.hstack(), np.vstack(), np.outer() and np.inner() are useful here. You're code could become something like this (not sure about your dimensions):
Cxyz = np.vstack((Curl_x,Curl_y,Curl_z))
C2xyz = np.dot(C2, Cxyz)
...
Check the shape of your resulting dimensions, to make sure you translated your problem right. Sometimes numexpr can also to speed up such tasks significantly with little extra effort,

Categories