I'm trying to
replicate this function in numpy
but for some reason it keeps doing this
or flattening the array and generally not behaving like i'd expect
or returning an error
The docstring is very clear. It explains at least three times that:
If axis is None, out is a flattened array.
This is the only reasonable thing to do. If the inputs are multidimensional, but you don't specify which axis to operate on, how can the code determine the "right" axis? For example, what if the input is a square, 2D array? In that case, both axes are equally valid.
There are too many ways for code that tries to be smart about appending to fail, or worse, to succeed but with the wrong results. Instead, the authors decided that flattening is a reasonable default choice, and made that choice explicit in the documentation.
Also note that there is no way to replicate the behavior at the top of your post in NumPy. By definition, ndarrays are rectangular, but the list you have here is "ragged". You cannot have an ndarray where each row or column has different size.
Related
I came across a problem when trying to do the resize as follow:
I am having a 2D array after processing data and now I want to resize the data, with each row ignoring the first 5 items.
What I am doing right now is to:
edit: this apporach works fine as long as you make sure you are working with a list, but not string. it failed to work on my side because I haven't done the convertion from string to list properly.
so it ends up eliminating the first five characters in the entire string.
2dArray=[[array1],[array2]]
new_Resize_2dArray= [array[5:] for array in 2dArray]
However, it does not seems to work as it just recopy all the element over to the new_Resize_2dArray.
I would like to ask for help to see what did I do wrong or if there is any scientific calculation library I could use to acheive this.
First, because python list indexing is zero-based, your code should read new_Resize_2dArray= [array[5:] for array in 2dArray] if you want to not include the first 5 columns. Otherwise, I see no issue with your single line of code.
As for scientific computing libraries, numpy is a highly prevalent 3rd party package with a high-performance multidimensional array type ndarray. Using ndarrays, your code could be shortened to new_Resize_2dArray = 2dArray[:,5:]
Aside: It would help to include a bit more of your code or a minimum example where you are getting the unexpected result (e.g., use a fake/stand-in 2d array to see if it works as expected or still fails.
I'm working with some image data of shape (x,y,3). Is a way to use a numpy construct like nditer to iterate over the (r,g,b) tuples corresponding to pixels? Out of the box, nditer iterates over the scalar values, but many numpy functions have something like a axis= argument to change behavoirs like this.
Check np.ndindex. It's doc may be enough to get you started. Better yet, read its code.
I have answered similar questions by essentially reverse enginerring ndindex. If there isn't a link in the Related sidebar, I'll do a quick search.
shallow iteration with nditer
Short version
Given a built-in quaternion data type, how can I view a numpy array of quaternions as a numpy array of floats with an extra dimension of size 4 (without copying memory)?
Long version
Numpy has built-in support for floats and complex floats. I need to use quaternions -- which generalize complex numbers, but rather than having two components, they have four. There's already a very nice package that uses the C API to incorporate quaternions directly into numpy, which seems to do all the operations perfectly fast. There are a few more quaternion functions that I need to add to it, but I think I can mostly handle those.
However, I would also like to be able to use these quaternions in other functions that I need to write using the awesome numba package. Unfortunately, numba cannot currently deal with custom types. But I don't need the fancy quaternion functions in those numba-ed functions; I just need the numbers themselves. So I'd like to be able to just re-cast an array of quaternions as an array of floats with one extra dimension (of size 4). In particular, I'd like to just use the data that's already in the array without copying, and view it as a new array. I've found the PyArray_View function, but I don't know how to implement it.
(I'm pretty confident the data are held contiguously in memory, which I assume would be required for having a simple view of them. Specifically, elsize = 8*4 and alignment = 8 in the quaternion package.)
Turns out that was pretty easy. The magic of numpy means it's already possible. While thinking about this, I just tried the following with complex numbers:
import numpy as np
a = np.array([1+2j, 3+4j, 5+6j])
a.view(np.float).reshape(a.shape[0],2)
And this gave exactly what I was looking for. Somehow the same basic idea works with the quaternion type. I guess the internals just rely on that elsize, divide by sizeof(float) and use that to set the new size in the last dimension???
To answer my own question then, the same idea can be applied to the quaternion module:
import numpy as np, quaternions
a = np.array([np.quaternion(1,2,3,4), np.quaternion(5,6,7,8), np.quaternion(9,0,1,2)])
a.view(np.float).reshape(a.shape[0],4)
The view transformation and reshaping combined seem to take about 1 microsecond on my laptop, independent of the size of the input array (presumably because there's no memory copying, other than a few members in some basic python object).
The above is valid for simple 1-d arrays of quaternions. To apply it to general shapes, I just write a function inside the quaternion namespace:
def as_float_array(a):
"View the quaternion array as an array of floats with one extra dimension of size 4"
return a.view(np.float).reshape(a.shape+(4,))
Different shapes don't seem to slow the function down significantly.
Also, it's easy to convert back to from a float array to a quaternion array:
def as_quat_array(a):
"View a float array as an array of floats with one extra dimension of size 4"
if(a.shape[-1]==4) :
return a.view(np.quaternion).reshape(a.shape[:-1])
return a.view(np.quaternion).reshape(a.shape[:-1]+(a.shape[-1]//4,))
I have a 3D numpy array. I would like to form a new 3d array by executing a function on successive 2d slices along an axis, and stacking the resulting slices together. Clearly there are many ways to do this; I'd like to do it in the most concise way possible. I'd think this would be possible with numpy.vectorize, but this seems to produce a function that iterates over every value in my array, rather than 2D slices taken by moving along the first axis.
Basically, I want code that looks something like this:
new3dmat = np.vectorize(func2dmat)(my3dmat)
And accomplishes the same thing as this:
new3dmat = np.empty_like(my3dmat)
for i in range(my3dmat.shape[0]):
new3dmat[i] = func2dmat(my3dmat[i])
How can I accomplish this?
I am afraid something like below is the best compromise between conciseness and performance. apply_along_axis does not take multiple axes, unfortunately.
new3dmat = np.array([func2dmat(slice) for slice in my3dmat])
It isn't ideal in terms of extra allocations and so on, but unless .shape[0] is big relative to .size, the extra overhead should be minimal.
I'm using scipy to do some image processing job, and I found something quite confusing, that is some functions, say scipy.signal.convolve, scipy.ndimage.filters.convolve, have the same name and functionality, but they belong the different modules of scipy, so I kinda wonder why not just implement them once ?
They do slightly different things, mostly related with how they handle the convolution when the two arrays being convolved don't fully overlap.
scipy.ndimage.filters.convolve always returns an array of the same size as its first parameter. To handle areas near the boundaries, where the second array may not fully overlap with the first, it makes up for those values using one of these options: reflect, constant, nearest, mirror or wrap.
scipy.signal.convolve always pads the arrays with zeros as needed, and gives a return with three options, full, valid or same, which determine the size of the return array, depending on whether values calculated relying on the zero-padding are to be kept or discarded.