How do I do an einsum that mimics 'keepdims'? - python

a python question: I've got a np.einsum operation that I'm doing on a pair of 3d arrays:
return np.einsum('ijk, ijk -> ik', input_array, self._beta_array)
Problem I'm having is the result is 2d; the operation collapses the 'j' dimension. What I'd love to do is to have it retain the 'j' dimension similar to how 'keepdims' works in the np.sum function.
I can wrap the result in np.expand_dims, but that seems inefficient to me. I'd prefer to find some way to tweak the einsum to output what I'm after.
Is this posible?

I can wrap the result in np.expand_dims, but that seems inefficient to me
Adding a dimension in numpy is at worst O(ndim), so basically free. Crucially, the actually data is not touched - all that happens is that the .strides and .shape tuples get an extra element each
There is no way right now to use einsum to directly get what you want.
You could try to make a pull-request against numpy to support something like ijk, ijk -> i1k, if you really think it improves readability

Related

Slicing a 2D numpy array using vectors for start-stop indices

First post here, so please go easy on me. :)
I want to vectorize the following:
rowStart=array of length N
rowStop=rowStart+4
colStart=array of length N
colStop=colStart+4
x=np.random.rand(512,512) #dummy test array
output=np.zeros([N,4,4])
for i in range(N):
output[i,:,:]=x[ rowStart[i]:rowStop[i], colStart[i]:colStop[i] ]
What I'd like to be able to do is something like:
output=x[rowStart:rowStop, colStart:colStop ]
where numpy recognizes that the slicing indices are vectors and broadcasts the slicing. I understand that this probably doesn't work because while I know that my slice output is always the same size, numpy doesn't.
I've looked at various approaches, including "fancy" or "advanced" indexing (which seems to work for indexing, not slicing), massive boolean indexing using meshgrids (not practical from a memory standpoint, as my N can get to 50k-100k), and np.take, which just seems to be another way of doing fancy/advanced indexing.
I could see how I could potentially use fancy/advanced indexing if I could get an array that looks like:
[np.arange(rowStart[0],rowStop[0]),
np.arange(rowStart[1],rowStop[1]),
...,
np.arange(rowStart[N],rowStop[N])]
and a similar one for columns, but I'm also having trouble figuring out a vectorized approach for creating that.
I'd appreciate any advice you can provide.
Thanks!
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows and hence solve our case here. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
BSZ = (4, 4) # block size
w = view_as_windows(x, BSZ)
out = w[rowStart, colStart]

How to make Numpy treat each row/tensor as a value

Many functions like in1d and setdiff1d are designed for 1-d array. One workaround to apply these methods on N-dimensional arrays is to make numpy to treat each row (something more high dimensional) as a value.
One approach I found to do so is in this answer Get intersecting rows across two 2D numpy arrays by Joe Kington.
The following code is taken from this answer. The task Joe Kington faced was to detect common rows in two arrays A and B while trying to use in1d.
import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)
I am hoping you to help me with any of the following three questions. First, I do not understand the mechanisms behind this method. Can you try to explain it to me?
Second, is there other ways to let numpy treat an subarray as one object?
One more open question: dose Joe's approach have any drawbacks? I mean whether treating rows as a value might cause some problems? Sorry this question is pretty broad.
Try to post what I have learned. The method Joe used is called structured arrays. It will allow users to define what is contained in a single cell/element.
We take a look at the description of the first example the documentation provided.
x = np.array([(1,2.,'Hello'), (2,3.,"World")], ...
dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
Here we have created a one-dimensional array of length 2. Each element
of this array is a structure that contains three items, a 32-bit
integer, a 32-bit float, and a string of length 10 or less.
Without passing in dtype, however, we will get a 2 by 3 matrix.
With this method, we would be able to let numpy treat a higher dimensional array as an single element with properly set dtype.
Another trick Joe showed is that we don't need to really form a new numpy array to achieve the purpose. We can use the view function (See ndarray.view) to change the way numpy view data. There is a section of Note section in ndarray.view that I think you should take a look before utilizing the method. I have no guarantee that there would not be side effects. The paragraph below is from the note section and seems to call for caution.
For a.view(some_dtype), if some_dtype has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance of a (shown by print(a)). It also depends on exactly how a is stored in memory. Therefore if a is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.
Other reference
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html

Efficiently unwrap in multiple dimensions with numpy

Let's assume I have an array of phases (from complex numbers)
A = np.angle(np.random.uniform(-1,1,[10,10,10]) + 1j*np.random.uniform(-1,1,[10,10,10]))
I would now like to unwrap this array in ALL dimensions. In the above 3D case I would do
A_unwrapped = np.unwrap(np.unwrap(np.unwrap(A,axis=0), axis=1),axis=2)
While this is still feasible in the 3D case, in case of higher dimensionality, this approach seems a little cumbersome to me. Is there a more efficient way to do this with numpy?
You could use np.apply_over_axes, which is supposed to apply a function over each dimension of an array in turn:
np.apply_over_axes(np.unwrap, A, np.arange(len(A.shape)))
I believe this should do it.
I'm not sure if there is a way to bypass performing the unwrap operation along each axis. Obviously if it acted on individual elements you could use vectorization, but that doesn't seem to be an option here. What you can do that will at least make the code cleaner is create a loop over the dimensions:
for dim in range(len(A.shape)):
A = np.unwrap(A, axis=dim)
You could also repeatedly apply a function that takes the dimension on which to operate as a parameter:
reduce(lambda A, axis: np.unwrap(A, axis=axis), range(len(A.shape)), A)
Remember that in Python 3 reduce needs to be imported from functools.

Why does numpy documentation recommend to prefer concatente over hstack?

Why does numpy documentation recommend to prefer concatente over hstack?
but you should prefer np.concatenate or np.stack.
According to this answer hstack is a wrapper around concatenate. In that case why not use hstack which improves the readability of the code?
So the actual code in hstack is:
arrs = [atleast_1d(_m) for _m in tup]
# As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
if arrs[0].ndim == 1:
return _nx.concatenate(arrs, 0)
else:
return _nx.concatenate(arrs, 1)
It first loops through the arguments and makes sure that each is at least 1d. This takes care of the 0d and scalar elements, such as in np.hstack([0,1,np.arange(3)]).
The rest chooses between concatenating on the one and only axis or the 2nd one.
vstack is similar, except it makes things atleast 2d, and concatenates on the 1st.
Judging from SO questions/answers these are still being used quite a bit, and I think in most cases they don't cause problems. It's np.append that creates most problems. That's the one I wish they'd never added.
I think the main problem with hstack and vstack is that they encourage (or at least allow) lazy thinking about dimensions and shapes. When questions arise it's because the poster doesn't understand what is means to have the same number of dimensions, or that shapes must be equal (except for one axis).

Minimizing an array and value in Python

I have a vector of floats (coming from an operation on an array) and a float value (which is actually an element of the array, but that's unimportant), and I need to find the smallest float out of them all.
I'd love to be able to find the minimum between them in one line in a 'Pythony' way.
MinVec = N[i,:] + N[:,j]
Answer = min(min(MinVec),N[i,j])
Clearly I'm performing two minimisation calls, and I'd love to be able to replace this with one call. Perhaps I could eliminate the vector MinVec as well.
As an aside, this is for a short program in Dynamic Programming.
TIA.
EDIT: My apologies, I didn't specify I was using numpy. The variable N is an array.
You can append the value, then minimize. I'm not sure what the relative time considerations of the two approaches are, though - I wouldn't necessarily assume this is faster:
Answer = min(np.append(MinVec, N[i, j]))
This is the same thing as the answer above but without using numpy.
Answer = min(MinVec.append(N[i, j]))

Categories