Multidimensional arrays transpose themselves without being asked - python

When I try to access multidimensional arrays in slightly different ways I get different results which I do not understand.
when I run:
ells=np.array([1,2,3,4])
check=np.zeros((2,2,2,len(ells)))
print(check[:,:,:,ells<=4].shape)
print(check[0,:,:,ells<=4].shape)
I can actually fix this problem by using
ells=np.array([1,2,3,4])
check=np.zeros((2,2,2,len(ells)))
print(check[:,:,:,ells<=4].shape)
print(check[0,:,:,:][:,:,ells<=4].shape)
however I would like to understand why the first version is wrong.
In the first case I expect to get arrays of shape (2,2,2,4) and (2,2,4) but I get (2,2,2,4) and (4,2,2).
In the second case I get the answers expected, (2,2,2,4) and (2,2,4).

This is an example of mixed advanced and basic indexing:
https://docs.scipy.org/doc/numpy-1.16.1/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The two outer indices select size 4, and the two inner slices are added after that:
check[0,:,:,ells<=4]
The reason given is that there's a potential ambiguity, when advanced indices are separated by slices. But the case for this ambiguity is weaker when one of the indices is a scalar (that's an old objection).
I'm sure someone could find a duplicate SO.

Related

What does indices!= index_to_remove mean?

I’m supposed to write a helper function that returns a list with an element removed by the value, in an unchanged order. In this case, I don't have to remove any values multiple times.
This is the picture
image of the code
And how do I understand the code here: new_indices= np.delete(indices,np.where(indices==index_to_remove))
Would highly appreciate it if there are examples to help me better understand the code.
indices!=index_to_remove evaluates to an array of booleans, and we are using that boolean array to mask indices. See the numpy docs here

how to test for row similarity (but not equivalence) on two different numpy string arrays

Ok, I have two different 2D numpy string arrays. One of the columns (the "token") should be tested where they need to be identical for any alphanumeric characters but not for other characters (because they might come from different encodings), and another column is only alphanumeric, so they can be tested for pure equality. Anytime they differ, a warning should be printed indicating the values of both columns in question.
I can do this easily iterating over the rows, like:
for row1, row2 in zip(array1, array2) :
if alpha_diff(row1[0], row2[0]) or row1[1] != row2[1] :
print....
but I was thinking there must be a more pythonic way of handling this that is more efficient, like creating a numpy ufunc or something.
Any ideas?
Jim,
I don't think that you'll find a built-in function to do what you want. Your best bet is to look at the different ways of iterating an arbitrary function over cells in numpy. The following stackoverflow shows the time differences between a few different approaches, and it looks like you won't get a big boost for a function like alphadiff that is not vector-based.
Most efficient way to map function over numpy array

Taking a mean of the elements of arrays with different sizes

Let's say I have a 5x5 array 'A'. I want to take the mean of five elements in that array. It is possible that one of these values is Nan. I thought something like this would work:
np.nanmean(np.array([A[1,1], A[2, 2:3], A[3, 1:3]]))
But it doesn't. I get
ValueError: setting an array element with a sequence.
I also tried concatenating, flattening and using a list instead of np.array, but without luck.
I'm sorry if this question is a duplicate. It seems like an easy problem, but I can't manage to figure it out and I find it hard to pick good search terms to find a solution online.
You can concatenate all the elements into an array before doing the mean:
np.nanmean(np.concatenate([[A[1,1]], A[2, 2:3], A[3, 1:3]]))
Note that I have placed A[1,1] inside an extra list. This is subtle, and the root of your troubles: Though e.g. A[2, 2:3] contains only a single number, it is still an array because it is constructed from a slice. On the other hand, A[1,1] is just a number, not living inside of an array object. Your error message is telling you that mixing this bare number with the other arrays leads to trouble.

How to make Numpy treat each row/tensor as a value

Many functions like in1d and setdiff1d are designed for 1-d array. One workaround to apply these methods on N-dimensional arrays is to make numpy to treat each row (something more high dimensional) as a value.
One approach I found to do so is in this answer Get intersecting rows across two 2D numpy arrays by Joe Kington.
The following code is taken from this answer. The task Joe Kington faced was to detect common rows in two arrays A and B while trying to use in1d.
import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)
I am hoping you to help me with any of the following three questions. First, I do not understand the mechanisms behind this method. Can you try to explain it to me?
Second, is there other ways to let numpy treat an subarray as one object?
One more open question: dose Joe's approach have any drawbacks? I mean whether treating rows as a value might cause some problems? Sorry this question is pretty broad.
Try to post what I have learned. The method Joe used is called structured arrays. It will allow users to define what is contained in a single cell/element.
We take a look at the description of the first example the documentation provided.
x = np.array([(1,2.,'Hello'), (2,3.,"World")], ...
dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
Here we have created a one-dimensional array of length 2. Each element
of this array is a structure that contains three items, a 32-bit
integer, a 32-bit float, and a string of length 10 or less.
Without passing in dtype, however, we will get a 2 by 3 matrix.
With this method, we would be able to let numpy treat a higher dimensional array as an single element with properly set dtype.
Another trick Joe showed is that we don't need to really form a new numpy array to achieve the purpose. We can use the view function (See ndarray.view) to change the way numpy view data. There is a section of Note section in ndarray.view that I think you should take a look before utilizing the method. I have no guarantee that there would not be side effects. The paragraph below is from the note section and seems to call for caution.
For a.view(some_dtype), if some_dtype has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance of a (shown by print(a)). It also depends on exactly how a is stored in memory. Therefore if a is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.
Other reference
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html

Python/Numpy: Divide array

I have some data represented in a 1300x1341 matrix. I would like to split this matrix in several pieces (e.g. 9) so that I can loop over and process them. The data needs to stay ordered in the sense that x[0,1] stays below (or above if you like) x[0,0] and besides x[1,1].
Just like if you had imaged the data, you could draw 2 vertical and 2 horizontal lines over the image to illustrate the 9 parts.
If I use numpys reshape (eg. matrix.reshape(9,260,745) or any other combination of 9,260,745) it doesn't yield the required structure since the above mentioned ordering is lost...
Did I misunderstand the reshape method or can it be done this way?
What other pythonic/numpy way is there to do this?
Sounds like you need to use numpy.split() which has its documentation here ... or perhaps its sibling numpy.array_split() here. They are for splitting an array into equal subsections without re-arranging the numbers like reshape does,
I haven't tested this but something like:
numpy.array_split(numpy.zeros((1300,1341)), 9)
should do the trick.
reshape, to quote its docs,
Gives a new shape to an array without
changing its data.
In other words, it does not move the array's data around at all -- it just affects the array's dimension. You, on the other hand, seem to require slicing; again quoting:
It is possible to slice and stride
arrays to extract arrays of the same
number of dimensions, but of different
sizes than the original. The slicing
and striding works exactly the same
way it does for lists and tuples
except that they can be applied to
multiple dimensions as well.
So for example thearray[0:260, 0:745] is the "upper leftmost part, thearray[260:520, 0:745] the upper left-of-center part, and so forth. You could have references to the various parts in a list (or dict with appropriate keys) to process them separately.

Categories