NumPy: is assignment of a scalar to a slice broadcasting? - python

I know in Python,
[1,2,3][0:2]=7
doesn't work because the right side must be an iterable.
However, the same thing works for NumPy ndarrays:
a=np.array([1,2,3])
a[0:2]=9
a
Is this the same mechanism as broadcasting? On https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html, it is said broadcasting is only for arithmetic operations.

Yes, assignment follows the same rules of broadcasting because you can also assign an array to another array's items. This however requires that the second array's shape to be broadcastable to destination slice/array shape.
This is also mentioned in Assigning values to indexed arrays documentation:
As mentioned, one can select a subset of an array to assign to using a single index, slices, and index and mask arrays. The value being assigned to the indexed array must be shape consistent (the same shape or broadcastable to the shape the index produces).

Related

Identity index for NumPy array

Suppose I have a flat NumPy array a and want to define an index array i to index a with and thus obtain a again by a[i].
I tried
import numpy as np
a = np.array([1,2]).reshape(-1)
i = True
But this does not preserve shape: a[i] has shape (1, 2) while a has shape (2,).
I know I could reshape a[i] or use i = np.full_like(a, True, dtype=bool). I want neither: The reshape is unnecessary if i is per some conditional definition sometimes not plain True but a boolean array matching the shape of a. The second approach means I need different is for doing this on arrays of different shapes.
So... is there something build-in in NumPy to just get the array back when used as index?
Numpy can not preserve the shape of a boolean masked result because it may be ragged. When you pass in a single boolean scalar, things get special-case weird.
You must therefore use a fancy index. With a fancy index, the shape of the result is exactly the shape of the index. For a 1-D array the following is fine:
i = np.arange(a.size)
For more dimensions, you'll want to create a full indexing tuple, using np.indices for example. The elements of the tuple can broadcast to the final desired shape, so you can use sparse=True:
i = np.indices(a.shape, sparse=True)
If you want i to be a numpy array, you can set sparse=False, in which case i will be of shape (a.ndim, *a.shape).
If you want to cheat, you can use slices. slice(None) is the object representing the literal index ::
i = (slice(None),) * a.ndim
Or just index the first dimension only, which returns the entire array:
i = slice(None)
Or if you're feeling really lazy, use Ellipsis directly. This is the object that stands for the literal ..., meaning :, :, etc, as many times as necessary:
i = Ellipsis
Going back to the boolean mask option, you can use it for the same effect if you create a separate array for each dimension:
i = tuple(np.ones(k, dtype=bool) for k in a.shape)
You could save some memory by only allocating the largest shape and creating views:
s = np.ones(max(a.shape), dtype=bool)
i = tuple(s[:k] for k in a.shape)

Is there a faster method to use a 2d numpy array of booleans to select elements from a 2d array, but with a 2d output?

If I have an array like this
arr=np.array([['a','b','c'],
['d','e','f']])
and an array of booleans of the same shape, like this:
boolarr=np.array([[False,True,False],
[True,True,True]])
I want to be able to only select the elements from the first array, that correspond to a True in the boolean array. So the output would be:
out=[['b'],
['d','e','f']]
I managed to solve this with a simple for loop
out=[]
for n, i in enumerate(arr):
out.append(i[boolarr[n]])
out=np.array(out)
but the problem is this solution is slow for large arrays, and was wondering if there was an easier solution with numpys indexing. Just using the normal notation arr[boolarr] returns a single flat array ['b','d','e','f']. I also tried using a slice with arr[:,[True,False,True]], which keeps the shape but can only use one boolean array.
Thanks for the comments. I misunderstood how an array works. For those curious this is my solution (I'm actually working with numbers):
arr[boolarr]=np.nan
And then I just changed how the rest of the function handles nan values

numpy.argmax() on multi-dim data does not return an ndarray

The documentation of numpy.argmax states that it returns the index position of the maximum value found in the array and it does so by returning an ndarray of int with the same shape as the input array:
numpy.argmax(a, axis=None, out=None)[source]
Returns the indices of the maximum values along an axis.
Parameters:
a : array_like
Input array.
axis : int, optional
By default, the index is into the flattened array, otherwise along the specified axis.
out : array, optional
If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
Returns:
index_array : ndarray of ints
Array of indices into the array. It has the same shape as a.shape with the dimension along axis removed.
Why then does the command return for a 2-dim array a single int?
a = np.arange(6).reshape(2,3)
print np.argmax(a)
5
It seems to me the reshape acts as a new view and the argmax command still uses the underlying 1-dim array. I tried copying the array into an initialized 2-dim array with the same end result.
This is Python 2.7.12 but since they have it in the documentation I believe this is the expected behavior. Am I missing anything? How can I get the ndarray returned?
From the documentation you quoted:
By default, the index is into the flattened array
It's providing the integer such that a.flat[np.argmax(a)] == np.max(a). If you want to convert this into an index tuple, you can use the np.unravel_index function linked in the "See also" section.
The part about "the same shape as a.shape with the dimension along axis removed" applies if you actually provide an axis argument. Also, note the "with the dimension along axis removed" part; even if you specify axis, it won't return an array of the same shape as the input like you were expecting.

numpy array concatenation error: 0-d arrays can't be concatenated

I am trying to concatenate two numpy arrays, but I got this error. Could some one give me a bit clue about what this actually means?
Import numpy as np
allValues = np.arange(-1, 1, 0.5)
tmp = np.concatenate(allValues, np.array([30], float))
Then I got
ValueError: 0-d arrays can't be concatenated
If I do
tmp = np.concatenate(allValues, np.array([50], float))
There is no error message but tmp variable does not reflect the concatenation either.
You need to put the arrays you want to concatenate into a sequence (usually a tuple or list) in the argument.
tmp = np.concatenate((allValues, np.array([30], float)))
tmp = np.concatenate([allValues, np.array([30], float)])
Check the documentation for np.concatenate. Note that the first argument is a sequence (e.g. list, tuple) of arrays. It does not take them as separate arguments.
As far as I know, this API is shared by all of numpy's concatenation functions: concatenate, hstack, vstack, dstack, and column_stack all take a single main argument that should be some sequence of arrays.
The reason you are getting that particular error is that arrays are sequences as well. But this means that concatenate is interpreting allValues as a sequence of arrays to concatenate. However, each element of allValues is a float rather than an array, and is therefore being interpreted as a zero-dimensional array. As the error says, these "arrays" cannot be concatenated.
The second argument is taken as the second (optional) argument of concatenate, which is the axis to concatenate on. This only works because there is a single element in the second argument, which can be cast as an integer and therefore is a valid value. If you had put an array with more elements in the second argument, you would have gotten a different error:
a = np.array([1, 2])
b = np.array([3, 4])
np.concatenate(a, b)
# TypeError: only length-1 arrays can be converted to Python scalars
Also make sure you are concatenating two numpy arrays. I was concatenating one python array with a numpy array and it was giving me the same error:
ValueError: 0-d arrays can't be concatenated
It took me some time to figure this out since all the answers in stackoverflow were assuming that you had two numpy arrays.
Pretty silly but easily overlooked mistake. Hence posting just in case this helps someone.
Here are the links to converting an existing python array using np.asarray
or
create np arrays, if it helps.
Another way to get this error is to have two numpy objects of different... types?
I get this error when I try np.concatenate([A,B])
and ValueError: all the input arrays must have same number of dimensions when I run np.concatenate([B,A])
Just as #mithunpaul mentioned, my types are off: A is an array of 44279x204 and B is a <44279x12 sparse matrix of type '<class 'numpy.float64'>' with 88558 stored elements in Compressed Sparse Row format>)
So that's why the error is happening. Don't know how to solve it yet though.

numpy arrays dimension mismatch

I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array.
np.concatenate((tmp, id, freqs))
Here are the exact values:
tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"
The dimensions of tmp, 17232, and freqs are as follows:
[in] tmp.shape
[out] (4,)
[in] np.array(17232).shape
[out] ()
[in] freqs.shape
[out] (1,)
I have also tried casting them all as numpy arrays to no avail.
Although the variable freqs will frequently have more than one value.
However, with both the np.concatenate and np.append functions I get the following error:
*** ValueError: all the input arrays must have same number of dimensions
These all have the same number of columns (0), why can't I concatenate them with either of the above described numpy methods?
All I'm looking to obtain is[(tmp), 17232, (freqs)] in one single dimensional array, which is to be appended onto the end of a pandas dataframe.
Thanks.
Update
It appears I can concatenate the two existing arrays:
np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)
However, the integer, even when casted cannot be used in concatenate.
np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions
What does work, however is nesting append and concatenate
np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
0.022831050228310501], dtype=object)
Although this is kind of messy. Does anyone have a better solution for concatenating a number of heterogeneous arrays?
The problem is that id, and later the integer np.array(17571), are not an array_like object. See here how numpy decides whether an object can be converted automatically to a numpy array or not.
The solution is to make id array_like, i.e. to be an element of a list or tuple, so that numpy understands that id belongs to a 1D array_like structure
It all boils down to
concatenate((tmp, (id,), freqs))
or
concatenate((tmp, [id], freqs))
To avoid this sort of problems when dealing with input variables in functions using numpy, you can use atleast_1d, as pointed out by #askewchan. See about it this question/answer.
Basically, if you are unsure if in different scenarios your variable id will be a single str or a list of str, you are better off using
concatenate((tmp, atleast_1d(id), freqs))
because the two options above will fail if id is already a list/tuple of strings.
EDIT: It may not be obvious why np.array(17571) is not an array_like object. This happens because np.array(17571).shape==(), so it is not iterable as it has no dimensions.

Categories