What is the exact meaning of multi-dimensional array for numpy? - python

Can someone tell me why a works while b does not with ValueError: setting an array element with a sequence? This says the "multi-dimensional" reason, but in my case, I think a and b are the same.
import numpy as np
a=np.array([[1],2,3])
b=np.array([1,2,[3]])

Numpy is observing the first element to see what dtype the array is going to have. For a it sees a list and therefore produces an object array. It happily moves on to fill in the rest of the elements into the object array. For b, it sees a numeric value and assumes it's going to be some numeric dtype. Then it borks when it gets to a list.
You can override this by stating object dtype in the first place
a=np.array([[1],2,3])
b=np.array([1,2,[3]], 'object')
print(a, b, sep='\n\n')
[list([1]) 2 3]
[1 2 list([3])]
Mind you, that may not be exactly how Numpy is identifying dtype but it's got to be pretty close.

Related

Reinterpret data in numpy ndarray

I have a numpy array with dtype=uint8 and shape=(N,4) and I want to reinterpret the 4 bytes along the axis=1 efficiently as dtype=int32 and get a resulting shape=(N,) but nothing I've tried works. The equivalent in c would be brutally casting the pointer of the array.
The initial array is created like this from a pandas dataframe:
tmp=df[['data_1','data_2','data_3','data_4']].values.astype('uint8')
But then this works but it's not vectorized:
tmp1=np.empty((tmp.shape[0],),dtype=np.int32)
for i in range(tmp.shape[0]):
tmp2=tmp[i].copy()
tmp1[i]=tmp2.view('<i4')
And this, which I understand as the efficient way to do it, doesn't:
tmp1=tmp.view('<i4')
Giving the error:
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
But the size should be correct as far as I understand.
edit: added the reinterpeted explanation
Assuming you actually want the output shape to be (N*4,) (not (N,) as you wrote initially), you can just flatten it and then cast it to your desired type:
tmp1 = tmp.flatten().astype('int32', copy=False)
EDIT:
If you actually want the same underlying data to be interpreted as a different type and get a (N,) array out, the view method is in fact the way to go. This for example works for me:
import numpy as np
N = 5
a = np.arange(N*4, dtype='uint8').reshape((N,4))
a.view('int32')[:,0]
That view is then array([ 50462976, 117835012, 185207048, 252579084, 319951120], dtype=int32).

What does `numpy.array(1, dtype=np.int32)` do? [duplicate]

I have a numpy array something like this
a = np.array(1)
Now if I want to get 1 back from this array. how do i retreive this??
I have tried
a[0], a(0)..
like
IndexError: 0-d arrays can't be indexed
or
TypeError: 'numpy.ndarray' object is not callable
I even tried to do some weird flattening and stuff but I am pretty sure that it shouldnt be that complicated..
And i am getting errors in both.. all i want is that 1 as an int?
Thanks
What you create with
a = np.array(1)
is a zero-dimensional array, and these cannot be indexed. You also don't need to index it -- you can use a directly as if it were a scalar value. If you really need the value in a different type, say float, you can explicitly convert it with float(a). If you need it in the base type of the array, you can use a.item() or a[()].
Note that the zero-dimensional array is mutable. If you change the value of the single entry in the array, this will be visible via all references to the array you stored. Use a.item() if you want to store an immutable value.
If you want a one-dimensional array with a single element instead, use
a = np.array([1])
You can access the single element with a[0] now.

numpy array concatenation error: 0-d arrays can't be concatenated

I am trying to concatenate two numpy arrays, but I got this error. Could some one give me a bit clue about what this actually means?
Import numpy as np
allValues = np.arange(-1, 1, 0.5)
tmp = np.concatenate(allValues, np.array([30], float))
Then I got
ValueError: 0-d arrays can't be concatenated
If I do
tmp = np.concatenate(allValues, np.array([50], float))
There is no error message but tmp variable does not reflect the concatenation either.
You need to put the arrays you want to concatenate into a sequence (usually a tuple or list) in the argument.
tmp = np.concatenate((allValues, np.array([30], float)))
tmp = np.concatenate([allValues, np.array([30], float)])
Check the documentation for np.concatenate. Note that the first argument is a sequence (e.g. list, tuple) of arrays. It does not take them as separate arguments.
As far as I know, this API is shared by all of numpy's concatenation functions: concatenate, hstack, vstack, dstack, and column_stack all take a single main argument that should be some sequence of arrays.
The reason you are getting that particular error is that arrays are sequences as well. But this means that concatenate is interpreting allValues as a sequence of arrays to concatenate. However, each element of allValues is a float rather than an array, and is therefore being interpreted as a zero-dimensional array. As the error says, these "arrays" cannot be concatenated.
The second argument is taken as the second (optional) argument of concatenate, which is the axis to concatenate on. This only works because there is a single element in the second argument, which can be cast as an integer and therefore is a valid value. If you had put an array with more elements in the second argument, you would have gotten a different error:
a = np.array([1, 2])
b = np.array([3, 4])
np.concatenate(a, b)
# TypeError: only length-1 arrays can be converted to Python scalars
Also make sure you are concatenating two numpy arrays. I was concatenating one python array with a numpy array and it was giving me the same error:
ValueError: 0-d arrays can't be concatenated
It took me some time to figure this out since all the answers in stackoverflow were assuming that you had two numpy arrays.
Pretty silly but easily overlooked mistake. Hence posting just in case this helps someone.
Here are the links to converting an existing python array using np.asarray
or
create np arrays, if it helps.
Another way to get this error is to have two numpy objects of different... types?
I get this error when I try np.concatenate([A,B])
and ValueError: all the input arrays must have same number of dimensions when I run np.concatenate([B,A])
Just as #mithunpaul mentioned, my types are off: A is an array of 44279x204 and B is a <44279x12 sparse matrix of type '<class 'numpy.float64'>' with 88558 stored elements in Compressed Sparse Row format>)
So that's why the error is happening. Don't know how to solve it yet though.

numpy arrays dimension mismatch

I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array.
np.concatenate((tmp, id, freqs))
Here are the exact values:
tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"
The dimensions of tmp, 17232, and freqs are as follows:
[in] tmp.shape
[out] (4,)
[in] np.array(17232).shape
[out] ()
[in] freqs.shape
[out] (1,)
I have also tried casting them all as numpy arrays to no avail.
Although the variable freqs will frequently have more than one value.
However, with both the np.concatenate and np.append functions I get the following error:
*** ValueError: all the input arrays must have same number of dimensions
These all have the same number of columns (0), why can't I concatenate them with either of the above described numpy methods?
All I'm looking to obtain is[(tmp), 17232, (freqs)] in one single dimensional array, which is to be appended onto the end of a pandas dataframe.
Thanks.
Update
It appears I can concatenate the two existing arrays:
np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)
However, the integer, even when casted cannot be used in concatenate.
np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions
What does work, however is nesting append and concatenate
np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
0.022831050228310501], dtype=object)
Although this is kind of messy. Does anyone have a better solution for concatenating a number of heterogeneous arrays?
The problem is that id, and later the integer np.array(17571), are not an array_like object. See here how numpy decides whether an object can be converted automatically to a numpy array or not.
The solution is to make id array_like, i.e. to be an element of a list or tuple, so that numpy understands that id belongs to a 1D array_like structure
It all boils down to
concatenate((tmp, (id,), freqs))
or
concatenate((tmp, [id], freqs))
To avoid this sort of problems when dealing with input variables in functions using numpy, you can use atleast_1d, as pointed out by #askewchan. See about it this question/answer.
Basically, if you are unsure if in different scenarios your variable id will be a single str or a list of str, you are better off using
concatenate((tmp, atleast_1d(id), freqs))
because the two options above will fail if id is already a list/tuple of strings.
EDIT: It may not be obvious why np.array(17571) is not an array_like object. This happens because np.array(17571).shape==(), so it is not iterable as it has no dimensions.

strange behaviour of numpy masked array

I have troubles understanding the behaviour of numpy masked array.
Here is the snippet that puzzles me for two reasons:
arr = numpy.ma.array([(1,2),(3,4)],dtype=[("toto","int"),("titi","int")])
arr[0][0] = numpy.ma.masked
when doing this nothing happens, no mask is applied on the element [0][0]
changing the data to [[1,2],[3,4]] (instead of [(1,2),(3,4)]), I get the following error:
TypeError: expected a readable buffer object
It seems that I misunderstood completely how to setup (and use) masked array.
Could you tell me what is wrong with this code ?
thanks
EDIT: without specifying the dtypes, it works like expected
The purpose of a masked array is to tell for any operation that some elements of the array are invalid to be used, i.e. masked.
For example, you have an array:
a = np.array([[2, 1000], [3, 1000]])
And you want to ignore any operations with the elements >100. You create a masked array like:
b = np.ma.array(a, mask=(a>100))
You can perform some operations in both arrays to see the differences:
a.sum()
# 2005
b.sum()
# 5
a.prod()
# 6000000
b.prod()
# 6
As you see, the masked items are ignored...

Categories