Numpy Array bug

Numpy Array bug - python

I have an array
array = [np.array([[0.76103773], [0.12167502]]),
np.array([[ 0.72017135, 0.1633635 , 0.39956811, 0.91484082, 0.76242736, -0.39897202],
[0.38787197, -0.06179132, -0.04213892, 0.16762614, 0.05880554, 0.59370467]])]
And I want to convert it into a numpy object array that contains numpy ndarrays. So I tried, np.array(array), np.array(array, dtype=object),np.array(array, dtype=np.object)
But all of them give the same error, ValueError: could not broadcast input array from shape (2,1) into shape (2). So basically, the end result should be the same, just that the type of the end result is a numpy object array, not a python list. Can anyone help?

Your list contains (2,1) and (2,6) shaped arrays.
np.array tries to create a multidimensional array from the inputs. That works fine with inputs that have matching shapes (or length and nesting). Failing that it falls back on creating object dtype arrays.
But in cases where the first dimensions of the input arrays match is produces this kind of error. Evidently it has initialed an 'blank' array, and is trying to copy the list arrays into it. I haven't looked at the details, but I've seen the error message before.
In effect giving np.array an list of diverse size arrays, forces it to use some backup methods. So produce an object array, others produce this kind of error. If your list contained arrays all the same shape, the result would be a 3d array, not an object array.
The surest way to make a object array with given shape, is to initialize it, and then copy from the list.
In [66]: alist =[np.array([[0.76103773], [0.12167502]]),
...: np.array([[ 0.72017135, 0.1633635 , 0.39956811, 0.91484082, 0.76242736, -0.39897202],
...: [0.38787197, -0.06179132, -0.04213892, 0.16762614, 0.05880554, 0.59370467]])]
In [67]: alist[0].shape
Out[67]: (2, 1)
In [68]: alist[1].shape
Out[68]: (2, 6)
In [69]: np.array(alist, object)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-69-261e1ad7e5cc> in <module>
----> 1 np.array(alist, object)
ValueError: could not broadcast input array from shape (2,1) into shape (2)
In [70]: arr = np.zeros(2, object)
In [71]: arr[:] = alist
In [72]: arr
Out[72]:
array([array([[0.76103773],
[0.12167502]]),
array([[ 0.72017135, 0.1633635 , 0.39956811, 0.91484082, 0.76242736,
-0.39897202],
[ 0.38787197, -0.06179132, -0.04213892, 0.16762614, 0.05880554,
0.59370467]])], dtype=object)
Don't expect too much from object dtype arrays. Math is hit-or-miss. Somethings work - if they can delegate the action to the elements. Others don't work:
In [73]: arr - arr
Out[73]:
array([array([[0.],
[0.]]),
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])], dtype=object)
In [74]: np.log(arr)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-74-a67b4ae04e95> in <module>
----> 1 np.log(arr)
AttributeError: 'numpy.ndarray' object has no attribute 'log'
Even when the math works it isn't faster than a list comprehension. In fact iteration on an object array is slower than iteration on a list.

Is this what you're trying to accomplish?
array1 = np.array([[0.76103773], [0.12167502]])
array2 = np.array([[ 0.72017135, 0.1633635 , 0.39956811, 0.91484082, 0.76242736, -0.39897202],[0.38787197, -0.06179132, -0.04213892, 0.16762614, 0.05880554, 0.59370467]])
result = np.hstack([array1,array2])
EDIT:
Maybe this?
array1 = [[0.76103773], [0.12167502]]
array2 = [[ 0.72017135, 0.1633635 , 0.39956811, 0.91484082, 0.76242736, -0.39897202],[0.38787197, -0.06179132, -0.04213892, 0.16762614, 0.05880554, 0.59370467]]
result = np.array([array1,array2])
EDIT 2:
Ok, Let's try one more time. I think this is it.
array1 = np.array([[0.76103773], [0.12167502]])
array2 = np.array([[ 0.72017135, 0.1633635 , 0.39956811, 0.91484082, 0.76242736, -0.39897202],[0.38787197, -0.06179132, -0.04213892, 0.16762614, 0.05880554, 0.59370467]])
#solution is either
result = np.array([array1,array2.transpose()])
#or this
result2 = np.array([array1.transpose(),array2])

Related

ValueError: all the input arrays must have same number of dims, but the arr at index 0 has 1 dimension(s) and the arr at index 11 has 2 dimension(s)

I have array as follows
samples_data = [array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)
array([ 0. , 0. , 0. , ..., -0.00020519,
-0.00019427, -0.00107348], dtype=float32)
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-8.9004419e-07, 7.3998461e-07, -6.9706215e-07], dtype=float32)
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)]
And I have a function like this
def generate_segmented_data_1(
samples_data: np.ndarray, sampling_rate: int = 16000
) -> np.ndarray:
new_data = []
for data in samples_data:
segments = segment_audio(data, sampling_rate=sampling_rate)
new_data.append(segments)
new_data = np.array(new_data)
return np.concatenate(new_data)
It shows error like this
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 11 has 2 dimension(s)
And the array at index 0 is like this
[array([ 0. , 0. , 0. , ..., -0.00022057,
0.00013752, -0.00114789], dtype=float32)
array([-4.3174211e-04, -5.4488028e-04, -1.1238289e-03, ...,
8.4724619e-05, 3.0450989e-05, -3.9514929e-05], dtype=float32)]
then the array at index 11 is like this
[[3.0856067e-05 3.0295929e-05 3.0955063e-05 ... 8.5010566e-03
1.3315652e-02 1.5698154e-02]]
And then what should I do so all of the segments I produced being concatenated as an array of segments?

I'm not quite sure I understand what you are trying to do.
b = np.array([[2]])
b.shape
# (1,1)
b = np.array([2])
b.shape
# (1,)
For the segment part of the question, it is unclear what your data structure is, but the code example is broken, as you are appending to a list that hasn't been created.

how do I can get the shape of below array to be 1D instead of 2D?
b = np.array([[2]])
b_shape = b.shape
This will result (1, 1). But, I want it results (1, ) without flattening it?
I suspect the confusion stems from the fact that you chose an example which can be also seen as a scalar, so I'll instead use a different example:
b = np.array([[1,2]])
now, b.shape is (1,2). Removing the first "one" dimension in any way (be it b.flatten() or b.squeeze() or using b[0]) all result in the same:
assert (b.flatten() == b.squeeze()).all()
assert (b.flatten() == b[0]).all()
Now, for the real problem: it appears you're trying to concatenate "rows" from "segments", but the "segments" (which I believe from your sample dat are lists of np.arrays?) are inconsistently formed.
Your sample data is very chaotic: Segments 0-10 seem to be lists of 1D arrays; Segment 11, 18 and 19 are either 2D arrays or lists of lists of floats. This, plus the error code, suggest you have an issue in the data processing of the segments.
Now, to actually concatenate both types of data:
new_data = []
for data in samples_data:
segments = function_a(data) # it appears this doesn't return consistent data
segments = np.asarray(segments) # force it to always be an array...
if segments.ndim > 1: # ...and append each row
for row in segments:
new_data.append(row)
elif segments.ndim == 1: # if just one row, append it directly
new_data.append(segments)
else:
# function_a returned an empty list, do nothing
pass
Given the shown data and code, this should work (but it's neither efficient, nor tested).

How to build an object numpy array from an iterator?

I want to create a NumPy array of np.ndarray from an iterable. This is because I have a function that will return np.ndarray of some constant shape, and I need to create an array of results from this function, something like this:
OUTPUT_SHAPE = some_constant
def foo(input) -> np.ndarray:
# processing
# generated np.ndarray of shape OUTPUT_SHAPE
return output
inputs = [i for i in range(100000)]
iterable = (foo(input) for input in inputs)
arr = np.fromiter(iterable, np.ndarray)
This obviously gives an error:-
cannot create object arrays from iterator
I cannot first create a list then convert it to an array, because it will first create a copy of every output array, so for a time, there will be almost double memory occupied, and I have very limited memory.
Can anyone help me?

You probably shouldn't make an object array. You should probably make an ordinary 2D array of non-object dtype. As long as you know the number of results the iterator will give in advance, you can avoid most of the copying you're worried about by doing it like this:
arr = numpy.empty((num_iterator_outputs, OUTPUT_SHAPE), dtype=whatever_appropriate_dtype)
for i, output in enumerate(iterable):
arr[i] = output
This only needs to hold arr and a single output in memory at once, instead of arr and every output.
If you really want an object array, you can get one. The simplest way would be to go through a list, which will not perform the copying you're worried about as long as you do it right:
outputs = list(iterable)
arr = numpy.empty(len(outputs), dtype=object)
arr[:] = outputs
Note that if you just try to call numpy.array on outputs, it will try to build a 2D array, which will cause the copying you're worried about. This is true even if you specify dtype=object - it'll try to build a 2D array of object dtype, and that'll be even worse, for both usability and memory.

An object dtype array contains references, just like a list.
Define 3 arrays:
In [589]: a,b,c = np.arange(3), np.ones(3), np.zeros(3)
put them in a list:
In [590]: alist = [a,b,c]
and in an object dtype array:
In [591]: arr = np.empty(3,object)
In [592]: arr[:] = alist
In [593]: arr
Out[593]:
array([array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])],
dtype=object)
In [594]: alist
Out[594]: [array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])]
Modify one, and see the change in the list and array:
In [595]: b[:] = [1,2,3]
In [596]: b
Out[596]: array([1., 2., 3.])
In [597]: alist
Out[597]: [array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])]
In [598]: arr
Out[598]:
array([array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])],
dtype=object)
A numeric dtype array created from these copies all values:
In [599]: arr1 = np.stack(arr)
In [600]: arr1
Out[600]:
array([[0., 1., 2.],
[1., 2., 3.],
[0., 0., 0.]])
So even if your use of fromiter worked, it wouldn't be any different, memory wise from a list accumulation:
alist = []
for i in range(n):
alist.append(constant_array)

Can't create an object with an element of a matrix

I'm on python2.7 and I want to get object from specific coordinate in my matrix after initializing all the coordinates at 0:
import numpy as np
class test:
"it's a test"
def __init__(self):
self.x=4
self.y=5
mat=np.full(shape=(4,4),fill_value=0)
mat[2,2]=test()
print(mat[2,2].x)
print(mat[2,2].y)
But I have this error:
Traceback (most recent call last):
File "/root/Documents/matrix.py", line 11, in <module>
mat[2,2]=test()
AttributeError: test instance has no attribute '__trunc__'enter code here
And if I change the line 9 into:
`mat=np.zeros(shape=(4,4))
I get this error:
Traceback (most recent call last):
File "/root/Documents/matrix.py", line 11, in <module>
mat[2]=test()
AttributeError: test instance has no attribute '__float__'
It works fine for an element of a simple list so I hope this is not due to the fact that I use matrix with numpy...
I hope someone can help me, thanks!

Pay attention to what your statements create.
In [164]: mat=np.full(shape=(4,4),fill_value=0)
In [165]:
In [165]: mat
Out[165]:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
In [166]: mat.dtype
Out[166]: dtype('int64')
This array can only hold integers. The error means it tries to apply the __trunc__ method to your object. That would work with a number like 12.23.__trunc__(). But you haven't defined such a method.
In [167]: mat=np.zeros(shape=(4,4))
In [168]: mat
Out[168]:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
In [169]: mat.dtype
Out[169]: dtype('float64')
Here the dtype is float. Again, you haven't defined a __float__ method.
A list holds pointers to Python objects.
In [171]: class test:
...: "it's a test"
...: def __init__(self):
...: self.x=4
...: self.y=5
...: def __repr__(self):
...: return 'test x={},y={}'.format(self.x, self.y)
...:
In [172]: alist = [test(), test()]
In [173]: alist
Out[173]: [test x=4,y=5, test x=4,y=5]
We can make an array that holds your objects:
In [174]: arr = np.array(alist)
In [175]: arr
Out[175]: array([test x=4,y=5, test x=4,y=5], dtype=object)
In [176]: arr[0].x
Out[176]: 4
But note the dtype.
Object dtype arrays are list like, with some array properties. They can be reshaped, but most operations have to use some sort of list iteration. Math is hit-and-miss depending on what methods you defined.
Don't use object dtype arrays unless you really need them. Lists are easier to use.

You should make explicit the fact that the data type are objects
mat=np.full(shape=(4,4),fill_value=0, dtype=object)

reshape list of numpy arrays and then reshape back

I have a list which consists of several numpy arrays with different shapes.
I want to reshape this list of arrays into a numpy vector and then change each element in the vector and then reshape it back to the original list of arrays.
For example:
input
[numpy.zeros((2,2)), numpy.ones((3,3))]
First
To vector
[0,0,0,0,1,1,1,1,1,1,1,1,1]
Second
every time change only one element. for example change the 1st element 0 to 2
[0,2,0,0,1,1,1,1,1,1,1,1,1]
Last
convert it back to
[array([[0,2],[0,0]]),array([[1,1,1],[1,1,1],[1,1,1]])]
Is there any fast implementation? Thanks very much.

It seems like converting to a list and back will be inefficient. Instead, why not figure out which array to index (and where) and then just update that index? e.g.
def change_element(arr1, arr2, ix, value):
which = ix >= arr1.size
arr = [arr1, arr2][which]
ix = ix - arr1.size if which else ix
arr.ravel()[ix] = value
And here's some example usage:
>>> arr1 = np.zeros((2, 2))
>>> arr2 = np.ones((3, 3))
>>> change_element(arr1, arr2, 1, 2)
>>> change_element(arr1, arr2, 6, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 1. , 1. , 1. ],
[ 1. , 1. , 1. ]])
>>> change_element(arr1, arr2, 7, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 3.14, 1. , 1. ],
[ 1. , 1. , 1. ]])
A few notes -- This updates the arrays in place. It doesn't create new arrays. If you really need to create new arrays, I suppose you could np.copy them and return. Also, this relies on the arrays sharing memory before and after the ravel. I don't remember the exact circumstances where ravel would return a new array rather than a view into the original array . . .
Generalizing to more arrays is actually quite easy. We just need to walk down the list of arrays and see if ix is less than the array size. If it is, we've found our array. If it isn't, we need to subtract the array's size from ix to represent the number of elements we've traversed thus far:
def change_element(arrays, ix, value):
for arr in arrays:
if ix < arr.size:
arr.ravel()[ix] = value
return
ix -= arr.size
And you can call this similar to before:
change_element([arr1, arr2], 6, 3.14159)

#mgilson probably has the best answer for you, but if you absolutely have to convert to a flat list first and then go back again (perhaps because you need to do something else with the flat list as well), then you can do this with list comprehensions:
lst = [numpy.zeros((2,4)), numpy.ones((3,3))]
tlist = [e for a in lst for e in a.ravel()]
tlist[1] = 2
i = 0
lst2 = []
dims = [a.shape for a in lst]
for n, m in dims:
lst2.append(np.array(tlist[i:i+n*m]).reshape(n,m))
i += n*m
lst2
[array([[ 0., 2.],
[ 0., 0.]]), array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])]
Of course, you lose the information about your array sizes when you flatten, so you need to store them somewhere (here, in dims).

cryptic scipy "could not convert integer scalar" error

I am constructing a sparse vector using a scipy.sparse.csr_matrix like so:
csr_matrix((values, (np.zeros(len(indices)), indices)), shape = (1, max_index))
This works fine for most of my data, but occasionally I get a ValueError: could not convert integer scalar.
This reproduces the problem:
In [145]: inds
Out[145]:
array([ 827969148, 996833913, 1968345558, 898183169, 1811744124,
2101454109, 133039182, 898183170, 919293479, 133039089])
In [146]: vals
Out[146]:
array([ 1., 1., 1., 1., 1., 2., 1., 1., 1., 1.])
In [147]: max_index
Out[147]:
2337713000
In [143]: csr_matrix((vals, (np.zeros(10), inds)), shape = (1, max_index+1))
...
996 fn = _sparsetools.csr_sum_duplicates
997 M,N = self._swap(self.shape)
--> 998 fn(M, N, self.indptr, self.indices, self.data)
999
1000 self.prune() # nnz may have changed
ValueError: could not convert integer scalar
inds is a np.int64 array and vals is a np.float64 array.
The relevant part of the scipy sum_duplicates code is here.
Note that this works:
In [235]: csr_matrix(([1,1], ([0,0], [1,2])), shape = (1, 2**34))
Out[235]:
<1x17179869184 sparse matrix of type '<type 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
So the problem is not that one of the dimensions is > 2^31
Any thoughts why these values should be causing a problem?

Might it be that max_index > 2**31 ?
Try this, just to make sure:
csr_matrix((vals, (np.zeros(10), inds/2)), shape = (1, max_index/2))

The max index you are giving is less than the maximum index of the rows you are supplying.
This
sparse.csr_matrix((vals, (np.zeros(10), inds)), shape = (1, np.max(inds)+1))
works fine with me.
Although making a .todense() results in memory error for the large size of the matrix

Uncommenting the sum_duplicates - function will lead to other errors. But this fix: strange error when creating csr_matrix also solves your problem. You can extend the version_check to newer versions of scipy.
import scipy
import scipy.sparse
if scipy.__version__ in ("0.14.0", "0.14.1", "0.15.1"):
_get_index_dtype = scipy.sparse.sputils.get_index_dtype
def _my_get_index_dtype(*a, **kw):
kw.pop('check_contents', None)
return _get_index_dtype(*a, **kw)
scipy.sparse.compressed.get_index_dtype = _my_get_index_dtype
scipy.sparse.csr.get_index_dtype = _my_get_index_dtype
scipy.sparse.bsr.get_index_dtype = _my_get_index_dtype

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy Array bug - python

Related

ValueError: all the input arrays must have same number of dims, but the arr at index 0 has 1 dimension(s) and the arr at index 11 has 2 dimension(s)

How to build an object numpy array from an iterator?

Can't create an object with an element of a matrix

reshape list of numpy arrays and then reshape back

cryptic scipy "could not convert integer scalar" error

Categories

Resources