Numpy array of arrays - python

In my project I use a library (root_numpy) which returns data as an array of arrays:
data = array([array([1, 2]), array([3, 4]), array([5, 6])], dtype=object)
I would like to turn this object to a regular 2d numpy array.
array([[1, 2],
[3, 4],
[5, 6]])
I already tried np.vstack but that was rather slow. Is there an efficient way to accomplish this task? Many thanks in advance.

Related

How to convert an numpy array of string array to nested numpy array?

I have a numpy array of the format: test = array(['[[1,2,3],[2,3,4]]', '[[2,3,4],[4,5,6]]'], dtype='<U17'). I would like to convert it to numpy array of shape (2,2,3).
Expected outcome is:
array([[[1, 2, 3],
[2, 3, 4]],
[[2, 3, 4],
[4, 5, 6]]])
Currently, I achieve this by doing - np.array(list(map(lambda x: json.loads(x),test))).
I wonder if there's a better way to do this. Any pointers please?

Slicing Numpy Array by 2 index arrays

If I have a set of indices stored in two Numpy arrays, my goal is to slice a given input array based on corresponding indices in those index arrays. For eg.
index_arr1 = np.asarray([2,3,4])
index_arr2 = np.asarray([5,5,6])
input_arr = np.asarray([1,2,3,4,4,5,7,2])
The output to my code should be [[3,4,4],[4,4],[4,5]] which is basically [input_arr[2:5], input_arr[3:5], input_arr[4:6]]
Can anybody suggest a way to solve this problem using numpy functions and avoiding any for loops to be as efficient as possible.
Do you mean:
[input_arr[x:y] for x,y in zip(index_arr1, index_arr2)]
Output:
[array([3, 4, 4]), array([4, 4]), array([4, 5])]
Or if you really want list of lists:
[[input_arr[x:y].tolist() for x,y in zip(index_arr1, index_arr2)]
Output:
[[3, 4, 4], [4, 4], [4, 5]]

Save a pandas dataframe with a column with 2d arrays as a parquet file in python

I'm trying to save a pandas dataframe to a parquet file using pd.to_parquet(df).df is a dataframe with multiple columns and one of the columns is filled with 2d arrays in each row. As I do this, I receive an error from pyarrow complaining that only 1-d arrays are supported. I googled and it seems there is no solution. I just wanted to confirm that in fact there is no solution to this and I have to somehow represent my 2-d array with a 1-d array.
It's correct that pyarrow / parquet has this limitation of not storing 2D arrays.
But, parquet (and arrow) support nested lists, and you could represent a 2D array as a list of lists (or in python an array of arrays or list of arrays is also fine). So one option could be to convert your 2D arrays to such format.
Example that such nested lists/arrays work:
In [2]: df = pd.DataFrame(
...: {'a': [[np.array([1, 2, 3]), np.array([4, 5, 6])],
...: [np.array([3, 4, 5]), np.array([6, 7, 8])]]})
In [3]: df.to_parquet('test_nested_list.parquet')
In [4]: res = pd.read_parquet('test_nested_list.parquet')
In [5]: res['a']
Out[5]:
0 [[1, 2, 3], [4, 5]]
1 [[1, 2], [3, 4, 5]]
Name: a, dtype: object
In [6]: res['a'].values
Out[6]:
array([array([array([1, 2, 3]), array([4, 5, 6])], dtype=object),
array([array([3, 4, 5]), array([6, 7, 8])], dtype=object)],
dtype=object)

How do I convert a 2D numpy array into a 1D numpy array of 1D numpy arrays?

In other words, each element of the outer array will be a row vector from the original 2D array.
A #Jaime already said, a 2D array can be interpreted as an array of 1D arrays, suppose:
a = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
doing a[0] will return array([1, 2, 3]).
So you don't need to do any conversion.
I think it makes little sense to use numpy arrays to do that, just think you're missing out on all the advantages of numpy.
I had the same issue to append a raw with a different length to a 2D-array.
The only trick I found up to now was to use list comprenhsion and append the new row (see below). Not very optimal I guess but at least it works ;-)
Hope this can help
>>> x=np.reshape(np.arange(0,9),(3,3))
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> row_to_append = np.arange(9,11)
>>> row_to_append
array([ 9, 10])
>>> result=[item for item in x]
>>> result.append(row_to_append)
>>> result
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10])]
np.vsplit Split an array into multiple sub-arrays vertically (row-wise).
x=np.arange(12).reshape(3,4)
In [7]: np.vsplit(x,3)
Out[7]: [array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8, 9, 10, 11]])]
A comprehension could be used to reshape those arrays into 1d ones.
This is a list of arrays, not an array of arrays. Such a sequence of arrays can be recombined with vstack (or hstack, dstack).
np.array([np.arange(3),np.arange(4)])
makes a 2 element array of arrays. But if the arrays in the list are all the same shape (or compatible), it makes a 2d array. In terms of data storage it may not matter whether it is 2d or 1d of 1d arrays.

Numpy: Concatenating multidimensional and unidimensional arrays

I have a 2x2 numpy array :
x = array(([[1,2],[4,5]]))
which I must merge (or stack, if you wish) with a one-dimensional array :
y = array(([3,6]))
by adding it to the end of the rows, thus making a 2x3 numpy array that would output like so :
array([[1, 2, 3],
[4, 5, 6]])
now the proposed method for this in the numpy guides is :
hstack((x,y))
however this doesn't work, returning the following error :
ValueError: arrays must have same number of dimensions
The only workaround possible seems to be to do this :
hstack((x, array(([y])).T ))
which works, but looks and sounds rather hackish. It seems there is not other way to transpose the given array, so that hstack is able to digest it. I was wondering, is there a cleaner way to do this? Wouldn't there be a way for numpy to guess what I wanted to do?
unutbu's answer works in general, but in this case there is also np.column_stack
>>> x
array([[1, 2],
[4, 5]])
>>> y
array([3, 6])
>>> np.column_stack((x,y))
array([[1, 2, 3],
[4, 5, 6]])
Also works:
In [22]: np.append(x, y[:, np.newaxis], axis=1)
Out[22]:
array([[1, 2, 3],
[4, 5, 6]])

Categories