Numpy array with numpy arrays as objects - python

I'd like to create a numpy ndarray with entries of type ndarray itself. I was able to wrap ndarrays into another type to get it work but I want to do this without wrapping. With wrapping a ndarray x into e.g. the dictionary {1:x} I can do
F = np.vectorize(lambda x: {1:np.repeat(x,3)})
F(np.arange(9).reshape(3,3))
and get (3,3) ndarray with entries {1:[0,0,0]} ... {1:[8,8,8]} (with ndarrays). When change F to F = np.vectorize(lambda x: np.repeat(x,3)) numpy complains ValueError: setting an array element with a sequence. I guess it detects that the entries as arrays themselves and doesn't threat them as objects anymore.
How can I avoid this and do the same thing without wrapping the entries from ndarray into something different?
Thanks a lot in advance for hints :)

You can (ab-)use numpy.frompyfunc:
>>> F = np.arange(9).reshape(3, 3)
>>> np.frompyfunc(F.__getitem__, 1, 1)(range(3))
array([array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])], dtype=object)

Related

2D list to np.ndarray of ndarray

I have a 2D list like this
import numpy as np
data=[[1,2],[2,3],[3,4]]
I want to turn this to np.ndarray.
I don't want to get this:
np.array(data).shape
#(3,2)
my expected is like this
result=np.array(np.array([1,2]),np.array([2,3]),np.array([3,4]))
result.dtype
result.shape
#np.ndarry
#(3,)
result[i].shape
result[i].dtype
#(2,)
#np.ndarry
what should i do this a lot
AS #hpaulj suggested in comments
You can create first empty array of object type and then assign list to it like
data = [[1,2],[2,3],[3,4]]
result = np.empty(shape=len(data), dtype=object)
result[:] = [np.array(i) for i in data]
result
array([array([1, 2]), array([2, 3]), array([3, 4])], dtype=object)
result.shape
(3,)
You can try converting the list into an ndarray and then flattening it
np.array(data).flatten()
You can also specify order parameter in flatten to flatten in row-major ('C') or column-major ('F') order

Rationale for numpy.split returning a list and not an array

I was surprised that numpy.split yields a list and not an array. I would have thought it would be better to return an array, since numpy has put a lot of work into making arrays more useful than lists. Can anyone justify numpy returning a list instead of an array? Why would that be a better programming decision for the numpy developers to have made?
A comment pointed out that if the slit is uneven, the result can't be a array, at least not one that has the same dtype. At best it would be an object dtype.
But lets consider the case of equal length subarrays:
In [124]: x = np.arange(10)
In [125]: np.split(x,2)
Out[125]: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]
In [126]: np.array(_) # make an array from that
Out[126]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
But we can get the same array without split - just reshape:
In [127]: x.reshape(2,-1)
Out[127]:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Now look at the code for split. It just passes the task to array_split. Ignoring the details about alternative axes, it just does
sub_arys = []
for i in range(Nsections):
# st and end from `div_points
sub_arys.append(sary[st:end])
return sub_arys
In other words, it just steps through array and returns successive slices. Those (often) are views of the original.
So split is not that sophisticate a function. You could generate such a list of subarrays yourself without a lot of numpy expertise.
Another point. Documentation notes that split can be reversed with an appropriate stack. concatenate (and family) takes a list of arrays. If give an array of arrays, or a higher dim array, it effectively iterates on the first dimension, e.g. concatenate(arr) => concatenate(list(arr)).
Actually you are right it returns a list
import numpy as np
a=np.random.randint(1,30,(2,2))
b=np.hsplit(a,2)
type(b)
it will return type(b) as list so, there is nothing wrong in the documentation, i also first thought that the documentation is wrong it doesn't return a array, but when i checked
type(b[0])
type(b[1])
it returned type as ndarray.
it means it returns a list of ndarrary's.

Numpy: How to stack arrays in columns?

Let's say that I have n numpy arrays of the same length. I would like to now create a numpy matrix, sucht that each column of the matrix is one of the numpy arrays. How can I achieve this? Now I'm doing this in a loop and it produces the wrong results.
Note: I have to be able to stack them next to each other one by one iteratively.
my code looks like assume that get_array is a function that returns a certain array based on its argument. I don't know until after the loop, how many columns that I'm going to have.
matrix = np.empty((n_rows,))
for item in sorted_arrays:
array = get_array(item)
matrix = np.vstack((matrix,array))
any help would be appreciated
You could try putting all your arrays (or lists) into a matrix and then transposing it. This will work if all arrays are the same length.
mymatrix = np.asmatrix((array1, array2, array3)) #... putting arrays into matrix.
mymatrix = mymatrix.transpose()
This should output a matrix with each array as a column. Hope this helps.
Time and again, we recommend collecting the arrays in a list, and making the final array with one call. That's more efficient, and usually easier to get right.
alist = []
for item in sorted_arrays:
alist.append(get_array(item)
or
alist = [get_array(item) for item in sorted_arrays]
There are various ways of assembling the list. Since you want columns, and assuming get_array produces equal sized 1d arrays:
arr = np.column_stack(alist)
Collecting them in rows and transposing that works too:
arr = np.array(alist).T
arr = np.vstack(alist).T
arr = np.stack(alist).T
arr = np.stack(alist, axis=1)
If the arrays are already 2d
arr = np.concatenate(alist, axis=1)
All the stack variations use concatenate, just varying in how they tweak the shape(s) of the input arrays. The key to using concatenate is to understand the dimensions and shapes, and how to add dimensions as needed. That should, soon or later, become fluent in that kind of coding.
If they vary in shape or dimensions, things get messier.
Equally good is to put the arrays in a pre-allocated array. But you need to know the desired final shape
arr = np.zeros((m,n), dtype)
for i, item in enumerate(sorted_arrays):
arr[:,i] = get_array(item)
n is len(sorted_arrays), and m is the length of one of get_array(item). You also need to know the expected dtype (int, float etc).
If you have a, b, c, d np array of same length, the following code will accomplish what you want:
out_matrix = np.vstack([a, b, c, d]).transpose()
An example:
In [3]: a = np.array([1, 2, 3, 4])
In [4]: b = np.array([5, 6, 7, 8])
In [5]: c = np.array([2, 3, 4, 5])
In [6]: d = np.array([6, 8, 2, 4])
In [10]: np.vstack([a, b, c, d]).transpose()
Out[10]:
array([[1, 5, 2, 6],
[2, 6, 3, 8],
[3, 7, 4, 2],
[4, 8, 5, 4]])

Convert list to np.arrays efficient

I am loading a dataset in my python code with contains two matrices. The name of those matrices are train_dataset_face and train_dataset_audio and the way that I am reading them is as list of np.arrays. Finally I convert them as np.arrrays of np.arrays. Initially my matrices during the debug look like that:
and
Then I convert them into np.arrays using the following code:
train_dataset_face = np.array(train_dataset_face)
train_dataset_audio = np.array(train_dataset_audio)
And in the end my matrices look like:
For some weird reason in the case of train_dataset_face I got this array indication before each vector of my array while in the case of train_dataset_audio i dont have it. Is it possible to remove it? This "array" indication cause me problems when I am trying to apply several algorithms to the train_dataset_face. Any idea what happened here?
You can only create a single array, if all arrays in your list has the same shape, which is true for train_dataset_audio but not for train_dataset_face.
>>> a = [numpy.array([1,2,3,4]), numpy.array([1,2,3,4])]
>>> numpy.array(a)
array([[1, 2, 3, 4],
[1, 2, 3, 4]])
>>> b = [numpy.array([1,2,3]), numpy.array([1,2,3,4])]
>>> numpy.array(b)
array([array([1, 2, 3]), array([1, 2, 3, 4])], dtype=object)

How to convert a python set to a numpy array?

I am using a set operation in python to perform a symmetric difference between two numpy arrays. The result, however, is a set and I need to convert it back to a numpy array to move forward. Is there a way to do this? Here's what I tried:
a = numpy.array([1,2,3,4,5,6])
b = numpy.array([2,3,5])
c = set(a) ^ set(b)
The results is a set:
In [27]: c
Out[27]: set([1, 4, 6])
If I convert to a numpy array, it places the entire set in the first array element.
In [28]: numpy.array(c)
Out[28]: array(set([1, 4, 6]), dtype=object)
What I need, however, would be this:
array([1,4,6],dtype=int)
I could loop over the elements to convert one by one, but I will have 100,000 elements and hoped for a built-in function to save the loop. Thanks!
Do:
>>> numpy.array(list(c))
array([1, 4, 6])
And dtype is int (int64 on my side.)
Don't convert the numpy array to a set to perform exclusive-or. Use setxor1d directly.
>>> import numpy
>>> a = numpy.array([1,2,3,4,5,6])
>>> b = numpy.array([2,3,5])
>>> numpy.setxor1d(a, b)
array([1, 4, 6])
Try:
numpy.fromiter(c, int, len(c))
This is twice as fast as the solution with list as a middle product.
Try this.
numpy.array(list(c))
Converting to list before initializing numpy array would set the individual elements to integer rather than the first element as the object.

Categories