Python: converting multi-dimensional numpy arrays into list of arrays - python

Suppose I have 2D array, for simplicity, a=np.array([[1,2],[3,4]]). I want to convert it to list of arrays, so that the result would be:
b=[np.array([1,2]), np.array([3,4])]
I found out that there is np.ndarray.tolist() function, but it converts N-D array into nested list. I could have done in a for loop (using append method), but it is not efficient/elegant.
In my real example I am working with 2D arrays of approximately 10000 x 50 elements and I want list that contains 50 one dimensional arrays, each of the shape (10000,).

How about using list:
a=np.array([[1,2],[3,4]])
b = list(a)

Why don't you use list comprehension as follows without using any append:
a=np.array([[1,2],[3,4]])
b = [i for i in a]
print (b)
Output
[array([1, 2]), array([3, 4])]

Related

Sum of elements of a 2d list in Python

I have a 2D list
a = [[1,2], [3,4] ...]
I want to do something like this:
1+3 and 2+4 and store result in another array
b = [4, 6]
Like 0th element of array at index 0 which is 1, added with 0th element of array at index 1 which is 3, and 2 added with 4, and so on.
How can I do this, without looping or generators as looping over a large list is comparatively slower than sum and zip functions.
Using just sum and zip as you mention, however zip still returns a generator, which is memory efficient, not sure why you think otherwise.
list(map(sum, (zip(*a))))

Subsetting an array with another array of indexes

Suppose i have a (list | np.array) a of size (N,) and a (list | np.array) l of integers between 0 and N-1.
Is there a way i could write more efficiently sum([a[x] for x in l]) ?
Four different conditions:
a is a numpy array, l is a numpy array
a is a numpy array, l is a list
a is a list, l is a numpy array
a is a list, l is a list
a is a numpy array, l is a numpy array
a is a numpy array, l is a list
For both of the above you can do a[l].sum()
a is a list, l is a numpy array
a is a list, l is a list
For these last two, your options are either to cast a to numpy and then do as above:
np.asarray(a)[l].sum()
or if you are going to use something like your list comprehension, then at least use a generator expression instead - there is no need to build a list simply to add up the values:
sum(a[x] for x in l)
If you are looking for a single solution that you can use regardless of the type, then np.asarray(a)[l].sum() (as suggested above) will work, because if the argument to np.asarray is an array anyway, then it will simply use it as-is -- but be aware that if a is a list then this will need to create an array version of a, so use of the generator expression will be more economical on memory.
import numpy as np
a_list = [10, 11, 12]
l_list = [2, 2]
a_array = np.array(a_list)
l_array = np.array(l_list)
for a in a_list, a_array:
for l in l_list, l_array:
print(np.asarray(a)[l].sum())
gives:
24
24
24
24

Numpy: two dimensional arrays, delete the odd indexes and keep the same array format

My array looks like this:
a = ([1,2],[2,3],[4,5],[3,8])
I did the following to delete odd indexes :
a = [v for i, v in enumerate(a) if i % 2 == 0]
but it dives me now two different arrays instead of one two dimensional:
a= [array([1, 2]), array([4, 5])]
How can I keep the same format as the beginning? thank you!
That is as simple as
a[::2]
which yields the lines with even index.
Use numpy array indexing, not comprehensions:
c = a[list(range(0,len(a),2)),:]
If you define c as the output of a list comprehension, it will return a list of one-dimensional numpy arrays. Instead, using the proper indexing maintains the result a numpy array.
Note than instead of "deleting" the odd indices, what we do is specify what to keep: take all lines with an even index (the list(range(0,len(a),2)) part) and for each line take all elements (the : part)

Find the lengths of list within numpy ndarray

I have a numpy array arr which is shaped (100,)
arr.shape() #(100,)
here are 100 lists within this array. The lists are strings of characters with no spaces, around length 30
'TBTBBBBBBBBBEBEBEBEBDLKDJFDFIKKKKK'
This array has a "matrix-esque" shape. How can I either (a) tell the lengths of each of the lists in this array or (b) reformat this array so that arr.shape() gives me two dimensions, i.e. arr.shape() gives (100,30)?
Also, it may not be that every list is of the same length (some may be 28, not 30):
How can I check for this behavior? What will numpy.shape() output in such a case?
You can use a list comprehension to determine the length of each string in the array:
arr = np.array(['aasdfads', 'asfdads', 'fdfsdfaf'])
>>> [len(i) for i in arr]
[8, 7, 8]
Or take the max:
>>> max([len(i) for i in arr])
8

Numpy: 2D array access with 2D array of indices

I have two arrays, one is a matrix of index pairs,
a = array([[[0,0],[1,1]],[[2,0],[2,1]]], dtype=int)
and another which is a matrix of data to access at these indices
b = array([[1,2,3],[4,5,6],[7,8,9]])
and I want to able to use the indices of a to get the entries of b. Just doing:
>>> b[a]
does not work, as it gives one row of b for each entry in a, i.e.
array([[[[1,2,3],
[1,2,3]],
[[4,5,6],
[4,5,6]]],
[[[7,8,9],
[1,2,3]],
[[7,8,9],
[4,5,6]]]])
when I would like to use the index pair in the last axis of a to give the two indices of b:
array([[1,5],[7,8]])
Is there a clean way of doing this, or do I need to reshape b and combine the columns of a in a corresponding manner?
In my actual problem a has about 5 million entries, and b is 100-by-100, I'd like to avoid for loops.
Actually, this works:
b[a[:, :, 0],a[:, :, 1]]
Gives array([[1, 5],
[7, 8]]).
For this case, this works
tmp = a.reshape(-1,2)
b[tmp[:,0], tmp[:,1]]
A more general solution, whenever you want to use a 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, in order to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
You can test this by using, e.g.:
B = 1/(np.arange(n*m).reshape(n,-1) + 1)
inds = np.random.randint(0,B.shape[1],(B.shape[0],B.shape[1]))

Categories