I have a numpy array arr which is shaped (100,)
arr.shape() #(100,)
here are 100 lists within this array. The lists are strings of characters with no spaces, around length 30
'TBTBBBBBBBBBEBEBEBEBDLKDJFDFIKKKKK'
This array has a "matrix-esque" shape. How can I either (a) tell the lengths of each of the lists in this array or (b) reformat this array so that arr.shape() gives me two dimensions, i.e. arr.shape() gives (100,30)?
Also, it may not be that every list is of the same length (some may be 28, not 30):
How can I check for this behavior? What will numpy.shape() output in such a case?
You can use a list comprehension to determine the length of each string in the array:
arr = np.array(['aasdfads', 'asfdads', 'fdfsdfaf'])
>>> [len(i) for i in arr]
[8, 7, 8]
Or take the max:
>>> max([len(i) for i in arr])
8
Related
Given a list of numpy arrays, each of different length, as that obtained by doing lst = np.array_split(arr, indices), how do I get the sum of every array in the list? (I know how to do it using list-comprehension but I was hoping there was a pure-numpy way to do it).
I thought that this would work:
np.apply_along_axis(lambda arr: arr.sum(), axis=0, arr=lst)
But it doesn't, instead it gives me this error which I don't understand:
ValueError: operands could not be broadcast together with shapes (0,) (12,)
NB: It's an array of sympy objects.
There's a faster way which avoids np.split, and utilizes np.reduceat. We create an ascending array of indices where you want to sum elements with np.append([0], np.cumsum(indices)[:-1]). For proper indexing we need to put a zero in front (and discard the last element, if it covers the full range of the original array.. otherwise just delete the [:-1] indexing). Then we use the np.add ufunc with np.reduceat:
import numpy as np
arr = np.arange(1, 11)
indices = np.array([2, 4, 4])
# this should split like this
# [1 2 | 3 4 5 6 | 7 8 9 10]
np.add.reduceat(arr, np.append([0], np.cumsum(indices)[:-1]))
# array([ 3, 18, 34])
Suppose i have a (list | np.array) a of size (N,) and a (list | np.array) l of integers between 0 and N-1.
Is there a way i could write more efficiently sum([a[x] for x in l]) ?
Four different conditions:
a is a numpy array, l is a numpy array
a is a numpy array, l is a list
a is a list, l is a numpy array
a is a list, l is a list
a is a numpy array, l is a numpy array
a is a numpy array, l is a list
For both of the above you can do a[l].sum()
a is a list, l is a numpy array
a is a list, l is a list
For these last two, your options are either to cast a to numpy and then do as above:
np.asarray(a)[l].sum()
or if you are going to use something like your list comprehension, then at least use a generator expression instead - there is no need to build a list simply to add up the values:
sum(a[x] for x in l)
If you are looking for a single solution that you can use regardless of the type, then np.asarray(a)[l].sum() (as suggested above) will work, because if the argument to np.asarray is an array anyway, then it will simply use it as-is -- but be aware that if a is a list then this will need to create an array version of a, so use of the generator expression will be more economical on memory.
import numpy as np
a_list = [10, 11, 12]
l_list = [2, 2]
a_array = np.array(a_list)
l_array = np.array(l_list)
for a in a_list, a_array:
for l in l_list, l_array:
print(np.asarray(a)[l].sum())
gives:
24
24
24
24
My array looks like this:
a = ([1,2],[2,3],[4,5],[3,8])
I did the following to delete odd indexes :
a = [v for i, v in enumerate(a) if i % 2 == 0]
but it dives me now two different arrays instead of one two dimensional:
a= [array([1, 2]), array([4, 5])]
How can I keep the same format as the beginning? thank you!
That is as simple as
a[::2]
which yields the lines with even index.
Use numpy array indexing, not comprehensions:
c = a[list(range(0,len(a),2)),:]
If you define c as the output of a list comprehension, it will return a list of one-dimensional numpy arrays. Instead, using the proper indexing maintains the result a numpy array.
Note than instead of "deleting" the odd indices, what we do is specify what to keep: take all lines with an even index (the list(range(0,len(a),2)) part) and for each line take all elements (the : part)
I want to count easily the number of elements in a NumPy array, but I don't know a priori their dimensions. Is there a generic function that counts the number of elements in a numpy array whatever its dimension is?
Thanks
numpy.ndarray.size returns a number of elements in the array
>>> x = np.zeros((3, 5, 2), dtype=np.complex128)
>>> x.size
30
Suppose I have 2D array, for simplicity, a=np.array([[1,2],[3,4]]). I want to convert it to list of arrays, so that the result would be:
b=[np.array([1,2]), np.array([3,4])]
I found out that there is np.ndarray.tolist() function, but it converts N-D array into nested list. I could have done in a for loop (using append method), but it is not efficient/elegant.
In my real example I am working with 2D arrays of approximately 10000 x 50 elements and I want list that contains 50 one dimensional arrays, each of the shape (10000,).
How about using list:
a=np.array([[1,2],[3,4]])
b = list(a)
Why don't you use list comprehension as follows without using any append:
a=np.array([[1,2],[3,4]])
b = [i for i in a]
print (b)
Output
[array([1, 2]), array([3, 4])]