Get ndarray from pandas column when cell elements are list - python

I have a pandas data frame that looks like this:
>>> df = pd.DataFrame({'a': list(range(10))})
>>> df['a'] = df.a.apply(lambda x: x*np.array([1,2,3]))
>>>df.head()
a
0 [0, 0, 0]
1 [1, 2, 3]
2 [2, 4, 6]
3 [3, 6, 9]
4 [4, 8, 12]
I would like to get column a from the df as a ndarray. But when I do that I get an array of arrays
>>> df.a.values
array([array([0, 0, 0]), array([1, 2, 3]), array([2, 4, 6]),
array([3, 6, 9]), array([ 4, 8, 12]), array([ 5, 10, 15]),
array([ 6, 12, 18]), array([ 7, 14, 21]), array([ 8, 16, 24]),
array([ 9, 18, 27])], dtype=object)
How can I get the returnd output to be
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
# ...
])

Using pandas,
df.a.apply(pd.Series).values
Using numpy,
np.vstack(df.a.values)
You get
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
[ 5, 10, 15],
[ 6, 12, 18],
[ 7, 14, 21],
[ 8, 16, 24],
[ 9, 18, 27]])

Check
np.array(df['a'].tolist())
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9],
[ 4, 8, 12],
[ 5, 10, 15],
[ 6, 12, 18],
[ 7, 14, 21],
[ 8, 16, 24],
[ 9, 18, 27]], dtype=int64)

Related

Convert a 2D array into 3D array repeating existing values

I have an array of shape (360,480) containing values from 1 to 11,
Array([[ 1, 1, 1, ..., 1, 1, 1],
[ 1, 1, 1, ..., 1, 1, 1],
[ 1, 1, 1, ..., 1, 1, 1],
...,
[ 4, 4, 4, ..., 11, 11, 11],
[ 4, 4, 4, ..., 11, 11, 11],
[ 4, 4, 4, ..., 11, 11, 11]])
How could I reshape this array into an array of shape (360,480,3) in a way that
np.all(array[:,:,0]==array[:,:,1])
and
np.all(array[:,:,0]==array[:,:,2])
are both True?
The expected outcome should be
array([[[ 1, 1, 1],
[ 1, 1, 1],
[ 1, 1, 1],
...,
[ 1, 1, 1],
[ 1, 1, 1],
[ 1, 1, 1]],
[[ 4, 4, 4],
[ 4, 4, 4],
[ 4, 4, 4],
...,
[11, 11, 11],
[11, 11, 11],
[11, 11, 11]],
[[ 4, 4, 4],
[ 4, 4, 4],
[ 4, 4, 4],
...,
[11, 11, 11],
[11, 11, 11],
[11, 11, 11]]])
You could use numpy.repeat function
https://numpy.org/doc/stable/reference/generated/numpy.repeat.html
array3d = np.repeat(array2d[:, :, None], repeats=3, axis=2)

Adding the previous n rows as columns to a NumPy array

I want to add the previous n rows as columns to a NumPy array.
For example, if n=2, the array below...
[[ 1, 2]
[ 3, 4]
[ 5, 6]
[ 7, 8]
[ 9, 10]
[11, 12]]
...should be turned into the following one:
[[ 1, 2, 0, 0, 0, 0]
[ 3, 4, 1, 2, 0, 0]
[ 5, 6, 3, 4, 1, 2]
[ 7, 8, 5, 6, 3, 4]
[ 9, 10, 7, 8, 5, 6]
[11, 12, 9, 10, 7, 8]]
Any ideas how I could do that without going over the entire array in a for loop?
Here's a vectorized approach -
def vectorized_app(a,n):
M,N = a.shape
idx = np.arange(a.shape[0])[:,None] - np.arange(n+1)
out = a[idx.ravel(),:].reshape(-1,N*(n+1))
out[N*(np.arange(1,M+1))[:,None] <= np.arange(N*(n+1))] = 0
return out
Sample run -
In [255]: a
Out[255]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
In [256]: vectorized_app(a,3)
Out[256]:
array([[ 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 4, 5, 6, 1, 2, 3, 0, 0, 0, 0, 0, 0],
[ 7, 8, 9, 4, 5, 6, 1, 2, 3, 0, 0, 0],
[10, 11, 12, 7, 8, 9, 4, 5, 6, 1, 2, 3],
[13, 14, 15, 10, 11, 12, 7, 8, 9, 4, 5, 6],
[16, 17, 18, 13, 14, 15, 10, 11, 12, 7, 8, 9]])
Runtime test -
I am timing #Psidom's loop-comprehension based method and the vectorized method listed in this post on a 100x scaled up version (in terms of size) of the sample posted in the question :
In [246]: a = np.random.randint(0,9,(600,200))
In [247]: n = 200
In [248]: %timeit np.column_stack(mypad(a, i) for i in range(n + 1))
1 loops, best of 3: 748 ms per loop
In [249]: %timeit vectorized_app(a,n)
1 loops, best of 3: 224 ms per loop
Here is a way to pad 0 in the beginning of the array and then column stack them:
import numpy as np
n = 2
def mypad(myArr, n):
if n == 0:
return myArr
else:
return np.pad(myArr, ((n,0), (0,0)), mode = "constant")[:-n]
np.column_stack(mypad(arr, i) for i in range(n + 1))
# array([[ 1, 2, 0, 0, 0, 0],
# [ 3, 4, 1, 2, 0, 0],
# [ 5, 6, 3, 4, 1, 2],
# [ 7, 8, 5, 6, 3, 4],
# [ 9, 10, 7, 8, 5, 6],
# [11, 12, 9, 10, 7, 8]])

From argwhere to where?

Is there a fast way of getting the output of argwhere in the output of where format ?
Let me show you what I'm doing with a bit of code:
In [123]: filter = np.where(scores[:,:,:,4,:] > 21000)
In [124]: filter
Out[124]:
(array([ 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 23, 23, 23, 23, 23]),
array([13, 13, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5]),
array([0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2]),
array([44, 44, 0, 1, 2, 3, 6, 8, 12, 14, 22, 31, 58, 76, 82, 41]))
In [125]: filter2 = np.argwhere(scores[:,:,:,4,:] > 21000)
In [126]: filter2
Out[126]:
array([[ 2, 13, 0, 44],
[ 2, 13, 1, 44],
[ 4, 4, 3, 0],
[ 4, 4, 3, 1],
[ 4, 4, 3, 2],
[ 4, 4, 3, 3],
[ 4, 4, 3, 6],
[ 4, 4, 3, 8],
[ 4, 4, 3, 12],
[ 4, 4, 3, 14],
[ 4, 4, 3, 22],
[23, 4, 2, 31],
[23, 4, 2, 58],
[23, 4, 2, 76],
[23, 4, 2, 82],
[23, 5, 2, 41]])
In [150]: scores[:,:,:,4,:][filter]
Out[150]:
array([ 21344., 21344., 24672., 24672., 24672., 24672., 25232.,
25232., 25232., 25232., 24672., 21152., 21152., 21152.,
21152., 21344.], dtype=float16)
In [129]: filter2[np.argsort(scores[:,:,:,4,:][filter])]
Out[129]:
array([[23, 4, 2, 31],
[23, 4, 2, 58],
[23, 4, 2, 76],
[23, 4, 2, 82],
[ 2, 13, 0, 44],
[ 2, 13, 1, 44],
[23, 5, 2, 41],
[ 4, 4, 3, 0],
[ 4, 4, 3, 1],
[ 4, 4, 3, 2],
[ 4, 4, 3, 3],
[ 4, 4, 3, 22],
[ 4, 4, 3, 6],
[ 4, 4, 3, 8],
[ 4, 4, 3, 12],
[ 4, 4, 3, 14]])
129 is my desired output, so my code works, but I'm trying to make it as fast as possible. Should I get filter2 with np.array(filter).transpose() ? Is there something even better ?
Edit, trying to put it more clearly: I want a list of indices ordered by the value they return when applied to an array. To do that, I need both the output of np.where and np.argwhere, and I'm wondering what is the fastest way to switch from one output to the other, or if there's another of getting my result.
Look at the code for argwhere:
return transpose(asanyarray(a).nonzero())
while where docs say:
where(condition, [x, y])
If only condition is given, return condition.nonzero().
In effect, both use a.nonzero(). One uses it as is, the other transposes it.
In [933]: x=np.zeros((2,3),int)
In [934]: x[[0,1,0],[0,1,2]]=1
In [935]: x
Out[935]:
array([[1, 0, 1],
[0, 1, 0]])
In [936]: x.nonzero()
Out[936]: (array([0, 0, 1], dtype=int32), array([0, 2, 1], dtype=int32))
In [937]: np.where(x) # same as nonzero()
Out[937]: (array([0, 0, 1], dtype=int32), array([0, 2, 1], dtype=int32))
In [938]: np.argwhere(x)
Out[938]:
array([[0, 0],
[0, 2],
[1, 1]], dtype=int32)
In [939]: np.argwhere(x).T
Out[939]:
array([[0, 0, 1],
[0, 2, 1]], dtype=int32)
argwhere().T is the same as where except in a 2d rather than a tuple.
np.transpose(filter) and np.array(filter).T look equally good. For a large array the time spent in nonzero is much larger than the time spent on these transformations.

Numpy index, get bands of width 2

I am wondering if there is a way it index/slice a numpy array, such that one can get every other band of 2 elements. In other words, given:
test = np.array([[1,2,3,4,5,6,7,8],[9,10,11,12,13,14,15,16]])
I would like to get the array:
[[1, 2, 5, 6],
[9, 10, 13, 14]]
Thoughts on how this can be accomplished with slicing/indexing?
Not that difficult with a few smart reshapes :)
test.reshape((4, 4))[:, :2].reshape((2, 4))
Given:
>>> test
array([[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16]])
You can do:
>>> test.reshape(-1,2)[::2].reshape(-1,4)
array([[ 1, 2, 5, 6],
[ 9, 10, 13, 14]])
Which works even for different shapes of initial arrays:
>>> test2
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> test2.reshape(-1,2)[::2].reshape(-1,4)
array([[ 1, 2, 5, 6],
[ 9, 10, 13, 14]])
>>> test3
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
>>> test3.reshape(-1,2)[::2].reshape(-1,4)
array([[ 1, 2, 5, 6],
[ 9, 10, 13, 14]])
How it works:
1. Reshape into two columns by however many rows:
>>> test.reshape(-1,2)
array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10],
[11, 12],
[13, 14],
[15, 16]])
2. Stride the array by stepping every second element
>>> test.reshape(-1,2)[::2]
array([[ 1, 2],
[ 5, 6],
[ 9, 10],
[13, 14]])
3. Set the shape you want of 4 columns, however many rows:
>>> test.reshape(-1,2)[::2].reshape(-1,4)
array([[ 1, 2, 5, 6],
[ 9, 10, 13, 14]])

remove a specific column in numpy

>>> arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
>>> arr
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
I am deleting the 3rd column as
>>> np.hstack(((np.delete(arr, np.s_[2:], 1)),(np.delete(arr, np.s_[:3],1))))
array([[ 1, 2, 4],
[ 5, 6, 8],
[ 9, 10, 12]])
Are there any better way ?
Please consider this to be a novice question.
If you ever want to delete more than one columns, you just pass indices of columns you want deleted as a list, like this:
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.delete(a, [1,3], axis=1)
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>> import numpy as np
>>> arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
>>> np.delete(arr, 2, axis=1)
array([[ 1, 2, 4],
[ 5, 6, 8],
[ 9, 10, 12]])
Something like this:
In [7]: x = range(16)
In [8]: x = np.reshape(x, (4, 4))
In [9]: x
Out[9]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [10]: np.delete(x, 1, 1)
Out[10]:
array([[ 0, 2, 3],
[ 4, 6, 7],
[ 8, 10, 11],
[12, 14, 15]])

Categories