Selecting DataFrame values based on column of indices in list

Selecting DataFrame values based on column of indices in list - python

I created a code that gets values of df based on list of indices in another column:
import numpy as np
import pandas as pd
d = {'myvalues': [11, 13, 0, -1, 10, 14], 'neighbours': [[1,2],[0,2,3],[0,1,3],[1,2,4],[3,5],[4]]}
df = pd.DataFrame(data=d)
df['neighboring_idxs'] = df['neighbours']+pd.Series(([[x] for x in df.index.values]))
df['neighboring_myvalues'] = df.apply(lambda row: df.myvalues.values[row.neighboring_idxs], axis=1)
Result is:
myvalues neighbours neighboring_idxs neighboring_myvalues
0 11 [1, 2] [1, 2, 0] [13, 0, 11]
1 13 [0, 2, 3] [0, 2, 3, 1] [11, 0, -1, 13]
2 0 [0, 1, 3] [0, 1, 3, 2] [11, 13, -1, 0]
3 -1 [1, 2, 4] [1, 2, 4, 3] [13, 0, 10, -1]
4 10 [3, 5] [3, 5, 4] [-1, 14, 10]
5 14 [4] [4, 5] [10, 14]
However on large dataset using apply is really time-consuming. Is there a smarter way to achieve the same df['neighboring_myvalues'], without using apply?

I don't know if it's faster but you can try to explode your list of list:
df['neighboring_myvalues'] = (
df.explode('neighboring_idxs').reset_index()
.assign(vals=lambda x: df.loc[x['neighboring_idxs'], 'myvalues'].tolist())
.groupby('index')['vals'].agg(list)
)
Output:
>>> df
myvalues neighbours neighboring_idxs neighboring_myvalues
0 11 [1, 2] [1, 2, 0] [13, 0, 11]
1 13 [0, 2, 3] [0, 2, 3, 1] [11, 0, -1, 13]
2 0 [0, 1, 3] [0, 1, 3, 2] [11, 13, -1, 0]
3 -1 [1, 2, 4] [1, 2, 4, 3] [13, 0, 10, -1]
4 10 [3, 5] [3, 5, 4] [-1, 14, 10]
5 14 [4] [4, 5] [10, 14]

Related

How to recursively extract values from a pandas DataFrame?

I have the following pandas DataFrame:
df = pd.DataFrame([
[3, 2, 5, 2],
[8, 5, 4, 2],
[9, 0, 8, 6],
[9, 2, 7, 1],
[1, 9, 2, 3],
[8, 1, 1, 6],
[8, 8, 0, 0],
[0, 1, 3, 0],
[2, 4, 5, 3],
[4, 0, 9, 7]
])
I am trying to write a recursive function that extracts all the possible paths up until 3 iterations:
and saves them into a list. Several attempts but no results to post.
Desired Output:
[
[0, 3, 9, 4],
[0, 3, 9, 0],
[0, 3, 9, 9],
[0, 3, 9, 7],
[0, 3, 2, 9],
[0, 3, 2, 0],
...
]
Represented as a tree, this is how it looks like:

Since you use numeric naming for both rows and columns in your dataframe, it's faster to convert the frame to a 2-D numpy array. Try this;
arr = df.to_numpy()
staging = [[0]]
result = []
while len(staging) > 0:
s = staging.pop(0)
if len(s) == 4:
result.append(s)
else:
i = s[-1]
for j in range(4):
staging.append(s + [arr[i, j]])

Element-wise multiplication of a series of two lists from separate Pandas Dataframe Series in Python

I have a dataframe where there are two series, and each contains a number of lists. I would like to perform element-wise multiplication of each list in 'List A' with the corresponding list in 'List B'.
df = pd.DataFrame({'ref': ['A', 'B', 'C', 'D'],
'List A': [ [0,1,2], [2,3,4], [3,4,5], [4,5,6] ],
'List B': [ [0,1,2], [2,3,4], [3,4,5], [4,5,6] ] })
df['New'] = df.apply(lambda x: (a*b for a,b in zip(x['List A'], x['List B'])) )
The aim is to get the following output:
print(df['New'])
0 [0, 1, 4]
1 [4, 9, 16]
2 [9, 16, 25]
3 [16, 25, 36]
Name: New, dtype: object
However I am getting the following error:
KeyError: ('List A', 'occurred at index ref')

Your code is almost there. Mostly, you need to pass axis=1 to apply:
df["new"] = df.apply(lambda x: list(a*b for a,b in zip(x['List A'], x['List B'])), axis=1)
print(df)
The output is:
ref List A List B new
0 A [0, 1, 2] [0, 1, 2] [0, 1, 4]
1 B [2, 3, 4] [2, 3, 4] [4, 9, 16]
2 C [3, 4, 5] [3, 4, 5] [9, 16, 25]
3 D [4, 5, 6] [4, 5, 6] [16, 25, 36]

You can use numpy
n [50]: df
Out[50]:
ref List A List B
0 A [0, 1, 2] [0, 1, 2]
1 B [2, 3, 4] [2, 3, 4]
2 C [3, 4, 5] [3, 4, 5]
3 D [4, 5, 6] [4, 5, 6]
In [51]: df["New"] = np.multiply(np.array(df["List A"].tolist()), np.array(df["List B"].tolist())).tolist()
In [52]: df
Out[52]:
ref List A List B New
0 A [0, 1, 2] [0, 1, 2] [0, 1, 4]
1 B [2, 3, 4] [2, 3, 4] [4, 9, 16]
2 C [3, 4, 5] [3, 4, 5] [9, 16, 25]
3 D [4, 5, 6] [4, 5, 6] [16, 25, 36]
You can also use operator module
In [63]: df
Out[63]:
ref List A List B
0 A [0, 1, 2] [0, 1, 2]
1 B [2, 3, 4] [2, 3, 4]
2 C [3, 4, 5] [3, 4, 5]
3 D [4, 5, 6] [4, 5, 6]
In [64]: import operator
In [65]: df["New"] = df.apply(lambda x:list(map(operator.mul, x["List A"], x["List B"])), axis=1)
In [66]: df
Out[66]:
ref List A List B New
0 A [0, 1, 2] [0, 1, 2] [0, 1, 4]
1 B [2, 3, 4] [2, 3, 4] [4, 9, 16]
2 C [3, 4, 5] [3, 4, 5] [9, 16, 25]
3 D [4, 5, 6] [4, 5, 6] [16, 25, 36]

Columns of each row in 2D Numpy array do not shuffle [duplicate]

Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]

EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])

More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.

I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Dataframe column of arrays to numpy array

I have a dataframe df with a single column that contains arrays of length 3. Now, I want to transform this column to a numpy array of the correct shape. However, applying np.reshape does not work. How can I do this?
Here is a brief example:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['col'])
for i in range(10):
df.loc[i,'col'] = np.zeros(3)
arr = np.array(df['col'])
np.reshape(arr, (10,3)) # This does not work

Here are two approaches using np.vstack and np.concatenate -
np.vstack(df.col)
np.concatenate(df.col).reshape(df.shape[0],-1) # for performance
For best performance, we could use the underlying data with df.col.values instead.
Sample run -
In [116]: df
Out[116]:
col
0 [7, 5, 2]
1 [1, 1, 3]
2 [6, 1, 4]
3 [7, 0, 0]
4 [8, 8, 0]
5 [7, 8, 0]
6 [0, 5, 8]
7 [8, 3, 1]
8 [6, 6, 8]
9 [8, 2, 3]
In [117]: np.vstack(df.col)
Out[117]:
array([[7, 5, 2],
[1, 1, 3],
[6, 1, 4],
[7, 0, 0],
[8, 8, 0],
[7, 8, 0],
[0, 5, 8],
[8, 3, 1],
[6, 6, 8],
[8, 2, 3]])

subtraction operation on multidimensional arrays

I have a list.
l = [[1, 2, 8] [8, 2, 7] [7, 2, 5]]
I want first element to be zero and then I need to subtract values columnwise.
explanation :
1 2 8
8 2 7
7 2 5
subtraction as,
0 1 6
0 -6 5
0 -5 3
I want output as :
l = [[0, 1, 6], [0, -6, 5], [0, -5, 3]]
which is the faster way to perform this operation if I have large list?
I am using numpy but I changed here so that easy to understand
my numpy array object is
l = [[1 2 8] [8 2 7] [7 2 5]]

>>> l = np.array([[1, 2, 8], [8, 2, 7], [7, 2, 5]])
>>> l[:, 1:] -= l[:, :-1]
>>> l[:, 0] = 0
>>> l
array([[ 0, 1, 6],
[ 0, -6, 5],
[ 0, -5, 3]])

Using numpy.insert and numpy.diff:
>>> import numpy as np
>>> a = np.array([[1, 2, 8], [8, 2, 7], [7, 2, 5]])
>>> np.insert(np.diff(a), 0, 0, axis=1)
array([[ 0, 1, 6],
[ 0, -6, 5],
[ 0, -5, 3]])

Without numpy, you can get away with this
l = [[1, 2, 8], [8, 2, 7], [7, 2, 5]]
def minus(rest, val):
rest[-1] -= val
rest.append(val)
return rest
def myReduce(l):
l2 = reduce(minus, l[-2::-1], [l[-1]])
l2.reverse()
l2[0] = 0
return l2
l2 = map(myReduce, l)
print l2
I guess it's quite straightforward and easy to understand.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selecting DataFrame values based on column of indices in list - python

Related

How to recursively extract values from a pandas DataFrame?

Element-wise multiplication of a series of two lists from separate Pandas Dataframe Series in Python

Columns of each row in 2D Numpy array do not shuffle [duplicate]

Dataframe column of arrays to numpy array

subtraction operation on multidimensional arrays

Categories

Resources