Conditional reduce - python

I would like to reduce a variable number of elements (or slices) of an array multiple times, and put the result into a new array. Kind of like a masked np.apply_along_axis, but we stay in numpy
For example, to reduce by mean:
to_reduce = np.array([
[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 0]]).astype(np.bool8)
arr = np.array([
[1.0, 2.0, 3.0],
[1.0, 2.0, 4.0],
[2.0, 2.0, 3.0],
[2.0, 2.0, 4.0],
[1.0, 0.0, 3.0]])
I want:
np.array([
[1.5, 2.0, 3.5],
[1.5, 1.0, 3.5],
[1.33333, 1.33333, 3.0],
[1.5, 2.0, 3.5]])
The slow way would be:
out = np.empty((4, 3))
for j, mask in enumerate(to_reduce):
out[j] = np.mean(arr[mask], axis=0)

Here's one simple and efficient way with matrix-multiplication -
In [56]: to_reduce.dot(arr)/to_reduce.sum(1)[:,None]
Out[56]:
array([[1.5 , 2. , 3.5 ],
[1.5 , 1. , 3.5 ],
[1.33333333, 1.33333333, 3. ],
[1.5 , 2. , 3.5 ]])

Related

calculate the average of each dimension defining the group in python

I have a dataframe (df) that has three columns (user, vector, and group name), the vector column with multiple comma-separated values in each row.
df = pd.DataFrame({'user': ['user_1', 'user_2', 'user_3', 'user_4', 'user_5', 'user_6'], 'vector': [[1, 0, 2, 0], [1, 8, 0, 2],[6, 2, 0, 0], [5, 0, 2, 2], [3, 8, 0, 0],[6, 0, 0, 2]], 'group': ['A', 'B', 'C', 'B', 'A', 'A']})
I would like to calculate for each group, the sum of dimensions in all rows divided by the total number of rows for this group.
For example:
For group, A is [(1+3+6)/3, (0+8+0)/3, (2+0+0)/3, (0+0+2)/3] = [3.3, 2.6, 0.6, 0.6].
For group, B is [(1+5)/2, (8+0)/2, (0+2)/2, (2+2)/2] = [3,4,1,2].
For group, C is [6, 2, 0, 0]
So, the expected result is an array:
group A: [3.3, 2.6, 0.6, 0.6]
group B: [3,4,1,2]
group C: [6, 2, 0, 0]
I'm not sure if you were looking for the results stored in a single array/dataframe, or if you're just looking to get the results as separate arrays.
If the latter, something like this should work for you:
for group in df.group.unique():
print(f'Group {group} results: ')
tmp_df = pd.DataFrame(df[df.group==group]['vector'].tolist())
print(tmp_df.mean().values)
Output:
Group A results:
[3.33333333 2.66666667 0.66666667 0.66666667]
Group B results:
[3. 4. 1. 2.]
Group C results:
[6. 2. 0. 0.]
It's a little clunky, but gets the job done if you're just looking to get the results.
Filters the dataframe based on group, then turns the vectors of that into it's own tmp_df and gets the mean for each column.
If you want you could easily take those arrays and save them for further manipulation or what have you.
Hope that helps!
Take advantage of numpy:
import numpy as np
out = (df.groupby('group')['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
print(out)
Output:
group
A [3.33, 2.67, 0.67, 0.67]
B [3.0, 4.0, 1.0, 2.0]
C [6.0, 2.0, 0.0, 0.0]
Name: vector, dtype: object
as DataFrame
out = (df.groupby('group', as_index=False)['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
Output:
group vector
0 A [3.33, 2.67, 0.67, 0.67]
1 B [3.0, 4.0, 1.0, 2.0]
2 C [6.0, 2.0, 0.0, 0.0]
as array
out = np.vstack(df.groupby('group')['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
Output:
[[3.33 2.67 0.67 0.67]
[3. 4. 1. 2. ]
[6. 2. 0. 0. ]]

Selecting at different column index for each row in tensor

I have a pytorch tensor
t = torch.tensor(
[[1.0, 1.5, 0.5, 2.0],
[5.0, 3.0, 4.5, 5.5],
[0.5, 1.0, 3.0, 2.0]]
)
t[:, [-1]] gives me last column value of each row:
tensor([[2.0000],
[5.5000],
[2.0000]])
However, I want to slice values at different columns per row. For example, in t for the 1st, 2nd and 3rd row, I want to slice at 2, -1, 0 index respectively to get the following tensor:
tensor([[0.5],
[5.5],
[0.5]])
How can I do it in torch?
t[[i for i in range(3)], [2, -1, 0]]
The list comprehension creates a list filled with row indexes, then you specify the column index for every row.
you can use the following:
t = torch.tensor(
[[1.0, 1.5, 0.5, 2.0],
[5.0, 3.0, 4.5, 5.5],
[0.5, 1.0, 3.0, 2.0]]
)
t
>tensor([[1.0000, 1.5000, 0.5000, 2.0000],
[5.0000, 3.0000, 4.5000, 5.5000],
[0.5000, 1.0000, 3.0000, 2.0000]])
rows = [0, 1, 2]
cols = [2, -1, 0]
t[rows, cols]
>tensor([0.5000, 5.5000, 0.5000])

How to pass argument of type char ** from Python to C API [duplicate]

As seen here How do I convert a Python list into a C array by using ctypes? this code will take a python array and transform it to a C array.
import ctypes
arr = (ctypes.c_int * len(pyarr))(*pyarr)
Which would the way of doing the same with a list of lists or a lists of lists of lists?
For example, for the following variable
list3d = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
I have tried the following with no luck:
([[ctypes.c_double * 4] *2]*3)(*list3d)
# *** TypeError: 'list' object is not callable
(ctypes.c_double * 4 *2 *3)(*list3d)
# *** TypeError: expected c_double_Array_4_Array_2 instance, got list
Thank you!
EDIT: Just to clarify, I am trying to get one object that contains the whole multidimensional array, not a list of objects. This object's reference will be an input to a C DLL that expects a 3D array.
It works with tuples if you don't mind doing a bit of conversion first:
from ctypes import *
list3d = [
[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0]],
[[0.2, 1.2, 2.2, 3.2], [4.2, 5.2, 6.2, 7.2]],
[[0.4, 1.4, 2.4, 3.4], [4.4, 5.4, 6.4, 7.4]],
]
arr = (c_double * 4 * 2 * 3)(*(tuple(tuple(j) for j in i) for i in list3d))
Check that it's initialized correctly in row-major order:
>>> (c_double * 24).from_buffer(arr)[:]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,
0.2, 1.2, 2.2, 3.2, 4.2, 5.2, 6.2, 7.2,
0.4, 1.4, 2.4, 3.4, 4.4, 5.4, 6.4, 7.4]
Or you can create an empty array and initialize it using a loop. enumerate over the rows and columns of the list and assign the data to a slice:
arr = (c_double * 4 * 2 * 3)()
for i, row in enumerate(list3d):
for j, col in enumerate(row):
arr[i][j][:] = col
I made the change accordingly
a = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
arr = (((ctypes.c_float * len(a[0][0])) * len(a[0])) * len(a))
arr_instance=arr()
for i in range(0,len(a)):
for j in range(0,len(a[0])):
for k in range(0,len(a[0][0])):
arr_instance[i][j][k]=a[i][j][k]
The arr_instance is what you want.

how to merge the values of a list of lists and a list into 1 resulting list of lists

I have a list of lists (a) and a list (b) which have the same "length" (in this case "4"):
a = [
[1.0, 2.0],
[1.1, 2.1],
[1.2, 2.2],
[1.3, 2.3]
]
b = [3.0, 3.1, 3.2, 3.3]
I would like to merge the values to obtain the following (c):
c = [
[1.0, 2.0, 3.0],
[1.1, 2.1, 3.1],
[1.2, 2.2, 3.2],
[1.3, 2.3, 3.3]
]
currently I'm doing the following to achieve it:
c = []
for index, elem in enumerate(a):
x = [a[index], [b[index]]] # x assigned here for better readability
c.append(sum(x, []))
my feeling is that there is an elegant way to do this...
note: the lists are a lot larger, for simplicity I shortened them. they are always(!) of the same length.
In python3.5+ use zip() within a list comprehension and in-place unpacking:
In [7]: [[*j, i] for i, j in zip(b, a)]
Out[7]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
In python 2 :
In [8]: [j+[i] for i, j in zip(b, a)]
Out[8]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
Or use numpy.column_stack in numpy:
In [16]: import numpy as np
In [17]: np.column_stack((a, b))
Out[17]:
array([[ 1. , 2. , 3. ],
[ 1.1, 2.1, 3.1],
[ 1.2, 2.2, 3.2],
[ 1.3, 2.3, 3.3]])

How to rewrite the value of a certain element in a nested list in python?

M = [[3.5, 1.0, 9.2, 4.0], [0, 0, 0, 0], [3.0, 1.0, 8.0, -2.0]]
c_idx = 2
for count4 in range(len(M)):
for count5 in range(len(M[count4])):
if M[count4].index(M[count4][count5]) == c_idx :
M[count4] = M[count4][ :c_idx] + [0] + M[count4][c_idx+1 : ]
count4 += 1
count5 += 1
print(M)
So I'm trying to rewrite an element of a certain position for the list,M. But it shows me an error:
if M[count4].index(M[count4][count5]) == c_idx :
IndexError: list index out of range
The result should be like this:
[[3.5, 1.0, 0, 4.0], [0, 0, 0, 0], [3.0, 1.0, 0, -2.0]]
I don't see where i'm doing wrong. Help me out folks!
just remove count4 +=1 and count5 +=1
M = [[3.5, 1.0, 9.2, 4.0], [0, 0, 0, 0], [3.0, 1.0, 8.0, -2.0]]
c_idx = 2
for count4 in range(len(M)):
for count5 in range(len(M[count4])):
if M[count4].index(M[count4][count5]) == c_idx :
M[count4] = M[count4][ :c_idx] + [0] + M[count4][c_idx+1 : ]
print(M)
[[3.5, 1.0, 0, 4.0], [0, 0, 0, 0], [3.0, 1.0, 0, -2.0]]
for/loop in range already did +=1 for you.
There are much cleaner approaches mentioned in other answers, though.
def replaceElement(l, index, element):
for row in l:
row[index] = element
return l
M = [[3.5, 1.0, 9.2, 4.0], [0, 0, 0, 0], [3.0, 1.0, 8.0, -2.0]]
c_idx = 2
M = replaceElement(M, c_idx, 0)
>>> M
[[3.5, 1.0, 0, 4.0], [0, 0, 0, 0], [3.0, 1.0, 0, -2.0]]
M = [[3.5, 1.0, 9.2, 4.0], [0, 0, 0, 0], [3.0, 1.0, 8.0, -2.0]]
c_idx = 2
for sublist in M:
sublist[c_idx] = 0 # change the third element to 0 in each sublist
print(M)

Categories