Related
I have a dataframe (df) that has three columns (user, vector, and group name), the vector column with multiple comma-separated values in each row.
df = pd.DataFrame({'user': ['user_1', 'user_2', 'user_3', 'user_4', 'user_5', 'user_6'], 'vector': [[1, 0, 2, 0], [1, 8, 0, 2],[6, 2, 0, 0], [5, 0, 2, 2], [3, 8, 0, 0],[6, 0, 0, 2]], 'group': ['A', 'B', 'C', 'B', 'A', 'A']})
I would like to calculate for each group, the sum of dimensions in all rows divided by the total number of rows for this group.
For example:
For group, A is [(1+3+6)/3, (0+8+0)/3, (2+0+0)/3, (0+0+2)/3] = [3.3, 2.6, 0.6, 0.6].
For group, B is [(1+5)/2, (8+0)/2, (0+2)/2, (2+2)/2] = [3,4,1,2].
For group, C is [6, 2, 0, 0]
So, the expected result is an array:
group A: [3.3, 2.6, 0.6, 0.6]
group B: [3,4,1,2]
group C: [6, 2, 0, 0]
I'm not sure if you were looking for the results stored in a single array/dataframe, or if you're just looking to get the results as separate arrays.
If the latter, something like this should work for you:
for group in df.group.unique():
print(f'Group {group} results: ')
tmp_df = pd.DataFrame(df[df.group==group]['vector'].tolist())
print(tmp_df.mean().values)
Output:
Group A results:
[3.33333333 2.66666667 0.66666667 0.66666667]
Group B results:
[3. 4. 1. 2.]
Group C results:
[6. 2. 0. 0.]
It's a little clunky, but gets the job done if you're just looking to get the results.
Filters the dataframe based on group, then turns the vectors of that into it's own tmp_df and gets the mean for each column.
If you want you could easily take those arrays and save them for further manipulation or what have you.
Hope that helps!
Take advantage of numpy:
import numpy as np
out = (df.groupby('group')['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
print(out)
Output:
group
A [3.33, 2.67, 0.67, 0.67]
B [3.0, 4.0, 1.0, 2.0]
C [6.0, 2.0, 0.0, 0.0]
Name: vector, dtype: object
as DataFrame
out = (df.groupby('group', as_index=False)['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
Output:
group vector
0 A [3.33, 2.67, 0.67, 0.67]
1 B [3.0, 4.0, 1.0, 2.0]
2 C [6.0, 2.0, 0.0, 0.0]
as array
out = np.vstack(df.groupby('group')['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
Output:
[[3.33 2.67 0.67 0.67]
[3. 4. 1. 2. ]
[6. 2. 0. 0. ]]
As seen here How do I convert a Python list into a C array by using ctypes? this code will take a python array and transform it to a C array.
import ctypes
arr = (ctypes.c_int * len(pyarr))(*pyarr)
Which would the way of doing the same with a list of lists or a lists of lists of lists?
For example, for the following variable
list3d = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
I have tried the following with no luck:
([[ctypes.c_double * 4] *2]*3)(*list3d)
# *** TypeError: 'list' object is not callable
(ctypes.c_double * 4 *2 *3)(*list3d)
# *** TypeError: expected c_double_Array_4_Array_2 instance, got list
Thank you!
EDIT: Just to clarify, I am trying to get one object that contains the whole multidimensional array, not a list of objects. This object's reference will be an input to a C DLL that expects a 3D array.
It works with tuples if you don't mind doing a bit of conversion first:
from ctypes import *
list3d = [
[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0]],
[[0.2, 1.2, 2.2, 3.2], [4.2, 5.2, 6.2, 7.2]],
[[0.4, 1.4, 2.4, 3.4], [4.4, 5.4, 6.4, 7.4]],
]
arr = (c_double * 4 * 2 * 3)(*(tuple(tuple(j) for j in i) for i in list3d))
Check that it's initialized correctly in row-major order:
>>> (c_double * 24).from_buffer(arr)[:]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,
0.2, 1.2, 2.2, 3.2, 4.2, 5.2, 6.2, 7.2,
0.4, 1.4, 2.4, 3.4, 4.4, 5.4, 6.4, 7.4]
Or you can create an empty array and initialize it using a loop. enumerate over the rows and columns of the list and assign the data to a slice:
arr = (c_double * 4 * 2 * 3)()
for i, row in enumerate(list3d):
for j, col in enumerate(row):
arr[i][j][:] = col
I made the change accordingly
a = [[[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]], [[40.0, 1.2, 6.0, 0.3], [50.0, 4.2, 0, 0]]]
arr = (((ctypes.c_float * len(a[0][0])) * len(a[0])) * len(a))
arr_instance=arr()
for i in range(0,len(a)):
for j in range(0,len(a[0])):
for k in range(0,len(a[0][0])):
arr_instance[i][j][k]=a[i][j][k]
The arr_instance is what you want.
I have list, A, containing 8 values. I like to make combinations first, then add two more points in list B to each combination. Here is my code:
def combination(arr, r):
return list(itertools.combinations(arr, r))
A = [[0.0, 0.0, 0.0], [0.0, 0.5, 0.0], [0.5, 0.0, 0.5], [0.5, 0.5, 0.5], [0.0, 0.25, 0.0], [0.0, 0.7499999819999985, 0.0], [0.5, 0.25, 0.5], [0.5, 0.7499999819999985, 0.5]]
B = [[0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
n = 1 #can be change
com = combination(A, n)
for item in com:
item.extend(B)
print(item)
But I got an error:
AttributeError: 'tuple' object has no attribute 'extend'
Expected results:
[[0.0, 0.0, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.0, 0.5, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.0, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.5, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.0, 0.25, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.0, 0.7499999819999985, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.25, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.7499999819999985, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
Type tuple (return by combinations) is immutable, you may use a list and populate it with the item and B
com = combination(A, n)
com = [[*item, *B] for item in com]
Or return list of list from your combination
def combination(arr, r):
return [list(c) for c in itertools.combinations(arr, r)]
# ...
for item in com:
item.extend(B)
You can use also map function, to get list of list.
map() function returns a map object(which is an iterator) of the results after applying the given function to each item of a given iterable (in your case list)
instead of:
com = combination(A, n)
use:
com = map(list, combination(A, n))
I have two lists of lists, sorted with respect to the first item of each inner list (represents timestamp) , containing data like this [[time0, voltage0],[time1,voltage1],....]
l1 =[[0,0],[1,1],[2,2],[3,3]]
l2 =[[0,0],[0.5,0.5],[1,1.2],[1.5,1.5],[2,2]]
the goal is to produce a single list of lists containing the elements from both lists and sorted with respect to the first item of the inner lists BUT
if there is an item which his timestamp is the same in both lists, the final list will contain the item from the other list.
for the example above the output should be:
result = [[0,0],[0,5,0.5],[1,1],[1.5,1.5],[2,2],[3,3]]
I've tried to save a reference in each element which will specify from which list the element came and then go over the list to find duplicates and delete those who came from the second list but finding duplicates isn't working since ["first",0,0] isn't a duplicate of ["second",0,0]
# examples of lists
import itemgetter
lFirst = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0], [4.0, 4.0], [5.0, 5.0]]
lSecond = [[0.0, 0.0], [0.5, 0.5], [1.0, 1.2], [1.5, 1.5], [2.0, 2.0], [2.5, 2.5], [3.0, 3.0], [3.5, 3.5], [4.0, 4.0], [4.5, 4.5]]
print "first list: {}".format(lFirst)
print "second list: {}".format(lSecond)
res = sorted(lFirst+lSecond , key = itemgetter(0))
print res
One way is to concatenate your lists, with l2 coming first. Then create a dictionary and sort the items():
print([list(x) for x in sorted(dict(l2 + l1).items())])
#[[0, 0], [0.5, 0.5], [1, 1], [1.5, 1.5], [2, 2], [3, 3]]
This works because dictionary keys are unique. You start with a key-value pair from l2, but if the key (timestamp) also exists in l1 it gets updated.
You could remove all duplicates from the second list before merging.
lFirst = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0], [4.0, 4.0], [5.0, 5.0]]
lSecond = [[0.0, 0.0], [0.5, 0.5], [1.0, 1.2], [1.5, 1.5], [2.0, 2.0], [2.5, 2.5], [3.0, 3.0], [3.5, 3.5], [4.0, 4.0], [4.5, 4.5]]
print("first list: {0}".format(lFirst))
print("second list: {0}".format(lSecond))
lFirstTimes = [x[0] for x in lFirst]
lSecondFiltered = [x for x in lSecond if x[0] not in lFirstTimes]
print("second list without duplicates: {0}".format(lSecondFiltered))
res = lFirst+lSecondFiltered
res.sort()
print(res)
You can use heapq.merge (doc) to merge the lists and itertools.grouby (doc) to group the elements.
The list which is first in merge() will get priority:
l1 = [[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0], [4.0, 4.0], [5.0, 5.0]]
l2 = [[0.0, 0.0], [0.5, 0.5], [1.0, 1.2], [1.5, 1.5], [2.0, 2.0], [2.5, 2.5], [3.0, 3.0], [3.5, 3.5], [4.0, 4.0], [4.5, 4.5]]
from heapq import merge
from itertools import groupby
out = [next(g) for _, g in groupby(merge(l1, l2, key=lambda k: k[0]), lambda k: k[0])]
from pprint import pprint
pprint(out)
Prints:
[[0.0, 0.0],
[0.5, 0.5],
[1.0, 1.0],
[1.5, 1.5],
[2.0, 2.0],
[2.5, 2.5],
[3.0, 3.0],
[3.5, 3.5],
[4.0, 4.0],
[4.5, 4.5],
[5.0, 5.0]]
EDIT: Works in Python3.5+ (In Python2.7 merge() doesn't have key= argument)
I would like to reduce a variable number of elements (or slices) of an array multiple times, and put the result into a new array. Kind of like a masked np.apply_along_axis, but we stay in numpy
For example, to reduce by mean:
to_reduce = np.array([
[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 0]]).astype(np.bool8)
arr = np.array([
[1.0, 2.0, 3.0],
[1.0, 2.0, 4.0],
[2.0, 2.0, 3.0],
[2.0, 2.0, 4.0],
[1.0, 0.0, 3.0]])
I want:
np.array([
[1.5, 2.0, 3.5],
[1.5, 1.0, 3.5],
[1.33333, 1.33333, 3.0],
[1.5, 2.0, 3.5]])
The slow way would be:
out = np.empty((4, 3))
for j, mask in enumerate(to_reduce):
out[j] = np.mean(arr[mask], axis=0)
Here's one simple and efficient way with matrix-multiplication -
In [56]: to_reduce.dot(arr)/to_reduce.sum(1)[:,None]
Out[56]:
array([[1.5 , 2. , 3.5 ],
[1.5 , 1. , 3.5 ],
[1.33333333, 1.33333333, 3. ],
[1.5 , 2. , 3.5 ]])