I know you're supposed to give examples when you ask questions here, but I can't really think of anything that wouldn't involve pasting a massive project worth of code, so I'll just try to describe this as well as possible.
I'm working on a project that involves using keypoints generated by using OpenPose (after I've done some preprocessing on them to simplify everything, I come up with data formatted like this: [x0, y0, c0, x1, y1, c1...], where there are 18 points total, and the x's and y's represent their coordinates, while the c's represent confidence.) I want to take a nested list that has the keypoints for a single person listed in the above manner for each frame, and output a new nested list of lists, made up of the weighted average x's and y's (the weights would be the confidence values for each point) along with the average confidences by each second (instead of by frame), in the same format as above.
I have already converted the original list into a 3-dimensional list, with each second holding each of its frames, each of which holds its keypoint list. I know that I can write code myself to do all of this without using numpy.average(), but I was hoping that I wouldn't have to, because it quickly becomes confusing. Instead, I was wondering if there were a way I could iterate over each second, using said method, in a reasonably simple manner, and just append the resulting lists to a new list, like this:
out = []
for second in lst:
out.append(average(second, axis=1, weights=?, other params?))
Again, I'm sorry for not giving an example of some sort.
Maybe you could get some inspiration from this code:
import numpy as np
def pose_average(sequence):
x, y, c = sequence[0::3], sequence[1::3], sequence[2::3]
x_avg = np.average(x, weights=c)
y_avg = np.average(y, weights=c)
return x_avg, y_avg
sequence = [2, 4, 1, 5, 6, 3, 5, 2, 1]
pose_average(sequence)
>>> (4.4, 4.8)
For multiple sequences of grouped poses:
data = [[1, 2, 3, 2, 3, 4, 3, 4, 5], [1, 2, 3, 4, 5, 6, 7, 8, 9], [4, 1, 2, 5, 3, 3, 4, 1, 2]]
out = [ pose_average(seq) for seq in data ]
out
>>> [(2.1666666666666665, 3.1666666666666665),
(5.0, 6.0),
(4.428571428571429, 1.8571428571428572)]
Edit
By assuming that:
data is a list of sequence
a sequence is a list of grouped poses (for example grouped by seconds)
a pose is the coordinates of the joins positions: [x1, y1, c1, x2, y2, c2, ...]
the slightly modified code is now:
import numpy as np
data = [
[[1, 2, 3, 2, 3, 4, 3, 4, 5], [9, 2, 3, 4, 5, 6, 7, 8, 9], [4, 1, 2, 5, 3, 3, 4, 1, 2], [5, 3, 4, 1, 10, 6, 5, 0, 0]],
[[6, 9, 11, 0, 8, 6, 1, 5, 11], [3, 5, 4, 2, 0, 2, 0, 8, 8], [1, 5, 9, 5, 1, 0, 6, 6, 6]],
[[9, 4, 7, 0, 2, 1], [9, 4, 7, 0, 2, 1], [9, 4, 7, 0, 2, 1]]
]
def pose_average(sequence):
sequence = np.asarray(sequence)
x, y, c = sequence[:, 0::3], sequence[:, 1::3], sequence[:, 2::3]
x_avg = np.average(x, weights=c, axis=0)
y_avg = np.average(y, weights=c, axis=0)
return x_avg, y_avg
out = [ pose_average(seq) for seq in data ]
out
>>> [(array([4.83333333, 2.78947368, 5.375 ]),
array([2.16666667, 5.84210526, 5.875 ])),
(array([3.625, 0.5 , 1.88 ]), array([6.83333333, 6. , 6.2 ])),
(array([9., 0.]), array([4., 2.]))]
x_avg is now the list of x position averaged over the sequence for each point and weight by c.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have two 1D arrays of the same length as this:
import numpy as np
a = np.array([1, 1, 1, 2, 2, 3, 4, 5])
b = np.array([7, 7, 8, 8, 9, 8, 10, 10])
The value of a is increasing while b is random.
I wanna pair them by their values following the steps below:
Pick the first unique value ([1]) of array a and get the unique numbers ([7, 8]) of array b at the same index.
If some paired numbers ([8]) appear again in b, then pick the number at the same index of a.
Then, some new paired number ([2]) which appears again in a, the numbers in b at the same index are selected.
Finally, the result should be:
[1, 2, 3] is paired with [7, 8, 9]
[4, 5] is paired with [10]
It looks like there is no easy way for a vectorised (no looping) solution since it's a graph theory problem of finding connected components. If you still want to have a performant script that works fast on big data, you could use igraph library which is written in C.
TL;DR
I assume your input corresponds to edges of some graph:
>>> np.transpose([a, b])
array([[ 1, 7],
[ 1, 7],
[ 1, 8],
[ 2, 8],
[ 2, 9],
[ 3, 8],
[ 4, 10],
[ 5, 10]])
So your vertices are:
>>> np.unique(np.transpose([a, b]))
array([ 1, 2, 3, 4, 5, 7, 8, 9, 10])
And you would be quite happy (at least at the beginning) to recognise communities, like:
tags = np.transpose([a, b, communities])
>>> tags
array([[ 1, 7, 0],
[ 1, 7, 0],
[ 1, 8, 0],
[ 2, 8, 0],
[ 2, 9, 0],
[ 3, 8, 0],
[ 4, 10, 1],
[ 5, 10, 1]])
so that you have vertices (1, 2, 3, 7, 8, 9) included in community number 0 and vertices (4, 5, 10) included in community number 1.
Unfortunately, igraph doesn't support labeling graph nodes from 1 to 10 or any gaps of ids in labels. It must start from 0 and have no gaps in ids. So you need to store initial indices and after that relabel vertices so that edges are:
vertices_old, inv = np.unique(np.transpose([a,b]), return_inverse=True)
edges_new = inv.reshape(-1, 2)
>>> vertices_old
array([ 1, 2, 3, 4, 5, 7, 8, 9, 10]) #new ones are: [0, 1, 2, ..., 8]
>>> edges_new
array([[0, 5],
[0, 5],
[0, 6],
[1, 6],
[1, 7],
[2, 6],
[3, 8],
[4, 8]], dtype=int64)
The next step is to find communities using igraph (pip install python-igraph). You can run the following:
import igraph as ig
graph = ig.Graph(edges = edges_new)
communities = graph.clusters().membership #type: list
communities = np.array(communities)
>>> communities
array([0, 0, 0, 1, 1, 0, 0, 0, 1]) #tags of nodes [1 2 3 4 5 7 8 9 10]
And then retrieve tags of source vertices (as well as tags of target vertices):
>>> communities = communities[edges_new[:, 0]] #or [:, 1]
array([0, 0, 0, 0, 0, 0, 1, 1])
After you find communities, the second part of solution appears to be a typical groupby problem. You can do it in pandas:
import pandas as pd
def get_part(source, communities):
part_edges = np.transpose([source, communities])
part_idx = pd.DataFrame(part_edges).groupby([1]).indices.values() #might contain duplicated source values
part = [np.unique(source[idx]) for idx in part_idx]
return part
>>> get_part(a, communities), get_part(b, communities)
([array([1, 2, 3]), array([4, 5])], [array([7, 8, 9]), array([10])])
Final Code
import igraph as ig
import numpy as np
import pandas as pd
def get_part(source, communities):
'''find set of nodes for each community'''
part_edges = np.transpose([source, communities])
part_idx = pd.DataFrame(part_edges).groupby([1]).indices.values() #might contain duplicated source values
part = [np.unique(source[idx]) for idx in part_idx]
return part
a = np.array([1, 1, 1, 2, 2, 3, 4, 5])
b = np.array([7, 7, 8, 8, 9, 8, 10, 10])
vertices_old, inv = np.unique(np.transpose([a,b]), return_inverse=True)
edges_new = inv.reshape(-1, 2)
graph = ig.Graph(edges = edges_new)
communities = np.array(graph.clusters().membership)
communities = communities[edges_new[:,0]] #or communities[edges_new[:,1]]
>>> get_part(a, communities), get_part(b, communities)
([array([1, 2, 3]), array([4, 5])], [array([7, 8, 9]), array([10])])
I tried doing this by iterating both the arrays simultaneously and keeping track of what element is associated with which index of the result. Let me know if this works for you?
a = [1, 1, 1, 2, 2, 3, 4, 5]
b = [7, 7, 8, 8, 9, 8, 10, 10]
tracker_a = dict()
tracker_b = dict()
result = []
index = 0
for elem_a, elem_b in zip(a, b):
if elem_a in tracker_a:
result[tracker_a[elem_a]][1].add(elem_b)
tracker_b[elem_b] = tracker_a[elem_a]
elif elem_b in tracker_b:
result[tracker_b[elem_b]][0].add(elem_a)
tracker_a[elem_a] = tracker_b[elem_b]
else:
tracker_a[elem_a] = index
tracker_b[elem_b] = index
result.append([{elem_a}, {elem_b}])
index += 1
print(result)
Output:
[[{1, 2, 3}, {8, 9, 7}], [{4, 5}, {10}]]
Complexity: O(n)
I have a multi-dimensional array in Python where there may be a repeated integer within a vector in the array. For example.
array = [[1,2,3,4],
[2,9,12,4],
[5,6,7,8],
[6,8,12,13]]
I would like to completely remove the vectors that contain any element that has appeared previously. In this case, vector [2,9,12,4] and vector [6,11,12,13] should be removed because they have an element (2 and 6 respectively) that has appeared in a previous vector within that array. Note that [6,8,12,13] contains two elements that have appeared previously, so the code should be able to work with these scenarios as well.
The resulting array should end up being:
array = [[1,2,3,4],
[5,6,7,8]]
I thought I could achieve this with np.unique(array, axis=0), but I couldnt find another function that would take care of this particular uniqueness.
Any thoughts are appreaciated.
You can work with array of sorted numbers and corresponding indices of rows that looks like so:
number_info = array([[ 0, 1],
[ 0, 2],
[ 1, 2],
[ 0, 3],
[ 0, 4],
[ 1, 4],
[ 2, 5],
[ 2, 6],
[ 3, 6],
[ 2, 7],
[ 2, 8],
[ 3, 8],
[ 1, 9],
[ 1, 12],
[ 3, 12],
[ 3, 13]])
It indicates that rows remove_idx = [2, 5, 8, 11, 14] of this array needs to be removed and it points to rows rows_idx = [1, 1, 3, 3, 3] of the original array. Now, the code:
flat_idx = np.repeat(np.arange(array.shape[0]), array.shape[1])
number_info = np.transpose([flat_idx, array.ravel()])
number_info = number_info[np.argsort(number_info[:,1])]
remove_idx = np.where((np.diff(number_info[:,1])==0) &
(np.diff(number_info[:,0])>0))[0] + 1
remove_rows = number_info[remove_idx, 0]
output = np.delete(array, remove_rows, axis=0)
Output:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Here's a quick way to do it with a list comprehension and set intersections:
>>> array = [[1,2,3,4],
... [2,9,12,4],
... [5,6,7,8],
... [6,8,12,13]]
>>> [v for i, v in enumerate(array) if not any(set(a) & set(v) for a in array[:i])]
[[1, 2, 3, 4], [5, 6, 7, 8]]
Struggling to describe this issue in words, but have a seemingly simple issue I can't find an answer for.
I want to create an array using values from one list/array and indices from another. I want the shape of the new array to be the same as the index array.
import numpy as np
a = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2])
b = [[0, 1], [2, 3, 4], [6, 7, 8, 9, 10]]
result = func(a, b) #some function or operator...
print(result)
[[9, 8], [7, 6, 5], [3, 2, 1, 0, -1]]
Thank you! :)
EDIT:
Good solutions so far, but I would rather do this without a for loop as we are looking at hundreds of thousands of rows and need to keep computing time down. Thanks again :)
You can use a list comprehension:
>>> [a[x[0]:x[-1]+1] for x in b]
[array([9, 8]), array([7, 6, 5]), array([ 3, 2, 1, 0, -1])]
EDIT: Your question indicates that you want a faster option, so you might test the following script to see which is faster for your Python installation:
#!/usr/bin/env python
import timeit
setup = '''
import numpy as np
a = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2])
b = [[0, 1], [2, 3, 4], [6, 7, 8, 9, 10]]
'''
test1 = '''
def test():
return [a[x[0]:x[-1]+1] for x in b]
'''
test2 = '''
def test():
return [a[idx] for idx in b]
'''
print(timeit.timeit(setup = setup,
stmt = test1,
number = 1000000))
print(timeit.timeit(setup = setup,
stmt = test2,
number = 1000000))
On my machine, the two approaches given you so far run about the same, but hpaulj's answer might be very slightly faster (unless Python is caching data behind the scenes), which may be of more use to you in production. Test it out locally and see if you get a similar or different answer.
Just apply each indexing sublist to a:
In [483]: a = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2])
...:
...: b = [[0, 1], [2, 3, 4], [6, 7, 8, 9, 10]]
...:
...:
In [484]: [a[idx] for idx in b]
Out[484]: [array([9, 8]), array([7, 6, 5]), array([ 3, 2, 1, 0, -1])]
The sublists differ in length, so the result cannot be made into a 2d array - it has to remain a list (or if you insist 1d object dtype array).
I am trying to write a python function to remove hot-pixels in 2D image data. I am trying to make function that will take the mean for the neighbors around each element in the 2D array and conditionally overwrite that element if its value exceeds the mean of its neighbors by a specific amount (for example 3 sigma). This is where I am:
def myFunction(values):
if np.mean(values) + 3*np.std(values) < origin:
return np.mean(values)
footprint = np.array([[1,1,1],
[1,0,1],
[1,1,1]])
correctedData = ndimage.generic_filter(data, myFunction, footprint = footprint)
'origin' in the above code is demonstrative. I know it isn't correct, I am just trying to show what I am trying to do. Is there a way to pass the value of the current element to the generic_function?
Thanks!
Your footprint is not passing the central value back to your function.
I find it easier to use size (equivalent to using all ones in the footprint), then deal with everything in the callback function. So in your case I'd extract the central value inside the callback function. Something like this:
from scipy.ndimage import generic_filter
def despike(values):
centre = int(values.size / 2)
avg = np.mean([values[:centre], values[centre+1:]])
std = np.std([values[:centre], values[centre+1:]])
if avg + 3 * std < values[centre]:
return avg
else:
return values[centre]
Let's make some fake data:
data = np.random.randint(0, 10, (5, 5))
data[2, 2] = 100
This yields (for example):
array([[ 2, 8, 4, 2, 4],
[ 9, 4, 7, 6, 5],
[ 9, 9, 100, 7, 3],
[ 0, 1, 0, 8, 0],
[ 9, 9, 7, 6, 0]])
Now you can apply the filter:
correctedData = generic_filter(data, despike, size=3)
Which removed the spike I added:
array([[2, 8, 4, 2, 4],
[9, 4, 7, 6, 5],
[9, 9, 5, 7, 3],
[0, 1, 0, 8, 0],
[9, 9, 7, 6, 0]])
I need a sample, without replacement, from among all possible tuples of numbers from range(n). That is, I have a collection of (0,0), (0,1), ..., (0,n), (1,0), (1,1), ..., (1,n), ..., (n,0), (n,1), (n,n), and I'm trying to get a sample of k of those elements. I am hoping to avoid explicitly building this collection.
I know random.sample(range(n), k) is simple and efficient if I needed a sample from a sequence of numbers rather than tuples of numbers.
Of course, I can explicitly build the list containing all possible (n * n = n^2) tuples, and then call random.sample. But that probably is not efficient if k is much smaller than n^2.
I am not sure if things work the same in Python 2 and 3 in terms of efficiency; I use Python 3.
Depending on how many of these you're selecting, it might be simplest to just keep track of what things you've already picked (via a set) and then re-pick until you get something that you haven't picked already.
The other option is to just use some simple math:
numbers_in_nxn = random.sample(range(n*n), k) # Use xrange in Python 2.x
tuples_in_nxn = [divmod(x,n) for x in numbers_in_nxn]
You say:
Of course, I can explicitly build the
list containing all possible (n * n =
n^2) tuples, and then call
random.sample. But that probably is
not efficient if k is much smaller
than n^2.
Well, how about building the tuple after you have randomly picked one? Ie, if you can build the tuples before you randomly choose which one to pick, you can do the picking first and building later.
I don't understand how your tuples are supposed to look, but here is an example, although I realize your tuples are all of the same length, this shows the principle:
Instead of doing this:
>>> import random
>>> all_sequences = [range(x) for x in range(10)]
>>> all_sequences
[[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
>>> random.sample(all_sequences, 3)
[[0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
You would do this:
>>> import random
>>> selection = random.sample(range(10), 3)
>>> [range(x) for a in selection]
[[0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
Without trying (no python at hand):
random.shuffle(range(n))[:k]
see comments. Didn't sleep enough...