Related
I am trying to extract several values at once from an array but I can't seem to find a way to do it in a one-liner in Numpy.
Simply put, considering an array:
a = numpy.arange(10)
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
I would like to be able to extract, say, 2 values, skip the next 2, extract the 2 following values etc. This would result in:
array([0, 1, 4, 5, 8, 9])
This is an example but I am ideally looking for a way to extract x values and skip y others.
I thought this could be done with slicing, doing something like:
a[:2:2]
but it only returns 0, which is the expected behavior.
I know I could obtain the expected result by combining several slicing operations (similarly to Numpy Array Slicing) but I was wondering if I was not missing some numpy feature.
If you want to avoid creating copies and allocating new memory, you could use a window_view of two elements:
win = np.lib.stride_tricks.sliding_window_view(a, 2)
array([[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9]])
And then only take every 4th window view:
win[::4].ravel()
array([0, 1, 4, 5, 8, 9])
Or directly go with the more dangerous as_strided, but heed the warnings in the documentation:
np.lib.stride_tricks.as_strided(a, shape=(3,2), strides=(32,8))
You can use a modulo operator:
x = 2 # keep
y = 2 # skip
out = a[np.arange(a.shape[0])%(x+y)<x]
Output: array([0, 1, 4, 5, 8, 9])
Output with x = 2 ; y = 3:
array([0, 1, 5, 6])
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have two 1D arrays of the same length as this:
import numpy as np
a = np.array([1, 1, 1, 2, 2, 3, 4, 5])
b = np.array([7, 7, 8, 8, 9, 8, 10, 10])
The value of a is increasing while b is random.
I wanna pair them by their values following the steps below:
Pick the first unique value ([1]) of array a and get the unique numbers ([7, 8]) of array b at the same index.
If some paired numbers ([8]) appear again in b, then pick the number at the same index of a.
Then, some new paired number ([2]) which appears again in a, the numbers in b at the same index are selected.
Finally, the result should be:
[1, 2, 3] is paired with [7, 8, 9]
[4, 5] is paired with [10]
It looks like there is no easy way for a vectorised (no looping) solution since it's a graph theory problem of finding connected components. If you still want to have a performant script that works fast on big data, you could use igraph library which is written in C.
TL;DR
I assume your input corresponds to edges of some graph:
>>> np.transpose([a, b])
array([[ 1, 7],
[ 1, 7],
[ 1, 8],
[ 2, 8],
[ 2, 9],
[ 3, 8],
[ 4, 10],
[ 5, 10]])
So your vertices are:
>>> np.unique(np.transpose([a, b]))
array([ 1, 2, 3, 4, 5, 7, 8, 9, 10])
And you would be quite happy (at least at the beginning) to recognise communities, like:
tags = np.transpose([a, b, communities])
>>> tags
array([[ 1, 7, 0],
[ 1, 7, 0],
[ 1, 8, 0],
[ 2, 8, 0],
[ 2, 9, 0],
[ 3, 8, 0],
[ 4, 10, 1],
[ 5, 10, 1]])
so that you have vertices (1, 2, 3, 7, 8, 9) included in community number 0 and vertices (4, 5, 10) included in community number 1.
Unfortunately, igraph doesn't support labeling graph nodes from 1 to 10 or any gaps of ids in labels. It must start from 0 and have no gaps in ids. So you need to store initial indices and after that relabel vertices so that edges are:
vertices_old, inv = np.unique(np.transpose([a,b]), return_inverse=True)
edges_new = inv.reshape(-1, 2)
>>> vertices_old
array([ 1, 2, 3, 4, 5, 7, 8, 9, 10]) #new ones are: [0, 1, 2, ..., 8]
>>> edges_new
array([[0, 5],
[0, 5],
[0, 6],
[1, 6],
[1, 7],
[2, 6],
[3, 8],
[4, 8]], dtype=int64)
The next step is to find communities using igraph (pip install python-igraph). You can run the following:
import igraph as ig
graph = ig.Graph(edges = edges_new)
communities = graph.clusters().membership #type: list
communities = np.array(communities)
>>> communities
array([0, 0, 0, 1, 1, 0, 0, 0, 1]) #tags of nodes [1 2 3 4 5 7 8 9 10]
And then retrieve tags of source vertices (as well as tags of target vertices):
>>> communities = communities[edges_new[:, 0]] #or [:, 1]
array([0, 0, 0, 0, 0, 0, 1, 1])
After you find communities, the second part of solution appears to be a typical groupby problem. You can do it in pandas:
import pandas as pd
def get_part(source, communities):
part_edges = np.transpose([source, communities])
part_idx = pd.DataFrame(part_edges).groupby([1]).indices.values() #might contain duplicated source values
part = [np.unique(source[idx]) for idx in part_idx]
return part
>>> get_part(a, communities), get_part(b, communities)
([array([1, 2, 3]), array([4, 5])], [array([7, 8, 9]), array([10])])
Final Code
import igraph as ig
import numpy as np
import pandas as pd
def get_part(source, communities):
'''find set of nodes for each community'''
part_edges = np.transpose([source, communities])
part_idx = pd.DataFrame(part_edges).groupby([1]).indices.values() #might contain duplicated source values
part = [np.unique(source[idx]) for idx in part_idx]
return part
a = np.array([1, 1, 1, 2, 2, 3, 4, 5])
b = np.array([7, 7, 8, 8, 9, 8, 10, 10])
vertices_old, inv = np.unique(np.transpose([a,b]), return_inverse=True)
edges_new = inv.reshape(-1, 2)
graph = ig.Graph(edges = edges_new)
communities = np.array(graph.clusters().membership)
communities = communities[edges_new[:,0]] #or communities[edges_new[:,1]]
>>> get_part(a, communities), get_part(b, communities)
([array([1, 2, 3]), array([4, 5])], [array([7, 8, 9]), array([10])])
I tried doing this by iterating both the arrays simultaneously and keeping track of what element is associated with which index of the result. Let me know if this works for you?
a = [1, 1, 1, 2, 2, 3, 4, 5]
b = [7, 7, 8, 8, 9, 8, 10, 10]
tracker_a = dict()
tracker_b = dict()
result = []
index = 0
for elem_a, elem_b in zip(a, b):
if elem_a in tracker_a:
result[tracker_a[elem_a]][1].add(elem_b)
tracker_b[elem_b] = tracker_a[elem_a]
elif elem_b in tracker_b:
result[tracker_b[elem_b]][0].add(elem_a)
tracker_a[elem_a] = tracker_b[elem_b]
else:
tracker_a[elem_a] = index
tracker_b[elem_b] = index
result.append([{elem_a}, {elem_b}])
index += 1
print(result)
Output:
[[{1, 2, 3}, {8, 9, 7}], [{4, 5}, {10}]]
Complexity: O(n)
I have a NumPy array, for example:
>>> import numpy as np
>>> x = np.random.randint(0, 10, size=(5, 5))
>>> x
array([[4, 7, 3, 7, 6],
[7, 9, 5, 7, 8],
[3, 1, 6, 3, 2],
[9, 2, 3, 8, 4],
[0, 9, 9, 0, 4]])
Is there a way to get a view (or copy) that contains indices 1:3 of the first row, indices 2:4 of the second row and indices 3:5 of the forth row?
So, in the above example, I wish to get:
>>> # What to write here?
array([[7, 3],
[5, 7],
[8, 4]])
Obviously, I would like a general method that would work efficiently also for multi-dimensional large arrays (and not only for the toy example above).
Try:
>>> np.array([x[0, 1:3], x[1, 2:4], x[3, 3:5]])
array([[7, 3],
[5, 7],
[8, 4]])
You can use numpy.lib.stride_tricks.as_strided as long as the offsets between rows are uniform:
# How far to step along the rows
offset = 1
# How wide the chunk of each row is
width = 2
view = np.lib.stride_tricks.as_strided(x, shape=(x.shape[0], width), strides=(x.strides[0] + offset * x.strides[1],) + x.strides[1:])
The result is guaranteed to be a view into the original data, not a copy.
Since as_strided is ridiculously powerful, be very careful how you use it. For example, make absolutely sure that the view does not go out of bounds in the last few rows.
If you can avoid it, try not to assign anything into a view returned by as_strided. Assignment just increases the dangers of unpredictable behavior and crashing a thousandfold if you don't know exactly what you're doing.
I guess something like this :D
In:
import numpy as np
x = np.random.randint(0, 10, size=(5, 5))
Out:
array([[7, 3, 3, 1, 9],
[6, 1, 3, 8, 7],
[0, 2, 2, 8, 4],
[8, 8, 1, 8, 8],
[1, 2, 4, 3, 4]])
In:
list_of_indicies = [[0,1,3], [1,2,4], [3,3,5]] #[row, start, stop]
def func(array, row, start, stop):
return array[row, start:stop]
for i in range(len(list_of_indicies)):
print(func(x,list_of_indicies[i][0],list_of_indicies[i][1], list_of_indicies[i][2]))
Out:
[3 3]
[3 8]
[3 4]
So u can modify it for your needs. Good luck!
I would extract diagonal vectors and stack them together, like this:
def diag_slice(x, start, end):
n_rows = min(*x.shape)-end+1
columns = [x.diagonal(i)[:n_rows, None] for i in range(start, end)]
return np.hstack(columns)
In [37]: diag_slice(x, 1, 3)
Out[37]:
array([[7, 3],
[5, 7],
[3, 2]])
For the general case it will be hard to beat a row by row list comprehension:
In [28]: idx = np.array([[0,1,3],[1,2,4],[4,3,5]])
In [29]: [x[i,j:k] for i,j,k in idx]
Out[29]: [array([7, 8]), array([2, 0]), array([9, 2])]
If the resulting arrays are all the same size, they can be combined into one 2d array:
In [30]: np.array(_)
Out[30]:
array([[7, 8],
[2, 0],
[9, 2]])
Another approach is to concatenate the indices before. I won't get into the details, but create something like this:
In [27]: x[[0,0,1,1,3,3],[1,2,2,3,3,4]]
Out[27]: array([7, 8, 2, 0, 3, 8])
Selecting from different rows complicates this 2nd approach. Conceptually the first is simpler. Past experience suggests the speed is about the same.
For uniform length slices, something like the as_strided trick may be faster, but it requires more understanding.
Some masking based approaches have also been suggested. But the details are more complicated, so I'll leave those to people like #Divakar who have specialized in them.
Someone has already pointed out the as_strided tricks, and yes, you should really use it with caution.
Here is a broadcast / fancy index approach which is less efficient than as_strided but still works pretty well IMO
window_size, step_size = 2, 1
# index within window
index = np.arange(2)
# offset
offset = np.arange(1, 4, step_size)
# for your case it's [0, 1, 3], I'm not sure how to generalize it without further information
fancy_row = np.array([0, 1, 3]).reshape(-1, 1)
# array([[1, 2],
# [2, 3],
# [3, 4]])
fancy_col = offset.reshape(-1, 1) + index
x[fancy_row, fancy_col]
I know you're supposed to give examples when you ask questions here, but I can't really think of anything that wouldn't involve pasting a massive project worth of code, so I'll just try to describe this as well as possible.
I'm working on a project that involves using keypoints generated by using OpenPose (after I've done some preprocessing on them to simplify everything, I come up with data formatted like this: [x0, y0, c0, x1, y1, c1...], where there are 18 points total, and the x's and y's represent their coordinates, while the c's represent confidence.) I want to take a nested list that has the keypoints for a single person listed in the above manner for each frame, and output a new nested list of lists, made up of the weighted average x's and y's (the weights would be the confidence values for each point) along with the average confidences by each second (instead of by frame), in the same format as above.
I have already converted the original list into a 3-dimensional list, with each second holding each of its frames, each of which holds its keypoint list. I know that I can write code myself to do all of this without using numpy.average(), but I was hoping that I wouldn't have to, because it quickly becomes confusing. Instead, I was wondering if there were a way I could iterate over each second, using said method, in a reasonably simple manner, and just append the resulting lists to a new list, like this:
out = []
for second in lst:
out.append(average(second, axis=1, weights=?, other params?))
Again, I'm sorry for not giving an example of some sort.
Maybe you could get some inspiration from this code:
import numpy as np
def pose_average(sequence):
x, y, c = sequence[0::3], sequence[1::3], sequence[2::3]
x_avg = np.average(x, weights=c)
y_avg = np.average(y, weights=c)
return x_avg, y_avg
sequence = [2, 4, 1, 5, 6, 3, 5, 2, 1]
pose_average(sequence)
>>> (4.4, 4.8)
For multiple sequences of grouped poses:
data = [[1, 2, 3, 2, 3, 4, 3, 4, 5], [1, 2, 3, 4, 5, 6, 7, 8, 9], [4, 1, 2, 5, 3, 3, 4, 1, 2]]
out = [ pose_average(seq) for seq in data ]
out
>>> [(2.1666666666666665, 3.1666666666666665),
(5.0, 6.0),
(4.428571428571429, 1.8571428571428572)]
Edit
By assuming that:
data is a list of sequence
a sequence is a list of grouped poses (for example grouped by seconds)
a pose is the coordinates of the joins positions: [x1, y1, c1, x2, y2, c2, ...]
the slightly modified code is now:
import numpy as np
data = [
[[1, 2, 3, 2, 3, 4, 3, 4, 5], [9, 2, 3, 4, 5, 6, 7, 8, 9], [4, 1, 2, 5, 3, 3, 4, 1, 2], [5, 3, 4, 1, 10, 6, 5, 0, 0]],
[[6, 9, 11, 0, 8, 6, 1, 5, 11], [3, 5, 4, 2, 0, 2, 0, 8, 8], [1, 5, 9, 5, 1, 0, 6, 6, 6]],
[[9, 4, 7, 0, 2, 1], [9, 4, 7, 0, 2, 1], [9, 4, 7, 0, 2, 1]]
]
def pose_average(sequence):
sequence = np.asarray(sequence)
x, y, c = sequence[:, 0::3], sequence[:, 1::3], sequence[:, 2::3]
x_avg = np.average(x, weights=c, axis=0)
y_avg = np.average(y, weights=c, axis=0)
return x_avg, y_avg
out = [ pose_average(seq) for seq in data ]
out
>>> [(array([4.83333333, 2.78947368, 5.375 ]),
array([2.16666667, 5.84210526, 5.875 ])),
(array([3.625, 0.5 , 1.88 ]), array([6.83333333, 6. , 6.2 ])),
(array([9., 0.]), array([4., 2.]))]
x_avg is now the list of x position averaged over the sequence for each point and weight by c.
Let's consider a 2d-array A
2 3 5 7
2 3 5 7
1 7 1 4
5 8 6 0
2 3 5 7
The first, second and last lines are identical. The algorithm I'm looking for should return the number of identical rows for each different row (=number of duplicates of each element). If the script can be easily modified to also count the number of identical column also, it would be great.
I use an inefficient naive algorithm to do that:
import numpy
A=numpy.array([[2, 3, 5, 7],[2, 3, 5, 7],[1, 7, 1, 4],[5, 8, 6, 0],[2, 3, 5, 7]])
i=0
end = len(A)
while i<end:
print i,
j=i+1
numberID = 1
while j<end:
print j
if numpy.array_equal(A[i,:] ,A[j,:]):
numberID+=1
j+=1
i+=1
print A, len(A)
Expected result:
array([3,1,1]) # number identical arrays per line
My algo looks like using native python within numpy, thus inefficient. Thanks for help.
In unumpy >= 1.9.0, np.unique has a return_counts keyword argument you can combine with the solution here to get the counts:
b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])
>>> unq_a
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> unq_cnt
array([1, 3, 1])
In an older numpy, you can replicate what np.unique does, which would look something like:
a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)
>>> a_unq
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> a_cnt
array([1, 3, 1])
You can lexsort on the row entries, which will give you the indices for traversing the rows in sorted order, making the search O(n) rather than O(n^2). Note that by default, the elements in the last column sort last, i.e. the rows are 'alphabetized' right to left rather than left to right.
In [9]: a
Out[9]:
array([[2, 3, 5, 7],
[2, 3, 5, 7],
[1, 7, 1, 4],
[5, 8, 6, 0],
[2, 3, 5, 7]])
In [10]: lexsort(a.T)
Out[10]: array([3, 2, 0, 1, 4])
In [11]: a[lexsort(a.T)]
Out[11]:
array([[5, 8, 6, 0],
[1, 7, 1, 4],
[2, 3, 5, 7],
[2, 3, 5, 7],
[2, 3, 5, 7]])
You can use Counter class from collections module for this.
It works like this :
x = [2, 2, 1, 5, 2]
from collections import Counter
c=Counter(x)
print c
Output : Counter({2: 3, 1: 1, 5: 1})
Only issue you will face is in your case since every value of x is itself a list which is a non hashable data structure.
If you can convert every value of x in a tuple that it should works as :
x = [(2, 3, 5, 7),(2, 3, 5, 7),(1, 7, 1, 4),(5, 8, 6, 0),(2, 3, 5, 7)]
from collections import Counter
c=Counter(x)
print c
Output : Counter({(2, 3, 5, 7): 3, (5, 8, 6, 0): 1, (1, 7, 1, 4): 1})