Major vote by column? - python

I have a 20x20 2D array, from which I want to get for every column the value with the highest count of occurring (excluding zeros) i.e. the value that receives the major vote.
I can do that for a single column like this :
: np.unique(p[:,0][p[:,0] != 0],return_counts=True)
: (array([ 3, 21], dtype=int16), array([1, 3]))
: nums, cnts = np.unique(p[:,0][ p[:,0] != 0 ],return_counts=True)
: nums[cnts.argmax()]
: 21
Just for completeness, we can extend the earlier proposed method to a loop-based solution for 2D arrays -
# p is 2D input array
for i in range(p.shape[1]):
nums, cnts = np.unique(p[:,i][ p[:,i] != 0 ],return_counts=True)
output_per_col = nums[cnts.argmax()]
How do I do that for all columns w/o using for loop ?

We can use bincount2D_vectorized to get binned counts per col, where the bins would be each integer. Then, simply slice out from the second count onwards (as the first count would be for 0) and get argmax, add 1 (to compensate for the slicing). That's our desired output.
Hence, the solution shown as a sample case run -
In [116]: p # input array
Out[116]:
array([[4, 3, 4, 1, 1, 0, 2, 0],
[4, 0, 0, 0, 0, 0, 4, 0],
[3, 1, 3, 4, 3, 1, 4, 3],
[4, 4, 3, 3, 1, 1, 3, 2],
[3, 0, 3, 0, 4, 4, 4, 0],
[3, 0, 0, 3, 2, 0, 1, 4],
[4, 0, 3, 1, 3, 3, 2, 0],
[3, 3, 0, 0, 2, 1, 3, 1],
[2, 4, 0, 0, 2, 3, 4, 2],
[0, 2, 4, 2, 0, 2, 2, 4]])
In [117]: bincount2D_vectorized(p.T)[:,1:].argmax(1)+1
Out[117]: array([3, 3, 3, 1, 2, 1, 4, 2])
That transpose is needed because bincount2D_vectorized gets us 2D bincounts per row. Thus, for an alternative problem of getting ranks per row, simply skip that transpose.
Also, feel free to explore other options in that linked Q&A to get 2D-bincounts.

Related

How to generate indices like [0,2,4,1,3,5] without using explicit loop for reorganizing rows of a tensors in Pytorch?

Suppose I have a Tensor like
a = torch.tensor([[3, 1, 5, 0, 4, 2],
[2, 1, 3, 4, 5, 0],
[0, 4, 5, 1, 2, 3],
[3, 1, 4, 5, 0, 2],
[3, 5, 4, 2, 0, 1],
[5, 3, 0, 4, 1, 2]])
and I want to reorganize the rows of the tensor by applying the transformation a[c] where
c = torch.tensor([0,2,4,1,3,5])
to get
b = torch.tensor([[3, 1, 5, 0, 4, 2],
[0, 4, 5, 1, 2, 3],
[3, 5, 4, 2, 0, 1],
[2, 1, 3, 4, 5, 0],
[3, 1, 4, 5, 0, 2],
[5, 3, 0, 4, 1, 2]])
For doing it, I want to generate the tensor c so that I can do this transformation irrespective of the size of tensor a and the stepping size (which I have taken to be equal to 2 in this example for simplicity). Can anyone let me know how do I generate such a tensor for the general case without using an explicit for loop in PyTorch?
You can use torch.index_select, so:
b = torch.index_select(a, 0, c)
The explanation in the official docs is pretty clear.
I also came up with another solution, which solves the above problem of reorganizing the rows of tensor a to generate tensor b without generating the indices array c
step = 2
b = a.view(-1,step,a.size(-1)).transpose(0,1).reshape(-1,a.size(-1))
Thinking for a little longer, I came up with the below solution for generation of the indices
step = 2
idx = torch.arange(0,a.size(0),step)
# idx = tensor([0, 2, 4])
idx = idx.repeat(int(a.size(0)/idx.size(0)))
# idx = tensor([0, 2, 4, 0, 2, 4])
incr = torch.arange(0,step)
# incr = tensor([0, 1])
incr = incr.repeat_interleave(int(a.size(0)/incr.size(0)))
# incr = tensor([0, 0, 0, 1, 1, 1])
c = incr + idx
# c = tensor([0, 2, 4, 1, 3, 5])
After this, the tensor c can be used to get the tensor b by using
b = a[c.long()]

Compute distances between all points in array efficiently using Python

I have a list of N=3 points like this as input:
points = [[1, 1], [2, 2], [4, 4]]
I wrote this code to compute all possible distances between all elements of my list points, as dist = min(∣x1−x2∣,∣y1−y2∣):
distances = []
for i in range(N-1):
for j in range(i+1,N):
dist = min((abs(points[i][0]-points[j][0]), abs(points[i][1]-points[j][1])))
distances.append(dist)
print(distances)
My output will be the array distances with all the distances saved in it: [1, 3, 2]
It works fine with N=3, but I would like to compute it in a more efficiently way and be free to set N=10^5.
I am trying to use also numpy and scipy, but I am having a little trouble with replacing the loops and use the correct method.
Can anybody help me please? Thanks in advance
The numpythonic solution
To compute your distances using the full power of Numpy, and do it
substantially faster:
Convert your points to a Numpy array:
pts = np.array(points)
Then run:
dist = np.abs(pts[np.newaxis, :, :] - pts[:, np.newaxis, :]).min(axis=2)
Here the result is a square array.
But if you want to get a list of elements above the diagonal,
just like your code generates, you can run:
dist2 = dist[np.triu_indices(pts.shape[0], 1)].tolist()
I ran this code for the following 9 points:
points = [[1, 1], [2, 2], [4, 4], [3, 5], [2, 8], [4, 10], [3, 7], [2, 9], [4, 7]]
For the above data, the result saved in dist (a full array) is:
array([[0, 1, 3, 2, 1, 3, 2, 1, 3],
[1, 0, 2, 1, 0, 2, 1, 0, 2],
[3, 2, 0, 1, 2, 0, 1, 2, 0],
[2, 1, 1, 0, 1, 1, 0, 1, 1],
[1, 0, 2, 1, 0, 2, 1, 0, 1],
[3, 2, 0, 1, 2, 0, 1, 1, 0],
[2, 1, 1, 0, 1, 1, 0, 1, 0],
[1, 0, 2, 1, 0, 1, 1, 0, 2],
[3, 2, 0, 1, 1, 0, 0, 2, 0]])
and the list of elements from upper diagonal part is:
[1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 0, 2, 1, 0, 2, 1, 2, 0, 1, 2, 0, 1, 1, 0, 1, 1,
2, 1, 0, 1, 1, 1, 0, 1, 0, 2]
How faster is my code
It turns out that even for such small sample like I used (9
points), my code works 2 times faster. For a sample of 18 points
(not presented here) - 6 times faster.
This difference in speed has been gained even though my function
computes "2 times more than needed" i.e. it generates a full
array, whereas the lower diagonal part of the result in a "mirror
view" of the upper diagonal part (what computes your code).
For bigger number of points the difference should be much bigger.
Make your test on a bigger sample of points (say 100 points) and write how
many times faster was my code.

Weird / Wrong outpout of np.argsort()

I was working with numpy and argsort, while encountering a strange (?) behavior of argsort:
>>> array = [[0, 1, 2, 3, 4, 5],
[444, 4, 8, 3, 1, 10],
[2, 5, 8, 999, 1, 4]]
>>> np.argsort(array, axis=0)
array([[0, 0, 0, 0, 1, 2],
[2, 1, 1, 1, 2, 0],
[1, 2, 2, 2, 0, 1]], dtype=int64)
The first 4 values of each list are pretty clear to me - argsort doing it's job right. But the last 2 values are pretty confusing, as it is kinda sorting the values wrong.
Shouldn't the output of argsort be:
array([[0, 0, 0, 0, 2, 1],
[2, 1, 1, 1, 0, 2],
[1, 2, 2, 2, 1, 0]], dtype=int64)
I think the issue is with what you think argsort is outputting. Let's focus on a simpler 1D example:
arr = np.array([5, 10, 4])
The result of np.argsort will be the indices from the original array to make the elements sorted:
[2, 0, 1]
Let's take a look at what the actual sorted values are to understand why:
[
4, # at index 2 in the original array
5, # at index 0 in the original array
10, # at index 1 in the original array
]
It seems like you are imagining the inverse operation, where argsort will tell you what index in the output each element will move to. You can obtain those indices by applying argsort to the result of argsort.
The output is correct, the thing is that np.argsort with axis=0, is actually comparing the each element of the first axis elements'. So, that for array
array = [[0, 1, 2, 3, 4, 5],
... [444, 4, 8, 3, 1, 10],
... [2, 5, 8, 999, 1, 4]]
axis=0, compares the elements, (0, 444, 2), (1,4,8), (2,8,8), (3,3,999), (4,1,1), (5,10,4) so that it gives the array of indices as:
np.argsort(array, axis=0)
array([[0, 0, 0, 0, 1, 2],
[2, 1, 1, 1, 2, 0],
[1, 2, 2, 2, 0, 1]])
So, for your question the last 2 values, comes from the elements (4,1,1) which give the array index value as (1,2,0), and for (5,10,4) it gives (2,0,1).
Refer this: np.argsort

Most efficient way to get sorted indices based on two numpy arrays

How can i get the sorted indices of a numpy array (distance), only considering certain indices from another numpy array (val).
For example, consider the two numpy arrays val and distance below:
val = np.array([[10, 0, 0, 0, 0],
[0, 0, 10, 0, 10],
[0, 10, 10, 0, 0],
[0, 0, 0, 10, 0],
[0, 0, 0, 0, 0]])
distance = np.array([[4, 3, 2, 3, 4],
[3, 2, 1, 2, 3],
[2, 1, 0, 1, 2],
[3, 2, 1, 2, 3],
[4, 3, 2, 3, 4]])
the distances where val == 10 are 4, 1, 3, 1, 0, 2. I would like to get these sorted to be 0, 1, 1, 2, 3, 4 and return the respective indices from distance array.
Returning something like:
(array([2, 1, 2, 3, 1, 0], dtype=int64), array([2, 2, 1, 3, 4, 0], dtype=int64))
or:
(array([2, 2, 1, 3, 1, 0], dtype=int64), array([2, 1, 2, 3, 4, 0], dtype=int64))
since the second and third element both have distance '1', so i guess the indices can be interchangable.
Tried using combinations of np.where, np.argsort, np.argpartition, np.unravel_index but cant seem to get it working right
Here's one way with masking -
In [20]: mask = val==10
In [21]: np.argwhere(mask)[distance[mask].argsort()]
Out[21]:
array([[2, 2],
[1, 2],
[2, 1],
[3, 3],
[1, 4],
[0, 0]])

making an array of n columns where each successive row increases by one

In numpy, I would like to be able to input n for rows and m for columns and end with the array that looks like:
[(0,0,0,0),
(1,1,1,1),
(2,2,2,2)]
So that would be a 3x4. Each column is just a copy of the previous one and the row increases by one each time. As an example:
input would be 4, then 6 and the output would be and array
[(0,0,0,0,0,0),
(1,1,1,1,1,1),
(2,2,2,2,2,2),
(3,3,3,3,3,3)]
4 rows and 6 columns where the row increases by one each time. Thanks for your time.
So many possibilities...
In [51]: n = 4
In [52]: m = 6
In [53]: np.tile(np.arange(n), (m, 1)).T
Out[53]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [54]: np.repeat(np.arange(n).reshape(-1,1), m, axis=1)
Out[54]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [55]: np.outer(np.arange(n), np.ones(m, dtype=int))
Out[55]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Here's one more. The neat trick here is that the values are not duplicated--only memory for the single sequence [0, 1, 2, ..., n-1] is allocated.
In [67]: from numpy.lib.stride_tricks import as_strided
In [68]: seq = np.arange(n)
In [69]: rep = as_strided(seq, shape=(n,m), strides=(seq.strides[0],0))
In [70]: rep
Out[70]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Be careful with the as_strided function. If you don't get the arguments right, you can crash Python.
To see that seq has not been copied, change seq in place, and then check rep:
In [71]: seq[1] = 99
In [72]: rep
Out[72]:
array([[ 0, 0, 0, 0, 0, 0],
[99, 99, 99, 99, 99, 99],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3]])
import numpy as np
def foo(n, m):
return np.array([np.arange(n)] * m).T
Natively (no Python lists):
rows, columns = 4, 6
numpy.arange(rows).reshape(-1, 1).repeat(columns, axis=1)
#>>> array([[0, 0, 0, 0, 0, 0],
#>>> [1, 1, 1, 1, 1, 1],
#>>> [2, 2, 2, 2, 2, 2],
#>>> [3, 3, 3, 3, 3, 3]])
You can easily do this using built in python functions. The program counts to 3 converting each number to a string and repeats the string 6 times.
print [6*str(n) for n in range(0,4)]
Here is the output.
ks-MacBook-Pro:~ kyle$ pbpaste | python
['000000', '111111', '222222', '333333']
On more for fun
np.zeros((n, m), dtype=np.int) + np.arange(n, dtype=np.int)[:,None]
As has been mentioned, there are many ways to do this.
Here's what I'd do:
import numpy as np
def makearray(m, n):
A = np.empty((m,n))
A.T[:] = np.arange(m)
return A
Here's an amusing alternative that will work if you aren't going to be changing the contents of the array.
It should save some memory.
Be careful though because this doesn't allocate a full array, it will have multiple entries pointing to the same memory address.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def makearray(m, n):
A = np.arange(m)
return as_strided(A, strides=(A.strides[0],0), shape=(m,n))
In either case, as I have written them, a 3x4 array can be created by makearray(3, 4)
Using count from the built-in module itertools:
>>> from itertools import count
>>> rows = 4
>>> columns = 6
>>> cnt = count()
>>> [[cnt.next()]*columns for i in range(rows)]
[[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3]]
you can simply
>>> nc=5
>>> nr=4
>>> [[k]*nc for k in range(nr)]
[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3]]
Several other possibilities using a (n,1) array
a = np.arange(n)[:,None] (or np.arange(n).reshape(-1,1))
a*np.ones((m),dtype=int)
a[:,np.zeros((m),dtype=int)]
If used with a (m,) array, just leave it (n,1), and let broadcasting expand it for you.

Categories