Numpy lexicographic ordering

Numpy lexicographic ordering - python

I'd like to lexicographically sort the following array a (get index positions), but, I'm having problems understanding the numpy results:
>>> a = np.asarray([[1, 1, 1, 2, 1, 2], [2, 1, 2, 3, 1, 0], [1, 2, 3, 3, 2, 2]])
>>> a
array([[1, 1, 1, 2, 1, 2],
[2, 1, 2, 3, 1, 0],
[1, 2, 3, 3, 2, 2]])
>>> np.lexsort(a)
array([0, 5, 1, 4, 2, 3])
For instance, I don't understand why [1, 2, 1] (a[:,0]) is sort-index 0 while [1, 1, 2] (a[:,1]) is index 5, even thought it should be samller than [1, 2, 1].

The order of significance for keys is opposite to what you expected.
In order to get expected result just flip the matrix upside down
>>> np.lexsort(np.flipud(a))
array([1, 4, 0, 2, 5, 3])

np.lexsort gives you the index of the columns in lexicographic order, however the order it considers is such that the last element in the column has priority over the previous one and so on. That's why in your example column 5 comes before column 1.
[2,0,2] < [1,1,2] because 2 = 2 and 0 < 1.

Related

Inserting a value in a list of lists

My data is as follows,
data = [[2, 1, 2, 2], [2, 2, 1, 5], [1, 2, 2, 2], [2, 1, 2, 5], [2, 5, 2, 1]]
I would like to transform this such that there is a 0 at 0, 1, 2, 3 and 4th positions of the internal lists and get it look like below,
new_Data = [[0, 2, 1, 2, 2], [2, 0, 2, 1, 5], [1, 2, 0, 2, 2], [2, 1, 2, 0, 5], [2, 5, 2, 1, 0]]
I have tried using the following method,
a = 0
for n in range(len(mRco1)-1):
mRco1[n][n] = [a]
But it does not seem to work.
Can anyone suggest how can I go about this?

Use the list.insert() method
for i in range(len(data)):
data[i].insert(i, 0)
result :
print(data)
>>> [[0, 2, 1, 2, 2], [2, 0, 2, 1, 5], [1, 2, 0, 2, 2], [2, 1, 2, 0, 5], [2, 5, 2, 1, 0]]

You'd like to iterate over the lists in data, and for the n'th list, insert a 0 at position n. You can use the insert function for that, and define the following loop:
for i in range(len(data)):
data[i].insert(i, 0)

How to generate indices like [0,2,4,1,3,5] without using explicit loop for reorganizing rows of a tensors in Pytorch?

Suppose I have a Tensor like
a = torch.tensor([[3, 1, 5, 0, 4, 2],
[2, 1, 3, 4, 5, 0],
[0, 4, 5, 1, 2, 3],
[3, 1, 4, 5, 0, 2],
[3, 5, 4, 2, 0, 1],
[5, 3, 0, 4, 1, 2]])
and I want to reorganize the rows of the tensor by applying the transformation a[c] where
c = torch.tensor([0,2,4,1,3,5])
to get
b = torch.tensor([[3, 1, 5, 0, 4, 2],
[0, 4, 5, 1, 2, 3],
[3, 5, 4, 2, 0, 1],
[2, 1, 3, 4, 5, 0],
[3, 1, 4, 5, 0, 2],
[5, 3, 0, 4, 1, 2]])
For doing it, I want to generate the tensor c so that I can do this transformation irrespective of the size of tensor a and the stepping size (which I have taken to be equal to 2 in this example for simplicity). Can anyone let me know how do I generate such a tensor for the general case without using an explicit for loop in PyTorch?

You can use torch.index_select, so:
b = torch.index_select(a, 0, c)
The explanation in the official docs is pretty clear.

I also came up with another solution, which solves the above problem of reorganizing the rows of tensor a to generate tensor b without generating the indices array c
step = 2
b = a.view(-1,step,a.size(-1)).transpose(0,1).reshape(-1,a.size(-1))

Thinking for a little longer, I came up with the below solution for generation of the indices
step = 2
idx = torch.arange(0,a.size(0),step)
# idx = tensor([0, 2, 4])
idx = idx.repeat(int(a.size(0)/idx.size(0)))
# idx = tensor([0, 2, 4, 0, 2, 4])
incr = torch.arange(0,step)
# incr = tensor([0, 1])
incr = incr.repeat_interleave(int(a.size(0)/incr.size(0)))
# incr = tensor([0, 0, 0, 1, 1, 1])
c = incr + idx
# c = tensor([0, 2, 4, 1, 3, 5])
After this, the tensor c can be used to get the tensor b by using
b = a[c.long()]

Append "i" element with "j" element with "k" element in python

I have 3 arrays that correspond to coordinates and data of a sparse matrix:
cx=[3, 4, 3, 0, 1, 2, 1, 2]
rx=[0, 0, 1, 2, 2, 2, 3, 4]
data=[0.73646372, 0.42238729, 0.20987735, 0.33721646, 0.66935198, 0.13533819, 0.64143482, 0.004114]
And I need this output:
[[3, 0 , 0.73646372],
[4, 0 , 0.42238729],
[3, 1 , 0.20987735],
....
I have tried .append() with no luck :(

Why not use zip method?
cx=[3, 4, 3, 0, 1, 2, 1, 2]
rx=[0, 0, 1, 2, 2, 2, 3, 4]
data=[0.73646372, 0.42238729, 0.20987735, 0.33721646, 0.66935198, 0.13533819, 0.64143482, 0.004114]
new_list=[]
for i in zip(cx, rx,data):
print(list(i))
However, if you want a traditional solution:
cx=[3, 4, 3, 0, 1, 2, 1, 2]
rx=[0, 0, 1, 2, 2, 2, 3, 4]
data=[0.73646372, 0.42238729, 0.20987735, 0.33721646, 0.66935198, 0.13533819, 0.64143482, 0.004114]
new_list=[]
for i,j in enumerate(cx):
new_list.append([cx[i],rx[i],data[i]])
print(new_list)
More compact solutions
k=[list(i) for i in zip(cx, rx,data)]
j=[[cx[i],rx[i],data[i]] for i,j in enumerate(cx)]

You can use zip()
new = [list(i) for i in zip(cx,rx,data)]
output
[[3, 0, 0.73646372], [4, 0, 0.42238729], [3, 1, 0.20987735], [0, 2, 0.33721646], [1, 2, 0.66935198], [2, 2, 0.13533819], [1, 3, 0.64143482], [2, 4, 0.004114]]

Major vote by column?

I have a 20x20 2D array, from which I want to get for every column the value with the highest count of occurring (excluding zeros) i.e. the value that receives the major vote.
I can do that for a single column like this :
: np.unique(p[:,0][p[:,0] != 0],return_counts=True)
: (array([ 3, 21], dtype=int16), array([1, 3]))
: nums, cnts = np.unique(p[:,0][ p[:,0] != 0 ],return_counts=True)
: nums[cnts.argmax()]
: 21
Just for completeness, we can extend the earlier proposed method to a loop-based solution for 2D arrays -
# p is 2D input array
for i in range(p.shape[1]):
nums, cnts = np.unique(p[:,i][ p[:,i] != 0 ],return_counts=True)
output_per_col = nums[cnts.argmax()]
How do I do that for all columns w/o using for loop ?

We can use bincount2D_vectorized to get binned counts per col, where the bins would be each integer. Then, simply slice out from the second count onwards (as the first count would be for 0) and get argmax, add 1 (to compensate for the slicing). That's our desired output.
Hence, the solution shown as a sample case run -
In [116]: p # input array
Out[116]:
array([[4, 3, 4, 1, 1, 0, 2, 0],
[4, 0, 0, 0, 0, 0, 4, 0],
[3, 1, 3, 4, 3, 1, 4, 3],
[4, 4, 3, 3, 1, 1, 3, 2],
[3, 0, 3, 0, 4, 4, 4, 0],
[3, 0, 0, 3, 2, 0, 1, 4],
[4, 0, 3, 1, 3, 3, 2, 0],
[3, 3, 0, 0, 2, 1, 3, 1],
[2, 4, 0, 0, 2, 3, 4, 2],
[0, 2, 4, 2, 0, 2, 2, 4]])
In [117]: bincount2D_vectorized(p.T)[:,1:].argmax(1)+1
Out[117]: array([3, 3, 3, 1, 2, 1, 4, 2])
That transpose is needed because bincount2D_vectorized gets us 2D bincounts per row. Thus, for an alternative problem of getting ranks per row, simply skip that transpose.
Also, feel free to explore other options in that linked Q&A to get 2D-bincounts.

Weird / Wrong outpout of np.argsort()

I was working with numpy and argsort, while encountering a strange (?) behavior of argsort:
>>> array = [[0, 1, 2, 3, 4, 5],
[444, 4, 8, 3, 1, 10],
[2, 5, 8, 999, 1, 4]]
>>> np.argsort(array, axis=0)
array([[0, 0, 0, 0, 1, 2],
[2, 1, 1, 1, 2, 0],
[1, 2, 2, 2, 0, 1]], dtype=int64)
The first 4 values of each list are pretty clear to me - argsort doing it's job right. But the last 2 values are pretty confusing, as it is kinda sorting the values wrong.
Shouldn't the output of argsort be:
array([[0, 0, 0, 0, 2, 1],
[2, 1, 1, 1, 0, 2],
[1, 2, 2, 2, 1, 0]], dtype=int64)

I think the issue is with what you think argsort is outputting. Let's focus on a simpler 1D example:
arr = np.array([5, 10, 4])
The result of np.argsort will be the indices from the original array to make the elements sorted:
[2, 0, 1]
Let's take a look at what the actual sorted values are to understand why:
[
4, # at index 2 in the original array
5, # at index 0 in the original array
10, # at index 1 in the original array
]
It seems like you are imagining the inverse operation, where argsort will tell you what index in the output each element will move to. You can obtain those indices by applying argsort to the result of argsort.

The output is correct, the thing is that np.argsort with axis=0, is actually comparing the each element of the first axis elements'. So, that for array
array = [[0, 1, 2, 3, 4, 5],
... [444, 4, 8, 3, 1, 10],
... [2, 5, 8, 999, 1, 4]]
axis=0, compares the elements, (0, 444, 2), (1,4,8), (2,8,8), (3,3,999), (4,1,1), (5,10,4) so that it gives the array of indices as:
np.argsort(array, axis=0)
array([[0, 0, 0, 0, 1, 2],
[2, 1, 1, 1, 2, 0],
[1, 2, 2, 2, 0, 1]])
So, for your question the last 2 values, comes from the elements (4,1,1) which give the array index value as (1,2,0), and for (5,10,4) it gives (2,0,1).
Refer this: np.argsort

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy lexicographic ordering - python

The order of significance for keys is opposite to what you expected. In order to get expected result just flip the matrix upside down >>> np.lexsort(np.flipud(a)) array([1, 4, 0, 2, 5, 3])

np.lexsort gives you the index of the columns in lexicographic order, however the order it considers is such that the last element in the column has priority over the previous one and so on. That's why in your example column 5 comes before column 1. [2,0,2] < [1,1,2] because 2 = 2 and 0 < 1.

Related

Inserting a value in a list of lists

How to generate indices like [0,2,4,1,3,5] without using explicit loop for reorganizing rows of a tensors in Pytorch?

Append "i" element with "j" element with "k" element in python

Major vote by column?

Weird / Wrong outpout of np.argsort()

Categories

Resources