Unique symmetrical elements Numpy Array - python

I have a Numpy array as this:
[1 4]
[2 3]
[3 0]
[4 1]
[5 6]
[6 5]
[7 6]]
This is output of NearestNeighbors algorithm of scikit-learn. I want to remove duplicated values. To have something like this:
[[0 3]
[1 4]
[2 3]
[6 5]
[7 6]]
I searched a lot, but not found any solution.

One way with sorting and np.unique -
np.unique(np.sort(a, axis=1), axis=0)

Related

How to create list of array combinations lexographically in numpy?

I have this array and I want to return unique array combinations. I tried meshgrid but it creates duplicates and inverse array values
>> import numpy as np
>> array = np.array([0,1,2,3])
>> combinations = np.array(np.meshgrid(array, array)).T.reshape(-1,2)
>> print(combinations)
[[0 0]
[0 1]
[0 2]
[0 3]
[1 0]
[1 1]
[1 2]
[1 3]
[2 0]
[2 1]
[2 2]
[2 3]
[3 0]
[3 1]
[3 2]
[3 3]]
What I want to exclude are the repeating arrays: [0,0] [1,1] [2,2] [3,3] and the inverse arrays when [2,3] is returned exclude [3,2] in the output.
Take a look at this combination calculator, this is the output that I like but how can I create it in NumPy?
you could use combinations from itertools
import numpy as np
from itertools import combinations
array = np.array([0,1,2,3])
combs = np.array(list(combinations(arr, 2)))

Sorting/Cluster a 2D numpy array in ordered sequence based on multiple columns

I have a 2D numpy array like this:
[[4 5 2]
[5 5 1]
[5 4 5]
[5 3 4]
[5 4 4]
[4 3 2]]
I would like to sort/cluster this array by finding the sequence in array like this row[0]>=row[1]>=row[2], row[0]>=row[2]>row[1]... so the row of the array is in ordered sequence.
I tried to use the code: lexdf = df[np.lexsort((df[:,2], df[:,1],df[:,0]))][::-1], however it is not I want.
The output of lexsort:
[[5 5 1]
[5 4 5]
[5 4 4]
[5 3 4]
[4 5 2]
[4 3 2]]
The output I would like to have:
[[5 5 1]
[5 4 4]
[4 3 2]
[5 4 5]
[5 3 4]
[4 5 2]]
or cluster it into three parts:
[[5 5 1]
[5 4 4]
[4 3 2]]
[[5 4 5]
[5 3 4]]
[[4 5 2]]
And I would like to apply this to an array with more columns, so it would be better to do it without iteration. Any ideas to generate this kind of output?
I don't know how to do it in numpy, except maybe with some weird hacks of function numpy.split.
Here is a way to get your groups with python lists:
from itertools import groupby, pairwise
def f(sublist):
return [x <= y for x,y in pairwise(sublist)]
# NOTE: itertools.pairwise requires python>=3.10
# For python<=3.9, use one of those alternatives:
# * more_itertools.pairwise(sublist)
# * zip(sublist, sublist[1:])
a = [[4, 5, 2],
[5, 5, 1],
[5, 4, 5],
[5, 3, 4],
[5, 4, 4],
[4, 3, 2]]
b = [list(g) for _,g in groupby(sorted(a, key=f), key=f)]
print(b)
# [[[4, 3, 2]],
# [[5, 4, 5], [5, 3, 4], [5, 4, 4]],
# [[4, 5, 2], [5, 5, 1]]]
Note: The combination groupby+sorted is actually slightly subefficient, because sorted takes n log(n) time. A linear alternative is to group using a dictionary of lists. See for instance function itertoolz.groupby from module toolz.

How to split an two-dimension array by the second dimension?(Python)

I have an array of points, and I want to split these into two arrays by the second dimension:
points_right = points[points[:, 0] > p0[0]]
points_left = points[points[:, 0] < p0[0]]
how can I split these points in one loop?
I think np.split is what you're looking for, just use axis=1.
Example splitting a 2x4 matrix:
import numpy as np
pts = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
left_pts, right_pts = np.split(pts, indices_or_sections=2, axis=1)
The original matrix (pts):
[[1 2 3 4]
[5 6 7 8]]
left_pts:
[[1 2]
[5 6]]
right_pts:
[[3 4]
[7 8]]
https://numpy.org/doc/stable/reference/generated/numpy.split.html

python how to remove duplicate from column A while keeping the max value from B

I would like to remove duplicate from column A (time scale) while keeping the max value from B(power).
I tried following:
import numpy as np
import matplotlib.pyplot as plt
data_file = np.loadtxt('test.txt',delimiter=' ')
time = data_file [:,0]
power = data_file[:,1]
max = np.max(power,axis=0)
thank you for your support!
You can use pandas:
import pandas as pd
pd.DataFrame(data_file).groupby(0).max(1).reset_index().to_numpy()
example input output:
datafile:
[[5 8]
[1 2]
[3 9]
[5 1]
[1 9]
[9 0]
[5 8]
[0 7]
[0 4]
[8 6]]
output:
[[0 7]
[1 9]
[3 9]
[5 8]
[8 6]
[9 0]]

Difficulty understanding why both these slicing methods for matrices are not equivalent in numpy

import numpy as np
a = np.array([[1, 2, 3],[4, 5, 6]])
print(a[0:1,1])
print(a[:,1])
Output:
[2]
[2 5]
I apologize for the relatively basic question, but I've been unable to find the answer on google. Why aren't these two equivalent?
Furthermore, the first code works when I change the range from 0:2, or 0:200, etc. any number which is larger than the rows of the matrix, why is this the case?
IMHO, I think it is better to think of the integer indexers as numbers in between the "cells".
So, if you slice 0:1, you are getting only the value "a" for the image below or the zero element.
Because slice don’t consider last value and 0:1 is equivalent to 0 raw. If you want to take first two values use 0:2.
import numpy as np
a = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
temp1 = a[0:1,:]
temp2 = a[0:2,:]
temp3 = a[0:3,:]
print(a[0:1,:])
print(a[0:2,:])
print(a[0:3,:])
print(temp1[:,1])
print(temp2[:,1])
print(temp3[:,1])
print(a[:,1])
Following code will output:
[[1 2 3]]
[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]
[7 8 9]]
[2]
[2 5]
[2 5 8]
[2 5 8]
I hope this example clarifies your question

Categories