Python equivalent of Unique function in Matlab - python

I have a (654 x 2) matrix of integers where many rows are having values which are just permutations of the same column values. (Eg. a certain row has values [2,5] whereas another row has values [5,2]). I need a Python function which treats both the rows as unique and help me deleting the row which comes later when sorted.

Sort each element in the sublist.
a = [[1,2], [3, 4], [2,1]]
#Sorted each element in sublist, I converted list to tuple to provide it as an input in set
li = [tuple(sorted(x)) for x in a]
print(li)
#[(1, 2), (3, 4), (1, 2)]
Then use set to eliminate duplicates.
#Convert tuple back to list
unique_li = [list(t) for t in set(li)]
print(unique_li)
#[[1, 2], [3, 4]]

You could use numpy to sort your array's rows.
a = np.array([[1,2], [3, 4], [2,1]])
a
array([[1, 2],
[3, 4],
[2, 1]])
np.ndarray.sort(a)
a
array([[1, 2],
[3, 4],
[1, 2]])
The use aray_equal to compare for equality of rows:
np.array_equal(a[0], a[1])
False
np.array_equal(a[0], a[2])
True
And then remove rows using:
np.delete(a, 2, 0)
array([[1, 2],
[3, 4]])

Related

Finding the intersection

I want intersection of x and y.Is there any way i can get output in below format.
I do not want use for loop.Since x can be of very large size.
x=np.array([[1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]])
y=np.array([1,2,0,9,9])
I want output in format:
np.array([[1],[1,2],[2]])
output can also be list of list.
Also consider a case if y is also 2D(np.array([[1,2,0,9,9],[1,5,6,8,9]])) .
You can use numpy intersect1d
arr = [np.intersect1d(z, y).tolist() for z in x]
print(arr) # [[1], [1, 2], [2]]

How can i sum up all values with the same index in a dictionary which each key has a nested list as a value?

I have a dictionary, each key of dictionary has a list of list (nested list) as its value. What I want is imagine we have:
x = {1: [[1, 2], [3, 5]], 2: [[2, 1], [2, 6]], 3: [[1, 5], [5, 4]]}
My question is how can I access each element of the dictionary and concatenate those with same index: for example first list from all keys:
[1,2] from first keye +
[2,1] from second and
[1,5] from third one
How can I do this?
You can access your nested list easily when you're iterating through your dictionary and append it to a new list and the you apply the sum function.
Code:
x={1: [[1,2],[3,5]] , 2:[[2,1],[2,6]], 3:[[1,5],[5,4]]}
ans=[]
for key in x:
ans += x[key][0]
print(sum(ans))
Output:
12
Assuming you want a list of the first elements, you can do:
>>> x={1: [[1,2],[3,5]] , 2:[[2,1],[2,6]], 3:[[1,5],[5,4]]}
>>> y = [a[0] for a in x.values()]
>>> y
[[1, 2], [2, 1], [1, 5]]
If you want the second element, you can use a[1], etc.
The output you expect is not entirely clear (do you want to sum? concatenate?), but what seems clear is that you want to handle the values as matrices.
You can use numpy for that:
summing the values
import numpy as np
sum(map(np.array, x.values())).tolist()
output:
[[4, 8], [10, 15]] # [[1+2+1, 2+1+5], [3+2+5, 5+6+4]]
concatenating the matrices (horizontally)
import numpy as np
np.hstack(list(map(np.array, x.values()))).tolist()
output:
[[1, 2, 2, 1, 1, 5], [3, 5, 2, 6, 5, 4]]
As explained in How to iterate through two lists in parallel?, zip does exactly that: iterates over a few iterables at the same time and generates tuples of matching-index items from all iterables.
In your case, the iterables are the values of the dict. So just unpack the values to zip:
x = {1: [[1, 2], [3, 5]], 2: [[2, 1], [2, 6]], 3: [[1, 5], [5, 4]]}
for y in zip(*x.values()):
print(y)
Gives:
([1, 2], [2, 1], [1, 5])
([3, 5], [2, 6], [5, 4])

Is there any function in python which can perform the inverse of numpy.repeat function?

For example
x = np.repeat(np.array([[1,2],[3,4]]), 2, axis=1)
gives you
x = array([[1, 1, 2, 2],
[3, 3, 4, 4]])
but is there something which can perform
x = np.*inverse_repeat*(np.array([[1, 1, 2, 2],[3, 3, 4, 4]]), axis=1)
and gives you
x = array([[1,2],[3,4]])
Regular slicing should work. For the axis you want to inverse repeat, use ::number_of_repetitions
x = np.repeat(np.array([[1,2],[3,4]]), 4, axis=0)
x[::4, :] # axis=0
Out:
array([[1, 2],
[3, 4]])
x = np.repeat(np.array([[1,2],[3,4]]), 3, axis=1)
x[:,::3] # axis=1
Out:
array([[1, 2],
[3, 4]])
x = np.repeat(np.array([[[1],[2]],[[3],[4]]]), 5, axis=2)
x[:,:,::5] # axis=2
Out:
array([[[1],
[2]],
[[3],
[4]]])
This should work, and has the exact same signature as np.repeat:
def inverse_repeat(a, repeats, axis):
if isinstance(repeats, int):
indices = np.arange(a.shape[axis] / repeats, dtype=np.int) * repeats
else: # assume array_like of int
indices = np.cumsum(repeats) - 1
return a.take(indices, axis)
Edit: added support for per-item repeats as well, analogous to np.repeat
For the case where we know the axis and the repeat - and the repeat is a scalar (same value for all elements) we can construct a slicing index like this:
In [1117]: a=np.array([[1, 1, 2, 2],[3, 3, 4, 4]])
In [1118]: axis=1; repeats=2
In [1119]: ind=[slice(None)]*a.ndim
In [1120]: ind[axis]=slice(None,None,a.shape[axis]//repeats)
In [1121]: ind
Out[1121]: [slice(None, None, None), slice(None, None, 2)]
In [1122]: a[ind]
Out[1122]:
array([[1, 2],
[3, 4]])
#Eelco's use of take makes it easier to focus on one axis, but requires a list of indices, not a slice.
But repeat does allow for differing repeat counts.
In [1127]: np.repeat(a1,[2,3],axis=1)
Out[1127]:
array([[1, 1, 2, 2, 2],
[3, 3, 4, 4, 4]])
Knowing axis=1 and repeats=[2,3] we should be able construct the right take indexing (probably with cumsum). Slicing won't work.
But if we only know the axis, and the repeats are unknown then we probably need some sort of unique or set operation as in #redratear's answer.
In [1128]: a2=np.repeat(a1,[2,3],axis=1)
In [1129]: y=[list(set(c)) for c in a2]
In [1130]: y
Out[1130]: [[1, 2], [3, 4]]
A take solution with list repeats. This should select the last of each repeated block:
In [1132]: np.take(a2,np.cumsum([2,3])-1,axis=1)
Out[1132]:
array([[1, 2],
[3, 4]])
A deleted answer uses unique; here's my row by row use of unique
In [1136]: np.array([np.unique(row) for row in a2])
Out[1136]:
array([[1, 2],
[3, 4]])
unique is better than set for this use since it maintains element order. There's another problem with unique (or set) - what if the original had repeated values, e.g. [[1,2,1,3],[3,3,4,1]].
Here is a case where it would be difficult to deduce the repeat pattern from the result. I'd have to look at all the rows first.
In [1169]: a=np.array([[2,1,1,3],[3,3,2,1]])
In [1170]: a1=np.repeat(a,[2,1,3,4], axis=1)
In [1171]: a1
Out[1171]:
array([[2, 2, 1, 1, 1, 1, 3, 3, 3, 3],
[3, 3, 3, 2, 2, 2, 1, 1, 1, 1]])
But cumsum on a known repeat solves it nicely:
In [1172]: ind=np.cumsum([2,1,3,4])-1
In [1173]: ind
Out[1173]: array([1, 2, 5, 9], dtype=int32)
In [1174]: np.take(a1,ind,axis=1)
Out[1174]:
array([[2, 1, 1, 3],
[3, 3, 2, 1]])
>>> import numpy as np
>>> x = np.repeat(np.array([[1,2],[3,4]]), 2, axis=1)
>>> y=[list(set(c)) for c in x] #This part remove duplicates for each array in tuple. So this will not work for x = np.repeat(np.array([[1,1],[3,3]]), 2, axis=1)=[[1,1,1,1],[3,3,3,3]. Result will be [[1],[3]]
>>> print y
[[1, 2], [3, 4]]
You dont need know to axis and repeat amount...

Finding indices of non-unique elements in Numpy array

I have found other methods, such as this, to remove duplicate elements from an array. My requirement is slightly different. If I start with:
array([[1, 2, 3],
[2, 3, 4],
[1, 2, 3],
[3, 2, 1],
[3, 4, 5]])
I would like to end up with:
array([[2, 3, 4],
[3, 2, 1]
[3, 4, 5]])
That's what I would ultimately like to end up with, but there is an extra requirement. I would also like to store either an array of indices to discard, or to keep (a la numpy.take).
I am using Numpy 1.8.1
We want to find rows which are not duplicated in your array, while preserving the order.
I use this solution to combine each row of a into a single element, so that we can find the unique rows using np.unique(,return_index=True, return_inverse= True). Then, I modified this function to output the counts of the unique rows using the index and inverse. From there, I can select all unique rows which have counts == 1.
a = np.array([[1, 2, 3],
[2, 3, 4],
[1, 2, 3],
[3, 2, 1],
[3, 4, 5]])
#use a flexible data type, np.void, to combine the columns of `a`
#size of np.void is the number of bytes for an element in `a` multiplied by number of columns
b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, inv = np.unique(b, return_index = True, return_inverse = True)
def return_counts(index, inv):
count = np.zeros(len(index), np.int)
np.add.at(count, inv, 1)
return count
counts = return_counts(index, inv)
#if you want the indices to discard replace with: counts[i] > 1
index_keep = [i for i, j in enumerate(index) if counts[i] == 1]
>>>a[index_keep]
array([[2, 3, 4],
[3, 2, 1],
[3, 4, 5]])
#if you don't need the indices and just want the array returned while preserving the order
a_unique = np.vstack(a[idx] for i, idx in enumerate(index) if counts[i] == 1])
>>>a_unique
array([[2, 3, 4],
[3, 2, 1],
[3, 4, 5]])
For np.version >= 1.9
b = a.view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
_, index, counts = np.unique(b, return_index = True, return_counts = True)
index_keep = [i for i, j in enumerate(index) if counts[i] == 1]
>>>a[index_keep]
array([[2, 3, 4],
[3, 2, 1],
[3, 4, 5]])
You can proceed as follows:
# Assuming your array is a
uniq, uniq_idx, counts = np.unique(a, axis=0, return_index=True, return_counts=True)
# to return the array you want
new_arr = uniq[counts == 1]
# The indices of non-unique rows
a_idx = np.arange(a.shape[0]) # the indices of array a
nuniq_idx = a_idx[np.in1d(a_idx, uniq_idx[counts==1], invert=True)]
You get:
#new_arr
array([[2, 3, 4],
[3, 2, 1],
[3, 4, 5]])
# nuniq_idx
array([0, 2])
If you want to delete all instances of the elements, that exists in duplicate versions, you could iterate through the array, find the indexes of elements existing in more than one version, and lastly delete these:
# The array to check:
array = numpy.array([[1, 2, 3],
[2, 3, 4],
[1, 2, 3],
[3, 2, 1],
[3, 4, 5]])
# List that contains the indices of duplicates (which should be deleted)
deleteIndices = []
for i in range(0,len(array)): # Loop through entire array
indices = range(0,len(array)) # All indices in array
del indices[i] # All indices in array, except the i'th element currently being checked
for j in indexes: # Loop through every other element in array, except the i'th element, currently being checked
if(array[i] == array[j]).all(): # Check if element being checked is equal to the j'th element
deleteIndices.append(j) # If i'th and j'th element are equal, j is appended to deleteIndices[]
# Sort deleteIndices in ascending order:
deleteIndices.sort()
# Delete duplicates
array = numpy.delete(array,deleteIndices,axis=0)
This outputs:
>>> array
array([[2, 3, 4],
[3, 2, 1],
[3, 4, 5]])
>>> deleteIndices
[0, 2]
Like that you both delete the duplicates and get a list of indices to discard.
The numpy_indexed package (disclaimer: I am its author) can be used to solve such problems in a vectorized manner:
index = npi.as_index(arr)
keep = index.count == 1
discard = np.invert(keep)
print(index.unique[keep])

Removing rows in a 2D array that have the same value

I'm looking for a quick way to remove duplicate values present in a 2D array on a first come first serve basis. I know a way to remove the rows if they are identical, but not if only one of the values is present.
a = array([[0, 1],
[3, 4],
[3, 5],
[2, 5],
[1, 2]])
As 3 is present in a[1] and a[2] I would like to delete any future occurrence of a value. Similarly with 2 in a[3] and a[4] So the output would be:
a = array([[0, 1],
[3, 4],
[2, 5]])
As can be seen, there is overlap with the value 5. Any suggestions are appreciated.
A pure-Python way will be to use set with a list-comprehension:
>>> seen = set()
>>> np.array([x for x in a if seen.isdisjoint(x) and not seen.update(x)])
array([[0, 1],
[3, 4],
[2, 5]])
The one-liner simply abuses the fact that set.update returns None, so when seen.isdisjoint(x) is True we can update the seen set using not seen.update(x).
We can also write the above code as:
seen = set()
out = []
for item in a:
# if none of items in current sub-array are present in seen set
# then add current sub-array to our list. Plus update the seen
# set with the items from current sub-array
if seen.isdisjoint(item):
out.append(item)
seen.update(item)
...
>>> out
[array([0, 1]), array([3, 4]), array([2, 5])]
>>> np.array(out)
array([[0, 1],
[3, 4],
[2, 5]])

Categories