I have a list made up of arrays. All have shape (2,).
Minimum example: mylist = [np.array([1,2]),np.array([1,2]),np.array([3,4])]
I would like to get a unique list, e.g.
[np.array([1,2]),np.array([3,4])]
or perhaps even better, a dict with counts, e.g. {np.array([1,2]) : 2, np.array([3,4]) : 1}
So far I tried list(set(mylist)), but the error is TypeError: unhashable type: 'numpy.ndarray'
As the error indicates, NumPy arrays aren't hashable. You can turn them to tuples, which are hashable and build a collections.Counter from the result:
from collections import Counter
Counter(map(tuple,mylist))
# Counter({(1, 2): 2, (3, 4): 1})
If you wanted a list of unique tuples, you could construct a set:
set(map(tuple,mylist))
# {(1, 2), (3, 4)}
In general, the best option is to use np.unique method with custom parameters
u, idx, counts = np.unique(X, axis=0, return_index=True, return_counts=True)
Then, according to documentation:
u is an array of unique arrays
idx is the indices of the X that give the unique values
counts is the number of times each unique item appears in X
If you need a dictionary, you can't store hashable values in its keys, so you might like to store them as tuples like in #yatu's answer or like this:
dict(zip([tuple(n) for n in u], counts))
Pure numpy approach:
numpy.unique(mylist, axis=0)
which produces a 2d array with your unique arrays in rows:
numpy.array([
[1 2],
[3 4]])
Works if all your arrays have same length (like in your example).
This solution can be useful depending on what you do earlier in your code: perhaps you would not need to get into plain Python at all, but stick to numpy instead, which should be faster.
Use the following:
import numpy as np
mylist = [np.array([1,2]),np.array([1,2]),np.array([3,4])]
np.unique(mylist, axis=0)
This gives out list of uniques arrays.
array([[1, 2],
[3, 4]])
Source: https://numpy.org/devdocs/user/absolute_beginners.html#how-to-get-unique-items-and-counts
Related
solution = [[1,0,0],[0,1,0], [1,0,0], [1,0,0]]
I have the above nested list, which contain some other lists inside it, how do we need to get the unique lists inside the solution
output = [[1,0,0],[0,1,0]
Note: each list is of same size
Things I have tried :
Take each list and compare with all other lists to see if its duplicated or not ? but it is very slow..
How can I check before inserting inserting list , is there any deuplicate of it so to avoid inserting duplicates
If you don't care about the order, you can use set:
solution = [[1,0,0],[0,1,0],[1,0,0],[1,0,0]]
output = set(map(tuple, solution))
print(output) # {(1, 0, 0), (0, 1, 0)}
Since lists are mutable objects you can't really check identity very quickly. You could convert to tuple, however, and store the tuple-ized view of each list in a set.
Tuples are heterogenous immutable containers, unlike lists which are mutable and idiomatically homogenous.
from typing import List, Any
def de_dupe(lst: List[List[Any]]) -> List[List[Any]]:
seen = set()
output = []
for element in lst:
tup = tuple(element)
if tup in seen:
continue # we've already added this one
seen.add(tup)
output.append(element)
return output
solution = [[1,0,0],[0,1,0], [1,0,0], [1,0,0]]
assert de_dupe(solution) == [[1, 0, 0], [0, 1, 0]]
Pandas duplicate might be of help.
import pandas as pd
df=pd.DataFrame([[1,0,0],[0,1,0], [1,0,0], [1,0,0]])
d =df[~df.duplicated()].values.tolist()
Output
[[1, 0, 0], [0, 1, 0]]
or, since you tag multidimensional-array, you can use numpy approach.
import numpy as np
def unique_rows(a):
a = np.ascontiguousarray(a)
unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
arr=np.array([[1,0,0],[0,1,0], [1,0,0], [1,0,0]])
output=unique_rows(arr).tolist()
Based on the suggestion in this OP
try this solution :
x=[[1,0,0],[0,1,0], [1,0,0], [1,0,0]]
Import numpy and convert the nested list into a numpy array
import numpy as np
a1=np.array(x)
find unique across rows
a2 = np.unique(a1,axis=0)
Convert it back to a nested list
a2.tolist()
Hope this helps
while lists are not hashable and therefore inefficient to duplicate, tuples are. So one way would be to transform your list into tuples and duplicate those.
>>> solution_tuples = [(1,0,0), (0,1,0), (1,0,0), (1,0,0)]
>>> set(solution_tuples)
{(1, 0, 0), (0, 1, 0)}
Suppose I have an array of size n that has some float values. I want to create a new array that has subarrays in it, where each subarray would have the indices of all elements in the original array that have equal values. So, for example, given an array givenArray=[50,20,50,20,40], the answer would be resultArray=[[0,2],[1,3],[4]].
The brute force way is to iterate on the original array, and in each iteration, iterate on the result array, compare the value to the first value in each subarray; if equal to it, add its index there. If not equal to the first value of any of the subarrays, create a new subarray and put its index there. Code for this in python would be something like:
resultArray=[]
for i in range(0,len(givenArray)):
flag=0
for j in range(0,len(resultArray)):
if(givenArray[i]==givenArray[resultArray[j][0]]):
resultArray[j].append(i)
flag=1
if(flag==0):
resultArray.append([i])
This solution has a complexity of O(n^2). Can this be done at a better complexity? How? Ideas and python code would be highly appreciated! Thanks a lot in advance!
Aly
You could use a defaultdict and enumerate to do this in linear time:
from collections import defaultdict
result = defaultdict(list)
for i, n in enumerate(givenArray):
result[n].append(i)
# {50: [0, 2], 20: [1, 3], 40: [4]}
result = [*result.values()]
# [[0, 2], [1, 3], [4]]
Note however, that your example has int values and not float. floats are less well-behaved as dictionary keys as they might be subject to rounding or precision errors, especially if they are results of some sort of calculations.
#schwobaseggl's answer with a dict is probably the best, but for completeness, here is a solution using groupby.
This solution returns the groups in increasing order of the values.
import operator
import itertools
def group_indices(array):
sorted_with_indices = sorted(enumerate(array), key=operator.itemgetter(1))
groups = itertools.groupby(sorted_with_indices, key=operator.itemgetter(1))
return [[i for i,v in g] for k,g in groups]
print(group_indices([50,20,50,20,40]))
# [[1, 3], [4], [0, 2]]
Relevant documentation:
builtin enumerate;
builtin sorted;
operator.itemgetter;
itertools.groupby.
My array looks like this:
a = ([1,2],[2,3],[4,5],[3,8])
I did the following to delete odd indexes :
a = [v for i, v in enumerate(a) if i % 2 == 0]
but it dives me now two different arrays instead of one two dimensional:
a= [array([1, 2]), array([4, 5])]
How can I keep the same format as the beginning? thank you!
That is as simple as
a[::2]
which yields the lines with even index.
Use numpy array indexing, not comprehensions:
c = a[list(range(0,len(a),2)),:]
If you define c as the output of a list comprehension, it will return a list of one-dimensional numpy arrays. Instead, using the proper indexing maintains the result a numpy array.
Note than instead of "deleting" the odd indices, what we do is specify what to keep: take all lines with an even index (the list(range(0,len(a),2)) part) and for each line take all elements (the : part)
I'm trying to do what I think should be simple:
I make a 2D list:
a = [[1,5],[2,6],[3,7]]
and I want to slide out the first column and tried:
1)
a[:,0]
...
TypeError: list indices must be integers or slices, not tuple
2)
a[:,0:1]
...
TypeError: list indices must be integers or slices, not tuple
3)
a[:][0]
[1, 5]
4)
a[0][:]
[1, 5]
5) got it but is this the way to do it?
aa[0] for aa in a
Using numpy it would be easy but what is the Python way?
2D slicing like a[:, 0] only works for NumPy arrays, not for lists.
However you can transpose (rows become columns and vice versa) nested lists using zip(*a). After transposing, simply slice out the first row:
a = [[1,5],[2,6],[3,7]]
print zip(*a) # [(1, 2, 3), (5, 6, 7)]
print list(zip(*a)[0]) # [1, 2, 3]
What you are trying to do in numerals 1 and 2 works in numpy arrays (or similarly with pandas dataframes), but not with basic python lists. If you want to do it with basic python lists, see the answer from #cricket_007 in the comments to your question.
One of the reasons to use numpy is exactly this - it makes it much easier to slice arrays with multiple dimensions
Use [x[0] for x in a] is the clear and proper way.
Say that I have 4 numpy arrays
[1,2,3]
[2,3,1]
[3,2,1]
[1,3,2]
In this case, I've determined [1,2,3] is the "minimum array" for my purposes, as it is one of two arrays with lowest value at index 0, and of those two arrays it has the the lowest index 1. If there were more arrays with similar values, I would need to compare the next index values, and so on.
How can I extract the array [1,2,3] in that same order from the pile?
How can I extend that to x arrays of size n?
Thanks
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
Here's an idea using numpy:
import numpy
a = numpy.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
col = 0
while a.shape[0] > 1:
b = numpy.argmin(a[:,col:], axis=1)
a = a[b == numpy.min(b)]
col += 1
print a
This checks column by column until only one row is left.
numpy's lexsort is close to what you want. It sorts on the last key first, but that's easy to get around:
>>> a = np.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
>>> order = np.lexsort(a[:, ::-1].T)
>>> order
array([0, 3, 1, 2])
>>> a[order]
array([[1, 2, 3],
[1, 3, 2],
[2, 3, 1],
[3, 2, 1]])