Grab unique tuples in python list, irrespective of order

Grab unique tuples in python list, irrespective of order - python

I have a python list:
[ (2,2),(2,3),(1,4),(2,2), etc...]
What I need is some kind of function that reduces it to its unique components... which would be, in the above list:
[ (2,2),(2,3),(1,4) ]
numpy unique does not quite do this. I can think of a way to do it--convert my tuples to numbers, [22,23,14,etc.], find the uniques, and work back from there...but I don't know if the complexity won't get out of hand. Is there a function that will do what I am trying to do with tuples?
Here is a sample of code that demonstrates the problem:
import numpy as np
x = [(2,2),(2,2),(2,3)]
y = np.unique(x)
returns: y: [2 3]
And here is the implementation of the solution that demonstrates the fix:
x = [(2,2),(2,2),(2,3)]
y = list(set(x))
returns y: [(2,2),(2,3)]

If order does not matter
If the order of the result is not critical, you can convert your list to a set (because tuples are hashable) and convert the set back to a list:
>>> l = [(2,2),(2,3),(1,4),(2,2)]
>>> list(set(l))
[(2, 3), (1, 4), (2, 2)]
If order matters
(UPDATE)
As of CPython 3.6 (or any Python 3.7 version) regular dictionaries remember their insertion order, so you can simply issue.
>>> l = [(2,2),(2,3),(1,4),(2,2)]
>>> list(dict.fromkeys(l))
[(2, 2), (2, 3), (1, 4)]
(OLD ANSWER)
If the order is important, the canonical way to filter the duplicates is this:
>>> seen = set()
>>> result = []
>>> for item in l:
... if item not in seen:
... seen.add(item)
... result.append(item)
...
>>> result
[(2, 2), (2, 3), (1, 4)]
Finally, a little slower and a bit more hackish, you can abuse an OrderedDict as an ordered set:
>>> from collections import OrderedDict
>>> OrderedDict.fromkeys(l).keys() # or list(OrderedDict.fromkeys(l)) if using a version where keys() does not return a list
[(2, 2), (2, 3), (1, 4)]

Using a set will remove duplicates, and you create a list from it afterwards:
>>> list(set([ (2,2),(2,3),(1,4),(2,2) ]))
[(2, 3), (1, 4), (2, 2)]

you could simply do
y = np.unique(x, axis=0)
z = []
for i in y:
z.append(tuple(i))
The reason is that a list of tuples is interpreted by numpy as a 2D array. By setting axis=0, you'd be asking numpy not to flatten the array and return unique rows.

set() will remove all duplicates, and you can then put it back to a list:
unique = list(set(mylist))
Using set(), however, will kill your ordering. If the order matters, you can use a list comprehension that checks if the value already exists earlier in the list:
unique = [v for i,v in enumerate(mylist) if v not in mylist[:i]]
That solution is a little slow, however, so you can do it like this:
unique = []
for tup in mylist:
if tup not in unique:
unique.append(tup)

Related

Enumerate does not work with 2d arrays yet range(len()) does?

I heard somewhere that we should all use enumerate to iterate through arrays but
for i in enumerate(array):
for j in enumerate(array[i]):
print(board[i][j])
doesn't work, yet when using range(len())
for i in range(len(array)):
for j in range(len(array[i)):
print(board[i][j])
it works as intended

use it like this:
for idxI, arrayI in enumerate(array):
for idxJ, arrayJ in enumerate(arrayI):
print(board[idxI][idxJ])

Like I wrote enumerate adds an extra counter to each element. Effectively turning you list of elements into a list of tuples.
Example
array = ['a', 'b','c','d']
print(list(enumerate(array)))
gives you this:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]
So in your case what you want to do it simply add the extra element when iterating over it
for i, item1 in enumerate(array):
for j,item2 in enumerate(array[i]):
print(board[i][j])
Issue was in your case is
for i in enumerate(array):
this i is not an integer but a tuple ('1','a') in my case. And you cant access a list element with an index value of a tuple.

When one uses for i in enumerate(array): it returns a collection of tuples. When working with enumerate, the (index, obj) is returned while range based loops just go through the range specified.
>>> arr = [1,2,3]
>>> enumerate(arr)
<enumerate object at 0x105413140>
>>> list(enumerate(arr))
[(0, 1), (1, 2), (2, 3)]
>>> for i in list(enumerate(arr)):
... print(i)
...
(0, 1)
(1, 2)
(2, 3)
>>>
One has to access the first element of the tuple to get the index in order to further index.
>>> board = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> for idx1,lst in enumerate(board):
... for idx2,lst_ele in enumerate(lst): # could use enumerate(board[i])
... print(lst_ele,end=" ")
...
1 2 3 4 5 6 7 8 9
>>>
Sometimes you do not need both the index and the element so I do not think its always better to use enumerate. That being said, there are plenty of situations where its easier to use enumerate so you can grab the element faster without having to write element = array[idx].
See range() vs enumerate()
"Both are valid. The first solution [range-based] looks more similar to the problem description, while the second solution [enum-based] has a slight optimization where you don’t mutate the list potentially three times per iteration." - James Uejio

Convert list into a list of 2-tuples

I have a list:
list = [1, 2, 3]
I want to get this list and add a tuple so this list becomes like this
T_list = [(1,x), (2, x), (3,x)]
How do I do it?

Use a simple list comprehension to do this:
>>> your_list = [1, 2, 3]
>>> x = 100
>>> your_list_with_tuples = [(k, x) for k in your_list]
>>> your_list_with_tuples
Output
[(1, 100), (2, 100), (3, 100)]
Also, don't name your variables list, dict, etc, because they shadow the builtin types/functions.

A list comprehension would do the trick:
t_list = [(y,x) for y in original_list]
Also, the commenter is right. Don't name your lists list. It's a built in function. Using it for something else could cause problems down the line.

Here's my attempt, although I'm not an expert.
lis = [1,2,3]
tup = (4, 5, 6)
new_tup = tuple(lis)
new_lis = []
for i in range(0,3):
data = (new_tup[i],tup[i])
new_lis.append(data)
print(new_lis)

Does enumerate over slice perform sublist materialization?

Does this:
for i,v in enumerate(lst[from:to]):
or this:
for i,v in enumerate(itertools.islice(lst,from,to)):
...make a copy of iterated sublist?

Assuming that lst is a regular Python list, and not a Numpy array, Pandas dataframe, or some custom class supporting slice indexing, then the slice [...:...] will create a new list, whereas itertools.islice does not.
As suggested in comments, you can see this for yourself by creating both enumerate objects and modifying the original list before consuming them:
>>> lst = [1, 2, 3, 4, 5]
>>> e1 = enumerate(lst[1:4])
>>> e2 = enumerate(itertools.islice(lst, 1, 4))
>>> del lst[2] # remove second element
>>> list(e1)
[(0, 2), (1, 3), (2, 4)] # shows content of original list
>>> list(e2)
[(0, 2), (1, 4), (2, 5)] # second element skipped
Also note that this does in fact have nothing to do with enumerate, which will create a generator in both cases (on top of whatever iterable was created before by the slice).
You could also just create the two variants of slices and check their types:
>>> type(lst[1:4])
list # a new list
>>> type(itertools.islice(lst, 1, 4))
itertools.islice # some sort of generator

Making a sequence of tuples unique by a specific element

So I have a tuple of tuples
a = ((1, 2), (7, 2), (5, 2), (3, 4), (8, 4))
I would like to remove all the tuples from 'a' which have the a common second element, except for one (any one of them).
For the example above I would like the new output a = ((1,2),(3,4))
In other words I would like to eliminate tuples which are considered duplicate elements in the second position of the tuple.
I would like to know the most efficient way to achieve this, and also like to know if I can do the same with lists instead of tuples?

You could create a dictionary from your elements, with whatever you wanted to be unique as the key, then extracting the values. This works for anything where the 'unique' sub-element is hashable. Integers are hashable:
def unique_by_key(elements, key=None):
if key is None:
# no key: the whole element must be unique
key = lambda e: e
return {key(el): el for el in elements}.values()
This function is pretty generic; it can be used to extract 'unique' elements by any trait, as long as whatever the key callable returns can be used as a key in a dictionary. Order will not be preserved, and currently the last element per key wins.
With the above function you can use a operator.itemgetter() object or a lambda to extract your second value from each element. This then works for both a sequence of tuples and a sequence of lists:
from operator import itemgetter
unique_by_second_element = unique_by_key(a, key=itemgetter(1))
Demo:
>>> from operator import itemgetter
>>> a = ((1, 2), (7, 2), (5, 2), (3, 4), (8, 4))
>>> unique_by_key(a, key=itemgetter(1))
[(5, 2), (8, 4)]
>>> b = [[1, 2], [7, 2], [5, 2], [3, 4], [8, 4]]
>>> unique_by_key(b, key=itemgetter(1))
[[5, 2], [8, 4]]
Note that the function always returns a list; you can always convert that back by calling tuple() on the result.

Getting single value from multiple values in dictionary

I have a dictionary like this
dic={10:(1,4),20:(2,4),30:(3,4)}
how to get 1,2,3 as output using dic.values() without using for loop.

This works:
>>> dic={10:(1,4),20:(2,4),30:(3,4)}
>>> [x[0] for x in dic.values()]
[1, 2, 3]
>>> # Or if you want that as a tuple
>>> tuple(x[0] for x in dic.values())
(1, 2, 3)
>>> # Or a string
>>> ",".join([str(x[0]) for x in dic.values()])
'1,2,3'
>>>
You should remember though that the order of dictionaries is not guaranteed. Meaning, the key/value pairs will not always be in the same order the you put them in.
To get disordered results in the order you want, you should look at sorted.

If you look at what dic.values() produces:
>>> dic={10:(1,4),20:(2,4),30:(3,4)}
>>> dic.values()
[(1, 4), (2, 4), (3, 4)]
Obviously you want the first element of each tuple.
You can use zip to get that without looping1.
>>> zip(*dic.values())[0]
(1, 2, 3)
As pointed out in comments, an even more efficient solution is:
>>> from itertools import izip
>>> next(izip(*dic.itervalues()))
(1, 2, 3)
Then you do not have to go all the way though creating several lists just to get the first element.
The order, of course, depends on the order of the keys in dic.
1 The 'without looping' is a silly distinction IMHO. Every solution either has an explicit or implicit loop in it...

Answer: You can't. You'll have to loop through the dictionary:
for v in d.values():
print v[0]
Or using a list comprehension:
[v[0] for v in d.values()]
This filtering methods are the best you can find :)

This solution is not any better than using iterators, but it has a different approach and maybe it is more suitable for complex tasks:
from operator import itemgetter
dic={10:(1,4),20:(2,4),30:(3,4)}
print map(itemgetter(0), dic.values())
gives:
[1, 2, 3]

How about using map here.
map(lambda x: x[0], dic.values())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grab unique tuples in python list, irrespective of order - python

Using a set will remove duplicates, and you create a list from it afterwards: >>> list(set([ (2,2),(2,3),(1,4),(2,2) ])) [(2, 3), (1, 4), (2, 2)]

you could simply do y = np.unique(x, axis=0) z = [] for i in y: z.append(tuple(i)) The reason is that a list of tuples is interpreted by numpy as a 2D array. By setting axis=0, you'd be asking numpy not to flatten the array and return unique rows.

Related

Enumerate does not work with 2d arrays yet range(len()) does?

Convert list into a list of 2-tuples

Does enumerate over slice perform sublist materialization?

Making a sequence of tuples unique by a specific element

Getting single value from multiple values in dictionary

Categories

Resources