Making a sequence of tuples unique by a specific element - python

So I have a tuple of tuples
a = ((1, 2), (7, 2), (5, 2), (3, 4), (8, 4))
I would like to remove all the tuples from 'a' which have the a common second element, except for one (any one of them).
For the example above I would like the new output a = ((1,2),(3,4))
In other words I would like to eliminate tuples which are considered duplicate elements in the second position of the tuple.
I would like to know the most efficient way to achieve this, and also like to know if I can do the same with lists instead of tuples?

You could create a dictionary from your elements, with whatever you wanted to be unique as the key, then extracting the values. This works for anything where the 'unique' sub-element is hashable. Integers are hashable:
def unique_by_key(elements, key=None):
if key is None:
# no key: the whole element must be unique
key = lambda e: e
return {key(el): el for el in elements}.values()
This function is pretty generic; it can be used to extract 'unique' elements by any trait, as long as whatever the key callable returns can be used as a key in a dictionary. Order will not be preserved, and currently the last element per key wins.
With the above function you can use a operator.itemgetter() object or a lambda to extract your second value from each element. This then works for both a sequence of tuples and a sequence of lists:
from operator import itemgetter
unique_by_second_element = unique_by_key(a, key=itemgetter(1))
Demo:
>>> from operator import itemgetter
>>> a = ((1, 2), (7, 2), (5, 2), (3, 4), (8, 4))
>>> unique_by_key(a, key=itemgetter(1))
[(5, 2), (8, 4)]
>>> b = [[1, 2], [7, 2], [5, 2], [3, 4], [8, 4]]
>>> unique_by_key(b, key=itemgetter(1))
[[5, 2], [8, 4]]
Note that the function always returns a list; you can always convert that back by calling tuple() on the result.

Related

Explanation of custom key (helper) sorting function

I encountered the following example in a tutorial I watched recently.
We want to sort these numbers:
numbers = [8, 3, 1, 2, 5, 4, 7, 6]
prioritising the ones belonging to the following group:
group = {2, 3, 5, 7}
So the helper (sorting key) function implemented by the author was the following:
def helper(x):
if x in group:
return (0, x)
return (1, x)
and it sorts by calling
numbers.sort(key=helper)
I can't seem to get my head around this return (0,x) vs. return (1,x) which most likely is something easy to explain (but perhaps I am missing an element about how the sorting helper function works)
What that key function does is, instead of comparing
[8, 3, 1, 2, 5, 4, 7, 6]
it compares
[(0, 8), (1, 3), (0, 1), (1, 2), (1, 5), (0, 4), (1, 7), (0, 6)]
Tuples are sorted lexicographically, meaning that the first elements are compared first. If they differ, the comparison stop. If they're the same, the second elements are compared.
This has the effect of bringing all numbers in group to the front (in numerical order) followed by the rest of the numbers (also in numerical order).
Well, (0, x) is smaller than (1, x). In short, Python will first compare the first element, if they are the same, then the second, then the third...
Is it clear enough? I mean, in your example, all elements in that group will be considered as smaller than the elements which are not in that group.
When the following line executes numbers.sort(key=helper), an iterator iterates over each element of the list number.
While iterating, for each element, it makes a call to the helper method with the element.
If this element is a part of the group, it returns (0, element).
If it isn't a part of the group, it returns (1, element).
Now, while sorting, the elements to be sorted are [(0,x), (1,x), (0,x)...] not the actual elements.
It compares two tuples in the list and checks if the values are > or < or =.
While comparing two tuples, it first compares them based on the value at the 0th index in each element.
Then it compares them based on the 1st value in each element of the list and so on..
This results in the following output:
>>> numbers
[2, 3, 5, 7, 1, 4, 6, 8]
If there would have been characters at the first index in each element, they would have been sorted based on their ASCII values.

Does enumerate over slice perform sublist materialization?

Does this:
for i,v in enumerate(lst[from:to]):
or this:
for i,v in enumerate(itertools.islice(lst,from,to)):
...make a copy of iterated sublist?
Assuming that lst is a regular Python list, and not a Numpy array, Pandas dataframe, or some custom class supporting slice indexing, then the slice [...:...] will create a new list, whereas itertools.islice does not.
As suggested in comments, you can see this for yourself by creating both enumerate objects and modifying the original list before consuming them:
>>> lst = [1, 2, 3, 4, 5]
>>> e1 = enumerate(lst[1:4])
>>> e2 = enumerate(itertools.islice(lst, 1, 4))
>>> del lst[2] # remove second element
>>> list(e1)
[(0, 2), (1, 3), (2, 4)] # shows content of original list
>>> list(e2)
[(0, 2), (1, 4), (2, 5)] # second element skipped
Also note that this does in fact have nothing to do with enumerate, which will create a generator in both cases (on top of whatever iterable was created before by the slice).
You could also just create the two variants of slices and check their types:
>>> type(lst[1:4])
list # a new list
>>> type(itertools.islice(lst, 1, 4))
itertools.islice # some sort of generator

Grab unique tuples in python list, irrespective of order

I have a python list:
[ (2,2),(2,3),(1,4),(2,2), etc...]
What I need is some kind of function that reduces it to its unique components... which would be, in the above list:
[ (2,2),(2,3),(1,4) ]
numpy unique does not quite do this. I can think of a way to do it--convert my tuples to numbers, [22,23,14,etc.], find the uniques, and work back from there...but I don't know if the complexity won't get out of hand. Is there a function that will do what I am trying to do with tuples?
Here is a sample of code that demonstrates the problem:
import numpy as np
x = [(2,2),(2,2),(2,3)]
y = np.unique(x)
returns: y: [2 3]
And here is the implementation of the solution that demonstrates the fix:
x = [(2,2),(2,2),(2,3)]
y = list(set(x))
returns y: [(2,2),(2,3)]
If order does not matter
If the order of the result is not critical, you can convert your list to a set (because tuples are hashable) and convert the set back to a list:
>>> l = [(2,2),(2,3),(1,4),(2,2)]
>>> list(set(l))
[(2, 3), (1, 4), (2, 2)]
If order matters
(UPDATE)
As of CPython 3.6 (or any Python 3.7 version) regular dictionaries remember their insertion order, so you can simply issue.
>>> l = [(2,2),(2,3),(1,4),(2,2)]
>>> list(dict.fromkeys(l))
[(2, 2), (2, 3), (1, 4)]
(OLD ANSWER)
If the order is important, the canonical way to filter the duplicates is this:
>>> seen = set()
>>> result = []
>>> for item in l:
... if item not in seen:
... seen.add(item)
... result.append(item)
...
>>> result
[(2, 2), (2, 3), (1, 4)]
Finally, a little slower and a bit more hackish, you can abuse an OrderedDict as an ordered set:
>>> from collections import OrderedDict
>>> OrderedDict.fromkeys(l).keys() # or list(OrderedDict.fromkeys(l)) if using a version where keys() does not return a list
[(2, 2), (2, 3), (1, 4)]
Using a set will remove duplicates, and you create a list from it afterwards:
>>> list(set([ (2,2),(2,3),(1,4),(2,2) ]))
[(2, 3), (1, 4), (2, 2)]
you could simply do
y = np.unique(x, axis=0)
z = []
for i in y:
z.append(tuple(i))
The reason is that a list of tuples is interpreted by numpy as a 2D array. By setting axis=0, you'd be asking numpy not to flatten the array and return unique rows.
set() will remove all duplicates, and you can then put it back to a list:
unique = list(set(mylist))
Using set(), however, will kill your ordering. If the order matters, you can use a list comprehension that checks if the value already exists earlier in the list:
unique = [v for i,v in enumerate(mylist) if v not in mylist[:i]]
That solution is a little slow, however, so you can do it like this:
unique = []
for tup in mylist:
if tup not in unique:
unique.append(tup)

List comprehension with tuple assignment

I want to ask if something like this is possible in python:
a,b = [i,i+1 for i in range(5)]
I know this isn't possible because I have got an error, but I think you understand what I am trying to achieve. Let me clear it up, I can do :
a,b = 3+2,3
Edit ---> Or even better:
a,b = [0,1,2,3,4],[1,2,3,4,5]
I wan't a similar thing in my first code example. I am trying to assign variables 'a' and 'b' as list, with list comprehension, but using tuple as assignment, the point is I don't want to use this:
a = [i for in range(5)]
b = [i+1 for in range(5)]
I am aware that I can use this: t = [(i,i+1) for i in range(5)], but that's not the point.
By the way this is only a simple example => "i,i+1"
Edit ---> I would like to clarify my question. How to assign several variables (type list) in one line, using list comprehension?
When you run this:
a,b = [(i,i+1) for i in range(5)] # wrapped i, i+1 in parentheses (syntax error)
It makes a list of five two-item tuples, like this:
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)]
But you're trying to assign those five tuples to only two objects (a and b)
Using argument unpacking (*) in zip, you can "unzip" the output to the first and second elements of each tuple:
a,b = zip(*[(i,i+1) for i in range(5)])
Which is this:
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5)]
And can be assigned to a and b as you've written
Don't try to be clever. This is perfectly acceptable code:
>>> a = range(5)
>>> b = range(1,6)
>>> a, b
([0, 1, 2, 3, 4], [1, 2, 3, 4, 5])

How does this code snippet rotating a matrix work?

While looking for a pythonic way to rotate a matrix, I came across this answer. However there is no explanation attached to it. I copied the snippet here:
rotated = zip(*original[::-1])
How does it work?
>>> lis = [[1,2,3], [4,5,6], [7,8,9]]
[::-1] reverses the list :
>>> rev = lis[::-1]
>>> rev
[[7, 8, 9], [4, 5, 6], [1, 2, 3]]
now we use zip on all items of the rev, and append each returned tuple to rotated:
>>> rotated = []
>>> for item in zip(rev[0],rev[1],rev[2]):
... rotated.append(item)
...
>>> rotated
[(7, 4, 1), (8, 5, 2), (9, 6, 3)]
zip picks items from the same index from each of the iterable passed to it(it runs only up to the item with minimum length) and returns them as a tuple.
what is *:
* is used for unpacking all the items of rev to zip, so instead of manually typing
rev[0], rev[1], rev[2], we can simply do zip(*rev).
The above zip loop could also be written as:
>>> rev = [[7, 8, 9], [4, 5, 6], [1, 2, 3]]
>>> min_length = min(len(x) for x in rev) # find the min length among all items
>>> rotated = []
for i in xrange(min_length):
items = tuple(x[i] for x in rev) # collect items on the same index from each
# list inside `rev`
rotated.append(items)
...
>>> rotated
[(7, 4, 1), (8, 5, 2), (9, 6, 3)]
Complementary to the explanations by Ashwini and HennyH, here's a little figure to illustrate the process.
First, the [::-1] slice operator reverses the list of list, taking the entire list (thus the first two arguments can be omitted) and using a step of -1.
Second, the zip function takes a number of lists and effectively returns a new list with rows and columns reversed. The * says that the list of lists is unpacked into several lists.
As can be seen, these two operations combined will rotate the matrix.
For my explination:
>>> m = [['a','b','c'],[1,2,3]]
Which when pretty-printed would be:
>>> pprint(m)
['a', 'b', 'c']
[1, 2, 3]
Firstly, zip(*m) will create a list of all the columns in m. As demonstrated by:
>>> zip(*m)
[('a', 1), ('b', 2), ('c', 3)]
The way this works is zip takes n sequences and get's the i-th element of each one and adds it to a tuple. So translated to our matrix m where each row is represented by a list contained within m, we essentially pass in each row to zip, which then gets the 1st element from each row puts all of them into a tuple, then gets every 2nd element from each row etc... Ultimately getting every column in m i.e:
>>> zip(['row1column1','row1column2'],['row2column1','row2column2'])
[('row1column1', 'row2column1'), ('row1column2', 'row2column2')]
Notice that each tuple contains all the elements in a specific column
Now that would look like:
>>> pprint(zip(*m))
('a', 1)
('b', 2)
('c', 3)
So effectively, each column in m is now a row. Hoever it isn't in the correct order (try imagine in your head rotating m to get the matrix above, it can't be done). This why it's necessary to 'flip' the original matrix:
>>> pprint(zip(*m[::-1]))
(1, 'a')
(2, 'b')
(3, 'c')
Which results in a matrix which is the equivalent of m rotated - 90 degrees.

Categories