Find duplicates in a list of lists with tuples - python

I am trying to find duplicates within tuples that are nested within a list. This whole construction is a list too. If there are other better ways to organize this to let my problem to be solved - I'd be glad to know, because this is something I build on the way.
pairsList = [
[1, (11, 12), (13, 14)], #list1
[2, (21, 22), (23, 24)], #list2
[3, (31, 32), (13, 14)], #list3
[4, (43, 44), (21, 22)], #list4
]
The first element in each list uniquely identifies each list.
From this object pairsList, I want to find out which lists have identical tuples. So I want to report that list1 has the same tuple as list3 (because both have (13,14). Likewise, list2 and list4 have the same tuple (both have (21,22)) and need to be reported. The position of tuples within the list doesn't matter (list2 and list4 both have (13,14) even though the position in the list the tuple has is different).
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
I am aware of sets and have used them to delete duplicates within the lists in other situations, but cannot understand how to solve this problem. I can check like this if one list contains any element from the other list:
list1 = [1, (11, 12), (13, 14)]
list2 = [3, (31, 32), (13, 14)]
print not set(list1).isdisjoint(list2)
>>>True
So, the code below lets me know what lists have same tuple(s) as the first one. But what is the correct way to perform this on all the lists?
counter = 0
for pair in pairsList:
list0 = pairsList[0]
iterList = pairsList[counter]
if not set(list0).isdisjoint(iterList):
print iterList[0] #print list ID
counter += 1

The first element in each list uniquely identifies each list.
Great, then let's convert it to a dict first:
d = {x[0]: x[1:] for x in pairsList}
# d:
{1: [(11, 12), (13, 14)],
2: [(21, 22), (23, 24)],
3: [(31, 32), (13, 14)],
4: [(43, 44), (21, 22)]}
Let's index the whole data structure:
index = {}
for k, vv in d.iteritems():
for v in vv:
index.setdefault(v, []).append(k)
Now index is:
{(11, 12): [1],
(13, 14): [1, 3],
(21, 22): [2, 4],
(23, 24): [2],
(31, 32): [3],
(43, 44): [4]}
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
pairs = [v for v in index.itervalues() if len(v) == 2]
returns [[1,3],[2,4]].

Related

List comprehension not giving expected result

I have a list of tuples, with each tuple containing a tuple pair, coordinates.
list1 = [((11, 11), (12, 12)), ((21, 21), (22, 22)), ((31, 31), (32, 32))]
Using list comprehension I am trying to generate a list without them being paired but still in the same order.
I was able to get the result be looping and using .append() but I was trying to avoid this.
new_list = []
for i in list1:
for x in i:
new_list.append(x)
print(new_list)
this works and gives me the result I am looking for:
[(11, 11), (12, 12), (21, 21), (22, 22), (31, 31), (32, 32)]
But when I try list comprehension I get the last tuple pair repeated!
new_list = [x for x in i for i in list1]
print(new_list)
[(31, 31), (31, 31), (31, 31), (32, 32), (32, 32), (32, 32)]
I am sure it is a small thing I am doing wrong so would appreciate the help!!
try :
print([inner_tuple for outer_tuple in list1 for inner_tuple in outer_tuple])
Nested fors should come to the right side. for sub in list1 after that for x in sub.
And btw, you didn't get NameError: name 'i' is not defined because, you wrote that list-comprehension after the for loop. In that time i is a global variable and exists.
The execution of the loop is out-inner type
Your code misses is close but it takes x only
Your code:
new_list = []
for i in list1:
for x in i:
new_list.append(x)
which is equivalent to
new_list = [x for i in list1 for x in i]
So, now you can understand the variables how they are taking values from for loops

Python: Lists to Dictionary

I'm writing this question despite the many answers on stackoverflow as the solutions did not work for my problem.
I have 2 Lists, List1 and List2. When I dict(zip(List1,List2)) the order of the elements inside the dictionary are disturbed.
print s_key
print value
sorted_dict = {k: v for k,v in zip(s_key,value)}
another_test = dict(zip(s_key,value))
print sorted_dict
print another_test
print zip(s_key,value))
Terminal :
[2, 1, 3]
[31, 12, 5]
{1: 12, 2: 31, 3: 5}
{1: 12, 2: 31, 3: 5}
[(2, 31), (1, 12), (3, 5)]
I was under the impression that the [(2, 31), (1, 12), (3, 5)] would be converted to a dict
Any help to understand where or what I'm doing wrong would help! Thanks!
a=[2, 1, 3]
b=[31, 12, 5]
from collections import OrderedDict
print(OrderedDict(zip(a,b)))
You cannot sort a dictionary, in your case if you wanted to display sorted key/values of your dictionary you can convert it to a list of tuples as you have and sort it by whichever element you want. In the code below it creates a list of tuples and sorts by the first element in the tuples:
l1,l2=[2, 1, 3],[31, 12, 5]
print ([(one,two) for (one,two) in
sorted(zip(l1,l2),key=lambda pair: pair[0])])
prints:
[(1, 12), (2, 31), (3, 5)]
shoutout to Sorting list based on values from another list? for the help
Either that or create a list of the dictionaries keys and sort the list then loop through the list and call each key
Or use ordered dict as others have pointed out

Sum values in tuple (values in dict)

I have a dictionary data that looks like that with sample values:
defaultdict(<type 'list'>,
{(None, 2014): [(5, 1), (10, 2)],
(u'Middle', 2014): [(6, 2), (11, 3)],
(u'SouthWest', 2015): [(7,3), (12, 4)]})
I get this from collections.defaultdict(list) because my keys have to be lists.
My goal is to get a new dictionary that will contain the sum values for every tuple with respect to their position in the tuple.
By running
out = {k:(sum(tup[0] for tup in v),sum(tup[1] for tup in v)) for k,v in data.items()}
I get
{(None, 2014): (15, 3), (u'Middle', 2014): (17, 5), (u'SouthWest', 2015): (19, 7)}
However, I don't know in advance how many items will be in every tuple, so using the sum(tup[0] for tup in v) with hard-coded indices is not an option. I know, however, how many integers will be in the tuple. This value is an integer and I get this along with the data dict. All tuples are always of the same length (in this example, of length 2).
How do I tell Python that I want the out dict to contain tuple of the size that matches the length I have to use?
I think you want the built-in zip function:
In [26]: {k: tuple(sum(x) for x in zip(*v)) for k, v in data.items()}
Out[26]:
{('SouthWest', 2015): (19, 7),
(None, 2014): (15, 3),
('Middle', 2014): (17, 5)}

removing something from a list of tuples

Say I have a list:
[(12,34,1),(123,34,1),(21,23,1)]
I want to remove the 1 from each tuple in the list so it becomes
[(12,34),(123,34),(21,23)]
You want to truncate your tuples, use a list comprehension:
[t[:-1] for t in listoftuples]
or, as a simple demonstration:
>>> listoftuples = [(12,34,1),(123,34,1),(21,23,1)]
>>> [t[:-1] for t in listoftuples]
[(12, 34), (123, 34), (21, 23)]
Tuples are immutable, so you can not remove an item. However, you can create a new tuple from the old tuple not including the elements you do not want to. So, to delete an arbitrary item from each tuple from a list of tuples, you can do:
def deleteItem(lst, toDel):
return [tuple(x for x in y if x != toDel) for y in lst]
Result:
>>> lst = [(12,34,1),(123,34,1),(21,23,1)]
>>> deleteItem(lst, 1)
[(12, 34), (123, 34), (21, 23)]
>>> a=[(12, 34, 1), (123, 34, 1), (21, 23, 1)]
>>> [filter (lambda a: a != 1, x) for x in a]
[(12, 34), (123, 34), (21, 23)]
THis will remove all 1 from the tuple irrespective of index
Since you can't change tuples (as they are immutable), I suggest using a list of lists:
my_list = [[12,34,1],[123,34,1],[21,23,1]]
for i in my_list:
i.remove(1)
return my_list
This returns: [[12, 34], [123, 34], [21, 21]].
python 3.2
1. [(i,v)for i,v,c in list1]
2. list(map(lambda x:x[:2],list1))

List of minimal pairs from a pair of lists

Given two lists of integers, generate the shortest list of pairs where every value in both lists is present. The first of each pair must be a value from the first list, and the second of each pair must be a value from the second list. The first of each pair must be less than the second of the pair.
A simple zip will not work if the lists are different lengths, or if the same integer exists at the same position in each list.
def gen_min_pairs(uplist, downlist):
for pair in zip(uplist, downlist):
yield pair
Here is what I can come up with so far:
def gen_min_pairs(uplist, downlist):
up_gen = iter(uplist)
down_gen = iter(downlist)
last_up = None
last_down = None
while True:
next_out = next(up_gen, last_up)
next_down = next(down_gen, last_down)
if (next_up == last_up and
next_down == last_down):
return
while not next_up < next_down:
next_down = next(down_gen, None)
if next_down is None:
return
yield next_up, next_down
last_up = next_up
last_down = next_down
And here is a simple test routine:
if __name__ == '__main__':
from pprint import pprint
datalist = [
{
'up': [1,7,8],
'down': [6,7,13]
},
{
'up': [1,13,15,16],
'down': [6,7,15]
}
]
for dates in datalist:
min_pairs = [pair for pair in
gen_min_pairs(dates['up'], dates['down'])]
pprint(min_pairs)
The program produces the expect output for the first set of dates, but fails for the second.
Expected:
[(1, 6), (7, 13), (8, 13)]
[(1, 6), (1, 7), (13, 15)]
Actual:
[(1, 6), (7, 13), (8, 13)]
[(1, 6), (13, 15)]
I think this can be done while only looking at each element of each list once, so in the complexity O(len(up) + len(down)). I think it depends on the number elements unique to each list.
EDIT: I should add that we can expect these lists to be sorted with the smallest integer first.
EDIT: uplist and downlist were just arbitrary names. Less confusing arbitrary ones might be A and B.
Also, here is a more robust test routine:
from random import uniform, sample
from pprint import pprint
def random_sorted_sample(maxsize=6, pop=31):
size = int(round(uniform(1,maxsize)))
li = sample(xrange(1,pop), size)
return sorted(li)
if __name__ == '__main__':
A = random_sorted_sample()
B = random_sorted_sample()
min_pairs = list(gen_min_pairs(A, B))
pprint(A)
pprint(B)
pprint(min_pairs)
This generates random realistic inputs, calculates the output, and displays all three lists. Here is an example of what a correct implementation would produce:
[11, 13]
[1, 13, 28]
[(11, 13), (13, 28)]
[5, 15, 24, 25]
[3, 13, 21, 22]
[(5, 13), (15, 21), (15, 22)]
[3, 28]
[4, 6, 15, 16, 30]
[(3, 4), (3, 6), (3, 15), (3, 16), (28, 30)]
[2, 5, 20, 24, 26]
[8, 12, 16, 21, 23, 28]
[(2, 8), (5, 12), (5, 16), (20, 21), (20, 23), (24, 28), (26, 28)]
[3, 4, 5, 6, 7]
[1, 2]
[]
I had many ideas to solve this (see edit history ;-/) but none of them quite worked out or did it in linear time. It took me a while to see it, but I had a similar problem before so I really wanted to figure this out ;-)
Anyways, in the end the solution came when I gave up on doing it directly and started drawing graphs about the matchings. I think your first list simply defines intervals and you're looking for the items that fall into them:
def intervals(seq):
seq = iter(seq)
current = next(seq)
for s in seq:
yield current,s
current = s
yield s, float("inf")
def gen_min_pairs( fst, snd):
snd = iter(snd)
s = next(snd)
for low, up in intervals(fst):
while True:
# does it fall in the current interval
if low < s <= up:
yield low, s
# try with the next
s = next(snd)
else:
# nothing in this interval, go to the next
break
zip_longest is called izip_longest in python 2.x.
import itertools
def MinPairs(up,down):
if not (up or down):
return []
up=list(itertools.takewhile(lambda x:x<down[-1],up))
if not up:
return []
down=list(itertools.dropwhile(lambda x:x<up[0],down))
if not down:
return []
for i in range(min(len(up),len(down))):
if up[i]>=down[i]:
up.insert(i,up[i-1])
return tuple(itertools.zip_longest(up,down,fillvalue=(up,down)[len(up)>len(down)][-1]))
While not a complete answers (i.e. no code), have you tried looking at the numpy "where" module?

Categories