removing duplicate tuples in python

removing duplicate tuples in python - python

I have a list of 50 numbers, [0,1,2,...49] and I would like to create a list of tuples without duplicates, where i define (a,b) to be a duplicate of (b,a). Similarly, I do not want tuples of the form (a,a).
I have this:
pairs = set([])
mylist = range(0,50)
for i in mylist:
for j in mylist:
pairs.update([(i,j)])
set((a,b) if a<=b else (b,a) for a,b in pairs)
print len(pairs)
>>> 2500
I get 2500 whereas I expect to get, i believe, 1225 (n(n-1)/2).
What is wrong?

You want all combinations. Python provides a module, itertools, with all sorts of combinatorial utilities like this. Where you can, I would stick with using itertool, it almost certainly faster and more memory efficient than anything you would cook up yourself. It is also battle-tested. You should not reinvent the wheel.
>>> import itertools
>>> combs = list(itertools.combinations(range(50),2))
>>> len(combs)
1225
>>>
However, as others have noted, in the case where you have a sequence (i.e. something indexable) such as a list, and you want N choose k, where k=2 the above could simply be implemented by a nested for-loop over the indices, taking care to generate your indices intelligently:
>>> result = []
>>> for i in range(len(numbers)):
... for j in range(i + 1, len(numbers)):
... result.append((numbers[i], numbers[j]))
...
>>> len(result)
1225
However, itertool.combinations takes any iterable, and also takes a second argument, r which deals with cases where k can be something like 7, (and you don't want to write a staircase).
Your approach essentially takes the cartesian product, and then filters. This is inefficient, but if you wanted to do that, the best way is to use frozensets:
>>> combinations = set()
>>> for i in numbers:
... for j in numbers:
... if i != j:
... combinations.add(frozenset([i,j]))
...
>>> len(combinations)
1225
And one more pass to make things tuples:
>>> combinations = [tuple(fz) for fz in combinations]

Try This,
pairs = set([])
mylist = range(0,50)
for i in mylist:
for j in mylist:
if (i < j):
pairs.append([(i,j)])
print len(pairs)

problem in your code snippet is that you filter out unwanted values but you don't assign back to pairs so the length is the same... also: this formula yields the wrong result because it considers (20,20) as valid for instance.
But you should just create the proper list at once:
pairs = set()
for i in range(0,50):
for j in range(i+1,50):
pairs.add((i,j))
print (len(pairs))
result:
1225
With that method you don't even need a set since it's guaranteed that you don't have duplicates in the first place:
pairs = []
for i in range(0,50):
for j in range(i+1,50):
pairs.append((i,j))
or using list comprehension:
pairs = [(i,j) for i in range(0,50) for j in range(i+1,50)]

Related

Using multiple variables in a for loop in Python

I am trying to get a deeper understanding to how for loops for different data types in Python. The simplest way of using a for loop an iterating over an array is as
for i in range(len(array)):
do_something(array[i])
I also know that I can
for i in array:
do_something(i)
What I would like to know is what this does
for i, j in range(len(array)):
# What is i and j here?
or
for i, j in array:
# What is i and j in this case?
And what happens if I try using this same idea with dictionaries or tuples?

The simplest and best way is the second one, not the first one!
for i in array:
do_something(i)
Never do this, it's needlessly complicating the code:
for i in range(len(array)):
do_something(array[i])
If you need the index in the array for some reason (usually you don't), then do this instead:
for i, element in enumerate(array):
print("working with index", i)
do_something(element)
This is just an error, you will get TypeError: 'int' object is not iterable when trying to unpack one integer into two names:
for i, j in range(len(array)):
# What is i and j here?
This one might work, assumes the array is "two-dimensional":
for i, j in array:
# What is i and j in this case?
An example of a two-dimensional array would be a list of pairs:
>>> for i, j in [(0, 1), ('a', 'b')]:
... print('i:', i, 'j:', j)
...
i: 0 j: 1
i: a j: b
Note: ['these', 'structures'] are called lists in Python, not arrays.

Your third loop will not work as it will throw a TypeError for an int not being iterable. This is because you are trying to "unpack" the int that is the array's index into i, and j which is not possible. An example of unpacking is like so:
tup = (1,2)
a,b = tup
where you assign a to be the first value in the tuple and b to be the second. This is also useful when you may have a function return a tuple of values and you want to unpack them immediately when calling the function. Like,
train_X, train_Y, validate_X, validate_Y = make_data(data)
More common loop cases that I believe you are referring to is how to iterate over an arrays items and it's index.
for i, e in enumerate(array):
...
and
for k,v in d.items():
...
when iterating over the items in a dictionary. Furthermore, if you have two lists, l1 and l2 you can iterate over both of the contents like so
for e1, e2 in zip(l1,l2):
...
Note that this will truncate the longer list in the case of unequal lengths while iterating. Or say that you have a lists of lists where the outer lists are of length m and the inner of length n and you would rather iterate over the elements in the inner lits grouped together by index. This is effectively iterating over the transpose of the matrix, you can use zip to perform this operation as well.
for inner_joined in zip(*matrix): # will run m times
# len(inner_joined) == m
...

Python's for loop is an iterator-based loop (that's why bruno desthuilliers says that it "works for all iterables (lists, tuples, sets, dicts, iterators, generators etc)". A string is also another common type of iterable).
Let's say you have a list of tuples. Using that nomenclature you shared, one can iterate through both the keys and values simultaneously. For instance:
tuple_list = [(1, "Countries, Cities and Villages"),(2,"Animals"),(3, "Objects")]
for k, v in tuple_list:
print(k, v)
will give you the output:
1 Countries, Cities and Villages
2 Animals
3 Objects
If you use a dictionary, you'll also gonna be able to do this. The difference here is the need for .items()
dictionary = {1: "Countries, Cities and Villages", 2: "Animals", 3: "Objects"}
for k, v in dictionary.items():
print(k, v)
The difference between dictionary and dictionary.items() is the following
dictionary: {1: 'Countries, Cities and Villages', 2: 'Animals', 3: 'Objects'}
dictionary.items(): dict_items([(1, 'Countries, Cities and Villages'), (2, 'Animals'), (3, 'Objects')])
Using dictionary.items() we'll get a view object containig the key-value pairs of the dictionary, as tuples in a list. In other words, with dictionary.items() you'll also get a list of tuples. If you don't use it, you'll get
TypeError: cannot unpack non-iterable int object
If you want to get the same output using a simple list, you'll have to use something like enumerate()
list = ["Countries, Cities and Villages","Animals", "Objects"]
for k, v in enumerate(list, 1): # 1 means that I want to start from 1 instead of 0
print(k, v)
If you don't, you'll get
ValueError: too many values to unpack (expected 2)
So, this naturally raises the question... do I need always a list of tuples? No. Using enumerate() we'll get an enumerate object.

Actually, "the simplest way of using a for loop an iterating over an array" (the Python type is named "list" BTW) is the second one, ie
for item in somelist:
do_something_with(item)
which FWIW works for all iterables (lists, tuples, sets, dicts, iterators, generators etc).
The range-based C-style version is considered highly unpythonic, and will only work with lists or list-like iterables.
What I would like to know is what this does
for i, j in range(len(array)):
# What is i and j here?
Well, you could just test it by yourself... But the result is obvious: it will raise a TypeError because unpacking only works on iterables and ints are not iterable.
or
for i, j in array:
# What is i and j in this case?
Depends on what is array and what it yields when iterating over it. If it's a list of 2-tuples or an iterator yielding 2-tuples, i and j will be the elements of the current iteration item, ie:
array = [(letter, ord(letter)) for letter in "abcdef"]
for letter, letter_ord in array:
print("{} : {}".format(letter, letter_ord))
Else, it will most probably raise a TypeError too.
Note that if you want to have both the item and index, the solution is the builtin enumerate(sequence), which yields an (index, item) tuple for each item:
array = list("abcdef")
for index, letter in enumerate(array):
print("{} : {}".format(index, letter)

Understood your question, explaining it using a different example.
1. Multiple Assignments using -> Dictionary
dict1 = {1: "Bitcoin", 2: "Ethereum"}
for key, value in dict1.items():
print(f"Key {key} has value {value}")
print(dict1.items())
Output:
Key 1 has value Bitcoin
Key 2 has value Ethereum
dict_items([(1, 'Bitcoin'), (2, 'Ethereum')])
Explaining dict1.items():
dict1_items() creates values dict_items([(1, 'Bitcoin'), (2, 'Ethereum')])
It comes in pairs (key, value) for each iteration.
2. Multiple Assignments using -> enumerate() Function
coins = ["Bitcoin", "Ethereum", "Cardano"]
prices = [48000, 2585, 2]
for i, coin in enumerate(coins):
price = prices[i]
print(f"${price} for 1 {coin}")
Output:
$48000 for 1 Bitcoin
$2585 for 1 Ethereum
$2 for 1 Cardano
Explaining enumerate(coins):
enumerate(coins) creates values ((0, 'Bitcoin'), (1, 'Ethereum'), (2, 'Cardano'))
It comes in pairs (index, value) for each (one) iteration
3. Multiple Assignments using -> zip() Function
coins = ["Bitcoin", "Ethereum", "Cardano"]
prices = [48000, 2585, 2]
for coin, price in zip(coins, prices):
print(f"${price} for 1 {coin}")
Output:
$48000 for 1 Bitcoin
$2585 for 1 Ethereum
$2 for 1 Cardano
Explaining
zip(coins, prices) create values (('Bitcoin', 48000), ('Ethereum', 2585), ('Cardano', 2))
It comes in pairs (value-list1, value-list2) for each (one) iteration.

I just wanted to add that, even in Python, you can get a for in effect using a classic C style loop by just using a local variable
l = len(mylist) #I often need to use this more than once anyways
for n in range(l-1):
i = mylist[n]
print("iterator:",n, " item:",i)

Check number not a sum of 2 ints on a list

Given a list of integers, I want to check a second list and remove from the first only those which can not be made from the sum of two numbers from the second. So given a = [3,19,20] and b = [1,2,17], I'd want [3,19].
Seems like a a cinch with two nested loops - except that I've gotten stuck with break and continue commands.
Here's what I have:
def myFunction(list_a, list_b):
for i in list_a:
for a in list_b:
for b in list_b:
if a + b == i:
break
else:
continue
break
else:
continue
list_a.remove(i)
return list_a
I know what I need to do, just the syntax seems unnecessarily confusing. Can someone show me an easier way? TIA!

You can do like this,
In [13]: from itertools import combinations
In [15]: [item for item in a if item in [sum(i) for i in combinations(b,2)]]
Out[15]: [3, 19]
combinations will give all possible combinations in b and get the list of sum. And just check the value is present in a
Edit
If you don't want to use the itertools wrote a function for it. Like this,
def comb(s):
for i, v1 in enumerate(s):
for j in range(i+1, len(s)):
yield [v1, s[j]]
result = [item for item in a if item in [sum(i) for i in comb(b)]]

Comments on code:
It's very dangerous to delete elements from a list while iterating over it. Perhaps you could append items you want to keep to a new list, and return that.
Your current algorithm is O(nm^2), where n is the size of list_a, and m is the size of list_b. This is pretty inefficient, but a good start to the problem.
Thee's also a lot of unnecessary continue and break statements, which can lead to complicated code that is hard to debug.
You also put everything into one function. If you split up each task into different functions, such as dedicating one function to finding pairs, and one for checking each item in list_a against list_b. This is a way of splitting problems into smaller problems, and using them to solve the bigger problem.
Overall I think your function is doing too much, and the logic could be condensed into much simpler code by breaking down the problem.
Another approach:
Since I found this task interesting, I decided to try it myself. My outlined approach is illustrated below.
1. You can first check if a list has a pair of a given sum in O(n) time using hashing:
def check_pairs(lst, sums):
lookup = set()
for x in lst:
current = sums - x
if current in lookup:
return True
lookup.add(x)
return False
2. Then you could use this function to check if any any pair in list_b is equal to the sum of numbers iterated in list_a:
def remove_first_sum(list_a, list_b):
new_list_a = []
for x in list_a:
check = check_pairs(list_b, x)
if check:
new_list_a.append(x)
return new_list_a
Which keeps numbers in list_a that contribute to a sum of two numbers in list_b.
3. The above can also be written with a list comprehension:
def remove_first_sum(list_a, list_b):
return [x for x in list_a if check_pairs(list_b, x)]
Both of which works as follows:
>>> remove_first_sum([3,19,20], [1,2,17])
[3, 19]
>>> remove_first_sum([3,19,20,18], [1,2,17])
[3, 19, 18]
>>> remove_first_sum([1,2,5,6],[2,3,4])
[5, 6]
Note: Overall the algorithm above is O(n) time complexity, which doesn't require anything too complicated. However, this also leads to O(n) extra auxiliary space, because a set is kept to record what items have been seen.

You can do it by first creating all possible sum combinations, then filtering out elements which don't belong to that combination list
Define the input lists
>>> a = [3,19,20]
>>> b = [1,2,17]
Next we will define all possible combinations of sum of two elements
>>> y = [i+j for k,j in enumerate(b) for i in b[k+1:]]
Next we will apply a function to every element of list a and check if it is present in above calculated list. map function can be use with an if/else clause. map will yield None in case of else clause is successful. To cater for this we can filter the list to remove None values
>>> list(filter(None, map(lambda x: x if x in y else None,a)))
The above operation will output:
>>> [3,19]
You can also write a one-line by combining all these lines into one, but I don't recommend this.

you can try something like that:
a = [3,19,20]
b= [1,2,17,5]
n_m_s=[]
data=[n_m_s.append(i+j) for i in b for j in b if i+j in a]
print(set(n_m_s))
print("after remove")
final_data=[]
for j,i in enumerate(a):
if i not in n_m_s:
final_data.append(i)
print(final_data)
output:
{19, 3}
after remove
[20]

Python, finding unique words in multiple lists

I have the following code:
a= ['hello','how','are','hello','you']
b= ['hello','how','you','today']
len_b=len(b)
for word in a:
count=0
while count < len_b:
if word == b[count]:
a.remove(word)
break
else:
count=count+1
print a
The goal is that it basically outputs (contents of list a)-(contents of list b)
so the wanted result in this case would be a = ['are','hello']
but when i run my code i get a= ['how','are','you']
can anybody either point out what is wrong with my implementation, or is there another better way to solve this?

You can use a set to get all non duplicate elements
So you could do set(a) - set(b) for the difference of sets

The reason for this is because you are mutating the list a while iterating over it.
If you want to solve it correctly, you can try the below method. It uses list comprehension and dictionary to keep track of the number of words in the resulting set:
>>> a = ['hello','how','are','hello','you']
>>> b = ['hello','how','you','today']
>>>
>>> cnt_a = {}
>>> for w in a:
... cnt_a[w] = cnt_a.get(w, 0) + 1
...
>>> for w in b:
... if w in cnt_a:
... cnt_a[w] -= 1
... if cnt_a[w] == 0:
... del cnt_a[w]
...
>>> [y for k, v in cnt_a.items() for y in [k] * v]
['hello', 'are']
It works well in case where there are duplicates, even in the resulting list. However it may not preserve the order, but it can be easily modify to do this if you want.

set(a+b) is alright, too. You can use sets to get unique elements.

Python List Indexing or Appending?

What is the best way to add values to a List in terms of processing time, memory usage and just generally what is the best programming option.
list = []
for i in anotherArray:
list.append(i)
or
list = range(len(anotherArray))
for i in list:
list[i] = anotherArray[i]
Considering that anotherArray is for example an array of Tuples. (This is just a simple example)

It really depends on your use case. There is no generic answer here as it depends on what you are trying to do.
In your example, it looks like you are just trying to create a copy of the array, in which case the best way to do this would be to use copy:
from copy import copy
list = copy(anotherArray)
If you are trying to transform the array into another array you should use list comprehension.
list = [i[0] for i in anotherArray] # get the first item from tuples in anotherArray
If you are trying to use both indexes and objects, you should use enumerate:
for i, j in enumerate(list)
which is much better than your second example.
You can also use generators, lambas, maps, filters, etc. The reason all of these possibilities exist is because they are all "better" for different reasons. The writters of python are pretty big on "one right way", so trust me, if there was one generic way which was always better, that is the only way that would exist in python.
Edit: Ran some results of performance for tuple swap and here are the results:
comprehension: 2.682028295999771
enumerate: 5.359116118001111
for in append: 4.177091988000029
for in indexes: 4.612594166001145
As you can tell, comprehension is usually the best bet. Using enumerate is expensive.
Here is the code for the above test:
from timeit import timeit
some_array = [(i, 'a', True) for i in range(0,100000)]
def use_comprehension():
return [(b, a, i) for i, a, b in some_array]
def use_enumerate():
lst = []
for j, k in enumerate(some_array):
i, a, b = k
lst.append((b, a, i))
return lst
def use_for_in_with_append():
lst = []
for i in some_array:
i, a, b = i
lst.append((b, a, i))
return lst
def use_for_in_with_indexes():
lst = [None] * len(some_array)
for j in range(len(some_array)):
i, a, b = some_array[j]
lst[j] = (b, a, i)
return lst
print('comprehension:', timeit(use_comprehension, number=200))
print('enumerate:', timeit(use_enumerate, number=200))
print('for in append:', timeit(use_for_in_with_append, number=200))
print('for in indexes:', timeit(use_for_in_with_indexes, number=200))
Edit2:
It was pointed out to me the the OP just wanted to know the difference between "indexing" and "appending". Really, those are used for two different use cases as well. Indexing is for replacing objects, whereas appending is for adding. However, in a case where the list starts empty, appending will always be better because the indexing has the overhead of creating the list initially. You can see from the results above that indexing is slightly slower, mostly because you have to create the first list.

Best way is list comprehension :
my_list=[i for i in anotherArray]
But based on your problem you can use a generator expression (is more efficient than list comprehension when you just want to loop over your items and you don't need to use some list methods like indexing or len or ... )
my_list=(i for i in anotherArray)

I would actually say the best is a combination of index loops and value loops with enumeration:
for i, j in enumerate(list): # i is the index, j is the value, can't go wrong

Sort list of lists by unique reversed absolute condition

Context - developing algorithm to determine loop flows in a power flow network.
Issue:
I have a list of lists, each list represents a loop within the network determined via my algorithm. Unfortunately, the algorithm will also pick up the reversed duplicates.
i.e.
L1 = [a, b, c, -d, -a]
L2 = [a, d, c, -b, -a]
(Please note that c should not be negative, it is correct as written due to the structure of the network and defined flows)
Now these two loops are equivalent, simply following the reverse structure throughout the network.
I wish to retain L1, whilst discarding L2 from the list of lists.
Thus if I have a list of 6 loops, of which 3 are reversed duplicates I wish to retain all three.
Additionally, The loop does not have to follow the format specified above. It can be shorter, longer, and the sign structure (e.g. pos pos pos neg neg) will not occur in all instances.
I have been attempting to sort this by reversing the list and comparing the absolute values.
I am completely stumped and any assistance would be appreciated.
Based upon some of the code provided by mgibson I was able to create the following.
def Check_Dup(Loops):
Act = []
while Loops:
L = Loops.pop()
Act.append(L)
Loops = Popper(Loops, L)
return Act
def Popper(Loops, L):
for loop in Loops:
Rev = loop[::-1]
if all (abs(x) == abs(y) for x, y in zip(loop_check, Rev)):
Loops.remove(loop)
return Loops
This code should run until there are no loops left discarding the duplicates each time. I'm accepting mgibsons answers as it provided the necessary keys to create the solution

I'm not sure I get your question, but reversing a list is easy:
a = [1,2]
a_rev = a[::-1] #new list -- if you just want an iterator, reversed(a) also works.
To compare the absolute values of a and a_rev:
all( abs(x) == abs(y) for x,y in zip(a,a_rev) )
which can be simplified to:
all( abs(x) == abs(y) for x,y in zip(a,reversed(a)) )
Now, in order to make this as efficient as possible, I would first sort the arrays based on the absolute value:
your_list_of_lists.sort(key = lambda x : map(abs,x) )
Now you know that if two lists are going to be equal, they have to be adjacent in the list and you can just pull that out using enumerate:
def cmp_list(x,y):
return True if x == y else all( abs(a) == abs(b) for a,b in zip(a,b) )
duplicate_idx = [ idx for idx,val in enumerate(your_list_of_lists[1:])
if cmp_list(val,your_list_of_lists[idx]) ]
#now remove duplicates:
for idx in reversed(duplicate_idx):
_ = your_list_of_lists.pop(idx)
If your (sub) lists are either strictly increasing or strictly decreasing, this becomes MUCH simpler.
lists = list(set( tuple(sorted(x)) for x in your_list_of_lists ) )

I don't see how they can be equivalent if you have c in both directions - one of them must be -c
>>> a,b,c,d = range(1,5)
>>> L1 = [a, b, c, -d, -a]
>>> L2 = [a, d, -c, -b, -a]
>>> L1 == [-x for x in reversed(L2)]
True
now you can write a function to collapse those two loops into a single value
>>> def normalise(loop):
... return min(loop, [-x for x in reversed(L2)])
...
>>> normalise(L1)
[1, 2, 3, -4, -1]
>>> normalise(L2)
[1, 2, 3, -4, -1]
A good way to eliminate duplicates is to use a set, we just need to convert the lists to tuples
>>> L=[L1, L2]
>>> set(tuple(normalise(loop)) for loop in L)
set([(1, 2, 3, -4, -1)])

[pair[0] for pair in frozenset(sorted( (c,negReversed(c)) ) for c in cycles)]
Where:
def negReversed(list):
return tuple(-x for x in list[::-1])
and where cycles must be tuples.
This takes each cycle, computes its duplicate, and sorts them (putting them in a pair that are canonically equivalent). The set frozenset(...) uniquifies any duplicates. Then you extract the canonical element (in this case I arbitrarily chose it to be pair[0]).
Keep in mind that your algorithm might be returning cycles starting in arbitrary places. If this is the case (i.e. your algorithm might return either [1,2,-3] or [-3,1,2]), then you need to consider these as equivalent necklaces
There are many ways to canonicalize necklaces. The above way is less efficient because we don't care about canonicalizing the necklace directly: we just treat the entire equivalence class as the canonical element, by turning each cycle (a,b,c,d,e) into {(a,b,c,d,e), (e,a,b,c,d), (d,e,a,b,c), (c,d,e,a,b), (b,c,d,e,a)}. In your case since you consider negatives to be equivalent, you would turn each cycle into {(a,b,c,d,e), (e,a,b,c,d), (d,e,a,b,c), (c,d,e,a,b), (b,c,d,e,a), (-a,-b,-c,-d,-e), (-e,-a,-b,-c,-d), (-d,-e,-a,-b,-c), (-c,-d,-e,-a,-b), (-b,-c,-d,-e,-a)}. Make sure to use frozenset for performance, as set is not hashable:
eqClass.pop() for eqClass in {frozenset(eqClass(c)) for c in cycles}
where:
def eqClass(cycle):
for rotation in rotations(cycle):
yield rotation
yield (-x for x in rotation)
where rotation is something like Efficient way to shift a list in python but yields a tuple

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

removing duplicate tuples in python - python

Try This, pairs = set([]) mylist = range(0,50) for i in mylist: for j in mylist: if (i < j): pairs.append([(i,j)]) print len(pairs)

Related

Using multiple variables in a for loop in Python

Check number not a sum of 2 ints on a list

Python, finding unique words in multiple lists

Python List Indexing or Appending?

Sort list of lists by unique reversed absolute condition

Categories

Resources