List comprehension behavior in python - python

I am working with Codeskulptor on a rock collision problem. I want to check collisions between rocks and my rocks are in a list. I came up with the solution to build a list of combinations and then check for collision.
I do not have itertools available.
My combination list was created like this:
def combinations(items):
n_items = [(n,item) for n,item in enumerate(items)]
return [(item,item2) for n,item in n_items for m,item2 in n_items[n:] if n != m]
letters = ['A','B','C','D']
print combinations(letters)
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
The result is ok.
I tried to do this in a one liner before with functions:
def combinations2(items):
return [(item,item2) for n,item in enumerate(items) for m,item2 in enumerate(items[n:]) if n != m]
letters = ['A','B','C','D']
print combinations2(letters)
But the outcome is completely different and wrong:
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'B'), ('B', 'D'), ('C', 'C'), ('C', 'D'), ('D', 'D')]
List comprehension is still a little black magic to me. I cannot explain this behavior, would like to understand the wrong out though.
I know that my two line solution is much faster, since enumerate is only done once and than used. But the wrong output is unexplainable to me, especially as BC is missing and BB CC DD doubles are there while AA is missing.
Can someone help me?

First thing to do when understanding a list comprehension is to expand it to a regular set of for loops. Read the loops from left to right and nest accordingly.
Working code:
def combinations(items):
n_items = []
for n,item in enumerate(items):
n_items.append((n,item))
result = []
for n, item in n_items:
for m, item2 in n_items[n:]:
if n != m:
result.append((item, item2))
return result
and your attempt that doesn't work:
def combinations2(items):
result = []
for n, item in enumerate(items):
for m, item2 in enumerate(items[n:]):
if n != m:
result.append((item, item2))
return result
Perhaps this way it is easier to see what goes wrong between the two versions.
Your version slices just items, not the indices produced by enumerate(). The original version slices [(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D')] down to [(1, 'B'), (2, 'C'), (3, 'D')], etc. while your version re-numbers that slice to [(0, 'B'), (1, 'C'), (2, 'D')]. This in turn leads to your erroneous output.
Start the inner loop at the higher index by adding a second argument to the enumerate() function, the index at which to start numbering:
def combinations2(items):
result = []
for n, item in enumerate(items):
for m, item2 in enumerate(items[n:], n):
if n != m:
result.append((item, item2))
return result
Back to a one-liner:
def combinations2(items):
return [(item, item2) for n, item in enumerate(items) for m, item2 in enumerate(items[n:], n) if n != m]
This then works correctly:
>>> def combinations2(items):
... return [(item, item2) for n, item in enumerate(items) for m, item2 in enumerate(items[n:], n) if n != m]
...
>>> letters = ['A','B','C','D']
>>> combinations2(letters)
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
Note that you can simplify it further; the only time when n == m is True is for the first iteration of each inner loop. Just slice the items list for the inner list one value further; start the outer enumerate() at 1, drop the inner enumerate() and drop the n != m test:
def combinations3(items):
result = []
for n, item in enumerate(items, 1):
for item2 in items[n:]:
result.append((item, item2))
return result
or as a list comprehension:
def combinations3(items):
return [(item, item2) for n, item in enumerate(items, 1) for item2 in items[n:]]

Just skip the clashes in the iterator.
>>> letter = ['A', 'B', 'C', 'D']
>>> list ( (x,y) for n, x in enumerate(letter) for y in letter[n+1:])
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]

Suppose you just want to get the list of combinations.
def combinations2(items):
return filter(lambda (i,j): i <> j, [(i,j) for i in items for j in items])
letters = ['A','B','C','D']
print combinations2(letters)
The output I got is:
[('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'A'), ('B', 'C'), ('B', 'D'), ('C', 'A'), ('C', 'B'), ('C', 'D'), ('D', 'A'), ('D', 'B'), ('D', 'C')]

Related

Elegant Python code for list decomposition

I tried to write code to solve the list decomposition with All possibilities.
The code I wrote was a mess. I need an elegant solution to solve the problem, because I want to improve my coding style.
I tried to write an initial version as follows, But the memory requirements are too large and the execution speed is too slow.
import itertools
powerset = lambda iterable: itertools.chain.from_iterable(
itertools.combinations(list(iterable), r)
for r in range(1, len(list(iterable)) + 1))
flatten = lambda list2d: [item for sublist in list2d for item in sublist]
x = list("abcd")
xxx = [val for val in powerset([val1 for val1 in powerset(x)] )]
xxxx = [val for val in xxx if x == list(sorted(flatten(val)))]
xxxx is :
[(('a', 'b', 'c', 'd'),),
(('a',), ('b', 'c', 'd')),
(('b',), ('a', 'c', 'd')),
(('c',), ('a', 'b', 'd')),
(('d',), ('a', 'b', 'c')),
(('a', 'b'), ('c', 'd')),
(('a', 'c'), ('b', 'd')),
(('a', 'd'), ('b', 'c')),
(('a',), ('b',), ('c', 'd')),
(('a',), ('c',), ('b', 'd')),
(('a',), ('d',), ('b', 'c')),
(('b',), ('c',), ('a', 'd')),
(('b',), ('d',), ('a', 'c')),
(('c',), ('d',), ('a', 'b')),
(('a',), ('b',), ('c',), ('d',))]
version 2:
import itertools
powerset = lambda iterable: itertools.chain.from_iterable(
itertools.combinations(list(iterable), r)
for r in range(1, len(list(iterable)) + 1))
flatten = lambda list2d: [item for sublist in list2d for item in sublist]
def makelist(list_1D):
for val in powerset(list(powerset(list_1D))) :
if list_1D == list(sorted(flatten(val))) :
yield val
if val == tuple(itertools.combinations(list_1D, 1)) :
break
for d in makelist(list("abcd")) :
print(d)
output:
(('a', 'b', 'c', 'd'),)
(('a',), ('b', 'c', 'd'))
(('b',), ('a', 'c', 'd'))
(('c',), ('a', 'b', 'd'))
(('d',), ('a', 'b', 'c'))
(('a', 'b'), ('c', 'd'))
(('a', 'c'), ('b', 'd'))
(('a', 'd'), ('b', 'c'))
(('a',), ('b',), ('c', 'd'))
(('a',), ('c',), ('b', 'd'))
(('a',), ('d',), ('b', 'c'))
(('b',), ('c',), ('a', 'd'))
(('b',), ('d',), ('a', 'c'))
(('c',), ('d',), ('a', 'b'))
(('a',), ('b',), ('c',), ('d',))
version 3 from Time Complexity of finding all partitions of a set
def partition(collection):
global counter
if len(collection) == 1:
yield [collection]
return
first = collection[0]
for smaller in partition(collection[1:]):
for n, subset in enumerate(smaller):
yield smaller[:n] + [[first] + subset] + smaller[n + 1:]
yield [[first]] + smaller
In order to avoid memory issues, we need to maximize the use of generators/iterators and never create a list of combinations.
Here is a way to do it by breaking down the problem in layers.
First, a generator to obtain partition sizes for a given number of elements. This will then be used to fill combinations of elements corresponding to each size except for the single element parts. The single element parts are done last in order to avoid duplicates. By doing them last, we always have exactly the right number of unused elements for the single element parts.
Partition generation
# Generator for all partition sizes forming N
def partSize(N):
if N<2: yield [1]*N;return
for s in range(1,N+1):
yield from ([s]+rest for rest in partSize(N-s))
print(*partSize(3))
# [1, 1, 1] [1, 2] [2, 1] [3]
print(*partSize(4))
# [1, 1, 1, 1] [1, 1, 2] [1, 2, 1] [1, 3] [2, 1, 1] [2, 2] [3, 1] [4]
Partition filling
# A generator that fills partitions
# with combinations of indexes contained in A
from itertools import combinations
def fillCombo(A,parts,R=None):
if R is None: R = [tuple()]*len(parts)
size = max(parts) # fill largest partitions first
if size < 2: # when only single element partitions are left
iA = iter(A) # fill them with the remaining indexes
yield [r if p!=1 else (next(iA),) for r,p in zip(R,parts)]
return
i,parts[i]= parts.index(size),0 # index of largest partition
for combo in combinations(A,size): # all combinations of that size
R[i] = combo # fill part and recurse
yield from fillCombo(A.difference(combo),[*parts],[*R])
Mapping partition to indexed values
# for each partition pattern, fill with combinations
# using set of indexes in fillCombo so that repeated values
# are processed as distinct
def partCombo(A):
for parts in partSize(len(A)):
for iParts in fillCombo({*range(len(A))},parts): # combine indexes
yield [tuple(A[i] for i in t) for t in iParts] # get actual values
output:
for pp in partCombo("abc"): print(pp)
[('a',), ('b',), ('c',)]
[('c',), ('a', 'b')]
[('b',), ('a', 'c')]
[('a',), ('b', 'c')]
[('a', 'b'), ('c',)]
[('a', 'c'), ('b',)]
[('b', 'c'), ('a',)]
[('a', 'b', 'c')]
This uses very little memory but still has exponential progression in time. For example:
sum(1 for _ in partCombo("abcdefghi")) # 768,500 combinations
takes 3.8 seconds on my laptop
Adding just one more letter, increases the execution time to 43 seconds for the 8,070,046 combinations.

Removing an element from a list based on a condition

I need to remove an element (in this case a tuple) from one list based on a condition (if satisfied) in another list.
I have 2 lists (list of tuples).
List1 = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
List2 = [(1, 2), (1, 3), (1, 2), (2, 3), (2, 2), (3, 2)]
List1 is basically computed from the following code.
import pandas as pd
mapping = {'name': ['a', 'b', 'c', 'd'],'ID': [1,2,3,2]}
df = pd.DataFrame(mapping)
comb = df['name'].to_list()
List1 = list(combinations(comb,2))
# mapping the elements of the list to an 'ID' from the dataframe and creating a list based on the following code
List2 = [(df['ID'].loc[df.name == x].item(), df['ID'].loc[df.name == y].item()) for (x, y) in List1]
Now I need to apply a condition here; looking at List2, I need to look at all tuples in List2 and see if there is any tuple with same 'ID's in it. For example, in List2 I see there is (2,2). So, I want to go back to List1 based on this remove the corresponding tuple which yielded this (2,2) pair.
Essentially my final revised list should be this:
RevisedList = [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('c', 'd')]
('b','d') should be removed because they yield (2,2) same IDs in a set
List1 = [('a','b'), ('a','c'), ('a','d'), ('b','c'), ('b','d')]
List2 = [(1,2), (1,3), (1,2), (2,3), (2,2)]
new_List1 = [elem for index,elem in enumerate(List1) if List2[index][0]!=List2[index][1]]
// Result: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c')]
It is not entirely clear but is this what you are looking for? new_List1 only contains those indexes where at that index List2 has two different numbers in the tuple

Find all unique pairs of keys of a dictionary

If there's a dictionary:
test_dict = { 'a':1,'b':2,'c':3,'d':4}
I want to find pairs of keys in list of tuples like:
[('a','b'),('a','c'),('a','d'),('b','c'),('b','d'),('c','d')]
I tried with the following double iteration
test_dict = { 'a':1,'b':2,'c':3,'d':4}
result = []
for first_key in test_dict:
for second_key in test_dict:
if first_key != second_key:
pair = (first_key,second_key)
result.append(pair)
But it's generating the following result
[('a', 'c'), ('a', 'b'), ('a', 'd'), ('c', 'a'), ('c', 'b'), ('c', 'd'), ('b', 'a'), ('b', 'c'), ('b', 'd'), ('d', 'a'), ('d', 'c'), ('d', 'b')]
For my test case ('a','b') and ('b','a') are similar and I just want one of them in the list. I had to run one more loop for getting the unique pairs from the result.
So is there any efficient way to do it in Python (preferably in 2.x)? I want to remove nested loops.
Update:
I have checked with the possible flagged duplicate, but it's not solving the problem here. It's just providing different combination. I just need the pairs of 2. For that question a tuple of ('a','b','c') and ('a','b','c','d') are valid, but for me they are not. I hope, this explains the difference.
Sounds like a job for itertools.
from itertools import combinations
test_dict = {'a':1, 'b':2, 'c':3, 'd':4}
results = list(combinations(test_dict, 2))
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
I should add that although the output above happens to be sorted, this is not guaranteed. If order is important, you can instead use:
results = sorted(combinations(test_dict, 2))
Since dictionary keys are unique, this problem becomes equivalent of finding all combinations of the keys of size 2. You can just use itertools for that:
>>> test_dict = { 'a':1,'b':2,'c':3,'d':4}
>>> import itertools
>>> list(itertools.combinations(test_dict, 2))
[('c', 'a'), ('c', 'd'), ('c', 'b'), ('a', 'd'), ('a', 'b'), ('d', 'b')]
Note, these will come in no particular order, since dict objects are inherently unordered. But you can sort before or after, if you want sorted order:
>>> list(itertools.combinations(sorted(test_dict), 2))
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
>>>
Note, this algorithm is relatively simple if you are working with sequences like a list:
>>> ks = list(test_dict)
>>> for i, a in enumerate(ks):
... for b in ks[i+1:]: # this is the important bit
... print(a, b)
...
c a
c d
c b
a d
a b
d b
Or more succinctly:
>>> [(a,b) for i, a in enumerate(ks) for b in ks[i+1:]]
[('c', 'a'), ('c', 'd'), ('c', 'b'), ('a', 'd'), ('a', 'b'), ('d', 'b')]
>>>
itertools.combinations does just what you want:
from itertools import combinations
test_dict = { 'a':1,'b':2,'c':3,'d':4}
keys = tuple(test_dict)
combs = list(combinations(keys, 2))
print(combs)
# [('a', 'd'), ('a', 'b'), ('a', 'c'), ('d', 'b'), ('d', 'c'), ('b', 'c')]
combs = list(combinations(test_dict, 2)) would just do; iterating over a dictionary is just iterating over its keys...

Where in my code are duplicates being deleted?

An important part of my output is being able to identify the length of the finalList but somewhere in my code, duplicates are being deleted and I can't figure out where
from itertools import chain, permutations
allPos = []
first_list = ['a','b','c']
match_list = [['a','b','c'], ['a','b','c']]
for i in range(1,30):
for phrase in permutations(first_list, i):
for ind, letter in enumerate(chain.from_iterable(phrase)):
if ind >= len(match_list) or letter not in match_list[ind]:
break
else:
allPos.append(phrase)
finalList = []
for i in allPos:
if len(i) == len(allPos[-1]):
finalList.append(i)
print(finalList)
OUTPUT
[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]
I know that it is deleting duplicates, or perhaps my code is just missing something completely because I am missing [('a','a'), ('b','b'), ('c','c')] from my output
You can try with this. Change iterable using permutations.
from itertools import chain, permutations
...
...
for i in range(1,30):
# change iterable
for phrase in permutations([j for ele in match_list for j in ele], i):
...
for i in set(allPos):
if len(i) == len(allPos[-1]):
finalList.append(i)
print (sorted(finalList))
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]

Unique Combinations in a list of k,v tuples in Python

I have a list of various combos of items in tuples
example = [(1,2), (2,1), (1,1), (1,1), (2,1), (2,3,1), (1,2,3)]
I wish to group and count by unique combinations
yielding the result
result = [((1,2), 3), ((1,1), 2), ((2,3,1), 2)]
It is not important that the order is maintained or which permutation of the combination is preserved but it is very important that operation be done with a lambda function and the output format be still a list of tuples as above because I will be working with a spark RDD object
My code currently counts patterns taken from a data set using
RDD = sc.parallelize(example)
result = RDD.map(lambda(y):(y, 1))\
.reduceByKey(add)\
.collect()
print result
I need another .map command that will add account for different permutations as explained above
How about this: maintain a set that contains the sorted form of each item you've already seen. Only add an item to the result list if you haven't seen its sorted form already.
example = [ ('a','b'), ('a','a','a'), ('a','a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
result = []
seen = set()
for item in example:
sorted_form = tuple(sorted(item))
if sorted_form not in seen:
result.append(item)
seen.add(sorted_form)
print result
Result:
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
You can use an OrderedDict to crate an ordered dictionary based on sorted case of its items :
>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i in example:
... d.setdefault(tuple(sorted(i)),i)
...
('a', 'b')
('a', 'a', 'a')
('a', 'a')
('a', 'b')
('c', 'd')
('b', 'c', 'a')
('b', 'c', 'a')
>>> d
OrderedDict([(('a', 'b'), ('a', 'b')), (('a', 'a', 'a'), ('a', 'a', 'a')), (('a', 'a'), ('a', 'a')), (('c', 'd'), ('c', 'd')), (('a', 'b', 'c'), ('b', 'c', 'a'))])
>>> d.values()
[('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')]
This is similar as the sorted dict.
from itertools import groupby
ex = [(1,2,3), (3,2,1), (1,1), (2,1), (1,2), (3,2), (2,3,1)]
f = lambda x: tuple(sorted(x)) as key
[tuple(k) for k, _ in groupby(sorted(ex, key=f), key=f)]
The nice thing is that you can get which are tuples are of the same combination:
In [16]: example = [ ('a','b'), ('a','a','a'), ('a','a'), ('a', 'a', 'a', 'a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ]
In [17]: for k, grpr in groupby(sorted(example, key=lambda x: tuple(sorted(x))), key=lambda x: tuple(sorted(x))):
print k, list(grpr)
....:
('a', 'a') [('a', 'a')]
('a', 'a', 'a') [('a', 'a', 'a')]
('a', 'a', 'a', 'a') [('a', 'a', 'a', 'a')]
('a', 'b') [('a', 'b'), ('b', 'a')]
('a', 'b', 'c') [('b', 'c', 'a'), ('a', 'b', 'c')]
('c', 'd') [('c', 'd')]
What you actually seem to need based on the comments, is map-reduce. I don't have Spark installed, but according to the docs (see transformations) this must be like this:
data.map(lambda i: (frozenset(i), i)).reduceByKey(lambda _, i : i)
This however will return (b, a) if your dataset has (a, b), (b, a) in that order.
I solved my own problem, but it was difficult to understand what I was really looking for I used
example = [(1,2), (1,1,1), (1,1), (1,1), (2,1), (3,4), (2,3,1), (1,2,3)]
RDD = sc.parallelize(example)
result = RDD.map(lambda x: list(set(x)))\
.filter(lambda x: len(x)>1)\
.map(lambda(x):(tuple(x), 1))\
.reduceByKey(add)\
.collect()
print result
which also eliminated simply repeated values such as (1,1) and (1,1,1) which was of added benefit to me
Since you are looking for a lambda function, try the following:
lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
You can use this lambda function like so:
uniquify = lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values()
result = uniquify(example)
Obviously, this sacrifices readability over the other answers. It is basically doing the same thing as Kasramvd's answer, in a single ugly line.

Categories