I'm using the product method from the itertools python library to calculate all permutations of items in a list of lists. As an example:
>> elems = [[1,2],[4,5],[7,8]]
>> permutations = list(itertools.product(*elems))
>> print permutations
# this prints [(1, 4, 7), (1, 4, 8), (1, 5, 7), (1, 5, 8), (2, 4, 7), (2, 4, 8), (2, 5, 7), (2, 5, 8)]
How can I check each permutation as it is calculated, rather than returning the entire set of permutations at once? The problem I am currently facing is that I run into a python Memory Error while running my script because too many permutations are being generated. I only care about a single one of the permutations. If I can check each permutation as it is generated, I can store just a single value rather than storing every possible permutation. Is this possible, and if so, how would I go about implementing this?
You can just do it in a for loop, one at a time:
for a_perm in itertools.product(*elems):
print(a_perm)
itertools.product() gives you iterator, which you can iterate over one item at a time.
Related
from collections import Counter
test_list = [(6, 5), (2, 7), (2, 5), (8, 7), (9, 8), (3, 7)]
freq_2ndEle=Counter(val for key,val in test_list)
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
print(res)
Input : test_list = [(6, 5), (1, 7), (2, 5), (8, 7), (9, 8), (3, 7)]
Output : [(1, 7), (8, 7), (3, 7), (6, 5), (2, 5), (9, 8)]
Explanation : 7 occurs 3 times as 2nd element, hence all tuples with 7, are aligned first.
please clarify how the code is working especially, this part
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
I have confusion on ele:freq_2ndEle[ele[1]].
Here is an explanation - in the future, you should try following similar steps, including reading the documentation:
Counter takes an iterable or a map as an argument. In your case, val for key,val in test_list is an iterable. You fetch values from test_list and feed them to Counter.
You don't need the key, val semantics, it is confusing in this context, as it suggests you are looping through a dictionary. Instead, you are looping through a list of tuples so freq_2ndEle=Counter(tp[1] for tp in test_list) is much clearer - here you access the second tuple element, indexed with 1.
Counter gives you number of occurrences of each of the second tuple elements. If you print freq_2ndEle, you will see this:
Counter({7: 3, 5: 2, 8: 1}), which is a pair of how many times each second element appears in the list.
In the last step you're sorting the original list by the frequency of the second element using sorted,
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
So you take in test_list as an argument to sort, and then you specify the key by which you want to sort: in your case the key is the the time second tuple element occurred.
freq_2ndEle stores key-value pairs of second second element name:times it ocurred in test_list - it is a dictionary in a way, so you access it as you would access a dictionary, that is - you get the value that corresponds to ele[1] which is the (name) of the second tuple element. Name is not the base term, but I thought it may be clearer. The value you fetch with freq_2ndEle[ele[1]] is exactly the time ele[1] occurred in test_list
Lastly, you sort the keys, but in reverse order - that is, descending, highest to lowest, [(2, 7), (8, 7), (3, 7), (6, 5), (2, 5), (9, 8)] with the values that have the same keys (like 7 and 5) grouped together. Note, according to the documentation sorted is stable, meaning it will preserve the order of elements from input, and this is why when the keys are the same, you get them in the order as in test_list i.e. (2,7) goes first and (3,7) last in the "7" group.
freq_2ndEle is a dictionary that contains the second elements of the tuple as keys, and their frequencies as values. Passing this frequency as a return value of lambda in the key argument of the function sorted will sort the list by this return value of lambda (which is the frequency).
If your question is about how lambda works, you can refer to this brief explanation which is pretty simple.
I am learning to code in python, and have written the following code to solve the two sum problem in a list.
My approach is as follows :
Take up an element in the list L
subtract the element value from the target sum
Check if this value is there anywhere in the list
If it is there add its index it to a new list and remove this value
from the original list L
My code runs well if I remove the 'remove' command but gives double results. Please help me in identifying the error.
l=[]
def two_sum(l,sval):
result=[]
for i in range(len(l)):
new=sval-l[i]
if new in l:
result=result+[(i,l.index(new))]
l.remove(new)
return(result)
I guess list index out of range is the Error that you receive. The reason is that you should never remove item when iterate a list in for loop: the list has been shortened because the corresponding index in range (0, Len(l)) was removed.
I think a while loop with two indicators serve better for this purpose. Try this:
def two_sum(l,sval):
i=0
j=0
result=[]
while i<len(l):
new=sval-l[i]
if new in l[j:]:
result=result+[(i,l.index(new))]
l[i], l[j]=l[j],l[i]
j+=1
i+=1
return result, l[j:]
So if you test:
l=list(np.arange(0,10,1))
two_sum(l,10)
it returns:
([(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)], [0, 6, 7, 8, 9])
Where [(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)] are the pairs
and [0, 6, 7, 8, 9] is the original list with summed objects removed.
Plus I guess it is slightly more efficient than the codes you provided: notice that the line if new in l[j:]: omit items that have been looped, so that it returns a summed pair only once.
I have list with repeated elements, for example array = [2,2,2,7].
If I use the solution suggested in this answer (using itertools.combinations()), I get:
()
(7,)
(2,)
(2,)
(2,)
(7, 2)
(7, 2)
(7, 2)
(2, 2)
(2, 2)
(2, 2)
(7, 2, 2)
(7, 2, 2)
(7, 2, 2)
(2, 2, 2)
(7, 2, 2, 2)
As you can see some of the 'combinations' are repeated, e.g. (7,2,2) appears 3 times.
The output I would like is:
()
(7,)
(2,)
(7, 2)
(2, 2)
(7, 2, 2)
(2, 2, 2)
(7, 2, 2, 2)
I could check the output for repeated combinations but I don't feel like that is the best solution to this problem.
You can take the set of the combinations and then chain them together.
from itertools import chain, combinations
arr = [2, 2, 2, 7]
list(chain.from_iterable(set(combinations(arr, i)) for i in range(len(arr) + 1)))
# [(), (7,), (2,), (2, 7), (2, 2), (2, 2, 2), (2, 2, 7), (2, 2, 2, 7)]
You would need to maintain a set of tuples that are sorted in the same fashion:
import itertools as it
desired=set([(),(7,),(2,),(7, 2),(2, 2),(7, 2, 2),(2, 2, 2),(7, 2, 2, 2)])
result=set()
for i in range(len(array)+1):
for combo in it.combinations(array, i):
result.add(tuple(sorted(combo, reverse=True)))
>>> result==desired
True
Without using itertools.combinations() and set's:
from collections import Counter
import itertools
def powerset(bag):
for v in itertools.product(*(range(r + 1) for r in bag.values())):
yield Counter(zip(bag.keys(), v))
array = [2, 2, 2, 7]
for s in powerset(Counter(array)):
# Convert `Counter` object back to a list
s = list(itertools.chain.from_iterable(itertools.repeat(*mv) for mv in s))
print(s)
I believe your problem could alternatively be stated as finding the power set of a multiset, at least according to this definition.
However it's worth noting that the method shown above will be slower than the solutions in other answers such as this one which simply group the results from itertools.combinations() into a set to remove duplicates, despite being seemingly less efficient, it is still faster in practice as iterating in Python is much slower than in C (see itertoolsmodule.c for the implementation of itertools.combinations()).
Through my limited testing, the method shown in this answer will outperform the previously cited method when there are approximately 14 distinct elements in your array, each with an average multiplicity of 2 (at which point the other method begins to pull away and run many times slower), however the running time for either method under those circumstances are >30 seconds, so if performance is of concern, then you might want to consider implementing this part of your application in C.
I'm trying to find duplicates in a list. I want to preserve the values and insert them into a tuple with their number of occurrences.
For example:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
occurance_of_n = zip(set(list_of_n), [list_of_n.count(n) for n in set(list_of_n)])
[(2, 2), (3, 1), (5, 3), (6, 1)]
This works fine with small sets. My question is: as list_of_n gets larger, will I have to worry about arg1 and arg2 in zip(arg1, arg2) not lining up correctly if they're the same set?
I.e. Is there a conceivable future where I call zip() and it accidentally aligns index [0] of list_of_n in arg1 with some other index of list_of_n in arg2?
(in case it's not clear, I'm converting the list to a set for purposes of speed in arg2, and under the pretense that zip will behave better if they're the same in arg1)
Since your sample output preserves the order of appearance, you might want to go with a collections.OrderedDict to gather the counts:
list_of_n = [2, 3, 5, 5, 5, 6, 2]
d = OrderedDict()
for x in list_of_n:
d[x] = d.get(x, 0) + 1
occurance_of_n = list(d.items())
# [(2, 2), (3, 1), (5, 3), (6, 1)]
If order does not matter, the appropriate approach is using a collections.Counter:
occurance_of_n = list(Counter(list_of_n).items())
Note that both approach require only one iteration of the list. Your version could be amended to sth like:
occurance_of_n = list(set((n, list_of_n.count(n)) for n in set(list_of_n)))
# [(6, 1), (3, 1), (5, 3), (2, 2)]
but the repeated calls to list.count make an entire iteration of the initial list for each (unique) element.
I wanted to use list comprehension to avoid writing a for loop appending to some lists. But can it work with a function that returns multiple values? I expected this (simplified example) code to work...
def calc(i):
a = i * 2
b = i ** 2
return a, b
steps = [1,2,3,4,5]
ay, be = [calc(s) for s in steps]
... but it doesn't :(
The for-loop appending to each list works:
def calc(i):
a = i * 2
b = i ** 2
return a, b
steps = [1,2,3,4,5]
ay, be = [],[]
for s in steps:
a, b = calc(s)
ay.append(a)
be.append(b)
Is there a better way or do I just stick with this?
Use zip with *:
>>> ay, by = zip(*(calc(x) for x in steps))
>>> ay
(2, 4, 6, 8, 10)
>>> by
(1, 4, 9, 16, 25)
The horrendous "space efficient" version that returns iterators:
from itertools import tee
ay, by = [(r[i] for r in results) for i, results in enumerate(tee(map(calc, steps), 2))]
But basically just use zip because most of the time it's not worth the ugly.
Explanation:
zip(*(calc(x) for x in steps))
will do (calc(x) for x in steps) to get an iterator of [(2, 1), (4, 4), (6, 9), (8, 16), (10, 25)].
When you unpack, you do the equivalent of
zip((2, 1), (4, 4), (6, 9), (8, 16), (10, 25))
so all of the items are stored in memory at once. Proof:
def return_args(*args):
return args
return_args(*(calc(x) for x in steps))
#>>> ((2, 1), (4, 4), (6, 9), (8, 16), (10, 25))
Hence all items are in memory at once.
So how does mine work?
map(calc, steps) is the same as (calc(x) for x in steps) (Python 3). This is an iterator. On Python 2, use imap or (calc(x) for x in steps).
tee(..., 2) gets two iterators that store the difference in iteration. If you iterate in lockstep the tee will take O(1) memory. If you do not, the tee can take up to O(n). So now we have a usage that lets us have O(1) memory up to this point.
enumerate obviously will keep this in constant memory.
(r[i] for r in results) returns an iterator that takes the ith item from each of the results. This means it receives, in this case, a pair (so r=(2,1), r=(4,4), etc. in turn). It returns the specific iterator.
Hence if you iterate ay and by in lockstep constant memory will be used. The memory usage is proportional to the distance between the iterators. This is useful in many cases (imagine diffing a file or suchwhat) but as I said most of the time it's not worth the ugly. There's an extra constant-factor overhead, too.
You should have shown us what
[calc(s) for s in xrange(5)]
does give you, i.e.
[(0, 0), (2, 1), (4, 4), (6, 9), (8, 16)]
While it isn't the 2 lists that you want, it is still a list of lists. Further more, doesn't that look just like?
zip((0, 2, 4, 6, 8), (0, 1, 4, 9, 16))
zip repackages a set of lists. Usually it is illustrated with 2 longer lists, but it works just as well many short lists.
The third step is to remember that fn(*[arg1,arg2, ...]) = fn(arg1,arg2, ...), that is, the * unpacks a list.
Put it all together to get hcwhsa's answer.