How to generate something like
[(), (1,), (1,2), (1,2,3)..., (1,2,3,...n)]
and
[(), (4,), (4,5), (4,5,6)..., (4,5,6,...m)]
then take the product of them and merge into
[(), (1,), (1,4), (1,4,5), (1,4,5,6), (1,2), (1,2,4)....(1,2,3,...n,4,5,6,...m)]
?
For the first two lists I've tried the powerset recipe in https://docs.python.org/2/library/itertools.html#recipes , but there will be something I don't want, like (1,3), (2,3)
For the product I've tested with chain and product, but I just can't merge the combinations of tuples into one.
Any idea how to do this nice and clean? Thanks!
Please note that, single element tuples are denoted like this (1,).
a = [(), (1,), (1, 2), (1, 2, 3)]
b = [(), (4,), (4, 5), (4, 5, 6)]
from itertools import product
for item1, item2 in product(a, b):
print item1 + item2
Output
()
(4,)
(4, 5)
(4, 5, 6)
(1,)
(1, 4)
(1, 4, 5)
(1, 4, 5, 6)
(1, 2)
(1, 2, 4)
(1, 2, 4, 5)
(1, 2, 4, 5, 6)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3, 4, 5)
(1, 2, 3, 4, 5, 6)
If you want them in a list, you can use list comprehension like this
from itertools import product
print [sum(items, ()) for items in product(a, b)]
Or even simpler,
print [items[0] + items[1] for items in product(a, b)]
You can try like this,
>>> a=[(), (1,), (1,2), (1,2,3)]
>>> b=[(), (4,), (4,5), (4,5,6)]
>>> for ix in a:
... for iy in b:
... print ix + iy
...
()
(4,)
(4, 5)
(4, 5, 6)
(1,)
(1, 4)
(1, 4, 5)
(1, 4, 5, 6)
(1, 2)
(1, 2, 4)
(1, 2, 4, 5)
(1, 2, 4, 5, 6)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3, 4, 5)
(1, 2, 3, 4, 5, 6)
If you don't want to use any special imports:
start = 1; limit = 10
[ range(start, start + x) for x in range(limit) ]
With start = 1 the output is:
[[], [1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7, 8, 9]]
If you want to take the product, maybe using itertools might be most elegant.
Related
The output of my script is a list and a nested list. I would like to get the combinations of the two lists by index. In this instance, I have the following two lists:
x = [0, 1, 2, 3]
y = [[0, 1, 2, 3],
[0, 1, 2, 3, 4, 5, 6, 7, 8],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 5, 6, 7, 8]]
The desired output should look something like this.
[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1,
7), (1, 8), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5),
(3, 6), (3, 7), (3, 8)]
I've looked at many posts about itertools.combinations and itertools.product, but I cannot find anything about looping and combining at the same time, which I think would be the approach to the problem. I want to get all combinations x[0] and y[0], then x[1] and y[1], etc.
You can do this with a list comprehension.
x = [0, 1, 2, 3]
y = [[0, 1, 2, 3],
[0, 1, 2, 3, 4, 5, 6, 7, 8],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 5, 6, 7, 8]]
final = [(i,j) for i in x for j in y[i]]
It seems you are going to do the catidion multiplication of two array. Here is the reference check and let me know if worked for you.
Cartesian product of x and y array points into single array of 2D points
I am able to produce all combinations given a particular value (k) for a single list as follows:
lst = []
p = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
c = itertools.combinations(p, k)
for i in c:
lst.append(list(i))
print lst
Note, in this code, k requires a specific value to be inputted - it cannot be a variable.
However, I now have multiple lists in which I need all combinations for various k:
m = [1, 2, 3, 4]
t = [1, 2, 3, 4]
c = [1, 2, 3, 4, 5]
ss =[1, 2, 3]
Put simply: I require an output of all the combinations possible for all these lists. E.g. k = 1 through 4 for m and t, 1 through 5 for c, and 1 through 3 for ss.
Example of k = 2 for ss would be
m = [1, 2, 3, 4]
t = [1, 2, 3, 4]
c = [1, 2, 3, 4, 5]
ss = [1, 2]
m = [1, 2, 3, 4]
t = [1, 2, 3, 4]
c = [1, 2, 3, 4, 5]
ss = [1, 3]
m = [1, 2, 3, 4]
t = [1, 2, 3, 4]
c = [1, 2, 3, 4, 5]
ss = [2, 3]
Follow this pattern for all values combinations of k possible in all variables.
Let me know if this is unclear and I can edit question accordingly.
You can get your output via itertools.product alone or via combinations. We could cram this all into one line if we really wanted, but I think it's more comprehensible to write
from itertools import combinations, product
def all_subs(seq):
for i in range(1, len(seq)+1):
for c in combinations(seq, i):
yield c
after which we have
>>> m,t,c,ss = [1,2,3,4],[1,2,3,4],[1,2,3,4,5],[1,2,3]
>>> seqs = m,t,c,ss
>>> out = list(product(*map(all_subs, seqs)))
>>> len(out)
48825
which is the right number of results:
>>> (2**4 - 1) * (2**4 - 1) * (2**5-1) * (2**3 - 1)
48825
and hits every possibility:
>>> import pprint
>>> pprint.pprint(out[:4])
[((1,), (1,), (1,), (1,)),
((1,), (1,), (1,), (2,)),
((1,), (1,), (1,), (3,)),
((1,), (1,), (1,), (1, 2))]
>>> pprint.pprint(out[-4:])
[((1, 2, 3, 4), (1, 2, 3, 4), (1, 2, 3, 4, 5), (1, 2)),
((1, 2, 3, 4), (1, 2, 3, 4), (1, 2, 3, 4, 5), (1, 3)),
((1, 2, 3, 4), (1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3)),
((1, 2, 3, 4), (1, 2, 3, 4), (1, 2, 3, 4, 5), (1, 2, 3))]
How can I skip the tuples which has duplicate elements in the iteration when I use itertools.product? Or let's say, is there anyway not to look at them in the iteration? Because skipping may be time consuming if the number of lists are too much.
Example,
lis1 = [1,2]
lis2 = [2,4]
lis3 = [5,6]
[i for i in product(lis1,lis2,lis3)] should be [(1,2,5), (1,2,6), (1,4,5), (1,4,6), (2,4,5), (2,4,6)]
It will not have (2,2,5) and (2,2,6) since 2 is duplicate in here. How can I do that?
itertools generally works on unique positions within inputs, not on unique values. So when you want to remove duplicate values, you generally have to either post-process the itertools result sequence, or "roll your own". Because post-processing can be very inefficient in this case, roll your own:
def uprod(*seqs):
def inner(i):
if i == n:
yield tuple(result)
return
for elt in sets[i] - seen:
seen.add(elt)
result[i] = elt
for t in inner(i+1):
yield t
seen.remove(elt)
sets = [set(seq) for seq in seqs]
n = len(sets)
seen = set()
result = [None] * n
for t in inner(0):
yield t
Then, e.g.,
>>> print list(uprod([1, 2, 1], [2, 4, 4], [5, 6, 5]))
[(1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (2, 4, 5), (2, 4, 6)]
>>> print list(uprod([1], [1, 2], [1, 2, 4], [1, 5, 6]))
[(1, 2, 4, 5), (1, 2, 4, 6)]
>>> print list(uprod([1], [1, 2, 4], [1, 5, 6], [1]))
[]
>>> print list(uprod([1, 2], [3, 4]))
[(1, 3), (1, 4), (2, 3), (2, 4)]
This can be much more efficient, since a duplicate value is never even considered (neither within an input iterable, nor across them).
lis1 = [1,2]
lis2 = [2,4]
lis3 = [5,6]
from itertools import product
print [i for i in product(lis1,lis2,lis3) if len(set(i)) == 3]
Output
[(1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (2, 4, 5), (2, 4, 6)]
With itertools.combinations there will be no repeated elements in sorted order:
>>> lis = [1, 2, 4, 5, 6]
>>> list(itertools.combinations(lis, 3))
[(1, 2, 4), (1, 2, 5), (1, 2, 6), (1, 4, 5), (1, 4, 6), (1, 5, 6), (2, 4, 5),
(2, 4, 6), (2, 5, 6), (4, 5, 6)]
I need to iterate over n consecutive elements in a list.
For example:
data = [1,2,3,4,5,6,7]
I need to go over:
1 2
2 3
3 4
4 5
or:
1 2 3
2 3 4
3 4 5
4 5 6
Is there zip function to do that?
I'm not sure exactly what you're looking for, but try this:
data = [1, 2, 3, 4, 5, 6, 7]
n = 3
[data[i:i+n] for i in range(len(data) - n + 1)]
# [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]]
Or:
f = lambda data, n: [data[i:i+n] for i in range(len(data) - n + 1)]
for x, y, z in f([1, 2, 3, 4, 5, 6, 7], 3):
print x, y, z
Assuming you are always doing this for a list or another sequence and it does not need to work with arbitrary iterables:
def group(seq, n):
return (seq[i:i+n] for i in range(len(seq)-n+1))
Examples:
>>> list(group([1,2,3,4,5,6,7], 2))
[[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7]]
>>> list(group([1,2,3,4,5,6,7], 3))
[[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]]
If you need to do this for any arbitrary iterable (that may not support len() or slicing), you can adapt the pairwise recipe:
from itertools import tee, izip
def group(iterable, n):
"group(s, 3) -> (s0, s1, s2), (s1, s2, s3), (s2, s3, s4), ..."
itrs = tee(iterable, n)
for i in range(1, n):
for itr in itrs[i:]:
next(itr, None)
return izip(*itrs)
>>> list(group(iter([1,2,3,4,5,6,7]), 2))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]
>>> list(group(iter([1,2,3,4,5,6,7]), 3))
[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]
Specific answer:
>>> zip(data,data[1:])
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]
General answer:
>>> def consecutives(data,per_set):
... return zip(*[data[n:] for n in range(per_set)])
...
>>> consecutives(range(1,8),2)
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]
>>> consecutives(range(1,8),3)
[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]
>>> consecutives(range(1,8),4)
[(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6), (4, 5, 6, 7)]
Depending on whether you want to iterate over the sub-lists or a flat list:
from itertools import chain
for x in chain(*[ a[i:i+n] for i in xrange(len(a)-n+1) ]):
print x
Or:
for x in [ a[i:i+n] for i in xrange(len(a)-n+1) ]:
print x
Probably not the best way, but still useful:
>>> data = [1,2,3,4,5,6,7]
>>> map(None,data[:-1],data[1:])
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]
>>> map(None,data[:-2],data[1:-1],data[2:])
[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]
This is a fairly simple task to program yourself. I dont think there is a pre-canned function to do this.
def func(arr,n):
i = 0
while i+n < len(arr):
for range(i,i+n):
.... make stuff here....
i = i + 1
Let's say that I have two arrays in the form
a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
As you can see, the above arrays are sorted, when considered a and b as columns of a super array.
Now, I want to do a searchsorted on this array. For instance, if I search for (3, 7) (a = 3 and b = 7), I should get 6.
Whenever there are duplicate values in a, the search should continue with values in b.
Is there a built-in numpy method to do it? Or what could be the efficient way to do it, assuming that I have million entries in my array.
I tried with numpy.recarray, to create one recarray with a and b and tried searching in it, but I am getting the following error.
TypeError: expected a readable buffer object
Any help is much appreciated.
You're almost there. It's just that numpy.record (which is what I assume you used, given the error message you received) isn't really what you want; just create a one-item record array:
>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
(4, 4), (4, 8), (5, 1), (6, 1)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> numpy.searchsorted(a_b, numpy.array((3, 7), dtype=a_b.dtype))
6
It might also be useful to know that sort and argsort sort record arrays lexically, and there is also lexsort. An example using lexsort:
>>> random_idx = numpy.random.permutation(range(12))
>>> a = numpy.array(a)[random_idx]
>>> b = numpy.array(b)[random_idx]
>>> sorted_idx = numpy.lexsort((b, a))
>>> a[sorted_idx]
array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6])
>>> b[sorted_idx]
array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])
Sorting record arrays:
>>> a_b = numpy.rec.fromarrays((a, b))
>>> a_b[a_b.argsort()]
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
(4, 4), (4, 8), (5, 1), (6, 1)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
>>> a_b.sort()
>>> a_b
rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9),
(4, 4), (4, 8), (5, 1), (6, 1)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
You could use a repeated searchsorted from left and right:
left, right = np.searchsorted(a, 3, side='left'), np.searchsorted(a, 3, side='right')
index = left + np.searchsorted(b[left:right], 7)
This works for me:
>>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
>>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
>>> Z = numpy.array(zip(a, b), dtype=[('a','int'), ('b','int')])
>>> Z.searchsorted(numpy.asarray((3,7), dtype=Z.dtype))
6
I think the trick might be to make sure the argument to searchsorted has the same dtype as the array. When I try Z.searchsorted((3, 7)) I get a segfault.
Here's an interesting way to do it (though it's not the most efficient way, as I believe it's O(n) rather than O(log(n)) as ecatmur's answer would be; it is, however, more compact):
np.searchsorted(a + 1j*b, a_val + 1j*b_val)
Example:
>>> a = np.array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6])
>>> b = np.array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])
>>> np.searchsorted(a + 1j*b, 4 + 1j*8)
9
n arrays extension :
import numpy as np
def searchsorted_multi(*args):
v = args[-1]
if len(v) != len(args[:-1]):
raise ValueError
l, r = 0, len(args[0])
ind = 0
for vi, ai in zip(v, args[:-1]):
l, r = [np.searchsorted(ai[l:r], vi, side) for side in ('left', 'right')]
ind += l
return ind
if __name__ == "__main__":
a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
c = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 2]
assert(searchsorted_multi(a, b, (3, 7)) == 6)
assert(searchsorted_multi(a, b, (3, 0)) == 5)
assert(searchsorted_multi(a, b, c, (6, 1, 2)) == 12)
Or without numpy:
>>> import bisect
>>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]
>>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]
>>> bisect.bisect_left(zip(a,b), (3,7))
6