This question already has answers here:
Rolling or sliding window iterator?
(29 answers)
Closed last month.
Is there an efficient or elegant way to retrieve all the k-size sublists of a list in Python? For example:
arr = [2, 3, 5, 7, 11, 13]
I want all 3-element sublists:
result = [[2, 3, 5],
[3, 5, 7],
[5, 7, 11],
[7, 11, 13]]
I know I could create this with a for loop, slicing the list with arr[i:i+3], but the lists I'm dealing with are gigantic and I'm hoping for an efficient mechanism, or at least an elegant or Pythonic mechanism.
I'm using Pandas as well, so happy to use a Pandas mechanism.
If you actually want to construct the list, I don't think you'll do better than a basic list comprehension like this:
arr = [2, 3, 5, 7, 11, 13]
result = [arr[i:i+k] for i in range(len(arr)-k+1)]
If you want to minimize memory use, you could use a generator:
arr = [2, 3, 5, 7, 11, 13]
def window(arr, k):
for i in range(len(arr)-k+1):
yield arr[i:i+k]
for group in window(arr, 3):
... # do something with group
You could also do something where you zip together k copies of the list, each offset by one. But that would take as much memory as the first solution, probably without much performance advantage.
There may be something quick and efficient in numpy or pandas, but you would need to show more about what your input and output should look like.
There are some other ideas here, but they are focused on general iterables (where you can only pull items out once), rather than lists (where you can access items by index, possibly repeatedly).
You can use more_itertools
import more_itertools
list(more_itertools.windowed(arr,3))
[(2, 3, 5), (3, 5, 7), (5, 7, 11), (7, 11, 13)]
OR
using itertools:
from itertools import islice
def pairwise(iterable, n):
"s -> (s0,s1,..s(n-1)), (s1,s2,.., sn), (s2, s3,..,s(n+1)), ..."
iters = iter(iterable)
result = tuple(islice(iters, n))
if len(result) == n:
yield result
for elem in iters:
result = result[1:] + (elem,)
yield result
You can use strides:
arr = [2, 3, 5, 7, 11, 13]
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(np.array(arr), 3)
print (a)
[[ 2 3 5]
[ 3 5 7]
[ 5 7 11]
[ 7 11 13]]
print (a.tolist())
[[2, 3, 5],
[3, 5, 7],
[5, 7, 11],
[7, 11, 13]]
If your source (list) is gigantic, it means the source provider should produce a value on demand. The way of doing that is by making a generator.
Hypothetical source generator from a file;
def gen_value():
with open('big-file.txt') as f:
for line in f:
for x in line.split():
yield int(x)
The grouper function recipe may be used to consume the generator:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
So you may call list(grouper(gen(), 3))
Related
in a Multiset it is allowed to have multiple elements
For Example. if X (normal set) = {0,2,4,7,10}, then ∆X (multiset) = {2,2,3,3,4,5,6,7,8,10}.
∆X denotes the multiset of all (N 2) pairwise distances between points in X
How can i Write this in Python?
I have created a List X but i don't know how to put all differences in another list and order them.
I hope you can help me.
It is basically just one line.
import itertools
s = {0,2,4,7,10}
sorted([abs(a-b) for (a,b) in itertools.combinations(s,2)])
you can use itertools
import itertools
s = {0,2,4,7,10}
k = itertools.combinations(s,2)
distance = []
l = list(k)
for p in l:
distance.append(abs(p[1]-p[0]))
print(sorted(distance))
A simple way is to convert your set to a list, sort it, and then use a double for loop to compute the differences:
X = {0,2,4,7,10} # original set
sorted_X = sorted(list(X))
diffs = []
for i, a in enumerate(sorted_X):
for j, b in enumerate(sorted_X):
if j > i:
diffs.append(b-a)
print(diffs)
#[2, 4, 7, 10, 2, 5, 8, 3, 6, 3]
And if you want the diffs sorted as well:
print(sorted(diffs))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]
Another option that would work in this case is to use itertools.product:
from itertools import product
print(sorted([(y-x) for x,y in product(sorted_X, sorted_X) if y>x]))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]
Let's say I have a list with six items
app = [6, 4, 6, 22, 255, 33]
But I want to pass those numbers to an argument - but only 3 numbers at a time
How would I do that?
Right now I'm using a deque with a max limit, but I don't know how to swap out the values with the next set.
Solution with loop:
app = [6, 4, 6, 22, 255, 33]
for i in range(0, len(app), 3):
print(app[i], app[i+1], app[i+2])
Solution with zip:
app = [6, 4, 6, 22, 255, 33]
for (i, j, q) in zip(app[::3], app[1::3], app[2::3]):
print(i, j, q)
More general solution. Grouper from itertools recipes:
from itertools import izip_longest
app = [1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14]
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
for i, j, q, r in grouper(app, 4):
print(i, j, q, r)
You can use the subarray notation of Python. E.g. app[:3] for the first three elements, or app[1:4] (= [4, 6, 22]) for the second to the fourth element.
If your function expects three parameters, you can pass them with the * operator:
def f(a, b, c):
...
f(*app[:3])
You can use python slices. Code written in browser, not tested
It does create a new array, but it is a small array
for i in range(len(app)-3):
slice = app[i:i+3]
myfun(slice[0], slice[1], slice[2])
I have the following situation. Say I have a variable batch_size and a list called data. I want to pull batch_size elements out of data, so that when I hit the end I wrap around. In other words:
data =[1,2,3,4,5]
batch_size = 4
-> [1,2,3,4], [5,1,2,3], [4,5,1,2], ...
Is there some nice idiomatic way of returning slices like this? The start index is always batch_size * batch modulo the length of data, but is there a simple way of "wrapping around" from the beginning if batch_size * (batch+1) goes beyond the length of the list? I can of course patch together two slices in this case, but I was hoping that there's some really clean way of doing this.
The only assumption I'm making is that batch_size < len(data).
You could use itertools.cycle and the grouper recipe from itertools
import itertools
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
data = [1,2,3,4,5]
batch_size = 4
how_many_groups = 5
groups = grouper(itertools.cycle(data), batch_size)
chunks = [next(groups) for _ in range(how_many_groups)]
The result of chunks is then:
[(1, 2, 3, 4),
(5, 1, 2, 3),
(4, 5, 1, 2),
(3, 4, 5, 1),
(2, 3, 4, 5)]
So if you actually need those as lists, you'll have to cast it as such ([list(next(groups)) for ...])
You can, also, use deque from collections module and do one rotation over the deques like this example:
from collections import deque
def grouper(iterable, elements, rotations):
if elements > len(iterable):
return []
b = deque(iterable)
for _ in range(rotations):
yield list(b)[:elements]
b.rotate(1)
data = [1,2,3,4,5]
elements = 4
rotations = 5
final = list(grouper(data, elements, rotations))
print(final)
Output:
[[1, 2, 3, 4], [5, 1, 2, 3], [4, 5, 1, 2], [3, 4, 5, 1], [2, 3, 4, 5]]
I was faced with the problem of executing n number of concurrent events that all return iterators to the results they aquired. However, there was an optional limit parameter that says, basically, to consolidate all the iterators and return up-to limit results.
So, for example: I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.
Thus, unravel:
import itertools
def unravel(*iterables, with_limit = None):
make_iter = {a:iter(i) for a,i in enumerate(iterables)}
if not isinstance(with_limit, int):
with_limit = -1
resize = False
while True:
for iid, take_from in make_iter.items():
if with_limit == 0:
raise StopIteration
try:
yield next(take_from)
except StopIteration:
resize = iid
else:
with_limit -= 1
if resize:
resize = False
if len(make_iter.keys()) > 1:
make_iter.pop(resize)
else: raise StopIteration
Usage:
>>> a = [1,2,3,4,5]
>>> b = [6,7,8,9,10]
>>> c = [1,3,5,7]
>>> d = [2,4,6,8]
>>>
>>> print([e for e in unravel(c, d)])
[1, 2, 3, 4, 5, 6, 7, 8]
>>> print([e for e in unravel(c, d, with_limit = 3)])
[1, 2, 3]
>>> print([e for e in unravel(a, b, with_limit = 6)])
[1, 6, 2, 7, 3, 8]
>>> print([e for e in unravel(a, b, with_limit = 100)])
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
Does something like this already exist, or is this a decent implementation?
Thanks
EDIT, WORKING FIX
Inspired by #abernert 's suggestion, this is what I went with. Thanks everybody!
def unravel(*iterables, limit = None):
yield from itertools.islice(
filter(None,
itertools.chain.from_iterable(
itertools.zip_longest(
*iterables
)
)
), limit)
>>> a = [x for x in range(10)]
>>> b = [x for x in range(5)]
>>> c = [x for x in range(0, 20, 2)]
>>> d = [x for x in range(1, 30, 2)]
>>>
>>> print(list(unravel(a, b)))
[1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> print(list(unravel(a, b, limit = 3)))
[1, 1, 2]
>>> print(list(unravel(a, b, c, d, limit = 20)))
[1, 1, 1, 2, 3, 2, 2, 4, 5, 3, 3, 6, 7, 4, 4, 8, 9, 5, 10, 11]
What you're doing here is almost just zip.
You want a flat iterable, rather than an iterable of sub-iterables, but chain fixes that.
And you want to take only the first N values, but islice fixes that.
So, if the lengths are all equal:
>>> list(chain.from_iterable(zip(a, b)))
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
>>> list(islice(chain.from_iterable(zip(a, b)), 7))
[1, 6, 2, 7, 3, 8, 4]
But if the lengths aren't equal, that will stop as soon as the first iterable finishes, which you don't want. And the only alternative in the stdlib is zip_longest, which fills in missing values with None.
You can pretty easily write a zip_longest_skipping (which is effectively the round_robin in Peter's answer), but you can also just zip_longest and filter out the results:
>>> list(filter(None, chain.from_iterable(zip_longest(a, b, c, d))))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]
(Obviously this doesn't work as well if your values are all either strings or None, but when they're all positive integers it works fine… to handle the "or None" case, do sentinel=object(), pass that to zip_longest, then filter on x is not sentinel.)
From the itertools example recipes:
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
Use itertools.islice to enforce your with_limit, eg:
print([e for e in itertools.islice(roundrobin(c, d), 3)])
>>> list(roundrobin(a, b, c, d))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]
For what you're actually trying to do, there's probably a much better solution.
I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.
OK, so why are the results in 8 separate iterables? There's no good reason for that. Instead of giving each thread its own queue (or global list and lock, or whatever you're using) and then trying to zip them together, why not have them all share a queue in the first place?
In fact, that's the default way that almost any thread pool is designed (including multiprocessing.Pool and concurrent.futures.Executor in the stdlib). Look at the main example for concurrent.futures.ThreadPoolExecutor:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
That's almost exactly your use case—spamming a bunch of URL downloads out over 5 different threads and gathering the results as they come in—without your problem even arising.
Of course it's missing with_limit, but you can just wrap that as_completed iterable in islice to handle that, and you're done.
This uses a generator and izip_longest to pull one item at a time from multiple iterators
from itertools import izip_longest
def unravel(cap, *iters):
counter = 0
for slice in izip_longest(*iters):
for entry in [s for s in slice if s is not None]:
yield entry
counter += 1
if counter >= cap: break
Im trying to write a function that creates set of dynamic sublists each containing 5 elements from a list passed to it.Here's my attempt at the code
def sublists(seq):
i=0
x=[]
while i<len(seq)-1:
j=0
while j<5:
X.append(seq[i]) # How do I change X after it reaches size 5?
#return set of sublists
EDIT:
Sample input: [1,2,3,4,5,6,7,8,9,10]
Expected output: [[1,2,3,4,5],[6,7,8,9,10]]
Well, for starters, you'll need to (or at least should) have two lists, a temporary one and a permanent one that you return (Also you will need to increase j and i or, more practically, use a for loop, but I assume you just forgot to post that).
EDIT removed first code as the style given doesn't match easily with the expected results, see other two possibilities.
Or, more sensibly:
def sublists(seq):
x=[]
for i in range(0,len(seq),5):
x.append(seq[i:i+5])
return x
Or, more sensibly again, a simple list comprehension:
def sublists(seq):
return [seq[i:i+5] for i in range(0,len(seq),5)]
When given the list:
l = [1,2,3,4,5,6,7,8,9,10]
They will return
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
Have you considered using itertools.combinations(...)?
For example:
>>> from itertools import combinations
>>> l = [1,2,3,4,5,6]
>>> list(combinations(l, 5))
[(1, 2, 3, 4, 5), (1, 2, 3, 4, 6), (1, 2, 3, 5, 6), (1, 2, 4, 5, 6), (1, 3, 4, 5, 6), (2, 3, 4, 5, 6)]
By "dynamic sublists", do you mean break up the list into groups of five elements? This is similar to your approach:
def sublists(lst, n):
ret = []
i = 0
while i < len(lst):
ret.append(seq[i:i+n])
i += n
return ret
Or, using iterators:
def sublists(seq, n):
it = iter(seq)
while True:
r = list(itertools.islice(it, 5))
if not r:
break
yield r
which will return an iterator of lists over list of length up to five. (If you took out the list call, weird things would happen if you didn't access the iterators in the same order.)