Is there a better way to do an "unravel" function in python? - python

I was faced with the problem of executing n number of concurrent events that all return iterators to the results they aquired. However, there was an optional limit parameter that says, basically, to consolidate all the iterators and return up-to limit results.
So, for example: I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.
Thus, unravel:
import itertools
def unravel(*iterables, with_limit = None):
make_iter = {a:iter(i) for a,i in enumerate(iterables)}
if not isinstance(with_limit, int):
with_limit = -1
resize = False
while True:
for iid, take_from in make_iter.items():
if with_limit == 0:
raise StopIteration
try:
yield next(take_from)
except StopIteration:
resize = iid
else:
with_limit -= 1
if resize:
resize = False
if len(make_iter.keys()) > 1:
make_iter.pop(resize)
else: raise StopIteration
Usage:
>>> a = [1,2,3,4,5]
>>> b = [6,7,8,9,10]
>>> c = [1,3,5,7]
>>> d = [2,4,6,8]
>>>
>>> print([e for e in unravel(c, d)])
[1, 2, 3, 4, 5, 6, 7, 8]
>>> print([e for e in unravel(c, d, with_limit = 3)])
[1, 2, 3]
>>> print([e for e in unravel(a, b, with_limit = 6)])
[1, 6, 2, 7, 3, 8]
>>> print([e for e in unravel(a, b, with_limit = 100)])
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
Does something like this already exist, or is this a decent implementation?
Thanks
EDIT, WORKING FIX
Inspired by #abernert 's suggestion, this is what I went with. Thanks everybody!
def unravel(*iterables, limit = None):
yield from itertools.islice(
filter(None,
itertools.chain.from_iterable(
itertools.zip_longest(
*iterables
)
)
), limit)
>>> a = [x for x in range(10)]
>>> b = [x for x in range(5)]
>>> c = [x for x in range(0, 20, 2)]
>>> d = [x for x in range(1, 30, 2)]
>>>
>>> print(list(unravel(a, b)))
[1, 1, 2, 2, 3, 3, 4, 4, 5, 6, 7, 8, 9]
>>> print(list(unravel(a, b, limit = 3)))
[1, 1, 2]
>>> print(list(unravel(a, b, c, d, limit = 20)))
[1, 1, 1, 2, 3, 2, 2, 4, 5, 3, 3, 6, 7, 4, 4, 8, 9, 5, 10, 11]

What you're doing here is almost just zip.
You want a flat iterable, rather than an iterable of sub-iterables, but chain fixes that.
And you want to take only the first N values, but islice fixes that.
So, if the lengths are all equal:
>>> list(chain.from_iterable(zip(a, b)))
[1, 6, 2, 7, 3, 8, 4, 9, 5, 10]
>>> list(islice(chain.from_iterable(zip(a, b)), 7))
[1, 6, 2, 7, 3, 8, 4]
But if the lengths aren't equal, that will stop as soon as the first iterable finishes, which you don't want. And the only alternative in the stdlib is zip_longest, which fills in missing values with None.
You can pretty easily write a zip_longest_skipping (which is effectively the round_robin in Peter's answer), but you can also just zip_longest and filter out the results:
>>> list(filter(None, chain.from_iterable(zip_longest(a, b, c, d))))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]
(Obviously this doesn't work as well if your values are all either strings or None, but when they're all positive integers it works fine… to handle the "or None" case, do sentinel=object(), pass that to zip_longest, then filter on x is not sentinel.)

From the itertools example recipes:
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
Use itertools.islice to enforce your with_limit, eg:
print([e for e in itertools.islice(roundrobin(c, d), 3)])
>>> list(roundrobin(a, b, c, d))
[1, 6, 1, 2, 2, 7, 3, 4, 3, 8, 5, 6, 4, 9, 7, 8, 5, 10]

For what you're actually trying to do, there's probably a much better solution.
I execute 2,000 url requests on 8 threads but just want the first 100 results, but not all 100 from the same potential thread.
OK, so why are the results in 8 separate iterables? There's no good reason for that. Instead of giving each thread its own queue (or global list and lock, or whatever you're using) and then trying to zip them together, why not have them all share a queue in the first place?
In fact, that's the default way that almost any thread pool is designed (including multiprocessing.Pool and concurrent.futures.Executor in the stdlib). Look at the main example for concurrent.futures.ThreadPoolExecutor:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
That's almost exactly your use case—spamming a bunch of URL downloads out over 5 different threads and gathering the results as they come in—without your problem even arising.
Of course it's missing with_limit, but you can just wrap that as_completed iterable in islice to handle that, and you're done.

This uses a generator and izip_longest to pull one item at a time from multiple iterators
from itertools import izip_longest
def unravel(cap, *iters):
counter = 0
for slice in izip_longest(*iters):
for entry in [s for s in slice if s is not None]:
yield entry
counter += 1
if counter >= cap: break

Related

Python sliding windows of a list [duplicate]

This question already has answers here:
Rolling or sliding window iterator?
(29 answers)
Closed last month.
Is there an efficient or elegant way to retrieve all the k-size sublists of a list in Python? For example:
arr = [2, 3, 5, 7, 11, 13]
I want all 3-element sublists:
result = [[2, 3, 5],
[3, 5, 7],
[5, 7, 11],
[7, 11, 13]]
I know I could create this with a for loop, slicing the list with arr[i:i+3], but the lists I'm dealing with are gigantic and I'm hoping for an efficient mechanism, or at least an elegant or Pythonic mechanism.
I'm using Pandas as well, so happy to use a Pandas mechanism.
If you actually want to construct the list, I don't think you'll do better than a basic list comprehension like this:
arr = [2, 3, 5, 7, 11, 13]
result = [arr[i:i+k] for i in range(len(arr)-k+1)]
If you want to minimize memory use, you could use a generator:
arr = [2, 3, 5, 7, 11, 13]
def window(arr, k):
for i in range(len(arr)-k+1):
yield arr[i:i+k]
for group in window(arr, 3):
... # do something with group
You could also do something where you zip together k copies of the list, each offset by one. But that would take as much memory as the first solution, probably without much performance advantage.
There may be something quick and efficient in numpy or pandas, but you would need to show more about what your input and output should look like.
There are some other ideas here, but they are focused on general iterables (where you can only pull items out once), rather than lists (where you can access items by index, possibly repeatedly).
You can use more_itertools
import more_itertools
list(more_itertools.windowed(arr,3))
[(2, 3, 5), (3, 5, 7), (5, 7, 11), (7, 11, 13)]
OR
using itertools:
from itertools import islice
def pairwise(iterable, n):
"s -> (s0,s1,..s(n-1)), (s1,s2,.., sn), (s2, s3,..,s(n+1)), ..."
iters = iter(iterable)
result = tuple(islice(iters, n))
if len(result) == n:
yield result
for elem in iters:
result = result[1:] + (elem,)
yield result
You can use strides:
arr = [2, 3, 5, 7, 11, 13]
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = rolling_window(np.array(arr), 3)
print (a)
[[ 2 3 5]
[ 3 5 7]
[ 5 7 11]
[ 7 11 13]]
print (a.tolist())
[[2, 3, 5],
[3, 5, 7],
[5, 7, 11],
[7, 11, 13]]
If your source (list) is gigantic, it means the source provider should produce a value on demand. The way of doing that is by making a generator.
Hypothetical source generator from a file;
def gen_value():
with open('big-file.txt') as f:
for line in f:
for x in line.split():
yield int(x)
The grouper function recipe may be used to consume the generator:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
So you may call list(grouper(gen(), 3))

How to stop placing elements in list

I want te creates a function which add elements to list. I want it to stop when it comes to border of range()
I got this:
def get_values(i,n):
d =[]
for x in range(n):
d.append(next(i))
return d
i = iter(range(10))
print((get_values(i,5)))
print((get_values(i,4)))
print((get_values(i,2)))
It gives me:
[0, 1, 2, 3, 4]
[5, 6, 7, 8]
Traceback (most recent call last):
File "/Users/user/Documents/untitled1/lol.py", line 17, in <module>
print((get_values(i,2)))
File "/Users/user/Documents/untitled1/lol.py", line 4, in get_values
d.append(next(i))
StopIteration
But I want to achive this:
>>> i = iter(range(10))
>>> get_values(i, 3)
[0, 1, 2]
>>> get_values(i, 5)
[3, 4, 5, 6, 7]
>>> get_values(i, 4)
[8, 9]
>>> get_values(i, 4)
[]
How can I control the loop to put just elements from range() of i?
just listen to the error and stop iteration when there is a error and break out of the loop:
def get_values(i,n):
d =[]
for x in range(n):
try:
d.append(next(i))
except StopIteration:
break
return d
The only way you can check if you can continue is to listen to the StopIteration exception. Here is another solution I thought can be handy:
def get_values(i, n):
d = []
try:
for _ in range(n):
d.append(next(i))
finally:
return d
Look for https://www.geeksforgeeks.org/python-next-method/ to learn more about arguments you can pass to next statement
def get_values(i,n):
d =[]
for x in range(n):
temp=next(i,'end')
if temp=="end":
break
d.append(temp)
return d
i = iter(range(10))
print((get_values(i,5)))
print((get_values(i,4)))
print((get_values(i,2)))
Output
[0, 1, 2, 3, 4]
[5, 6, 7, 8]
[9]

Create a multiset from a Set X

in a Multiset it is allowed to have multiple elements
For Example. if X (normal set) = {0,2,4,7,10}, then ∆X (multiset) = {2,2,3,3,4,5,6,7,8,10}.
∆X denotes the multiset of all 􏰃(N 2) pairwise distances between points in X
How can i Write this in Python?
I have created a List X but i don't know how to put all differences in another list and order them.
I hope you can help me.
It is basically just one line.
import itertools
s = {0,2,4,7,10}
sorted([abs(a-b) for (a,b) in itertools.combinations(s,2)])
you can use itertools
import itertools
s = {0,2,4,7,10}
k = itertools.combinations(s,2)
distance = []
l = list(k)
for p in l:
distance.append(abs(p[1]-p[0]))
print(sorted(distance))
A simple way is to convert your set to a list, sort it, and then use a double for loop to compute the differences:
X = {0,2,4,7,10} # original set
sorted_X = sorted(list(X))
diffs = []
for i, a in enumerate(sorted_X):
for j, b in enumerate(sorted_X):
if j > i:
diffs.append(b-a)
print(diffs)
#[2, 4, 7, 10, 2, 5, 8, 3, 6, 3]
And if you want the diffs sorted as well:
print(sorted(diffs))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]
Another option that would work in this case is to use itertools.product:
from itertools import product
print(sorted([(y-x) for x,y in product(sorted_X, sorted_X) if y>x]))
#[2, 2, 3, 3, 4, 5, 6, 7, 8, 10]

built-in max heap API in Python

Default heapq is min queue implementation and wondering if there is an option for max queue? Thanks.
I tried the solution using _heapify_max for max heap, but how to handle dynamically push/pop element? It seems _heapify_max could only be used during initialization time.
import heapq
def heapsort(iterable):
h = []
for value in iterable:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
if __name__ == "__main__":
print heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
Edit, tried _heapify_max seems not working for dynamically push/pop elements. I tried both methods output the same, both output is, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].
def heapsort(iterable):
h = []
for value in iterable:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
def heapsort2(iterable):
h = []
heapq._heapify_max(h)
for value in iterable:
heapq.heappush(h, value)
return [heapq.heappop(h) for i in range(len(h))]
if __name__ == "__main__":
print heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
print heapsort2([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
Thanks in advance,
Lin
In the past I have simply used sortedcontainers's SortedList for this, as:
> a = SortedList()
> a.add(3)
> a.add(2)
> a.add(1)
> a.pop()
3
It's not a heap, but it's fast and works directly as required.
If you absolutely need it to be a heap, you could make a general negation class to hold your items.
class Neg():
def __init__(self, x):
self.x = x
def __cmp__(self, other):
return -cmp(self.x, other.x)
def maxheappush(heap, item):
heapq.heappush(heap, Neg(item))
def maxheappop(heap):
return heapq.heappop(heap).x
But that will be using a little more memory.
There is a _heappop_max function in the latest cpython source that you may find useful:
def _heappop_max(heap):
"""Maxheap version of a heappop."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
heapq._siftup_max(heap, 0)
return returnitem
return lastelt
If you change the heappush logic using heapq._siftdown_max you should get the desired output:
def _heappush_max(heap, item):
heap.append(item)
heapq._siftdown_max(heap, 0, len(heap)-1)
def _heappop_max(heap):
"""Maxheap version of a heappop."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
heapq._siftup_max(heap, 0)
return returnitem
return lastelt
def heapsort2(iterable):
h = []
heapq._heapify_max(h)
for value in iterable:
_heappush_max(h, value)
return [_heappop_max(h) for i in range(len(h))]
Output:
In [14]: heapsort2([1,3,6,2,7,9,0,4,5,8])
Out[14]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [15]: heapsort2([7, 8, 9, 6, 4, 2, 3, 5, 1, 0])
Out[15]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [16]: heapsort2([19,13,15,17,11,10,14,20,18])
Out[16]: [20, 19, 18, 17, 15, 14, 13, 11, 10]
In [17]: heapsort2(["foo","bar","foobar","baz"])
Out[17]: ['foobar', 'foo', 'baz', 'bar']

Python - iterating beginning with the middle of the list and then checking either side

Really not sure where this fits. Say, I have a list:
>>>a = [1, 2, 3, 4, 5, 6, 7]
How can I iterate it in such a way, that it will check 4 first, then 5, then 3, then 6, and then 2(and so on for bigger lists)? I have only been able to work out the middle which is
>>>middle = [len(a)/2 if len(a) % 2 = 0 else ((len(a)+1)/2)]
I'm really not sure how to apply this, nor am I sure that my way of working out the middle is the best way. I've thought of grabbing two indexes and after each iteration, adding 1 and subtracting 1 from each respective index but have no idea how to make a for loop abide by these rules.
With regards as to why I need this; it's for analysing a valid play in a card game and will check from the middle card of a given hand up to each end until a valid card can be played.
You can just keep removing from the middle of list:
lst = range(1, 8)
while lst:
print lst.pop(len(lst)/2)
This is not the best solution performance-wise (removing item from list is expensive), but it is simple - good enough for a simple game.
EDIT:
More performance stable solution would be a generator, that calculates element position:
def iter_from_middle(lst):
try:
middle = len(lst)/2
yield lst[middle]
for shift in range(1, middle+1):
# order is important!
yield lst[middle - shift]
yield lst[middle + shift]
except IndexError: # occures on lst[len(lst)] or for empty list
raise StopIteration
To begin with, here is a very useful general purpose utility to interleave two sequences:
def imerge(a, b):
for i, j in itertools.izip_longest(a,b):
yield i
if j is not None:
yield j
with that, you just need to imerge
a[len(a) / 2: ]
with
reversed(a[: len(a) / 2])
You could also play index games, for example:
>>> a = [1, 2, 3, 4, 5, 6, 7]
>>> [a[(len(a) + (~i, i)[i%2]) // 2] for i in range(len(a))]
[4, 5, 3, 6, 2, 7, 1]
>>> a = [1, 2, 3, 4, 5, 6, 7, 8]
>>> [a[(len(a) + (~i, i)[i%2]) // 2] for i in range(len(a))]
[4, 5, 3, 6, 2, 7, 1, 8]
Here's a generator that yields alternating indexes for any given provided length. It could probably be improved/shorter, but it works.
def backNforth(length):
if length == 0:
return
else:
middle = length//2
yield middle
for ind in range(1, middle + 1):
if length > (2 * ind - 1):
yield middle - ind
if length > (2 * ind):
yield middle + ind
# for testing:
if __name__ == '__main__':
r = range(9)
for _ in backNforth(len(r)):
print(r[_])
Using that, you can just do this to produce a list of items in the order you want:
a = [1, 2, 3, 4, 5, 6, 7]
a_prime = [a[_] for _ in backNforth(len(a))]
In addition to the middle elements, I needed their index as well. I found Wasowski's answer very helpful, and modified it:
def iter_from_middle(lst):
index = len(lst)//2
for i in range(len(lst)):
index = index+i*(-1)**i
yield index, lst[index]
>>> my_list = [10, 11, 12, 13, 14, 15]
>>> [(index, item) for index, item in iter_from_middle(my_list)]
[(3, 13), (2, 12), (4, 14), (1, 11), (5, 15), (0, 10)]

Categories