How to chunk up an interval in Python?

How to chunk up an interval in Python? - python

Is there a built-in/existing library function that is like xrange but splits an interval into even spaced non-overlapping chunks?
For instance, if we call this function xchunks, then I would like:
>>> xchunks(start=0, stop=18, step=5)
[(0, 4), (5, 9), (10, 14), (15, 17)]
Ideally, this should also work for negative step.
>>> xchunks(start=20, stop=2, step=5)
[(20, 16), (15, 11), (10, 6), (5, 3)]

the full solution to this would look like:
[(s, (s+step-1 if s+step-1<stop-1 else stop-1)) for s in xrange(start,stop,step)]
use range or xrange, whichever you like.

This (almost) does what you want:
[(s, s + step - 1) for s in range(start, stop, step)]
Result:
[(0, 4), (5, 9), (10, 14), (15, 19)]
You can improve on this an make the last pair (15, 17).

This gives an iterator as with xrange and also works with a negative step. If you're using Python 3.x then replace xrange with range.
def xchunks(start, stop, step):
return ((i, min(i + step - 1, stop - 1)) if step > 0 else
(i, max(i + step + 1, stop + 1))
for i in xrange(start, stop, step))
def test():
result = list(xchunks(start=0, stop=18, step=5))
print list(result)
assert result == [(0, 4), (5, 9), (10, 14), (15, 17)]
result = list(xchunks(start=20, stop=2, step=-5))
print list(result)
assert result == [(20, 16), (15, 11), (10, 6), (5, 3)]

Related

I have an interval of integers that comprises some inner intervals. Given these intervals I want to compute a list including the intervals between

Inner intervals are always inside the global one.
All intervals are integer, left-closed, right-open intervals.
Let's take this example.
The "global" interval is [0, 22[.
"Inner" intervals are [3, 6[ and [12, 15[.
For this example I expect :
[0, 3[ U [3, 6[ U [6, 12[ U [12, 15[ U [15, 22[
I've tried to define a function but then messed up with indices while iterating over intervals.
def allspans(r, spans):
pass
allspans((0, 22), [(3,6), (12,15)]) # expected : [(0, 3), (3, 6), (6, 12), (12, 15), (15, 22)]

Yes you have to iterate over your spans but take care of maintaining a position to correctly fill the spaces between.
from typing import Generator
def allspans(r, spans) -> Generator:
pos = 0
for lower, upper in spans:
if pos < lower:
yield pos, lower
yield lower, upper
pos = upper
if pos <= r[1]:
yield pos, r[1]
I find it easier to use a Generator.
Just use list() to convert to a List.
list(allspans((0, 22), [(3,6), (12,15)])) # [(0, 3), (3, 6), (6, 12), (12, 15), (15, 22)]

Using a normal loop:
def allspans(r, spans):
intervals = []
intervals.append((r[0], spans[0][0]))
for i in range(len(spans)):
current_span = spans[i]
if i != 0:
intervals.append((spans[i - 1][1], current_span[0]))
intervals.append((current_span[0], current_span[1]))
intervals.append((spans[-1][1], r[1]))
return intervals
print(allspans((0, 22), [(3, 6), (12, 15)]))
# [(0, 3), (3, 6), (6, 12), (12, 15), (15, 22)]

Using itertools.chain and itertools.pairwise (Python 3.10+):
from itertools import chain, pairwise
def all_spans(r, spans):
start, end = r
it = chain((start,), chain.from_iterable(spans), (end,))
return [t for t in pairwise(it) if t[0] != t[1]]
First, we construct an iterator it over all of the interval endpoints in order; then the sub-ranges are all the pairs of consecutive endpoints, excluding the empty sub-ranges where two consecutive endpoints are equal.

How to calculate the mean of "n" outputs which I got after using for loop and assign it to a variable?

I used a for loop to calculate a series of thresholds for diff segments in an image.
>>crdf
[(11, 6),
(11, 7),
(11, 11),
(12, 16),
(10, 9),
(21, 26),
(15, 15),
(12, 17),
(12, 12),
(14, 10),
(20, 26)
]
>>for i in range(0,4):
>> print(threshold_otsu(img[crdf[i]],16))
-14.606459
-15.792943
-15.547393
-16.170353
How to calculate the mean of these thresholds(output values) and store it in a variable using python?

Many ways to do this:
Using numpy:
import numpy as np
thresholds = []
for i in range(0,4):
thresholds.append(threshold_otsu(img[crdf[i]],16))
mean_threshold = np.mean(thresholds)
Not using numpy:
threshold_sum = 0
i_values = range(0,4)
for i in i_values:
threshold_sum += threshold_otsu(img[crdf[i]],16)
mean_threshold = threshold_sum / len(i_values)

You can modify your code to look like the following, pay attention to the for loop part and after the for loop. Essentially all the numbers in the for loop are being appended to an array and to calculate then we calculate the mean of the numbers within that array after the for loop:
crdf = [(11, 6),
(11, 7),
(11, 11),
(12, 16),
(10, 9),
(21, 26),
(15, 15),
(12, 17),
(12, 12),
(14, 10),
(20, 26)
]
arrayOfNumbers=[]
for i in range(0,4):
arrayOfNumbers.append(threshold_otsu(img[crdf[i]],16))
mean = float(sum(arrayOfNumbers)) / max(len(arrayOfNumbers), 1)
print(mean)
I don't know how you calculate thing out with threshold_otsu() but eventually if out of the for loop you will get those 4 values and they will be appended to arrayOfNumbers you will be in a situation like this:
#the array will have for example these 4 values
arrayOfNumbers=[-14.606459, -15.792943, -15.547393, -16.170353]
mean = float(sum(arrayOfNumbers)) / max(len(arrayOfNumbers), 1)
print(mean)
#-15.529287

How to get all maximal non-overlapping sets of spans from a list of spans

I can't seem to find a way to write the algorithm in the title without needing to curate the results in some way.
To illustrate what I want:
all_spans = [(0, 5), (2, 7), (5, 8), (6, 10), (9, 10), (11, 15)]
possible_sets = [
{(0, 5), (5, 8), (9, 10), (11, 15)},
{(2, 7), (9, 10), (11, 15)},
{(0, 5), (6, 10), (11, 15)}
]
not_possible = [
{(0, 5), (5, 8), (6, 10), (11, 15)}, # has overlaps
{(5, 8), (9, 10), (11, 15)} # not maximal w.r.t possible_sets[0]
]
My current implementation is more or less this:
def has_overlap(a, b):
return a[1] > b[0] and b[1] > a[0]
def combine(spans, current, idx=0):
for i in range(idx, len(spans)):
overlaps = {e for e in current if has_overlap(e, spans[i])}
if overlaps:
yield from combine(spans, current-overlaps, i)
else:
current.add(spans[i])
yield current
But it produces non-maximal spans that I'd rather just not create in the first place.
>>> for s in combine(all_spans, set()):
... print(sorted(s))
[(9, 10), (11, 15)]
[(6, 10), (11, 15)]
[(5, 8), (9, 10), (11, 15)]
[(9, 10), (11, 15)]
[(6, 10), (11, 15)]
[(2, 7), (9, 10), (11, 15)]
[(0, 5), (9, 10), (11, 15)]
[(0, 5), (6, 10), (11, 15)]
[(0, 5), (5, 8), (9, 10), (11, 15)]
Is there a different approach that avoids this behavior? I found similar problems under the keywords "interval overlaps" and "activity scheduling", but none of them seemed to refer to this particular problem.

It depends on what you mean by not wanting to curate the results.
You can filter out the non-maximal results after using your generator with:
all_results = [s for s in combine(all_spans, set())]
for first_result in list(all_results):
for second_result in list(all_results):
if first_result.issubset(second_result) and first_result != second_result:
all_results.remove(first_result)
break
To not produce them in the first place, you could do a check before yielding to see whether an answer is maximal. Something like:
def combine(spans, current, idx=0):
for i in range(idx, len(spans)):
overlaps = {e for e in current if has_overlap(e, spans[i])}
if overlaps:
yield from combine(spans, current-overlaps, i)
else:
current.add(spans[i])
# Check whether the current set is maximal.
possible_additions = set(spans)
for item_to_consider in set(possible_additions):
if any([has_overlap(item_in_current, item_to_consider) for item_in_current in current]):
possible_additions.remove(item_to_consider)
if len(possible_additions) == 0:
yield current

This is a simple (?) graph problem. Make a directed graph where each span is a node. There is an edge AB (from node A to node B) iff A[1] <= B[0] -- in prose, if span B doesn't start until span A finishes. Your graph would look like
Node => Successors
(0, 5) => (5, 8), (6, 10), (9, 10), (11, 15)
(2, 7) => (9, 10), (11, 15)
(5, 8) => (9, 10), (11, 15)
(6, 10) => (11, 15)
(9, 10) => (11, 15)
Now, the problem reduces to simply finding the longest path through the graph, including ties.
Given the linearity of the problem, finding one maximal solution is easier: at each step, pick the successor node with the soonest ending time. In steps:
To start, all nodes are available. The one with the soonest ending time is (0,5).
The successor to (0,5) with the earliest end is (5, 8).
The successor to (5,8) ... is (9, 10)
... and finally add (11, 15)
Note that this much doesn't require a graph; merely a structure you're willing to reference by either first or second sub-element.
The solution length is 4, as you already know.
Can you take it form here?

Assuming ranges are sorted by lower bound, we'd like to append the current range to the longest paths it can be appended to, or create a new path (append to an empty path). If it's called for, we could consider making the search for the longest prefixes more efficient. (The code below just updates that search in a slightly optimised linear method.)
(I'm not sure how to use the yield functionality, perhaps you could make this code more elegant.)
# Assumes spans are sorted by lower bound
# and each tuple is a valid range
def f(spans):
# Append the current span to the longest
# paths it can be appended to.
paths = [[spans.pop(0)]]
for l,r in spans:
to_extend = []
longest = 0
print "\nCandidate: %s" % ((l,r),)
for path in paths:
lp, rp = path[-1]
print "Testing on %s" % ((lp,rp),)
if lp <= l < rp:
prefix = path[:-1]
if len(prefix) >= longest:
to_extend.append(prefix + [(l,r)])
longest = len(prefix)
# Otherwise, it's after so append it.
else:
print "Appending to path: %s" % path
path.append((l, r))
longest = len(path)
for path in to_extend:
print "Candidate extensions: %s" % to_extend
if len(path) == longest + 1:
print "Adding to total paths: %s" % path
paths.append(path)
print "\nResult: %s" % paths
return paths
all_spans = [(0, 5), (2, 7), (5, 8), (6, 10), (9, 10), (11, 15)]
f(all_spans)
Output:
"""
Candidate: (2, 7)
Testing on (0, 5)
Candidate extensions: [[(2, 7)]]
Adding to total paths: [(2, 7)]
Candidate: (5, 8)
Testing on (0, 5)
Appending to path: [(0, 5)]
Testing on (2, 7)
Candidate: (6, 10)
Testing on (5, 8)
Testing on (2, 7)
Candidate extensions: [[(0, 5), (6, 10)]]
Adding to total paths: [(0, 5), (6, 10)]
Candidate: (9, 10)
Testing on (5, 8)
Appending to path: [(0, 5), (5, 8)]
Testing on (2, 7)
Appending to path: [(2, 7)]
Testing on (6, 10)
Candidate: (11, 15)
Testing on (9, 10)
Appending to path: [(0, 5), (5, 8), (9, 10)]
Testing on (9, 10)
Appending to path: [(2, 7), (9, 10)]
Testing on (6, 10)
Appending to path: [(0, 5), (6, 10)]
Result: [[(0, 5), (5, 8), (9, 10), (11, 15)],
[(2, 7), (9, 10), (11, 15)],
[(0, 5), (6, 10), (11, 15)]]
"""

iterating through list of tuples while accessing the previous and next elements

I'm trying to iterate of a list of tuples while still having access to the previous tuple and the next tuple.Specifically I need to compare the y coordinate of the previous current and next tuple.
This is my list of tuples that I'm using as input:
[(1,1),(2,2),(3,4),(4,3),(5,5),(6,3.5),(7,7),(8,8),(9,9),(10,8),(11,9.5),(11,7.5),(12,12),(13,1.5)]
I initially used this code segment to be able to have access to the previous, current and next elements:
def previous_and_next(some_iterable):
prevs, current, nexts = tee(some_iterable, 3)
prevs = chain([None], prevs)
nexts = chain(islice(nexts, 1, None), [None])
return zip(prevs, current, nexts)
But I can't access the elements of the tuple using this function as it returns an error about subscripting. Im open to new ideas or different functions, as this bit is clearly not what I need.
For more clarification this is the function that I am currently trying to implement
UF = UnionFind()
sortedlist = sorted(pointcloud, key=lambda x: x[1])
for previous, current, nxt in previous_and_next(sortedlist):
if previous[1] > current[1] and nxt[1] > current[1]:
UF.insert_objects(current)
elif previous[1] < current[1] and nxt[1] < current[1]:
c=UF.find(previous[1])
d=UF.find(nxt[1])
UF.union(c,d)
else:
c=UF.find(previous[1])
UF.union(current,previous)
return

there's no relation with iteration, previous or next elements.
The real issue is that your start and end points aren't tuples but None. So the first thing that does this code:
for previous, current, nxt in previous_and_next(sortedlist):
if previous[1] > current[1] and nxt[1] > current[1]:
UF.insert_objects(current)
is breaking because previous is None and None[1] isn't valid (not subscriptable).
>>> previous=None
>>> previous[1]
Traceback (most recent call last):
File "<string>", line 301, in runcode
File "<interactive input>", line 1, in <module>
TypeError: 'NoneType' object is not subscriptable
So either replace None by a tuple made of "invalid" values (like (-1,-1), depending on the effect you need) or filter out start & end triplets:
for t in previous_and_next(sortedlist):
if None not in t:
previous, current, nxt = t
if previous[1] > current[1] and nxt[1] > current[1]:

Not sure I understand your question, but think maybe this might help:
def previous_and_next(iterable):
iterable = iter(iterable)
prv, cur, nxt = None, next(iterable), next(iterable)
while True:
yield prv, cur, nxt
prv, cur, nxt = cur, nxt, next(iterable)
tuples = [(1, 1), (2, 2), (3, 4), (4, 3), (5, 5), (6, 3.5), (7, 7), (8, 8), (9, 9),
(10, 8), (11, 9.5), (11, 7.5), (12, 12), (13, 1.5)]
for p, c, n in previous_and_next(tuples):
print(p, c, n)
Output:
None (1, 1) (2, 2)
(1, 1) (2, 2) (3, 4)
(2, 2) (3, 4) (4, 3)
(3, 4) (4, 3) (5, 5)
(4, 3) (5, 5) (6, 3.5)
(5, 5) (6, 3.5) (7, 7)
(6, 3.5) (7, 7) (8, 8)
(7, 7) (8, 8) (9, 9)
(8, 8) (9, 9) (10, 8)
(9, 9) (10, 8) (11, 9.5)
(10, 8) (11, 9.5) (11, 7.5)
(11, 9.5) (11, 7.5) (12, 12)
(11, 7.5) (12, 12) (13, 1.5)

Find maximum equidistant points on a line

I need an algorithm to find maximum no of equidistant points on the same line.
Input: List of collinear points
For example: My points could be
[(1, 1), (1, 2), (1, 3)]
In this case what I could do is sort the points based on their distance from origin and find the distance sequentially. However, in a scenario such as below the condition is failing. All the points are on the same line y=-x+6, and are equidistant from each other.
[(3, 3), (2, 4), (4, 2), (5, 1), (1, 5)]
because all the points are equidistant from origin, and sorting order could be anything so sequential traversal is not possible.
For example, if final dictionary become this [(3, 3), (5, 1), (4, 2), (2, 4), (1,5)] we would end up calculating distance between (3,3) and (5,1), which is not correct. Ideally, I would want to calculate the distance between closest points so the order should be (1,5), (2,4).
To overcome this problem I created a O(n*n) solution by iterating using 2 loops, and finding frequency of minimum distance between any 2 points:
import sys
distance_list=[]
lop=[(1, 3), (2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
lop.sort(key=lambda x: x[0]*x[0] + x[1]*x[1])
for k in range(0, len(lop)):
min_dist=sys.maxint
for l in range(0, len(lop)):
if k!=l:
temp_dist = ( (lop[k][0] - lop[l][0])*(lop[k][0] - lop[l][0]) + (lop[k][1] - lop[l][1])*(lop[k][1] - lop[l][1]) )
min_dist= min(min_dist, temp_dist)
distance_list.append(min_dist)
print distance_list.count (max(distance_list,key=distance_list.count))
However, above solution failed for below test case:
[(1, 3), (2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
Expected answer should be: 5
However, I'm getting: 9
Essentially, I am not able to make sure, how do I do distinction between 2 cluster of points which contain equidistant points; In the above example that would be
[(1, 3), (2, 4), (3, 5), (4, 6)] AND [(10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]

If you want to put the points in order, you don't need to sort them by distance from anything. You can just sort them by the default lexicographic order, which is consistent with the order along the line:
lop.sort()
Now you just need to figure out how to find the largest set of equidistant points. That could be tricky, especially if you're allowed to skip points.

because you want the distance of consecutive points, there is no need to calculate all combinations, you just need to calculate the distance of (p0,p1), (p1,p2), (p2,p3), and so on, and group those pairs in that order by the value of their distance, once you have done that, you just need the longest sequence among those, to do that the itertools module come in handy
from itertools import groupby, tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
def distance(a,b):
ax,ay = a
bx,by = b
return (ax-bx)**2 + (ay-by)**2
def longest_seq(points):
groups = [ list(g) for k,g in groupby(pairwise(points), lambda p:distance(*p)) ]
max_s = max(groups,key=len) # this is a list of pairs [(p0,p1), (p1,p2), (p2,p3),..., (pn-1,pn)]
ans = [ p[0] for p in max_s ]
ans.append( max_s[-1][-1] ) # we need to include the last point manually
return ans
here the goupby function group together consecutive pairs of points that have the same distance, pairwise is a recipe to do the desire pairing, and the rest is self explanatory.
here is a test
>>> test = [(1, 3), (2, 4), (3, 5), (4, 6), (10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
>>> longest_seq(test)
[(10, 12), (11, 13), (12, 14), (13, 15), (14, 16)]
>>>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to chunk up an interval in Python? - python

the full solution to this would look like: [(s, (s+step-1 if s+step-1<stop-1 else stop-1)) for s in xrange(start,stop,step)] use range or xrange, whichever you like.

This (almost) does what you want: [(s, s + step - 1) for s in range(start, stop, step)] Result: [(0, 4), (5, 9), (10, 14), (15, 19)] You can improve on this an make the last pair (15, 17).

Related

I have an interval of integers that comprises some inner intervals. Given these intervals I want to compute a list including the intervals between

How to calculate the mean of "n" outputs which I got after using for loop and assign it to a variable?

How to get all maximal non-overlapping sets of spans from a list of spans

iterating through list of tuples while accessing the previous and next elements

Find maximum equidistant points on a line

Categories

Resources