A friend of mine passed me over an interview question he recently got and I wasn't very happy with my approach to the solution. The question is as follows:
You have two lists.
Each list will contain lists of length 2, which represent a range (ie. [3,5] means a range from 3 to 5, inclusive).
You need to return the intersection of all ranges between the sets. If I give you [1,5] and [0,2], the result would be [1,2].
Within each list, the ranges will always increase and never overlap (i.e. it will be [[0, 2], [5, 10] ... ] never [[0,2], [2,5] ... ])
In general there are no "gotchas" in terms of the ordering or overlapping of the lists.
Example:
a = [[0, 2], [5, 10], [13, 23], [24, 25]]
b = [[1, 5], [8, 12], [15, 18], [20, 24]]
Expected output:
[[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]
My lazy solution involved spreading the list of ranges into a list of integers then doing a set intersection, like this:
def get_intersection(x, y):
x_spread = [item for sublist in [list(range(l[0],l[1]+1)) for l in x] for item in sublist]
y_spread = [item for sublist in [list(range(l[0],l[1]+1)) for l in y] for item in sublist]
flat_intersect_list = list(set(x_spread).intersection(y_spread))
...
But I imagine there's a solution that's both readable and more efficient.
Please explain how you would mentally tackle this problem, if you don't mind. A time/space complexity analysis would also be helpful.
Thanks
[[max(first[0], second[0]), min(first[1], second[1])]
for first in a for second in b
if max(first[0], second[0]) <= min(first[1], second[1])]
A list comprehension which gives the answer:
[[1, 2], [5, 5], [8, 10], [15, 18], [20, 23], [24, 24]]
Breaking it down:
[[max(first[0], second[0]), min(first[1], second[1])]
Maximum of the first term, Min of the 2nd term
for first in a for second in b
For all combinations of first and second term:
if max(first[0], second[0]) <= min(first[1], second[1])]
Only if the max of the first does not exceed the minimum of the second.
If you need the output compacted, then the following function does that (In O(n^2) time because deletion from a list is O(n), a step we perform O(n) times):
def reverse_compact(lst):
for index in range(len(lst) - 2,-1,-1):
if lst[index][1] + 1 >= lst[index + 1][0]:
lst[index][1] = lst[index + 1][1]
del lst[index + 1] # remove compacted entry O(n)*
return lst
It joins ranges which touch, given they are in-order. It does it in reverse because then we can do this operation in place and delete the compacted entries as we go. If we didn't do it in reverse, deleting other entries would muck with our index.
>>> reverse_compact(comp)
[[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]
The compacting function can be reduced further to O(n) by doing a forward in place compaction and copying back the elements, as then each inner step is O(1) (get/set instead of del), but this is less readable:
This runs in O(n) time and space complexity:
def compact(lst):
next_index = 0 # Keeps track of the last used index in our result
for index in range(len(lst) - 1):
if lst[next_index][1] + 1 >= lst[index + 1][0]:
lst[next_index][1] = lst[index + 1][1]
else:
next_index += 1
lst[next_index] = lst[index + 1]
return lst[:next_index + 1]
Using either compactor, the list comprehension is the dominating term here, with time =O(n*m), space = O(m+n), as it compares all possible combinations of the two lists with no early outs. This does not take advantage of the ordered structure of the lists given in the prompt: you could exploit that structure to reduce the time complexity to O(n + m) as they always increase and never overlap, meaning you can do all comparisons in a single pass.
Note there is more than one solution and hopefully you can solve the problem and then iteratively improve upon it.
A 100% correct answer which satisfies all possible inputs is not the goal of an interview question. It is to see how a person thinks and handles challenges, and whether they can reason about a solution.
In fact, if you give me a 100% correct, textbook answer, it's probably because you've seen the question before and you already know the solution... and therefore that question isn't helpful to me as an interviewer. 'Check, can regurgitate solutions found on StackOverflow.' The idea is to watch you solve a problem, not regurgitate a solution.
Too many candidates miss the forest for the trees: Acknowledging shortcomings and suggesting solutions is the right way to go about an answer to an interview questions. You don't have to have a solution, you have to show how you would approach the problem.
Your solution is fine if you can explain it and detail potential issues with using it.
I got my current job by failing to answer an interview question: After spending the majority of my time trying, I explained why my approach didn't work and the second approach I would try given more time, along with potential pitfalls I saw in that approach (and why I opted for my first strategy initially).
OP, I believe this solution works, and it runs in O(m+n) time where m and n are the lengths of the lists. (To be sure, make ranges a linked list so that changing its length runs in constant time.)
def intersections(a,b):
ranges = []
i = j = 0
while i < len(a) and j < len(b):
a_left, a_right = a[i]
b_left, b_right = b[j]
if a_right < b_right:
i += 1
else:
j += 1
if a_right >= b_left and b_right >= a_left:
end_pts = sorted([a_left, a_right, b_left, b_right])
middle = [end_pts[1], end_pts[2]]
ranges.append(middle)
ri = 0
while ri < len(ranges)-1:
if ranges[ri][1] == ranges[ri+1][0]:
ranges[ri:ri+2] = [[ranges[ri][0], ranges[ri+1][1]]]
ri += 1
return ranges
a = [[0,2], [5,10], [13,23], [24,25]]
b = [[1,5], [8,12], [15,18], [20,24]]
print(intersects(a,b))
# [[1, 2], [5, 5], [8, 10], [15, 18], [20, 24]]
Algorithm
Given two intervals, if they overlap, then the intersection's starting point is the maximum of the starting points of the two intervals, and its stopping point is the minimum of the stopping points:
To find all the pairs of intervals that might intersect, start with the first pair and keep incrementing the interval with the lower stopping point:
At most m + n pairs of intervals are considered, where m is length of the first list, and n is the length of the second list. Calculating the intersection of a pair of intervals is done in constant time, so this algorithm's time-complexity is O(m+n).
Implementation
To keep the code simple, I'm using Python's built-in range object for the intervals. This is a slight deviation from the problem description in that ranges are half-open intervals rather than closed. That is,
(x in range(a, b)) == (a <= x < b)
Given two range objects x and y, their intersection is range(start, stop), where start = max(x.start, y.start) and stop = min(x.stop, y.stop). If the two ranges don't overlap, then start >= stop and you just get an empty range:
>>> len(range(1, 0))
0
So given two lists of ranges, xs and ys, each increasing in start value, the intersection can be computed as follows:
def intersect_ranges(xs, ys):
# Merge any abutting ranges (implementation below):
xs, ys = merge_ranges(xs), merge_ranges(ys)
# Try to get the first range in each iterator:
try:
x, y = next(xs), next(ys)
except StopIteration:
return
while True:
# Yield the intersection of the two ranges, if it's not empty:
intersection = range(
max(x.start, y.start),
min(x.stop, y.stop)
)
if intersection:
yield intersection
# Try to increment the range with the earlier stopping value:
try:
if x.stop <= y.stop:
x = next(xs)
else:
y = next(ys)
except StopIteration:
return
It seems from your example that the ranges can abut. So any abutting ranges have to be merged first:
def merge_ranges(xs):
start, stop = None, None
for x in xs:
if stop is None:
start, stop = x.start, x.stop
elif stop < x.start:
yield range(start, stop)
start, stop = x.start, x.stop
else:
stop = x.stop
yield range(start, stop)
Applying this to your example:
>>> a = [[0, 2], [5, 10], [13, 23], [24, 25]]
>>> b = [[1, 5], [8, 12], [15, 18], [20, 24]]
>>> list(intersect_ranges(
... (range(i, j+1) for (i, j) in a),
... (range(i, j+1) for (i, j) in b)
... ))
[range(1, 3), range(5, 6), range(8, 11), range(15, 19), range(20, 25)]
I know this question already got a correct answer. For completeness, I would like to mention I developed some time ago a Python library, namely portion (https://github.com/AlexandreDecan/portion) that supports this kind of operations (intersections between list of atomic intervals).
You can have a look at the implementation, it's quite close to some of the answers that were provided here: https://github.com/AlexandreDecan/portion/blob/master/portion/interval.py#L406
To illustrate its usage, let's consider your example:
a = [[0, 2], [5, 10], [13, 23], [24, 25]]
b = [[1, 5], [8, 12], [15, 18], [20, 24]]
We need to convert these "items" to closed (atomic) intervals first:
import portion as P
a = [P.closed(x, y) for x, y in a]
b = [P.closed(x, y) for x, y in b]
print(a)
... displays [[0,2], [5,10], [13,23], [24,25]] (each [x,y] is an Interval object).
Then we can create an interval that represents the union of these atomic intervals:
a = P.Interval(*a)
b = P.Interval(*b)
print(b)
... displays [0,2] | [5,10] | [13,23] | [24,25] (a single Interval object, representing the union of all the atomic ones).
And now we can easily compute the intersection:
c = a & b
print(c)
... displays [1,2] | [5] | [8,10] | [15,18] | [20,23] | [24].
Notice that our answer differs from yours ([20,23] | [24] instead of [20,24]) since the library expects continuous domains for values. We can quite easily convert the results to discrete intervals following the approach proposed in https://github.com/AlexandreDecan/portion/issues/24#issuecomment-604456362 as follows:
def discretize(i, incr=1):
first_step = lambda s: (P.OPEN, (s.lower - incr if s.left is P.CLOSED else s.lower), (s.upper + incr if s.right is P.CLOSED else s.upper), P.OPEN)
second_step = lambda s: (P.CLOSED, (s.lower + incr if s.left is P.OPEN and s.lower != -P.inf else s.lower), (s.upper - incr if s.right is P.OPEN and s.upper != P.inf else s.upper), P.CLOSED)
return i.apply(first_step).apply(second_step)
print(discretize(c))
... displays [1,2] | [5] | [8,10] | [15,18] | [20,24].
I'm no kind of python programmer, but don't think this problem is amenable to slick Python-esque short solutions that are also efficient.
Mine treats the interval boundaries as "events" labeled 1 and 2, processing them in order. Each event toggles the respective bit in a parity word. When we toggle to or from 3, it's time to emit the beginning or end of an intersection interval.
The tricky part is that e.g. [13, 23], [24, 25] is being treated as [13, 25]; adjacent intervals must be concatenated. The nested if below takes care of this case by continuing the current interval rather than starting a new one. Also, for equal event values, interval starts must be processed before ends so that e.g. [1, 5] and [5, 10] will be emitted as [5, 5] rather than nothing. That's handled with the middle field of the event tuples.
This implementation is O(n log n) due to the sorting, where n is the total length of both inputs. By merging the two event lists pairwise, it could be O(n), but this article suggests that the lists must be huge before the library merge will beat the library sort.
def get_isect(a, b):
events = (map(lambda x: (x[0], 0, 1), a) + map(lambda x: (x[1], 1, 1), a)
+ map(lambda x: (x[0], 0, 2), b) + map(lambda x: (x[1], 1, 2), b))
events.sort()
prevParity = 0
isect = []
for event in events:
parity = prevParity ^ event[2]
if parity == 3:
# Maybe start a new intersection interval.
if len(isect) == 0 or isect[-1][1] < event[0] - 1:
isect.append([event[0], 0])
elif prevParity == 3:
# End the current intersection interval.
isect[-1][1] = event[0]
prevParity = parity
return isect
Here is an O(n) version that's a bit more complex because it finds the next event on the fly by merging the input lists. It also requires only constant storage beyond inputs and output:
def get_isect2(a, b):
ia = ib = prevParity = 0
isect = []
while True:
aVal = a[ia / 2][ia % 2] if ia < 2 * len(a) else None
bVal = b[ib / 2][ib % 2] if ib < 2 * len(b) else None
if not aVal and not bVal: break
if not bVal or aVal < bVal or (aVal == bVal and ia % 2 == 0):
parity = prevParity ^ 1
val = aVal
ia += 1
else:
parity = prevParity ^ 2
val = bVal
ib += 1
if parity == 3:
if len(isect) == 0 or isect[-1][1] < val - 1:
isect.append([val, 0])
elif prevParity == 3:
isect[-1][1] = val
prevParity = parity
return isect
Answering your question as I personally would probably answer an interview question and probably also most appreciate an answer; the interviewee's goal is probably to demonstrate a range of skills, not limited strictly to python. So this answer is admittedly going to be more abstract than others here.
It might be helpful to ask for information about any constraints I'm operating under. Operation time and space complexity are common constraints, as is development time, all of which are mentioned in previous answers here; but other constraints might also arise. As common as any of those is maintenance and integration with existing code.
Within each list, the ranges will always increase and never overlap
When I see this, it probably means there is some pre-existing code to normalize the list of ranges, that sorts ranges and merges overlap. That's a pretty common union operation. When joining an existing team or ongoing project, one of the most important factors for success is integrating with existing patterns.
Intersection operation can also be performed via a union operation. Invert the sorted ranges, union them, and invert the result.
To me, that answer demonstrates experience with algorithms generally and "range" problems specifically, an appreciation that the most readable and maintainable code approach is typically reusing existing code, and a desire to help a team succeed over simply puzzling on my own.
Another approach is to sort both lists together into one iterable list. Iterate the list, reference counting each start/end as increment/decrement steps. Ranges are emitted on transitions between reference counts of 1 and 2. This approach is inherently extensible to support more than two lists, if the sort operation meets our needs (and they usually do).
Unless instructed otherwise, I would offer the general approaches and discuss reasons I might use each before writing code.
So, there's no code here. But you did ask for general approaches and thinking :D
Related
I'm trying to wrap my head around this whole thing and I can't seem to figure it out. Basically, I have a list of ints. Adding up those int values equals 15. I want to split up a list into 2 parts, but at the same time, making each list as close as possible to each other in total sum. Sorry if I'm not explaining this good.
Example:
list = [4,1,8,6]
I want to achieve something like this:
list = [[8, 1][6,4]]
adding the first list up equals 9, and the other equals 10. That's perfect for what I want as they are as close as possible.
What I have now:
my_list = [4,1,8,6]
total_list_sum = 15
def divide_chunks(l, n):
# looping till length l
for i in range(0, len(l), n):
yield l[i:i + n]
n = 2
x = list(divide_chunks(my_list, n))
print (x)
But, that just splits it up into 2 parts.
Any help would be appreciated!
You could use a recursive algorithm and "brute force" partitioning of the list. Starting with a target difference of zero and progressively increasing your tolerance to the difference between the two lists:
def sumSplit(left,right=[],difference=0):
sumLeft,sumRight = sum(left),sum(right)
# stop recursion if left is smaller than right
if sumLeft<sumRight or len(left)<len(right): return
# return a solution if sums match the tolerance target
if sumLeft-sumRight == difference:
return left, right, difference
# recurse, brutally attempting to move each item to the right
for i,value in enumerate(left):
solution = sumSplit(left[:i]+left[i+1:],right+[value], difference)
if solution: return solution
if right or difference > 0: return
# allow for imperfect split (i.e. larger difference) ...
for targetDiff in range(1, sumLeft-min(left)+1):
solution = sumSplit(left, right, targetDiff)
if solution: return solution
# sumSplit returns the two lists and the difference between their sums
print(sumSplit([4,1,8,6])) # ([1, 8], [4, 6], 1)
print(sumSplit([5,3,2,2,2,1])) # ([2, 2, 2, 1], [5, 3], 1)
print(sumSplit([1,2,3,4,6])) # ([1, 3, 4], [2, 6], 0)
Use itertools.combinations (details here). First let's define some functions:
def difference(sublist1, sublist2):
return abs(sum(sublist1) - sum(sublist2))
def complement(sublist, my_list):
complement = my_list[:]
for x in sublist:
complement.remove(x)
return complement
The function difference calculates the "distance" between lists, i.e, how similar the sums of the two lists are. complement returns the elements of my_list that are not in sublist.
Finally, what you are looking for:
def divide(my_list):
lower_difference = sum(my_list) + 1
for i in range(1, int(len(my_list)/2)+1):
for partition in combinations(my_list, i):
partition = list(partition)
remainder = complement(partition, my_list)
diff = difference(partition, remainder)
if diff < lower_difference:
lower_difference = diff
solution = [partition, remainder]
return solution
test1 = [4,1,8,6]
print(divide(test1)) #[[4, 6], [1, 8]]
test2 = [5,3,2,2,2,1]
print(divide(test2)) #[[5, 3], [2, 2, 2, 1]]
Basically, it tries with every possible division of sublists and returns the one with the minimum "distance".
If you want to make it a a little bit faster you could return the first combination whose difference is 0.
I think what you're looking for is a hill climbing algorithm. I'm not sure this will cover all cases but at least works for your example. I'll update this if I think of a counter example or something.
Let's call your list of numbers vals.
vals.sort(reverse=true)
a,b=[],[]
for v in vals:
if sum(a)<sum(b):
a.append(v)
else:
b.append(v)
I have a list say l = [1,5,8,-3,6,8,-3,2,-4,6,8]. Im trying to split it into sublists of positive integers i.e. the above list would give me [[1,5,8],[6,8],[2],[6,8]]. I've tried the following:
l = [1,5,8,-3,6,8,-3,2,-4,6,8]
index = 0
def sublist(somelist):
a = []
for i in somelist:
if i > 0:
a.append(i)
else:
global index
index += somelist.index(i)
break
return a
print sublist(l)
With this I can get the 1st sublist ( [1,5,8] ) and the index number of the 1st negative integer at 3. Now if I run my function again and pass it l[index+1:], I cant get the next sublist and assume that index will be updated to show 6. However i cant, for the life of me cant figure out how to run the function in a loop or what condition to use so that I can keep running my function and giving it l[index+1:] where index is the updated, most recently encountered position of a negative integer. Any help will be greatly appreciated
You need to keep track of two levels of list here - the large list that holds the sublists, and the sublists themselves. Start a large list, start a sublist, and keep appending to the current sublist while i is non-negative (which includes positive numbers and 0, by the way). When i is negative, append the current sublist to the large list and start a new sublist. Also note that you should handle cases where the first element is negative or the last element isn't negative.
l = [1,5,8,-3,6,8,-3,2,-4,6,8]
def sublist(somelist):
result = []
a = []
for i in somelist:
if i > 0:
a.append(i)
else:
if a: # make sure a has something in it
result.append(a)
a = []
if a: # if a is still accumulating elements
result.append(a)
return result
The result:
>>> sublist(l)
[[1, 5, 8], [6, 8], [2], [6, 8]]
Since somelist never changes, rerunning index will always get index of the first instance of an element, not the one you just reached. I'd suggest looking at enumerate to get the index and element as you loop, so no calls to index are necessary.
That said, you could use the included batteries to solve this as a one-liner, using itertools.groupby:
from itertools import groupby
def sublist(somelist):
return [list(g) for k, g in groupby(somelist, key=(0).__le__) if k]
Still worth working through your code to understand it, but the above is going to be fast and fairly simple.
This code makes use of concepts found at this URL:
Python list comprehension- "pop" result from original list?
Applying an interesting concept found here to your problem, the following are some alternatives to what others have posted for this question so far. Both use list comprehensions and are commented to explain the purpose of the second option versus the first. Did this experiment for me as part of my learning curve, but hoping it may help you and others on this thread as well:
What's nice about these is that if your input list is very very large, you won't have to double your memory expenditure to get the job done. You build one up as you shrink the other down.
This code was tested on Python 2.7 and Python 3.6:
o1 = [1,5,8,-3,6,9,-4,2,-5,6,7,-7, 999, -43, -1, 888]
# modified version of poster's list
o1b = [1,5,8,-3,6,8,-3,2,-4,6,8] # poster's list
o2 = [x for x in (o1.pop() for i in range(len(o1))) \
if (lambda x: True if x < 0 else o1.insert(0, x))(x)]
o2b = [x for x in (o1b.pop() for i in range(len(o1b))) \
if (lambda x: True if x < 0 else o1b.insert(0, x))(x)]
print(o1)
print(o2)
print("")
print(o1b)
print(o2b)
It produces result sets like this (on iPython Jupyter Notebooks):
[1, 5, 8, 6, 9, 2, 6, 7, 999, 888]
[-1, -43, -7, -5, -4, -3]
[1, 5, 8, 6, 8, 2, 6, 8]
[-4, -3, -3]
Here is another version that also uses list comprehensions as the work horse, but functionalizes the code in way that is more read-able (I think) and easier to test with different numeric lists. Some will probably prefer the original code since it is shorter:
p1 = [1,5,8,-3,6,9,-4,2,-5,6,7,-7, 999, -43, -1, 888]
# modified version of poster's list
p1b = [1,5,8,-3,6,8,-3,2,-4,6,8] # poster's list
def lst_mut_byNeg_mod(x, pLst): # list mutation by neg nums module
# this function only make sense in context of usage in
# split_pos_negs_in_list()
if x < 0: return True
else:
pLst.insert(0,x)
return False
def split_pos_negs_in_list(pLst):
pLngth = len(pLst) # reduces nesting of ((()))
return [x for x in (pLst.pop() for i in range(pLngth)) \
if lst_mut_byNeg_mod(x, pLst)]
p2 = split_pos_negs_in_list(p1)
print(p1)
print(p2)
print("")
p2b = split_pos_negs_in_list(p1b)
print(p1b)
print(p2b)
Final Thoughts:
Link provided earlier had a number of ideas in the comment thread:
It recommends a Google search for the "python bloom filter library" - this sounds promising from a performance standpoint but I have not yet looked into it
There is a post on that thread with 554 up-voted, and yet it has at least 4 comments explaining what might be faulty with it. When exploring options, it may be advisable to scan the comment trail and not just review what gets the most votes. There are many options proposed for situations like this.
Just for fun you can use re too for a one liner.
l = [1,5,8,-3,6,8,-3,2,-4,6,8]
print map(lambda x: map(int,x.split(",")), re.findall(r"(?<=[,\[])\s*\d+(?:,\s*\d+)*(?=,\s*-\d+|\])", str(l)))
Output:[[1, 5, 8], [6, 8], [2], [6, 8]]
I have two lists of objects. Each list is already sorted by a property of the object that is of the datetime type. I would like to combine the two lists into one sorted list. Is the best way just to do a sort or is there a smarter way to do this in Python?
is there a smarter way to do this in Python
This hasn't been mentioned, so I'll go ahead - there is a merge stdlib function in the heapq module of python 2.6+. If all you're looking to do is getting things done, this might be a better idea. Of course, if you want to implement your own, the merge of merge-sort is the way to go.
>>> list1 = [1, 5, 8, 10, 50]
>>> list2 = [3, 4, 29, 41, 45, 49]
>>> from heapq import merge
>>> list(merge(list1, list2))
[1, 3, 4, 5, 8, 10, 29, 41, 45, 49, 50]
Here's the documentation.
People seem to be over complicating this.. Just combine the two lists, then sort them:
>>> l1 = [1, 3, 4, 7]
>>> l2 = [0, 2, 5, 6, 8, 9]
>>> l1.extend(l2)
>>> sorted(l1)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
..or shorter (and without modifying l1):
>>> sorted(l1 + l2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
..easy! Plus, it's using only two built-in functions, so assuming the lists are of a reasonable size, it should be quicker than implementing the sorting/merging in a loop. More importantly, the above is much less code, and very readable.
If your lists are large (over a few hundred thousand, I would guess), it may be quicker to use an alternative/custom sorting method, but there are likely other optimisations to be made first (e.g not storing millions of datetime objects)
Using the timeit.Timer().repeat() (which repeats the functions 1000000 times), I loosely benchmarked it against ghoseb's solution, and sorted(l1+l2) is substantially quicker:
merge_sorted_lists took..
[9.7439379692077637, 9.8844599723815918, 9.552299976348877]
sorted(l1+l2) took..
[2.860386848449707, 2.7589840888977051, 2.7682540416717529]
Long story short, unless len(l1 + l2) ~ 1000000 use:
L = l1 + l2
L.sort()
Description of the figure and source code can be found here.
The figure was generated by the following command:
$ python make-figures.py --nsublists 2 --maxn=0x100000 -s merge_funcs.merge_26 -s merge_funcs.sort_builtin
This is simply merging. Treat each list as if it were a stack, and continuously pop the smaller of the two stack heads, adding the item to the result list, until one of the stacks is empty. Then add all remaining items to the resulting list.
res = []
while l1 and l2:
if l1[0] < l2[0]:
res.append(l1.pop(0))
else:
res.append(l2.pop(0))
res += l1
res += l2
There is a slight flaw in ghoseb's solution, making it O(n**2), rather than O(n).
The problem is that this is performing:
item = l1.pop(0)
With linked lists or deques this would be an O(1) operation, so wouldn't affect complexity, but since python lists are implemented as vectors, this copies the rest of the elements of l1 one space left, an O(n) operation. Since this is done each pass through the list, it turns an O(n) algorithm into an O(n**2) one. This can be corrected by using a method that doesn't alter the source lists, but just keeps track of the current position.
I've tried out benchmarking a corrected algorithm vs a simple sorted(l1+l2) as suggested by dbr
def merge(l1,l2):
if not l1: return list(l2)
if not l2: return list(l1)
# l2 will contain last element.
if l1[-1] > l2[-1]:
l1,l2 = l2,l1
it = iter(l2)
y = it.next()
result = []
for x in l1:
while y < x:
result.append(y)
y = it.next()
result.append(x)
result.append(y)
result.extend(it)
return result
I've tested these with lists generated with
l1 = sorted([random.random() for i in range(NITEMS)])
l2 = sorted([random.random() for i in range(NITEMS)])
For various sizes of list, I get the following timings (repeating 100 times):
# items: 1000 10000 100000 1000000
merge : 0.079 0.798 9.763 109.044
sort : 0.020 0.217 5.948 106.882
So in fact, it looks like dbr is right, just using sorted() is preferable unless you're expecting very large lists, though it does have worse algorithmic complexity. The break even point being at around a million items in each source list (2 million total).
One advantage of the merge approach though is that it is trivial to rewrite as a generator, which will use substantially less memory (no need for an intermediate list).
[Edit]
I've retried this with a situation closer to the question - using a list of objects containing a field "date" which is a datetime object.
The above algorithm was changed to compare against .date instead, and the sort method was changed to:
return sorted(l1 + l2, key=operator.attrgetter('date'))
This does change things a bit. The comparison being more expensive means that the number we perform becomes more important, relative to the constant-time speed of the implementation. This means merge makes up lost ground, surpassing the sort() method at 100,000 items instead. Comparing based on an even more complex object (large strings or lists for instance) would likely shift this balance even more.
# items: 1000 10000 100000 1000000[1]
merge : 0.161 2.034 23.370 253.68
sort : 0.111 1.523 25.223 313.20
[1]: Note: I actually only did 10 repeats for 1,000,000 items and scaled up accordingly as it was pretty slow.
This is simple merging of two sorted lists. Take a look at the sample code below which merges two sorted lists of integers.
#!/usr/bin/env python
## merge.py -- Merge two sorted lists -*- Python -*-
## Time-stamp: "2009-01-21 14:02:57 ghoseb"
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
def merge_sorted_lists(l1, l2):
"""Merge sort two sorted lists
Arguments:
- `l1`: First sorted list
- `l2`: Second sorted list
"""
sorted_list = []
# Copy both the args to make sure the original lists are not
# modified
l1 = l1[:]
l2 = l2[:]
while (l1 and l2):
if (l1[0] <= l2[0]): # Compare both heads
item = l1.pop(0) # Pop from the head
sorted_list.append(item)
else:
item = l2.pop(0)
sorted_list.append(item)
# Add the remaining of the lists
sorted_list.extend(l1 if l1 else l2)
return sorted_list
if __name__ == '__main__':
print merge_sorted_lists(l1, l2)
This should work fine with datetime objects. Hope this helps.
from datetime import datetime
from itertools import chain
from operator import attrgetter
class DT:
def __init__(self, dt):
self.dt = dt
list1 = [DT(datetime(2008, 12, 5, 2)),
DT(datetime(2009, 1, 1, 13)),
DT(datetime(2009, 1, 3, 5))]
list2 = [DT(datetime(2008, 12, 31, 23)),
DT(datetime(2009, 1, 2, 12)),
DT(datetime(2009, 1, 4, 15))]
list3 = sorted(chain(list1, list2), key=attrgetter('dt'))
for item in list3:
print item.dt
The output:
2008-12-05 02:00:00
2008-12-31 23:00:00
2009-01-01 13:00:00
2009-01-02 12:00:00
2009-01-03 05:00:00
2009-01-04 15:00:00
I bet this is faster than any of the fancy pure-Python merge algorithms, even for large data. Python 2.6's heapq.merge is a whole another story.
def merge_sort(a,b):
pa = 0
pb = 0
result = []
while pa < len(a) and pb < len(b):
if a[pa] <= b[pb]:
result.append(a[pa])
pa += 1
else:
result.append(b[pb])
pb += 1
remained = a[pa:] + b[pb:]
result.extend(remained)
return result
Python's sort implementation "timsort" is specifically optimized for lists that contain ordered sections. Plus, it's written in C.
http://bugs.python.org/file4451/timsort.txt
http://en.wikipedia.org/wiki/Timsort
As people have mentioned, it may call the comparison function more times by some constant factor (but maybe call it more times in a shorter period in many cases!).
I would never rely on this, however. – Daniel Nadasi
I believe the Python developers are committed to keeping timsort, or at least keeping a sort that's O(n) in this case.
Generalized sorting (i.e. leaving apart radix sorts from limited value domains)
cannot be done in less than O(n log n) on a serial machine. – Barry Kelly
Right, sorting in the general case can't be faster than that. But since O() is an upper bound, timsort being O(n log n) on arbitrary input doesn't contradict its being O(n) given sorted(L1) + sorted(L2).
An implementation of the merging step in Merge Sort that iterates through both lists:
def merge_lists(L1, L2):
"""
L1, L2: sorted lists of numbers, one of them could be empty.
returns a merged and sorted list of L1 and L2.
"""
# When one of them is an empty list, returns the other list
if not L1:
return L2
elif not L2:
return L1
result = []
i = 0
j = 0
for k in range(len(L1) + len(L2)):
if L1[i] <= L2[j]:
result.append(L1[i])
if i < len(L1) - 1:
i += 1
else:
result += L2[j:] # When the last element in L1 is reached,
break # append the rest of L2 to result.
else:
result.append(L2[j])
if j < len(L2) - 1:
j += 1
else:
result += L1[i:] # When the last element in L2 is reached,
break # append the rest of L1 to result.
return result
L1 = [1, 3, 5]
L2 = [2, 4, 6, 8]
merge_lists(L1, L2) # Should return [1, 2, 3, 4, 5, 6, 8]
merge_lists([], L1) # Should return [1, 3, 5]
I'm still learning about algorithms, please let me know if the code could be improved in any aspect, your feedback is appreciated, thanks!
Use the 'merge' step of merge sort, it runs in O(n) time.
From wikipedia (pseudo-code):
function merge(left,right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else
append first(right) to result
right = rest(right)
end while
while length(left) > 0
append left to result
while length(right) > 0
append right to result
return result
Recursive implementation is below. Average performance is O(n).
def merge_sorted_lists(A, B, sorted_list = None):
if sorted_list == None:
sorted_list = []
slice_index = 0
for element in A:
if element <= B[0]:
sorted_list.append(element)
slice_index += 1
else:
return merge_sorted_lists(B, A[slice_index:], sorted_list)
return sorted_list + B
or generator with improved space complexity:
def merge_sorted_lists_as_generator(A, B):
slice_index = 0
for element in A:
if element <= B[0]:
slice_index += 1
yield element
else:
for sorted_element in merge_sorted_lists_as_generator(B, A[slice_index:]):
yield sorted_element
return
for element in B:
yield element
This is my solution in linear time without editing l1 and l2:
def merge(l1, l2):
m, m2 = len(l1), len(l2)
newList = []
l, r = 0, 0
while l < m and r < m2:
if l1[l] < l2[r]:
newList.append(l1[l])
l += 1
else:
newList.append(l2[r])
r += 1
return newList + l1[l:] + l2[r:]
I'd go with the following answer.
from math import floor
def merge_sort(l):
if len(l) < 2:
return l
left = merge_sort(l[:floor(len(l)/2)])
right = merge_sort(l[floor(len(l)/2):])
return merge(left, right)
def merge(a, b):
i, j = 0, 0
a_len, b_len = len(a), len(b)
output_length = a_len + b_len
out = list()
for _ in range(output_length):
if i < a_len and j < b_len and a[i] < b[j]:
out.append(a[i])
i = i + 1
elif j < b_len:
out.append(b[j])
j = j + 1
while (i < a_len):
out.append(a[i])
i += 1
while (j < b_len):
out.append(b[j])
j += 1
return out
if __name__ == '__main__':
print(merge_sort([7, 8, 9, 4, 5, 6]))
Well, the naive approach (combine 2 lists into large one and sort) will be O(N*log(N)) complexity. On the other hand, if you implement the merge manually (i do not know about any ready code in python libs for this, but i'm no expert) the complexity will be O(N), which is clearly faster.
The idea is described wery well in post by Barry Kelly.
If you want to do it in a manner more consistent with learning what goes on in the iteration try this
def merge_arrays(a, b):
l= []
while len(a) > 0 and len(b)>0:
if a[0] < b[0]: l.append(a.pop(0))
else:l.append(b.pop(0))
l.extend(a+b)
print( l )
import random
n=int(input("Enter size of table 1")); #size of list 1
m=int(input("Enter size of table 2")); # size of list 2
tb1=[random.randrange(1,101,1) for _ in range(n)] # filling the list with random
tb2=[random.randrange(1,101,1) for _ in range(m)] # numbers between 1 and 100
tb1.sort(); #sort the list 1
tb2.sort(); # sort the list 2
fus=[]; # creat an empty list
print(tb1); # print the list 1
print('------------------------------------');
print(tb2); # print the list 2
print('------------------------------------');
i=0;j=0; # varialbles to cross the list
while(i<n and j<m):
if(tb1[i]<tb2[j]):
fus.append(tb1[i]);
i+=1;
else:
fus.append(tb2[j]);
j+=1;
if(i<n):
fus+=tb1[i:n];
if(j<m):
fus+=tb2[j:m];
print(fus);
# this code is used to merge two sorted lists in one sorted list (FUS) without
#sorting the (FUS)
Have used merge step of the merge sort. But I have used generators. Time complexity O(n)
def merge(lst1,lst2):
len1=len(lst1)
len2=len(lst2)
i,j=0,0
while(i<len1 and j<len2):
if(lst1[i]<lst2[j]):
yield lst1[i]
i+=1
else:
yield lst2[j]
j+=1
if(i==len1):
while(j<len2):
yield lst2[j]
j+=1
elif(j==len2):
while(i<len1):
yield lst1[i]
i+=1
l1=[1,3,5,7]
l2=[2,4,6,8,9]
mergelst=(val for val in merge(l1,l2))
print(*mergelst)
This code has time complexity O(n) and can merge lists of any data type, given a quantifying function as the parameter func. It produces a new merged list and does not modify either of the lists passed as arguments.
def merge_sorted_lists(listA,listB,func):
merged = list()
iA = 0
iB = 0
while True:
hasA = iA < len(listA)
hasB = iB < len(listB)
if not hasA and not hasB:
break
valA = None if not hasA else listA[iA]
valB = None if not hasB else listB[iB]
a = None if not hasA else func(valA)
b = None if not hasB else func(valB)
if (not hasB or a<b) and hasA:
merged.append(valA)
iA += 1
elif hasB:
merged.append(valB)
iB += 1
return merged
def compareDate(obj1, obj2):
if obj1.getDate() < obj2.getDate():
return -1
elif obj1.getDate() > obj2.getDate():
return 1
else:
return 0
list = list1 + list2
list.sort(compareDate)
Will sort the list in place. Define your own function for comparing two objects, and pass that function into the built in sort function.
Do NOT use bubble sort, it has horrible performance.
in O(m+n) complexity
def merge_sorted_list(nums1: list, nums2:list) -> list:
m = len(nums1)
n = len(nums2)
nums1 = nums1.copy()
nums2 = nums2.copy()
nums1.extend([0 for i in range(n)])
while m > 0 and n > 0:
if nums1[m-1] >= nums2[n-1]:
nums1[m+n-1] = nums1[m-1]
m -= 1
else:
nums1[m+n-1] = nums2[n-1]
n -= 1
if n > 0:
nums1[:n] = nums2[:n]
return nums1
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
print(merge_sorted_list(l1, l2))
output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Hope this helps. Pretty Simple and straight forward:
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
l3 = l1 + l2
l3.sort()
print (l3)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In regards to Python 2.7.12 (disclaimer: I understand Python2 is being phased out to Python3, but the course I'm taking started us here, perhaps to understand older code bases):
I have a list of integers whom I'd like to swap each with their neighboring value. So far, this works great for lists that are even in the number of integers they contain, however when the list length is odd, it's not so easy to simply swap each value, as the number of integers is uneven.
Giving the following code example, how can I swap all values other than the final value in the list?
arr = [1, 2, 3, 4, 5]
def swapListPairs(arr):
for idx, val in enumerate(arr):
if len(arr) % 2 == 0:
arr[idx], arr[val] = arr[val], arr[idx] # traditional swap using evaluation order
else:
arr[0], arr[1] = arr[1], arr[0] # this line is not the solution but where I know I need some conditions to swap all list values other than len(arr)-1, but am not sure how to do this?
return arr
print swapListPairs(arr)
Bonus Points to the ultimate Pythonic Master: How can this code be modified to also swap strings? Right now, I can only use this function using integers and am very curious how I can make this work for both int and str objects?
Thank you so greatly for any insight or suggestions to point me in the right direction! Everyone's help at times here has been invaluable and I thank you for reading and for your help!
Here's a shorter, probably faster way based on slice assignment:
def swap_adjacent_elements(l):
end = len(l) - len(l) % 2
l[:end:2], l[1:end:2] = l[1:end:2], l[:end:2]
The slice assignment selects the elements of l at all even indices (l[:end:2]) or all odd indices (l[1:end:2]) up to and excluding index end, then uses the same kind of swapping technique you're already using to swap the slices.
end = len(l) - len(l) % 2 selects the index at which to stop. We set end to the closest even number less than or equal to len(l) by subtracting len(l) % 2, the remainder when len(l) is divided by 2.
Alternatively, we could have done end = len(l) & ~1, using bitwise operations. That would construct an integer to use as a mask (~1), with a 0 in the 1 bit and 1s everywhere else, then apply the mask (with &) to set the 1 bit of len(l) to 0 to produce end.
This is easier to do without enumerate. Note that it never, ever makes decisions based on the contents of arr; that is what makes it work on anything, not just a pre-sorted list of integers starting from 1.
for i in range(len(arr)//2):
a = 2*i
b = a+1
if b < len(arr):
arr[a], arr[b] = arr[b], arr[a]
Exercise for you: is the if actually necessary? Why or why not?
You could iterate through the length of the list with a step of two and try to swap values (and except index errors).
def swap_list_pairs(arr):
for index in range(0, len(arr), 2):
try:
arr[index], arr[index+1] = arr[index+1], arr[index]
except IndexError:
pass
return arr
This will work for all data types.
As Copperfield suggested, you could get rid of the try-except-clause:
def swap_list_pairs(arr):
for index in range(1, len(arr), 2):
arr[index-1], arr[index] = arr[index], arr[index-1]
return arr
Similar to #user2357112 but I prefer it this way:
arr[1::2], arr[:-1:2] = arr[:-1:2], arr[1::2]
Demo:
>>> arr = [1, 2, 3, 4, 5]
>>> arr[1::2], arr[:-1:2] = arr[:-1:2], arr[1::2]
>>> arr
[2, 1, 4, 3, 5]
>>> arr = [1, 2, 3, 4, 5, 6]
>>> arr[1::2], arr[:-1:2] = arr[:-1:2], arr[1::2]
>>> arr
[2, 1, 4, 3, 6, 5]
looks like : and :: confuse you a little, let me explain it:
Sequences type of object in python such as list, tuple, str, among other provide what is know as a slice, it come in 2 flavors:
Slicing: a[i:j] selects all items with index k such that i <= k < j. When used as an expression, a slice is a sequence of the same type. This implies that the index set is renumbered so that it starts at 0.
Extended slicing with a third “step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.
In both cases i, j and/or k can be omitted and in that case suitable values are used instead
Some examples
>>> arr = [10, 20, 30, 40, 50, 60, 70]
>>> arr[:]
[10, 20, 30, 40, 50, 60, 70]
>>> arr[:3]
[10, 20, 30]
>>> arr[1:3]
[20, 30]
>>> arr[1::2]
[20, 40, 60]
>>> arr[::2]
[10, 30, 50, 70]
>>>
the working of this can also be illustrated with the following function
def the_slice(lista, ini=None, end=None, step=None):
result=[]
if ini is None:
ini = 0
if end is None:
end = len(lista)
if step is None:
step = 1
for index in range(ini,end,step):
result.append( lista[index] )
return result
>>> the_slice(arr,step=2) # arr[::2]
[10, 30, 50, 70]
>>> the_slice(arr,ini=1,step=2) # arr[1::2]
[20, 40, 60]
>>>
I'm looking at getting values in a list with an increment.
l = [0,1,2,3,4,5,6,7]
and I want something like:
[0,4,6,7]
At the moment I am using l[0::2] but I would like sampling to be sparse at the beginning and increase towards the end of the list.
The reason I want this is because the list represents the points along a line from the center of a circle to a point on its circumference. At the moment I iterate every 10 points along the lines and draw a circle with a small radius on each. Therefore, my circles close to the center tend to overlap and I have gaps as I get close to the circle edge. I hope this provides a bit of context.
Thank you !
This can be more complicated than it sounds... You need a list of indices starting at zero and ending at the final element position in your list, presumably with no duplication (i.e. you don't want to get the same points twice). A generic way to do this would be to define the number of points you want first and then use a generator (scaled_series) that produces the required number of indices based on a function. We need a second generator (unique_ints) to ensure we get integer indices and no duplication.
def scaled_series(length, end, func):
""" Generate a scaled series based on y = func(i), for an increasing
function func, starting at 0, of the specified length, and ending at end
"""
scale = float(end) / (func(float(length)) - func(1.0))
intercept = -scale * func(1.0)
print 'scale', scale, 'intercept', intercept
for i in range(1, length + 1):
yield scale * func(float(i)) + intercept
def unique_ints(iter):
last_n = None
for n in iter:
if last_n is None or round(n) != round(last_n):
yield int(round(n))
last_n = n
L = [0, 1, 2, 3, 4, 5, 6, 7]
print [L[i] for i in unique_ints(scaled_series(4, 7, lambda x: 1 - 1 / (2 * x)))]
In this case, the function is 1 - 1/2x, which gives the series you want [0, 4, 6, 7]. You can play with the length (4) and the function to get the kind of spacing between the circles you are looking for.
I am not sure what exact algorithm you want to use, but if it is non-constant, as your example appears to be, then you should consider creating a generator function to yield values:
https://wiki.python.org/moin/Generators
Depending on what your desire here is, you may want to consider a built in interpolator like scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html#scipy.interpolate.interp1d
Basically, given your question, you can't do it with the basic slice operator. Without more information this is the best answer I can give you :-)
Use the slice function to create a range of indices. You can then extend your sliced list with other slices.
k = [0,1,2,3,4,5,6,7]
r = slice(0,len(k)//2,4)
t = slice(r.stop,None,1)
j = k[r]
j.extend(k[t])
print(j) #outputs: [0,4,5,6,7]
What I would do is just use list comprehension to retrieve the values. It is not possible to do it just by indexing. This is what I came up with:
l = [0, 1, 2, 3, 4, 5, 6, 7]
m = [l[0]] + [l[1+sum(range(3, s-1, -1))] for s in [x for x in range(3, 0, -1)]]
and here is a breakdown of the code into loops:
# Start the list with the first value of l (the loop does not include it)
m = [l[0]]
# Descend from 3 to 1 ([3, 2, 1])
for s in range(3, 0, -1):
# append 1 + sum of [3], [3, 2] and [3, 2, 1]
m.append(l[ 1 + sum(range(3, s-1, -1)) ])
Both will give you the same answer:
>>> m
[0, 4, 6, 7]
I made this graphic that would I hope will help you to understand the process: