Merge two lists with specific element placement, python [duplicate] - python

I have two lists of objects. Each list is already sorted by a property of the object that is of the datetime type. I would like to combine the two lists into one sorted list. Is the best way just to do a sort or is there a smarter way to do this in Python?

is there a smarter way to do this in Python
This hasn't been mentioned, so I'll go ahead - there is a merge stdlib function in the heapq module of python 2.6+. If all you're looking to do is getting things done, this might be a better idea. Of course, if you want to implement your own, the merge of merge-sort is the way to go.
>>> list1 = [1, 5, 8, 10, 50]
>>> list2 = [3, 4, 29, 41, 45, 49]
>>> from heapq import merge
>>> list(merge(list1, list2))
[1, 3, 4, 5, 8, 10, 29, 41, 45, 49, 50]
Here's the documentation.

People seem to be over complicating this.. Just combine the two lists, then sort them:
>>> l1 = [1, 3, 4, 7]
>>> l2 = [0, 2, 5, 6, 8, 9]
>>> l1.extend(l2)
>>> sorted(l1)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
..or shorter (and without modifying l1):
>>> sorted(l1 + l2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
..easy! Plus, it's using only two built-in functions, so assuming the lists are of a reasonable size, it should be quicker than implementing the sorting/merging in a loop. More importantly, the above is much less code, and very readable.
If your lists are large (over a few hundred thousand, I would guess), it may be quicker to use an alternative/custom sorting method, but there are likely other optimisations to be made first (e.g not storing millions of datetime objects)
Using the timeit.Timer().repeat() (which repeats the functions 1000000 times), I loosely benchmarked it against ghoseb's solution, and sorted(l1+l2) is substantially quicker:
merge_sorted_lists took..
[9.7439379692077637, 9.8844599723815918, 9.552299976348877]
sorted(l1+l2) took..
[2.860386848449707, 2.7589840888977051, 2.7682540416717529]

Long story short, unless len(l1 + l2) ~ 1000000 use:
L = l1 + l2
L.sort()
Description of the figure and source code can be found here.
The figure was generated by the following command:
$ python make-figures.py --nsublists 2 --maxn=0x100000 -s merge_funcs.merge_26 -s merge_funcs.sort_builtin

This is simply merging. Treat each list as if it were a stack, and continuously pop the smaller of the two stack heads, adding the item to the result list, until one of the stacks is empty. Then add all remaining items to the resulting list.
res = []
while l1 and l2:
if l1[0] < l2[0]:
res.append(l1.pop(0))
else:
res.append(l2.pop(0))
res += l1
res += l2

There is a slight flaw in ghoseb's solution, making it O(n**2), rather than O(n).
The problem is that this is performing:
item = l1.pop(0)
With linked lists or deques this would be an O(1) operation, so wouldn't affect complexity, but since python lists are implemented as vectors, this copies the rest of the elements of l1 one space left, an O(n) operation. Since this is done each pass through the list, it turns an O(n) algorithm into an O(n**2) one. This can be corrected by using a method that doesn't alter the source lists, but just keeps track of the current position.
I've tried out benchmarking a corrected algorithm vs a simple sorted(l1+l2) as suggested by dbr
def merge(l1,l2):
if not l1: return list(l2)
if not l2: return list(l1)
# l2 will contain last element.
if l1[-1] > l2[-1]:
l1,l2 = l2,l1
it = iter(l2)
y = it.next()
result = []
for x in l1:
while y < x:
result.append(y)
y = it.next()
result.append(x)
result.append(y)
result.extend(it)
return result
I've tested these with lists generated with
l1 = sorted([random.random() for i in range(NITEMS)])
l2 = sorted([random.random() for i in range(NITEMS)])
For various sizes of list, I get the following timings (repeating 100 times):
# items: 1000 10000 100000 1000000
merge : 0.079 0.798 9.763 109.044
sort : 0.020 0.217 5.948 106.882
So in fact, it looks like dbr is right, just using sorted() is preferable unless you're expecting very large lists, though it does have worse algorithmic complexity. The break even point being at around a million items in each source list (2 million total).
One advantage of the merge approach though is that it is trivial to rewrite as a generator, which will use substantially less memory (no need for an intermediate list).
[Edit]
I've retried this with a situation closer to the question - using a list of objects containing a field "date" which is a datetime object.
The above algorithm was changed to compare against .date instead, and the sort method was changed to:
return sorted(l1 + l2, key=operator.attrgetter('date'))
This does change things a bit. The comparison being more expensive means that the number we perform becomes more important, relative to the constant-time speed of the implementation. This means merge makes up lost ground, surpassing the sort() method at 100,000 items instead. Comparing based on an even more complex object (large strings or lists for instance) would likely shift this balance even more.
# items: 1000 10000 100000 1000000[1]
merge : 0.161 2.034 23.370 253.68
sort : 0.111 1.523 25.223 313.20
[1]: Note: I actually only did 10 repeats for 1,000,000 items and scaled up accordingly as it was pretty slow.

This is simple merging of two sorted lists. Take a look at the sample code below which merges two sorted lists of integers.
#!/usr/bin/env python
## merge.py -- Merge two sorted lists -*- Python -*-
## Time-stamp: "2009-01-21 14:02:57 ghoseb"
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
def merge_sorted_lists(l1, l2):
"""Merge sort two sorted lists
Arguments:
- `l1`: First sorted list
- `l2`: Second sorted list
"""
sorted_list = []
# Copy both the args to make sure the original lists are not
# modified
l1 = l1[:]
l2 = l2[:]
while (l1 and l2):
if (l1[0] <= l2[0]): # Compare both heads
item = l1.pop(0) # Pop from the head
sorted_list.append(item)
else:
item = l2.pop(0)
sorted_list.append(item)
# Add the remaining of the lists
sorted_list.extend(l1 if l1 else l2)
return sorted_list
if __name__ == '__main__':
print merge_sorted_lists(l1, l2)
This should work fine with datetime objects. Hope this helps.

from datetime import datetime
from itertools import chain
from operator import attrgetter
class DT:
def __init__(self, dt):
self.dt = dt
list1 = [DT(datetime(2008, 12, 5, 2)),
DT(datetime(2009, 1, 1, 13)),
DT(datetime(2009, 1, 3, 5))]
list2 = [DT(datetime(2008, 12, 31, 23)),
DT(datetime(2009, 1, 2, 12)),
DT(datetime(2009, 1, 4, 15))]
list3 = sorted(chain(list1, list2), key=attrgetter('dt'))
for item in list3:
print item.dt
The output:
2008-12-05 02:00:00
2008-12-31 23:00:00
2009-01-01 13:00:00
2009-01-02 12:00:00
2009-01-03 05:00:00
2009-01-04 15:00:00
I bet this is faster than any of the fancy pure-Python merge algorithms, even for large data. Python 2.6's heapq.merge is a whole another story.

def merge_sort(a,b):
pa = 0
pb = 0
result = []
while pa < len(a) and pb < len(b):
if a[pa] <= b[pb]:
result.append(a[pa])
pa += 1
else:
result.append(b[pb])
pb += 1
remained = a[pa:] + b[pb:]
result.extend(remained)
return result

Python's sort implementation "timsort" is specifically optimized for lists that contain ordered sections. Plus, it's written in C.
http://bugs.python.org/file4451/timsort.txt
http://en.wikipedia.org/wiki/Timsort
As people have mentioned, it may call the comparison function more times by some constant factor (but maybe call it more times in a shorter period in many cases!).
I would never rely on this, however. – Daniel Nadasi
I believe the Python developers are committed to keeping timsort, or at least keeping a sort that's O(n) in this case.
Generalized sorting (i.e. leaving apart radix sorts from limited value domains)
cannot be done in less than O(n log n) on a serial machine. – Barry Kelly
Right, sorting in the general case can't be faster than that. But since O() is an upper bound, timsort being O(n log n) on arbitrary input doesn't contradict its being O(n) given sorted(L1) + sorted(L2).

An implementation of the merging step in Merge Sort that iterates through both lists:
def merge_lists(L1, L2):
"""
L1, L2: sorted lists of numbers, one of them could be empty.
returns a merged and sorted list of L1 and L2.
"""
# When one of them is an empty list, returns the other list
if not L1:
return L2
elif not L2:
return L1
result = []
i = 0
j = 0
for k in range(len(L1) + len(L2)):
if L1[i] <= L2[j]:
result.append(L1[i])
if i < len(L1) - 1:
i += 1
else:
result += L2[j:] # When the last element in L1 is reached,
break # append the rest of L2 to result.
else:
result.append(L2[j])
if j < len(L2) - 1:
j += 1
else:
result += L1[i:] # When the last element in L2 is reached,
break # append the rest of L1 to result.
return result
L1 = [1, 3, 5]
L2 = [2, 4, 6, 8]
merge_lists(L1, L2) # Should return [1, 2, 3, 4, 5, 6, 8]
merge_lists([], L1) # Should return [1, 3, 5]
I'm still learning about algorithms, please let me know if the code could be improved in any aspect, your feedback is appreciated, thanks!

Use the 'merge' step of merge sort, it runs in O(n) time.
From wikipedia (pseudo-code):
function merge(left,right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else
append first(right) to result
right = rest(right)
end while
while length(left) > 0
append left to result
while length(right) > 0
append right to result
return result

Recursive implementation is below. Average performance is O(n).
def merge_sorted_lists(A, B, sorted_list = None):
if sorted_list == None:
sorted_list = []
slice_index = 0
for element in A:
if element <= B[0]:
sorted_list.append(element)
slice_index += 1
else:
return merge_sorted_lists(B, A[slice_index:], sorted_list)
return sorted_list + B
or generator with improved space complexity:
def merge_sorted_lists_as_generator(A, B):
slice_index = 0
for element in A:
if element <= B[0]:
slice_index += 1
yield element
else:
for sorted_element in merge_sorted_lists_as_generator(B, A[slice_index:]):
yield sorted_element
return
for element in B:
yield element

This is my solution in linear time without editing l1 and l2:
def merge(l1, l2):
m, m2 = len(l1), len(l2)
newList = []
l, r = 0, 0
while l < m and r < m2:
if l1[l] < l2[r]:
newList.append(l1[l])
l += 1
else:
newList.append(l2[r])
r += 1
return newList + l1[l:] + l2[r:]

I'd go with the following answer.
from math import floor
def merge_sort(l):
if len(l) < 2:
return l
left = merge_sort(l[:floor(len(l)/2)])
right = merge_sort(l[floor(len(l)/2):])
return merge(left, right)
def merge(a, b):
i, j = 0, 0
a_len, b_len = len(a), len(b)
output_length = a_len + b_len
out = list()
for _ in range(output_length):
if i < a_len and j < b_len and a[i] < b[j]:
out.append(a[i])
i = i + 1
elif j < b_len:
out.append(b[j])
j = j + 1
while (i < a_len):
out.append(a[i])
i += 1
while (j < b_len):
out.append(b[j])
j += 1
return out
if __name__ == '__main__':
print(merge_sort([7, 8, 9, 4, 5, 6]))

Well, the naive approach (combine 2 lists into large one and sort) will be O(N*log(N)) complexity. On the other hand, if you implement the merge manually (i do not know about any ready code in python libs for this, but i'm no expert) the complexity will be O(N), which is clearly faster.
The idea is described wery well in post by Barry Kelly.

If you want to do it in a manner more consistent with learning what goes on in the iteration try this
def merge_arrays(a, b):
l= []
while len(a) > 0 and len(b)>0:
if a[0] < b[0]: l.append(a.pop(0))
else:l.append(b.pop(0))
l.extend(a+b)
print( l )

import random
n=int(input("Enter size of table 1")); #size of list 1
m=int(input("Enter size of table 2")); # size of list 2
tb1=[random.randrange(1,101,1) for _ in range(n)] # filling the list with random
tb2=[random.randrange(1,101,1) for _ in range(m)] # numbers between 1 and 100
tb1.sort(); #sort the list 1
tb2.sort(); # sort the list 2
fus=[]; # creat an empty list
print(tb1); # print the list 1
print('------------------------------------');
print(tb2); # print the list 2
print('------------------------------------');
i=0;j=0; # varialbles to cross the list
while(i<n and j<m):
if(tb1[i]<tb2[j]):
fus.append(tb1[i]);
i+=1;
else:
fus.append(tb2[j]);
j+=1;
if(i<n):
fus+=tb1[i:n];
if(j<m):
fus+=tb2[j:m];
print(fus);
# this code is used to merge two sorted lists in one sorted list (FUS) without
#sorting the (FUS)

Have used merge step of the merge sort. But I have used generators. Time complexity O(n)
def merge(lst1,lst2):
len1=len(lst1)
len2=len(lst2)
i,j=0,0
while(i<len1 and j<len2):
if(lst1[i]<lst2[j]):
yield lst1[i]
i+=1
else:
yield lst2[j]
j+=1
if(i==len1):
while(j<len2):
yield lst2[j]
j+=1
elif(j==len2):
while(i<len1):
yield lst1[i]
i+=1
l1=[1,3,5,7]
l2=[2,4,6,8,9]
mergelst=(val for val in merge(l1,l2))
print(*mergelst)

This code has time complexity O(n) and can merge lists of any data type, given a quantifying function as the parameter func. It produces a new merged list and does not modify either of the lists passed as arguments.
def merge_sorted_lists(listA,listB,func):
merged = list()
iA = 0
iB = 0
while True:
hasA = iA < len(listA)
hasB = iB < len(listB)
if not hasA and not hasB:
break
valA = None if not hasA else listA[iA]
valB = None if not hasB else listB[iB]
a = None if not hasA else func(valA)
b = None if not hasB else func(valB)
if (not hasB or a<b) and hasA:
merged.append(valA)
iA += 1
elif hasB:
merged.append(valB)
iB += 1
return merged

def compareDate(obj1, obj2):
if obj1.getDate() < obj2.getDate():
return -1
elif obj1.getDate() > obj2.getDate():
return 1
else:
return 0
list = list1 + list2
list.sort(compareDate)
Will sort the list in place. Define your own function for comparing two objects, and pass that function into the built in sort function.
Do NOT use bubble sort, it has horrible performance.

in O(m+n) complexity
def merge_sorted_list(nums1: list, nums2:list) -> list:
m = len(nums1)
n = len(nums2)
nums1 = nums1.copy()
nums2 = nums2.copy()
nums1.extend([0 for i in range(n)])
while m > 0 and n > 0:
if nums1[m-1] >= nums2[n-1]:
nums1[m+n-1] = nums1[m-1]
m -= 1
else:
nums1[m+n-1] = nums2[n-1]
n -= 1
if n > 0:
nums1[:n] = nums2[:n]
return nums1
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
print(merge_sorted_list(l1, l2))
output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Hope this helps. Pretty Simple and straight forward:
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
l3 = l1 + l2
l3.sort()
print (l3)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Related

Optimizing or tweaking the following implementation for Merge Sort 3 way

I have recently been playing with sorting algorithms, and upon touching the Merge Sort algorithm, I wanted to try and implement the algorithm's merge helper function using 3 sorted lists, as opposed to 2. My current implementation works, but I was wondering if there is perhaps some way to tweak it or implement in differently to make it run faster.
Here is the code:
def merge_three(l1, l2, l3):
"""This function returns a sorted list made out of the three
given lists.
>>> merge_three([9, 29], [1, 7, 15], [8, 17, 21])
[1, 7, 8, 9, 15, 17, 21, 29]
"""
index1, index2, index3 = 0, 0, 0
to_loop = len(l1) + len(l2) + len(l3)
sorted_list = []
i = 0
while i < to_loop:
advance = 0
value = float("inf")
if index1 < len(l1) and l1[index1] <= value:
advance = 1
value = l1[index1]
if index2 < len(l2) and l2[index2] <= value:
advance = 2
value = l2[index2]
if index3 < len(l3) and l3[index3] <= value:
advance = 3
value = l3[index3]
sorted_list.append(value)
if advance == 1:
index1 += 1
elif advance == 2:
index2 += 1
else:
index3 += 1
i += 1
return sorted_list
Thank you :)
Thinking about a more general merge function leads to a simpler design. Suppose you wanted to write a function that takes a list of sorted lists and merges all the lists. The idea is simple: find the list with the smallest element, pop it off, move it to the result list, remove a list from the list of lists when it is empty, and iterate until the list of lists itself is empty.
One approach is:
def merge(lists):
result = []
while len(lists):
(index, value) = min(enumerate(i[0] for i in lists), key=lambda x: x[1])
result.append(lists[index].pop(0))
if len(lists[index]) == 0:
lists.pop(index)
return result

merge two sorted list of items in python

My question has to do with figuring out different ways to merge 2 sorted items?
I try to find an easy way to merge 2 sorted items.
def merge(arr1, arr2):
return sorted(arr1 + arr2)
# Example: merge([1, 4, 7], [2, 3, 6, 9]) => [1, 2, 3, 4, 6, 7, 9]
I'm not sure if I'm complicating it. This uses a built-in function, meaning it's harder to mess up the implementation details.
I also find that I can use the merge() function from cypthon's heapq.
Wondering if there's any thought of using other method likee the following:
Gist for Merge in python
its up to you which implementation you use. The consideration is whether you need a clean code or you need performance.
For clean code you can use:
sorted(l1+l2)
merge from heappq
both of this with complexity O(nlogn)
whereas this implementation is https://gist.github.com/Jeffchiucp/9dc2a5108429c4222fe4b2a25e35c778 with algorithm complexity O(n).
use merge from heapq
>>> l1 = [1, 4, 7]
>>> l2 = [2, 3, 6, 9]
>>> from heapq import merge
>>> list(merge(l1,l2))
[1, 2, 3, 4, 6, 7, 9]
You don't want to concatenate two sorted lists and sort them again, it's not needed. There are O(n) algorithms for merging sorted lists, your algorithm would be O(n log n) because of sorting again. Merging two sorted lists using Priority queue taken from here:
from Queue import PriorityQueue
class Solution(object):
def mergeKLists(self, lists):
dummy = ListNode(None)
curr = dummy
q = PriorityQueue()
for node in lists:
if node: q.put((node.val,node))
while q.qsize()>0:
curr.next = q.get()[1]
curr=curr.next
if curr.next: q.put((curr.next.val, curr.next))
return dummy.next
Here is my solution:
def merge_sorted_list(list1,list2):
i = j = 0
while(i < len(list1) and j < len(list2)):
if(list1[i] > list2[j]):
list1.insert(i, list2[j])
i+=1
j+=1
else:
i+=1
if(j < len(list2)):
list1.extend(list2[j:])
return list1
print(merge_sorted_list([1, 4, 7],[2, 3, 6, 9]))
def cleanMerge(self, list1, list2):
if len(list1)!=len(list2):
less = list1 if len(list1)<len(list2) else list2
chosenList = list2 if len(list1)<len(list2) else list1
chosenList = list1
less = list2
l1,l2,r1,r2 = 0,0,len(chosenList),len(less)
while l2<r2 and l1<r1:
if chosenList[l1]>less[l2]:
chosenList[l1+1:r1+1] = chosenList[l1:r1]
chosenList[l1] = less[l2]
l2+=1
l1+=1
r1+=1
else:
l1+=1
continue
if l2<r2:
for item in less[l2:r2]:
chosenList.append(item)
return chosenList

How can we riffle shuffle the elements of a list in Python?

I want to shuffle the elements of a list without importing any module.
The type of shuffle is a riffle shuffle. It is where you want to divide the number of elements of the list into two and then interleave them.
If there are odd number of elements then the second half should contain the extra element.
eg:
list = [1,2,3,4,5,6,7]
Then the final list should look like
[1,4,2,5,3,6,7]
Just for fun, a recursive solution:
def interleave(lst1, lst2):
if not lst1:
return lst2
elif not lst2:
return lst1
return lst1[0:1] + interleave(lst2, lst1[1:])
Use it as follows in Python 2.x (In Python 3.x, use // instead of /):
lst = [1,2,3,4,5,6,7]
interleave(lst[:len(lst)/2], lst[len(lst)/2:])
=> [1, 4, 2, 5, 3, 6, 7]
The above will work fine with lists of any length, it doesn't matter if the length is even or odd.
listA = [1,2,3,4,5,6,7,8,9]
listLen = len(listA)/2
listB = listA[:listLen]
listC = listA[listLen:]
listD = []
num = 0
while num < listLen:
if len(listB) >= num:
listD.append(listB[num])
listD.append(listC[num])
num += 1
if len(listA)%2 != 0:
listD.append(listC[num])
print listD
After looking at another answer, I also am adding a recursive version, which is a revised version of the other guy's answer, but easier to call as you only have to call the function with a single argument (The list you are trying to have shuffled) and it does everything else:
def interleave(lst):
def interleaveHelper(lst1,lst2):
if not lst1:
return lst2
elif not lst2:
return lst1
return lst1[0:1] + interleaveHelper(lst2, lst1[1:])
return interleaveHelper(lst[:len(lst)/2], lst[len(lst)/2:])
When you go to call it, you can say interleave(list)
eg: list = [1,2,3,4,5,6,7]
then the final list should look like [1,4,2,5,3,6,7]
Here's a function that should do this reliably:
def riffle(deck):
'''
Shuffle a list like a deck of cards.
i.e. given a list, split with second set have the extra if len is odd
and then interleave, second deck's first item after first deck's first item
and so on. Thus:
riffle([1,2,3,4,5,6,7])
returns [1, 4, 2, 5, 3, 6, 7]
'''
cut = len(deck) // 2 # floor division
deck, second_deck = deck[:cut], deck[cut:]
for index, item in enumerate(second_deck):
insert_index = index*2 + 1
deck.insert(insert_index, item)
return deck
and to unit-test it...
import unittest
class RiffleTestCase(unittest.TestCase):
def test_riffle(self):
self.assertEqual(riffle(['a','b','c','d','e']), ['a','c','b','d','e'])
self.assertEqual(riffle([1,2,3,4,5,6,7]), [1,4,2,5,3,6,7])
unittest.main()
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
You could do this fairly easily with by utilizing the next feature of an iterator in Python.
First thing you'll want to do is split the elements into two parts.
Next, turn those two parts into iterators using Python's iter function. You could skip this step, but I find calling next(iterable) to be a lot cleaner than manually indexing a list.
Finally, you'll loop through the first half of your list, and for each element you add of that half, add the corresponding element of the latter (calling next gives the next item in the sequence).
For example:
elements = [1,2,3,4,5,6,7]
half_point = len(elements)/2
a = iter(elements[0:half_point])
b = iter(elements[half_point: ])
result = []
for i in range(half_point):
result.append(next(a))
result.append(next(b))
if len(elements) % 2 != 0:
result.append(next(b))
print result
>>> [1, 4, 2, 5, 3, 6, 7]
The last bit at the bottom checks to see if the list is odd. If it is, that it appends the final element onto the end of the list.
If you get creative, you could probably condense this down a good bit by zipping and then unpacking, but I'll leave that for when you explore itertools ;)
You can split the input list into two parts, then use zip and some list manipulation to interleave the items.
n = 9
l = range(1,n+1)
a = l[:n/2]
b = l[n/2:]
c = zip(a,b)
d = list()
for p in c :
d.extend(list(p))
if n%2==1:
d.append(b[n/2])
print(d)
def riffle(deck):
new_deck = []
deck_1 = deck[:len(deck)//2]
deck_2 = deck[len(deck)//2::]
for i in range(len(deck)//2):
new_deck.append(deck_1[i])
new_deck.append(deck_2[i])
if len(deck) % 2 == 1:
new_deck.append(deck[-1])
return new_deck
deck = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
print(riffle(deck))
You may find the expression i%2, which evaluates to 0 on even numbers and 1 on odd numbers, to alternatively access the beginning and middle of the deck.
def riffle(deck: List[int]) -> List[int]:
result = []
mid = len(deck) // 2
for i in range(len(deck)):
result.append(deck[i // 2 + (i % 2) * mid])
return result
Or you can utilize an if expression for the odd and even numbers respectively:
def riffle(deck: List[int]) -> List[int]:
result = []
mid = len(deck) // 2
for i in range(len(deck)):
if not i % 2:
result.append(deck[i // 2])
else:
result.append(deck[i // 2 + mid])
return result
>>> ll = list(range(1,8))
>>> mid = len(ll)/2 # for Python3, use '//' operator
>>> it1 = iter(ll[:mid])
>>> it2 = iter(ll[mid:])
>>> riff = sum(zip(it1,it2), ()) + tuple(it2)
>>> riff
(1, 4, 2, 5, 3, 6, 7)
If this is homework, be prepared to explain how sum and zip working here, what the second parameter to sum is for, why tuple(it2) is being added to the end, and how this solution has an inherent inefficiency.
If deck is a list, write a function that does list comprehension to perform the shuffling:
def riffle_shuffle(deck):
return [deck[i//2 + (i%2)*(len(deck)//2)] for i in range(len(deck))]

How to modify python collections by filtering in-place?

I was wondering, if there is way in Python to modify collections without creating new ones. E.g.:
lst = [1, 2, 3, 4, 5, 6]
new_lst = [i for i in lst if i > 3]
Works just fine, but a new collection is created. Is there a reason, that Python collections lack a filter() method (or similar) that would modify the collection object in place?
If you want to do this in place, just use
lst[:] = [i for i in lst if i > 3]
This won't be faster or save any memory, but it changes the object in place, if this is the semantics you need.
The other answers are correct; if you want all the names pointing to the old list to point to the new list you can use slice assignment.
However, that's not truly in-place creation; the new list is first created elsewhere. The link in Sven's answer is good.
The reason there isn't one that truly operates in-place is that while making a new list like that is O(n), each truly in-place item removal would be O(k) by itself, where k is the length of the list from the removal point on. The only way to avoid that with Python lists is to use some temporary storage, which is what you're doing by using slice assignment.
An example of an in-place O(n) filter on a collections.deque, in case you don't need to store your data in a list:
from collections import deque
def dequefilter(deck, condition):
for _ in xrange(len(deck)):
item = deck.popleft()
if condition(item):
deck.append(item)
deck = deque((1, 2, 3, 4, 5))
dequefilter(deck, lambda x: x > 2) # or operator.gt(2)
print deck
# deque([3, 4, 5])
Correcting #larsmans original solution, you could either do
i = 0
while i < len(lst):
if lst[i] <= 3:
del lst[i]
else:
i += 1
or
i = len(lst)
while i > 0:
if lst[i-1] <= 3:
del lst[i-1]
i -= 1
The reason is the "index shift" which happens with the del. If I del at a certain index, that index needs to be re-examined because it now holds a different value.
Maybe I'm slightly late, but since no other "O(n) time/O(1) memory" solutions have been posted, and some people even claimed that it is impossible, I think I should post this.
# Retains the elements of xs for which p returned true
def retain(xs, p):
w = 0
for x in xs:
if p(x):
xs[w] = x
w += 1
del xs[w:]
The lst[:] solution by #Sven Marnach is one option. You can also perform this operation in-place, using constant extra memory, with
>>> i = 0
>>> while i < len(lst):
... if lst[i] <= 3:
... del lst[i]
... else:
... i += 1
...
>>> lst
[4, 5, 6]
... but this solution is not very readable and takes quadratic time due to all the element shifting involved.
Because it's not needed.
lst[:] = [i for i in lst if i > 3]
I think it's in place transformation;
lst = [1,2,3,4,5,6,7,8,9,10,11]
to_exclude = [8,4,11,9]
print 'lst == %s\nto_exclude == %s' % (lst,to_exclude)
for i in xrange(len(lst)-1,-1,-1):
if lst[i] in to_exclude:
lst.pop(i)
print '\nlst ==',lst
result
lst == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
to_exclude == [8, 4, 11, 9]
lst == [1, 2, 3, 5, 6, 7, 10]

Combining two sorted lists in Python

I have two lists of objects. Each list is already sorted by a property of the object that is of the datetime type. I would like to combine the two lists into one sorted list. Is the best way just to do a sort or is there a smarter way to do this in Python?
is there a smarter way to do this in Python
This hasn't been mentioned, so I'll go ahead - there is a merge stdlib function in the heapq module of python 2.6+. If all you're looking to do is getting things done, this might be a better idea. Of course, if you want to implement your own, the merge of merge-sort is the way to go.
>>> list1 = [1, 5, 8, 10, 50]
>>> list2 = [3, 4, 29, 41, 45, 49]
>>> from heapq import merge
>>> list(merge(list1, list2))
[1, 3, 4, 5, 8, 10, 29, 41, 45, 49, 50]
Here's the documentation.
People seem to be over complicating this.. Just combine the two lists, then sort them:
>>> l1 = [1, 3, 4, 7]
>>> l2 = [0, 2, 5, 6, 8, 9]
>>> l1.extend(l2)
>>> sorted(l1)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
..or shorter (and without modifying l1):
>>> sorted(l1 + l2)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
..easy! Plus, it's using only two built-in functions, so assuming the lists are of a reasonable size, it should be quicker than implementing the sorting/merging in a loop. More importantly, the above is much less code, and very readable.
If your lists are large (over a few hundred thousand, I would guess), it may be quicker to use an alternative/custom sorting method, but there are likely other optimisations to be made first (e.g not storing millions of datetime objects)
Using the timeit.Timer().repeat() (which repeats the functions 1000000 times), I loosely benchmarked it against ghoseb's solution, and sorted(l1+l2) is substantially quicker:
merge_sorted_lists took..
[9.7439379692077637, 9.8844599723815918, 9.552299976348877]
sorted(l1+l2) took..
[2.860386848449707, 2.7589840888977051, 2.7682540416717529]
Long story short, unless len(l1 + l2) ~ 1000000 use:
L = l1 + l2
L.sort()
Description of the figure and source code can be found here.
The figure was generated by the following command:
$ python make-figures.py --nsublists 2 --maxn=0x100000 -s merge_funcs.merge_26 -s merge_funcs.sort_builtin
This is simply merging. Treat each list as if it were a stack, and continuously pop the smaller of the two stack heads, adding the item to the result list, until one of the stacks is empty. Then add all remaining items to the resulting list.
res = []
while l1 and l2:
if l1[0] < l2[0]:
res.append(l1.pop(0))
else:
res.append(l2.pop(0))
res += l1
res += l2
There is a slight flaw in ghoseb's solution, making it O(n**2), rather than O(n).
The problem is that this is performing:
item = l1.pop(0)
With linked lists or deques this would be an O(1) operation, so wouldn't affect complexity, but since python lists are implemented as vectors, this copies the rest of the elements of l1 one space left, an O(n) operation. Since this is done each pass through the list, it turns an O(n) algorithm into an O(n**2) one. This can be corrected by using a method that doesn't alter the source lists, but just keeps track of the current position.
I've tried out benchmarking a corrected algorithm vs a simple sorted(l1+l2) as suggested by dbr
def merge(l1,l2):
if not l1: return list(l2)
if not l2: return list(l1)
# l2 will contain last element.
if l1[-1] > l2[-1]:
l1,l2 = l2,l1
it = iter(l2)
y = it.next()
result = []
for x in l1:
while y < x:
result.append(y)
y = it.next()
result.append(x)
result.append(y)
result.extend(it)
return result
I've tested these with lists generated with
l1 = sorted([random.random() for i in range(NITEMS)])
l2 = sorted([random.random() for i in range(NITEMS)])
For various sizes of list, I get the following timings (repeating 100 times):
# items: 1000 10000 100000 1000000
merge : 0.079 0.798 9.763 109.044
sort : 0.020 0.217 5.948 106.882
So in fact, it looks like dbr is right, just using sorted() is preferable unless you're expecting very large lists, though it does have worse algorithmic complexity. The break even point being at around a million items in each source list (2 million total).
One advantage of the merge approach though is that it is trivial to rewrite as a generator, which will use substantially less memory (no need for an intermediate list).
[Edit]
I've retried this with a situation closer to the question - using a list of objects containing a field "date" which is a datetime object.
The above algorithm was changed to compare against .date instead, and the sort method was changed to:
return sorted(l1 + l2, key=operator.attrgetter('date'))
This does change things a bit. The comparison being more expensive means that the number we perform becomes more important, relative to the constant-time speed of the implementation. This means merge makes up lost ground, surpassing the sort() method at 100,000 items instead. Comparing based on an even more complex object (large strings or lists for instance) would likely shift this balance even more.
# items: 1000 10000 100000 1000000[1]
merge : 0.161 2.034 23.370 253.68
sort : 0.111 1.523 25.223 313.20
[1]: Note: I actually only did 10 repeats for 1,000,000 items and scaled up accordingly as it was pretty slow.
This is simple merging of two sorted lists. Take a look at the sample code below which merges two sorted lists of integers.
#!/usr/bin/env python
## merge.py -- Merge two sorted lists -*- Python -*-
## Time-stamp: "2009-01-21 14:02:57 ghoseb"
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
def merge_sorted_lists(l1, l2):
"""Merge sort two sorted lists
Arguments:
- `l1`: First sorted list
- `l2`: Second sorted list
"""
sorted_list = []
# Copy both the args to make sure the original lists are not
# modified
l1 = l1[:]
l2 = l2[:]
while (l1 and l2):
if (l1[0] <= l2[0]): # Compare both heads
item = l1.pop(0) # Pop from the head
sorted_list.append(item)
else:
item = l2.pop(0)
sorted_list.append(item)
# Add the remaining of the lists
sorted_list.extend(l1 if l1 else l2)
return sorted_list
if __name__ == '__main__':
print merge_sorted_lists(l1, l2)
This should work fine with datetime objects. Hope this helps.
from datetime import datetime
from itertools import chain
from operator import attrgetter
class DT:
def __init__(self, dt):
self.dt = dt
list1 = [DT(datetime(2008, 12, 5, 2)),
DT(datetime(2009, 1, 1, 13)),
DT(datetime(2009, 1, 3, 5))]
list2 = [DT(datetime(2008, 12, 31, 23)),
DT(datetime(2009, 1, 2, 12)),
DT(datetime(2009, 1, 4, 15))]
list3 = sorted(chain(list1, list2), key=attrgetter('dt'))
for item in list3:
print item.dt
The output:
2008-12-05 02:00:00
2008-12-31 23:00:00
2009-01-01 13:00:00
2009-01-02 12:00:00
2009-01-03 05:00:00
2009-01-04 15:00:00
I bet this is faster than any of the fancy pure-Python merge algorithms, even for large data. Python 2.6's heapq.merge is a whole another story.
def merge_sort(a,b):
pa = 0
pb = 0
result = []
while pa < len(a) and pb < len(b):
if a[pa] <= b[pb]:
result.append(a[pa])
pa += 1
else:
result.append(b[pb])
pb += 1
remained = a[pa:] + b[pb:]
result.extend(remained)
return result
Python's sort implementation "timsort" is specifically optimized for lists that contain ordered sections. Plus, it's written in C.
http://bugs.python.org/file4451/timsort.txt
http://en.wikipedia.org/wiki/Timsort
As people have mentioned, it may call the comparison function more times by some constant factor (but maybe call it more times in a shorter period in many cases!).
I would never rely on this, however. – Daniel Nadasi
I believe the Python developers are committed to keeping timsort, or at least keeping a sort that's O(n) in this case.
Generalized sorting (i.e. leaving apart radix sorts from limited value domains)
cannot be done in less than O(n log n) on a serial machine. – Barry Kelly
Right, sorting in the general case can't be faster than that. But since O() is an upper bound, timsort being O(n log n) on arbitrary input doesn't contradict its being O(n) given sorted(L1) + sorted(L2).
An implementation of the merging step in Merge Sort that iterates through both lists:
def merge_lists(L1, L2):
"""
L1, L2: sorted lists of numbers, one of them could be empty.
returns a merged and sorted list of L1 and L2.
"""
# When one of them is an empty list, returns the other list
if not L1:
return L2
elif not L2:
return L1
result = []
i = 0
j = 0
for k in range(len(L1) + len(L2)):
if L1[i] <= L2[j]:
result.append(L1[i])
if i < len(L1) - 1:
i += 1
else:
result += L2[j:] # When the last element in L1 is reached,
break # append the rest of L2 to result.
else:
result.append(L2[j])
if j < len(L2) - 1:
j += 1
else:
result += L1[i:] # When the last element in L2 is reached,
break # append the rest of L1 to result.
return result
L1 = [1, 3, 5]
L2 = [2, 4, 6, 8]
merge_lists(L1, L2) # Should return [1, 2, 3, 4, 5, 6, 8]
merge_lists([], L1) # Should return [1, 3, 5]
I'm still learning about algorithms, please let me know if the code could be improved in any aspect, your feedback is appreciated, thanks!
Use the 'merge' step of merge sort, it runs in O(n) time.
From wikipedia (pseudo-code):
function merge(left,right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else
append first(right) to result
right = rest(right)
end while
while length(left) > 0
append left to result
while length(right) > 0
append right to result
return result
Recursive implementation is below. Average performance is O(n).
def merge_sorted_lists(A, B, sorted_list = None):
if sorted_list == None:
sorted_list = []
slice_index = 0
for element in A:
if element <= B[0]:
sorted_list.append(element)
slice_index += 1
else:
return merge_sorted_lists(B, A[slice_index:], sorted_list)
return sorted_list + B
or generator with improved space complexity:
def merge_sorted_lists_as_generator(A, B):
slice_index = 0
for element in A:
if element <= B[0]:
slice_index += 1
yield element
else:
for sorted_element in merge_sorted_lists_as_generator(B, A[slice_index:]):
yield sorted_element
return
for element in B:
yield element
This is my solution in linear time without editing l1 and l2:
def merge(l1, l2):
m, m2 = len(l1), len(l2)
newList = []
l, r = 0, 0
while l < m and r < m2:
if l1[l] < l2[r]:
newList.append(l1[l])
l += 1
else:
newList.append(l2[r])
r += 1
return newList + l1[l:] + l2[r:]
I'd go with the following answer.
from math import floor
def merge_sort(l):
if len(l) < 2:
return l
left = merge_sort(l[:floor(len(l)/2)])
right = merge_sort(l[floor(len(l)/2):])
return merge(left, right)
def merge(a, b):
i, j = 0, 0
a_len, b_len = len(a), len(b)
output_length = a_len + b_len
out = list()
for _ in range(output_length):
if i < a_len and j < b_len and a[i] < b[j]:
out.append(a[i])
i = i + 1
elif j < b_len:
out.append(b[j])
j = j + 1
while (i < a_len):
out.append(a[i])
i += 1
while (j < b_len):
out.append(b[j])
j += 1
return out
if __name__ == '__main__':
print(merge_sort([7, 8, 9, 4, 5, 6]))
Well, the naive approach (combine 2 lists into large one and sort) will be O(N*log(N)) complexity. On the other hand, if you implement the merge manually (i do not know about any ready code in python libs for this, but i'm no expert) the complexity will be O(N), which is clearly faster.
The idea is described wery well in post by Barry Kelly.
If you want to do it in a manner more consistent with learning what goes on in the iteration try this
def merge_arrays(a, b):
l= []
while len(a) > 0 and len(b)>0:
if a[0] < b[0]: l.append(a.pop(0))
else:l.append(b.pop(0))
l.extend(a+b)
print( l )
import random
n=int(input("Enter size of table 1")); #size of list 1
m=int(input("Enter size of table 2")); # size of list 2
tb1=[random.randrange(1,101,1) for _ in range(n)] # filling the list with random
tb2=[random.randrange(1,101,1) for _ in range(m)] # numbers between 1 and 100
tb1.sort(); #sort the list 1
tb2.sort(); # sort the list 2
fus=[]; # creat an empty list
print(tb1); # print the list 1
print('------------------------------------');
print(tb2); # print the list 2
print('------------------------------------');
i=0;j=0; # varialbles to cross the list
while(i<n and j<m):
if(tb1[i]<tb2[j]):
fus.append(tb1[i]);
i+=1;
else:
fus.append(tb2[j]);
j+=1;
if(i<n):
fus+=tb1[i:n];
if(j<m):
fus+=tb2[j:m];
print(fus);
# this code is used to merge two sorted lists in one sorted list (FUS) without
#sorting the (FUS)
Have used merge step of the merge sort. But I have used generators. Time complexity O(n)
def merge(lst1,lst2):
len1=len(lst1)
len2=len(lst2)
i,j=0,0
while(i<len1 and j<len2):
if(lst1[i]<lst2[j]):
yield lst1[i]
i+=1
else:
yield lst2[j]
j+=1
if(i==len1):
while(j<len2):
yield lst2[j]
j+=1
elif(j==len2):
while(i<len1):
yield lst1[i]
i+=1
l1=[1,3,5,7]
l2=[2,4,6,8,9]
mergelst=(val for val in merge(l1,l2))
print(*mergelst)
This code has time complexity O(n) and can merge lists of any data type, given a quantifying function as the parameter func. It produces a new merged list and does not modify either of the lists passed as arguments.
def merge_sorted_lists(listA,listB,func):
merged = list()
iA = 0
iB = 0
while True:
hasA = iA < len(listA)
hasB = iB < len(listB)
if not hasA and not hasB:
break
valA = None if not hasA else listA[iA]
valB = None if not hasB else listB[iB]
a = None if not hasA else func(valA)
b = None if not hasB else func(valB)
if (not hasB or a<b) and hasA:
merged.append(valA)
iA += 1
elif hasB:
merged.append(valB)
iB += 1
return merged
def compareDate(obj1, obj2):
if obj1.getDate() < obj2.getDate():
return -1
elif obj1.getDate() > obj2.getDate():
return 1
else:
return 0
list = list1 + list2
list.sort(compareDate)
Will sort the list in place. Define your own function for comparing two objects, and pass that function into the built in sort function.
Do NOT use bubble sort, it has horrible performance.
in O(m+n) complexity
def merge_sorted_list(nums1: list, nums2:list) -> list:
m = len(nums1)
n = len(nums2)
nums1 = nums1.copy()
nums2 = nums2.copy()
nums1.extend([0 for i in range(n)])
while m > 0 and n > 0:
if nums1[m-1] >= nums2[n-1]:
nums1[m+n-1] = nums1[m-1]
m -= 1
else:
nums1[m+n-1] = nums2[n-1]
n -= 1
if n > 0:
nums1[:n] = nums2[:n]
return nums1
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
print(merge_sorted_list(l1, l2))
output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Hope this helps. Pretty Simple and straight forward:
l1 = [1, 3, 4, 7]
l2 = [0, 2, 5, 6, 8, 9]
l3 = l1 + l2
l3.sort()
print (l3)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Categories