Linear merging for lists in Python

Linear merging for lists in Python - python

I'm working through Google's Python class exercises. One of the exercises is this:
Given two lists sorted in increasing order, create and return a merged list of all the elements in sorted order. You may modify the passed in lists. Ideally, the solution should work in "linear" time, making a single pass of both lists.
The solution I came up with was:
def linear_merge(list1, list2):
list1.extend(list2)
return sorted(list1)
It passed the the test function, but the solution given is this:
def linear_merge(list1, list2):
result = []
# Look at the two lists so long as both are non-empty.
# Take whichever element [0] is smaller.
while len(list1) and len(list2):
if list1[0] < list2[0]:
result.append(list1.pop(0))
else:
result.append(list2.pop(0))
# Now tack on what's left
result.extend(list1)
result.extend(list2)
return result
Included as part of the solution was this:
Note: the solution above is kind of cute, but unfortunately list.pop(0) is
not constant time with the standard python list implementation, so the
above is not strictly linear time. An alternate approach uses pop(-1) to
remove the endmost elements from each list, building a solution list which
is backwards. Then use reversed() to put the result back in the correct
order. That solution works in linear time, but is more ugly.
Why are these two solutions so different? Am I missing something, or are they being unnecessarily complicated?

They're encouraging you to think about the actual method (algorithm) of merging two sorted lists. Suppose you had two stacks of paper with names on them, each in alphabetical order, and you wanted to make one sorted stack from them. You wouldn't just lump them together and then sort that from scratch; that would be too much work. You'd make use of the fact that each pile is already sorted, so you can just take the one that comes first off of one pile or the other, and put them into a new stack.

As you noted, your solution works perfectly. So why the complexity? Well, for a start
Ideally, the solution should work in "linear" time, making a single
pass of both lists.
Well, you're not explicitly passing through any lists, but you are calling sorted(). So how many times will sorted() pass over the lists?
Well, I don't actually know. Normally, a sorting algorithm would operate in something like O(n*log(n)) time, though look at this quote from the Python docs:
The Timsort algorithm used in Python does multiple sorts efficiently
because it can take advantage of any ordering already present in a
dataset.
Maybe someone who knows timsort better can figure it out.
But what they're doing in the solution, is using the fact that they know they have 2 sorted lists. So rather than starting from "scratch" with sorted, they're picking off elements 1 by 1.

I like the #Abhijit approach the most. Here is a slightly more pythonic/readable version of his code snippet:
def linear_merge(list1, list2):
result = []
while list1 and list2:
result.append((list1 if list1[-1] > list2[-1] else list2).pop(-1))
return (result + list1 + list2)[-1::-1]
With the help of the built-in python features, we:
don't need to explicitly check if the lists are empty with the
len function.
can merge/append empty lists and the result will remain unchanged, so no need for explicit checking.
we can combine multiple statements (if the readability allows), which sometimes makes the code more compact.

result = []
while list1 and list2:
result.append((list1 if list1[-1] > list2[-1] else list2).pop(-1))
if len(list1):
result += list1[-1::-1]
if len(list2):
result += list2[-1::-1]
return result[-1::-1]
The solution by #Abhijit and #intel do not work in all cases because they have not reversed the leftover parts of the original lists. If we have list1 = [1, 2, 3, 5, 9, 11, 13, 17] and list2 = [6, 7, 12, 15] then their solution would give [5, 3, 2, 1, 6, 7, 9, 11, 12, 13, 15, 17] where we would want [1, 2, 3, 5, 6, 7, 9, 11, 12, 13, 15, 17].

Your solution is O(n log n), which means that if your lists were 10 times as long, the program would take (roughly) 30 times as much time. Their solution would only take 10 times as long.

Pop off the end of the lists until one is empty. I think this is linear, and also the reverses are linear too. Ugly, but a solution.
def linear_merge(list1, list2):
# NOT return sorted (list1 + list2), as this is not linear
list3 = []
rem = []
empty = False
while not empty:
# Get last items from each list, if they exist
if len (list1) > 0:
a = list1[-1]
else:
rem = list2[:]
empty = True
if len (list2) > 0:
b = list2[-1]
else:
rem = list1[:]
empty = True
# Pop the one that's largest onto the new list
if not empty:
if a > b:
list3.append (a)
list1.pop ()
else:
list3.append (b)
list2.pop ()
# add the (reversed) remainder to the list
rem.reverse ()
list3 += rem
# reverse the entire list
list3.reverse ()
return list3

A slightly refined by still ugly solution (in Python3.5):
def linear_merge(list1: list, list2: list):
result = []
while len(list1) and len(list2):
result.append((list1 if list1[-1] > list2[-1] else list2).pop(-1))
result += list1 if len(list1) else list2
return result[-1::-1]

def linear_merge(list1, list2):
a= list1 + list2
a.sort()
return a

Related

Index error trying to deduplicate a list only using if and for

My goal is to write a program that removes duplicated elements from a list – E.g., [2,3,4,5,4,6,5] →[2,3,4,5,6] without the function set (only if and for)
At the end of my code I got stuck. I have tried to change everything in the if statement but it got me to nowhere, same error repeating itself here:
n=int(input('enter the number of elements in your list '))
mylist=[]
for i in range (n):
for j in range (n):
ele=input(' ')
if mylist[i]!=mylist[j]: ***here is the error exactly , I dont really understand what does the out-of-range above problem have to do with the if statement right here***
mylist.append(ele)
print(mylist)
However, I changed nearly everything and I still got the following error:
if mylist[i]!=mylist[j]:
IndexError: list index out of range
Why does this issue keep coming back? ps: I cant use the function set because I am required to use if and for only

You can do this all with a list comprehension that checks if the value has previously appeared in the list slice that you have already looped over.
mylist = [1,2,3,3,4,5,6,4,1,2,4,3]
outlist = [x for index, x in enumerate(mylist) if x not in mylist[:index]]
print(outlist)
Output
[1, 2, 3, 4, 5, 6]
In case you're not familiar with list comprehensions yet, the above comprehension is functionally equivalent to:
outlist = []
for index, x in enumerate(mylist):
if x not in mylist[:index]:
outlist.append(x)
print(outlist)

A set would be more efficient, but
>>> def dedupe(L):
... seen = []
... return [seen.append(e) or e for e in L if e not in seen]
...
>>> dedupe([2,3,4,5,4,6,5])
[2, 3, 4, 5, 6]
This is not just a toy solution. Because sets can only contain hashable types, you might really have reason to resort to this kind of thing. This approach entails a linear search of the seen items, which is fine for small cases, but is slow compared to hashing.
It still may be possible to do better than this in some cases. Even if you can't hash them, if you can at least sort the seen elements, then you can speed up the search to logarithmic time by using a binary search. (See bisect).

Iterating over 2 lists at once and comparing the elements

This is just a small part of my homework assignment. What im trying to do is iterate through list1 and iterate backwards through list2 and determine if list2 is a reversed version of list1. Both lists are equal length.
example: list1 = [1,2,3,4] and list2 = [4,3,2,1]. list2 is a reversed version of list1. You could also have list1 = [1,2,1] and list2 = [1,2,1] then they would be the same list and also reversed lists.
Im not asking for exact code, im just not sure how i would code this. Would i run 2 loops? Any tips are appreciated. Just looking for a basic structure/algorithm.
edit: we are not allowed to use any auxiliary lists etc.

You can just iterate backwards on the second list, and keep a counter of items from the start of the first list. If items match, break out of the loop, otherwise keep going.
Here's what it can look like:
def is_reversed(l1, l2):
first = 0
for i in range(len(l2)-1, -1, -1):
if l2[i] != l1[first]:
return False
first += 1
return True
Which Outputs:
>>> is_reversed([1,2,3,4], [4,3,2,1])
True
>>> is_reversed([1,2,3,4], [4,3,2,2])
False
Although it would be easier to just use builtin functions to do this shown in the comments.

The idea here is that whenever you have an element list1[i], you want to compare it to the element list2[-i-1]. As required, the following code creates no auxiliary list in the process.
list1 = [1, 2, 3]
list2 = [3, 2, 1]
are_reversed = True
for i in range(len(list1)):
if list1[i] != list2[-i - 1]:
are_reversed = False
break
I want to point out that the built-in range does not create a new list in Python3, but a range object instead. Although, if you really want to stay away from those as well you can modify the code to use a while-loop.
You can also make this more compact by taking advantage of the built-in function all. The following line instantiate an iterator, so this solution does not create an auxiliary list either.
are_reversed = all(list1[i] == list2[-i - 1] for i in range(len(list2))) # True

If you want to get the N'th value of each list you can do a for loop with
if (len(list1) <= len(list2):
for x in range(0, len(list1):
if (list1[x] == list2[x]):
#Do something
else:
for x in range(0, len(list2):
if (list1[x] == list2[x]):
#Do something
If you want to check if each value of a list with every value of another list you can nestle a for loop
for i in list1:
for j in list2:
if (list1[i] == list2[j]):
//Do something
EDIT: Changed code to Python

Check number not a sum of 2 ints on a list

Given a list of integers, I want to check a second list and remove from the first only those which can not be made from the sum of two numbers from the second. So given a = [3,19,20] and b = [1,2,17], I'd want [3,19].
Seems like a a cinch with two nested loops - except that I've gotten stuck with break and continue commands.
Here's what I have:
def myFunction(list_a, list_b):
for i in list_a:
for a in list_b:
for b in list_b:
if a + b == i:
break
else:
continue
break
else:
continue
list_a.remove(i)
return list_a
I know what I need to do, just the syntax seems unnecessarily confusing. Can someone show me an easier way? TIA!

You can do like this,
In [13]: from itertools import combinations
In [15]: [item for item in a if item in [sum(i) for i in combinations(b,2)]]
Out[15]: [3, 19]
combinations will give all possible combinations in b and get the list of sum. And just check the value is present in a
Edit
If you don't want to use the itertools wrote a function for it. Like this,
def comb(s):
for i, v1 in enumerate(s):
for j in range(i+1, len(s)):
yield [v1, s[j]]
result = [item for item in a if item in [sum(i) for i in comb(b)]]

Comments on code:
It's very dangerous to delete elements from a list while iterating over it. Perhaps you could append items you want to keep to a new list, and return that.
Your current algorithm is O(nm^2), where n is the size of list_a, and m is the size of list_b. This is pretty inefficient, but a good start to the problem.
Thee's also a lot of unnecessary continue and break statements, which can lead to complicated code that is hard to debug.
You also put everything into one function. If you split up each task into different functions, such as dedicating one function to finding pairs, and one for checking each item in list_a against list_b. This is a way of splitting problems into smaller problems, and using them to solve the bigger problem.
Overall I think your function is doing too much, and the logic could be condensed into much simpler code by breaking down the problem.
Another approach:
Since I found this task interesting, I decided to try it myself. My outlined approach is illustrated below.
1. You can first check if a list has a pair of a given sum in O(n) time using hashing:
def check_pairs(lst, sums):
lookup = set()
for x in lst:
current = sums - x
if current in lookup:
return True
lookup.add(x)
return False
2. Then you could use this function to check if any any pair in list_b is equal to the sum of numbers iterated in list_a:
def remove_first_sum(list_a, list_b):
new_list_a = []
for x in list_a:
check = check_pairs(list_b, x)
if check:
new_list_a.append(x)
return new_list_a
Which keeps numbers in list_a that contribute to a sum of two numbers in list_b.
3. The above can also be written with a list comprehension:
def remove_first_sum(list_a, list_b):
return [x for x in list_a if check_pairs(list_b, x)]
Both of which works as follows:
>>> remove_first_sum([3,19,20], [1,2,17])
[3, 19]
>>> remove_first_sum([3,19,20,18], [1,2,17])
[3, 19, 18]
>>> remove_first_sum([1,2,5,6],[2,3,4])
[5, 6]
Note: Overall the algorithm above is O(n) time complexity, which doesn't require anything too complicated. However, this also leads to O(n) extra auxiliary space, because a set is kept to record what items have been seen.

You can do it by first creating all possible sum combinations, then filtering out elements which don't belong to that combination list
Define the input lists
>>> a = [3,19,20]
>>> b = [1,2,17]
Next we will define all possible combinations of sum of two elements
>>> y = [i+j for k,j in enumerate(b) for i in b[k+1:]]
Next we will apply a function to every element of list a and check if it is present in above calculated list. map function can be use with an if/else clause. map will yield None in case of else clause is successful. To cater for this we can filter the list to remove None values
>>> list(filter(None, map(lambda x: x if x in y else None,a)))
The above operation will output:
>>> [3,19]
You can also write a one-line by combining all these lines into one, but I don't recommend this.

you can try something like that:
a = [3,19,20]
b= [1,2,17,5]
n_m_s=[]
data=[n_m_s.append(i+j) for i in b for j in b if i+j in a]
print(set(n_m_s))
print("after remove")
final_data=[]
for j,i in enumerate(a):
if i not in n_m_s:
final_data.append(i)
print(final_data)
output:
{19, 3}
after remove
[20]

Move every element from list l to list p

I want to transfer every element from one list to another with ascending order. This is my code:
l=[10,1,2,3,4,5,6,7,8,9]
p=[]
for x in l :
p.append(min(l))
l.remove(min(l))
print p
print l
But it returns this result:
[1, 2, 3, 4, 5]
[10, 6, 7, 8, 9]
I don't know why it stop at half way, please help me on it...Thanks!

Just do this:
p = sorted(l)
#l = [] if you /really/ want it to be empty after the operation
The reason you're getting wonky behavior is that you're changing the size of the sequence l as you iterate over it, leading you to skip elements.
If you wanted to fix your method, you would do:
for x in l[:]:
l[:] creates a copy of l, which you can safely iterate over while you do things to the original l.

try this:
p = []
while len(l) > 0:
p.append(min(l))
l.remove(min(l))
Using while instead of for prevents you from modifying the list as you're iterating over it.

If you want to retain the original unsorted array, use a copy of l.
Check out this answer for more information. https://stackoverflow.com/a/1352908/1418255

Gee, I hope your lists are short. Otherwise, all that min()'ing will yield a slow piece of code.
If your lists are long, you might try a heap (EG heapq, in the standard library) or tree (EG: https://pypi.python.org/pypi/red-black-tree-mod) or treap (EG: https://pypi.python.org/pypi/treap/).
For what you're doing, I'm guessing a heapq would be nice, unless there's a part of your story you've left out, like needing to be able to access arbitrary values and not just the min repeatedly.

Find non-common elements in lists

I'm trying to write a piece of code that can automatically factor an expression. For example,
if I have two lists [1,2,3,4] and [2,3,5], the code should be able to find the common elements in the two lists, [2,3], and combine the rest of the elements together in a new list, being [1,4,5].
From this post: How to find list intersection?
I see that the common elements can be found by
set([1,2,3,4]&set([2,3,5]).
Is there an easy way to retrieve non-common elements from each list, in my example being [1,4] and [5]?
I can go ahead and do a for loop:
lists = [[1,2,3,4],[2,3,5]]
conCommon = []
common = [2,3]
for elem in lists:
for elem in eachList:
if elem not in common:
nonCommon += elem
But this seems redundant and inefficient. Does Python provide any handy function that can do that? Thanks in advance!!

Use the symmetric difference operator for sets (aka the XOR operator):
>>> set([1,2,3]) ^ set([3,4,5])
set([1, 2, 4, 5])

Old question, but looks like python has a built-in function to provide exactly what you're looking for: .difference().
EXAMPLE
list_one = [1,2,3,4]
list_two = [2,3,5]
one_not_two = set(list_one).difference(list_two)
# set([1, 4])
two_not_one = set(list_two).difference(list_one)
# set([5])
This could also be written as:
one_not_two = set(list_one) - set(list_two)
Timing
I ran some timing tests on both and it appears that .difference() has a slight edge, to the tune of 10 - 15% but each method took about an eighth of a second to filter 1M items (random integers between 500 and 100,000), so unless you're very time sensitive, it's probably immaterial.
Other Notes
It appears the OP is looking for a solution that provides two separate lists (or sets) - one where the first contains items not in the second, and vice versa. Most of the previous answers return a single list or set that include all of the items.
There is also the question as to whether items that may be duplicated in the first list should be counted multiple times, or just once.
If the OP wants to maintain duplicates, a list comprehension could be used, for example:
one_not_two = [ x for x in list_one if x not in list_two ]
two_not_one = [ x for x in list_two if x not in list_one ]
...which is roughly the same solution as posed in the original question, only a little cleaner. This method would maintain duplicates from the original list but is considerably (like multiple orders of magnitude) slower for larger data sets.

You can use Intersection concept to deal with this kind of problems.
b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
set(b1).intersection(b2)
Out[22]: {4, 5}
Best thing about using this code is it works pretty fast for large data also. I have b1 with 607139 and b2 with 296029 elements when i use this logic I get my results in 2.9 seconds.

You can use the .__xor__ attribute method.
set([1,2,3,4]).__xor__(set([2,3,5]))
or
a = set([1,2,3,4])
b = set([2,3,5])
a.__xor__(b)

You can use symmetric_difference command
x = {1,2,3}
y = {2,3,4}
z = set.difference(x,y)
Output will be : z = {1,4}

This should get the common and remaining elements
lis1=[1,2,3,4,5,6,2,3,1]
lis2=[4,5,8,7,10,6,9,8]
common = list(dict.fromkeys([l1 for l1 in lis1 if l1 in lis2]))
remaining = list(filter(lambda i: i not in common, lis1+lis2))
common = [4, 5, 6]
remaining = [1, 2, 3, 2, 3, 1, 8, 7, 10, 9, 8]

All the good solutions, starting from basic DSA style to using inbuilt functions:
# Time: O(2n)
def solution1(arr1, arr2):
map = {}
maxLength = max(len(arr1), len(arr2))
for i in range(maxLength):
if(arr1[i]):
if(not map.get(arr1[i])):
map[arr1[i]] = [True, False]
else:
map[arr1[i]][0] = True
if(arr2[i]):
if(not map.get(arr2[i])):
map[arr2[i]] = [False, True]
else:
map[arr2[i]][1] = False
res = [];
for key, value in map.items():
if(value[0] == False or value[1] == False):
res.append(key)
return res
def solution2(arr1, arr2):
return set(arr1) ^ set(arr2)
def solution3(arr1, arr2):
return (set(arr1).difference(arr2), set(arr2).difference(arr1))
def solution4(arr1, arr2):
return set(arr1).__xor__(set(arr2))
print(solution1([1,2,3], [2,4,6]))
print(solution2([1,2,3], [2,4,6]))
print(solution3([1,2,3], [2,4,6]))
print(solution4([1,2,3], [2,4,6]))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Linear merging for lists in Python - python

Your solution is O(n log n), which means that if your lists were 10 times as long, the program would take (roughly) 30 times as much time. Their solution would only take 10 times as long.

A slightly refined by still ugly solution (in Python3.5): def linear_merge(list1: list, list2: list): result = [] while len(list1) and len(list2): result.append((list1 if list1[-1] > list2[-1] else list2).pop(-1)) result += list1 if len(list1) else list2 return result[-1::-1]

def linear_merge(list1, list2): a= list1 + list2 a.sort() return a

Related

Index error trying to deduplicate a list only using if and for

Iterating over 2 lists at once and comparing the elements

Check number not a sum of 2 ints on a list

Move every element from list l to list p

Find non-common elements in lists

Categories

Resources