How does a Python custom comparator work? - python

I have the following Python dict:
[(2, [3, 4, 5]), (3, [1, 0, 0, 0, 1]), (4, [-1]), (10, [1, 2, 3])]
Now I want to sort them on the basis of sum of values of the values of dictionary, so for the first key the sum of values is 3+4+5=12.
I have written the following code that does the job:
def myComparator(a,b):
print "Values(a,b): ",(a,b)
sum_a=sum(a[1])
sum_b=sum(b[1])
print sum_a,sum_b
print "Comparision Returns:",cmp(sum_a,sum_b)
return cmp(sum_a,sum_b)
items.sort(myComparator)
print items
This is what the output that I get after running above:
Values(a,b): ((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))
2 12
Comparision Returns: -1
Values(a,b): ((4, [-1]), (3, [1, 0, 0, 0, 1]))
-1 2
Comparision Returns: -1
Values(a,b): ((10, [1, 2, 3]), (4, [-1]))
6 -1
Comparision Returns: 1
Values(a,b): ((10, [1, 2, 3]), (3, [1, 0, 0, 0, 1]))
6 2
Comparision Returns: 1
Values(a,b): ((10, [1, 2, 3]), (2, [3, 4, 5]))
6 12
Comparision Returns: -1
[(4, [-1]), (3, [1, 0, 0, 0, 1]), (10, [1, 2, 3]), (2, [3, 4, 5])]
Now I am unable to understand as to how the comparator is working, which two values are being passed and how many such comparisons would happen? Is it creating a sorted list of keys internally where it keeps track of each comparison made? Also the behavior seems to be very random. I am confused, any help would be appreciated.

The number and which comparisons are done is not documented and in fact, it can freely change from different implementations. The only guarantee is that if the comparison function makes sense the method will sort the list.
CPython uses the Timsort algorithm to sort lists, so what you see is the order in which that algorithm is performing the comparisons (if I'm not mistaken for very short lists Timsort just uses insertion sort)
Python is not keeping track of "keys". It just calls your comparison function every time a comparison is made. So your function can be called many more than len(items) times.
If you want to use keys you should use the key argument. In fact you could do:
items.sort(key=lambda x: sum(x[1]))
This will create the keys and then sort using the usual comparison operator on the keys. This is guaranteed to call the function passed by key only len(items) times.
Given that your list is:
[a,b,c,d]
The sequence of comparisons you are seeing is:
b < a # -1 true --> [b, a, c, d]
c < b # -1 true --> [c, b, a, d]
d < c # 1 false
d < b # 1 false
d < a # -1 true --> [c, b, d, a]

how the comparator is working
This is well documented:
Compare the two objects x and y and return an integer according to the outcome. The return value is negative if x < y, zero if x == y and strictly positive if x > y.
Instead of calling the cmp function you could have written:
sum_a=sum(a[1])
sum_b=sum(b[1])
if sum_a < sum_b:
return -1
elif sum_a == sum_b:
return 0
else:
return 1
which two values are being passed
From your print statements you can see the two values that are passed. Let's look at the first iteration:
((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))
What you are printing here is a tuple (a, b), so the actual values passed into your comparison functions are
a = (3, [1, 0, 0, 0, 1])
b = (2, [3, 4, 5]))
By means of your function, you then compare the sum of the two lists in each tuple, which you denote sum_a and sum_b in your code.
and how many such comparisons would happen?
I guess what you are really asking: How does the sort work, by just calling a single function?
The short answer is: it uses the Timsort algorithm, and it calls the comparison function O(n * log n) times (note that the actual number of calls is c * n * log n, where c > 0).
To understand what is happening, picture yourself sorting a list of values, say v = [4,2,6,3]. If you go about this systematically, you might do this:
start at the first value, at index i = 0
compare v[i] with v[i+1]
If v[i+1] < v[i], swap them
increase i, repeat from 2 until i == len(v) - 2
start at 1 until no further swaps occurred
So you get, i =
0: 2 < 4 => [2, 4, 6, 3] (swap)
1: 6 < 4 => [2, 4, 6, 3] (no swap)
2: 3 < 6 => [2, 4, 3, 6] (swap)
Start again:
0: 4 < 2 => [2, 4, 3, 6] (no swap)
1: 3 < 4 => [2, 3, 4, 6] (swap)
2: 6 < 4 => [2, 3, 4, 6] (no swap)
Start again - there will be no further swaps, so stop. Your list is sorted. In this example we have run through the list 3 times, and there were 3 * 3 = 9 comparisons.
Obviously this is not very efficient -- the sort() method only calls your comparator function 5 times. The reason is that it employs a more efficient sort algorithm than the simple one explained above.
Also the behavior seems to be very random.
Note that the sequence of values passed to your comparator function is not, in general, defined. However, the sort function does all the necessary comparisons between any two values of the iterable it receives.
Is it creating a sorted list of keys internally where it keeps track of each comparison made?
No, it is not keeping a list of keys internally. Rather the sorting algorithm essentially iterates over the list you give it. In fact it builds subsets of lists to avoid doing too many comparisons - there is a nice visualization of how the sorting algorithm works at Visualising Sorting Algorithms: Python's timsort by Aldo Cortesi

Basically, for the simple list such as [2, 4, 6, 3, 1] and the complex list you provided, the sorting algorithms are the same.
The only differences are the complexity of elements in the list and the comparing scheme that how to compare any tow elements (e.g. myComparator you provided).
There is a good description for Python Sorting: https://wiki.python.org/moin/HowTo/Sorting

First, the cmp() function:
cmp(...)
cmp(x, y) -> integer
Return negative if x<y, zero if x==y, positive if x>y.
You are using this line: items.sort(myComparator) which is equivalent to saying: items.sort(-1) or items.sort(0) or items.sort(1)
Since you want to sort based on the sum of each tuples list, you could do this:
mylist = [(2, [3, 4, 5]), (3, [1, 0, 0, 0, 1]), (4, [-1]), (10, [1, 2, 3])]
sorted(mylist, key=lambda pair: sum(pair[1]))
What this is doing is, I think, exactly what you wanted. Sorting mylist based on the sum() of each tuples list

Related

Error when trying to implement MERGE algorithm merging to sorted lists of integers in python?

I'm new to both algorithms AND programming.
As an intro to the MERGE algorithms the chapter introduces first the MERGE algorithm by itself. It merges and sorts an array consisting of 2 sorted sub-arrays.
I did the pseudocode on paper according to the book:
Source: "Introduction to Algorithms
Third Edition" Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein
Since I am implementing it in python3 I had to change some lines given that indexing in python starts at 0 unlike in the pseudocode example of the book.
Keep in mind that the input is one array that contains 2 SORTED sub-arrays which are then merged and sorted, and returned. I kept the prints in my code, so you can see my checks...
#!/anaconda3/bin/python3
import math
import argparse
# For now only MERGE slides ch 2 -- Im defining p q and r WITHIN the function
# But for MERGE_SORT p,q and r are defined as parameters!
def merge(ar):
'''
Takes as input an array. This array consists of 2 subarrays that ARE ALLREADY sorted
(small to large). When splitting the array into half, the left
part will be longer by one if not divisible by 2. These subarrays will be
called left and right. Each of the subarrays must already be sorted. Merge() then
merges these sorted arrays into one big sorted array. The sorted array is returned.
'''
print(ar)
p=0 # for now defining always as 0
if len(ar)%2==0:
q=len(ar)//2-1 # because indexing starts from ZERO in py
else:
q=len(ar)//2 # left sub array will be 1 item longer
r=len(ar)-1 # again -1 because indexing starts from ZERO in py
print('p', p, 'q', q, 'r', r)
# lets see if n1 and n2 check out
n_1 = q-p+1 # lenght of left subarray
n_2 = r-q # lenght of right subarray
print('n1 is: ', n_1)
print('n2 is: ', n_2)
left = [0]*(n_1+1) # initiating zero list of lenght n1
right=[0]*(n_2+1)
print(left, len(left))
print(right, len(right))
# filling left and right
for i in range(n_1):# because last value will always be infinity
left[i] = ar[p+i]
for j in range(n_2):
right[j] = ar[q+j+1]
#print(ar[q+j+1])
#print(right[j])
# inserting infinity at last index for each subarray
left[n_1]=math.inf
right[n_2]=math.inf
print(left)
print(right)
# merging: initiating indexes at 0
i=0
j=0
print('p', p)
print('r', r)
for k in range(p,r):
if left[i] <= right[j]:
ar[k]=left[i]
# increase i
i += 1
else:
ar[k]=right[j]
#increase j
j += 1
print(ar)
#############################################################################################################################
# Adding parser
#############################################################################################################################
parser = argparse.ArgumentParser(description='MERGE algorithm from ch 2')
parser.add_argument('-a', '--array', type=str, metavar='', required=True, help='One List of integers composed of 2 sorted halves. Sorting must start from smallest to largest for each of the halves.')
args = parser.parse_args()
args_list_st=args.array.split(',') # list of strings
args_list_int=[]
for i in args_list_st:
args_list_int.append(int(i))
if __name__ == "__main__":
merge(args_list_int)
The problem:
When I try to sort the array as shown in the book the merged array that is returned contains two 6es and the 7 is lost.
$ ./2.merge.py -a=2,4,5,7,1,2,3,6
[2, 4, 5, 7, 1, 2, 3, 6]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 6, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 6, 6]
This does how ever not happen with arrays of any number higher than 6.
$ ./2.merge.py -a=2,4,5,7,1,2,3,8
[2, 4, 5, 7, 1, 2, 3, 8]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 8, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 7, 8]
I showed it to a colleague in my class without success. And I've walked it through manually with numbers on paper snippets but withouth success. I hope someone can find my silly mistake because I'm completely stuck.
Thanks
As r is the index of the last value in arr, you need to add one to it to make a range that also includes that final index:
for k in range(p, r + 1):
# ^^^^^
Note that your code could be greatly reduced if you would use list slicing.
Brother you made a very small mistake in this line
for k in range(p,r):
Here you loop is running from p to r-1 and your last index i.e r, will not get iterated.
So you have to use
for k in range(p,r+1):
And in the second testcase a=[2,4,5,7,1,2,3,8]
You are getting the correct output even with your wrong code because you are overwriting the values in array ar and your current code was able to sort the array till index r-1 and the number present at index r will be the same which was present before the execution of your merge function i.e 8
Try using this testcase: [2, 4, 5, 8, 1, 2, 3, 7]
And your output will be [1, 2, 2, 3, 4, 5, 7, 7]
Hope this helped

How can I find the indices of any two of the numbers, whose sum is equal to the target sum in Python?

I'm doing this test on testdome.com for practicing , and it's failing some test case. Can anyone help me pointing out the logic error in my code?
This is the question for my code:
"Write a function that, when passed a list and a target sum, returns, efficiently with respect to time used, two distinct zero-based indices of any two of the numbers, whose sum is equal to the target sum.
If there are no two numbers, the function should return None.
For example, find_two_sum([3, 1, 5, 7, 5, 9], 10) should return a single tuple containing any of the following pairs of indices:
0 and 3 (or 3 and 0) because addition of 3 and 7 is 10.
1 and 5 (or 5 and 1) because addition of 1 and 9 is 10.
2 and 4 (or 4 and 2) because addition of 5 and 5 is 10.
def find_two_sum(numbers, target_sum):
sss=list(dict.fromkeys(numbers))
if (sss == None or len(sss) < 2): return None
for item in sss:
tesn=target_sum-item
if tesn in sss:
if numbers.index(item)==numbers.index(tesn):
continue
else:
return numbers.index(item),numbers.index(tesn)
return None
print(find_two_sum([3, 1, 5, 7, 5, 9], 10))
They have four test cases and my code can only pass first two test cases.
Example case:Wrong answer ( to return [0,2] because 3 of index 0 + 7 of index 3 is 10)
Distinct numbers with and without solutions: Wrong answer
Duplicate numbers with and without solutions: Wrong answer
Performance test with a large list of numbers: Wrong answer
My take on the problem:
def find_two_sum(lst, n):
indices = {}
for idx, num in enumerate(lst):
indices.setdefault(num, []).append(idx)
for k, v in indices.items():
i = v.pop()
if n - k in indices and indices[n-k]:
return i, indices[n-k].pop()
print( find_two_sum([3, 1, 5, 7, 5, 9], 6) )
print( find_two_sum([3, 1, 5, 7, 5, 9], 10) )
print( find_two_sum([1, 2, 1, 8], 10) )
print( find_two_sum([5, 5], 10) )
print( find_two_sum([11], 10) )
Prints:
(1, 4)
(0, 3)
(1, 3)
(1, 0)
None
I believe you have to add a check for the two indexes to be distinct.
For example here:
print(find_two_sum([3, 1, 5, 7, 5, 9], 6))
The function will give an answer of (0, 0) which wouldn't be correct, though these are the indexes of 3, which gives a sum of 6 with itself .
Here, I've added the check for distinct indexes:
def find_two_sum(numbers, target_sum):
sss = list(dict.fromkeys(numbers))
if (sss == None or len(sss) < 2): return None
tup=()
for item in sss:
item_index = numbers.index(item)
tesn = target_sum - item
if tesn in sss:
tesn_index = numbers.index(tesn)
if item_index!=tesn_index:
return (item_index, tesn_index)
return None
One flaw in the logic is that sss does not contain duplicates that may exist in the original list - you have lost information. You are assuming there are no duplicates in the original list: list.index(n) will return the index of the first item equal to n so you can end up with a result with duplicate indices
>>> a = [3, 1, 5, 7, 5, 9]
>>> item = 5
>>> tesn = 5
>>> a.index(item),a.index(tesn)
(2, 2)
>>>
Your algorithm has a flaw e.g. find_two_sum([5, 2], 10) gives (0, 0).
This is because when you check item in sss, it's gonna evaluate to true when item is 5, there is only a single 5 in the input list.
This answer seems to be 50% correct.
def find_two_sum(numbers, target_sum):
for n in numbers:
for i in numbers[numbers.index(n)+1:]:
if n+i==target_sum:
return(numbers.index(n),numbers.index(i))
break
return None
print(find_two_sum([3, 1, 5, 7, 5, 9], 10))

How to fix the multiple value assignment to the variables in the python code below?

Given a list of unique numbers in python, I need to swap the positions of the maximum and minimum numbers in the list.
Apart from the traditional way of doing by getting the positions of the numbers by for loop, I tried to do that by the in-built python functions and used it directly in the multiple variables assignment method which is shown below.
a = [i for i in range(6, 1, -1)]
print("The original array is =", a) # print's [6, 5, 4, 3, 2]
index_max = a.index(max(a))
index_min = a.index(min(a))
# a[ a.index(max(a)) ], a[ a.index(min(a)) ] = min(a), max(a) #print's [6, 5, 4, 3, 2]
a[index_max], a[index_min] = min(a), max(a) # print's [2, 5, 4, 3, 6]
print("The swapped array is =", a)
Line no.7 doesn't work as it gives the output [6, 5, 4, 3, 2], instead
of [2, 5, 4, 3, 6].
Whereas line no.8 works perfectly!!
According to docummentation of Python:
WARNING: Although the definition of assignment implies
that overlaps between the left-hand side and the right-
hand side are `safe' (e.g., "a, b = b, a" swaps two
variables), overlaps within the collection of assigned-to
variables are not safe! For instance, the following program
prints "[0, 2]":
x = [0, 1]
i = 0
i, x[i] = 1, 2
print x
So the problem is that, in line 7, Python first does
a [a.index(max(a))] = min(a)
Now, a = [2, 5, 4, 3, 2]. After that, Python does
a [a.index(min(a))] = max(a)
But min(a) = 2, and a.index(2) returns 0. So, in the end, a = [6, 5, 4, 3, 2]. That's why assign the index of min and max before swap the variables does work.
Reference:
https://docs.python.org/2.0/ref/assignment.html
Edit: reference to Python 3 as suggested by #chepner:
Although the definition of assignment implies that overlaps between
the left-hand side and the right-hand side are ‘simultaneous’ (for
example a, b = b, a swaps two variables), overlaps within the
collection of assigned-to variables occur left-to-right, sometimes
resulting in confusion. For instance, the following program prints [0,
2]:
x = [0, 1]
i = 0
i, x[i] = 1, 2 # i is updated, then x[i] is updated
print(x)
Reference:
https://docs.python.org/3/reference/simple_stmts.html#assignment-statements
The important here is the order of the operations. When you do:
a[ a.index(max(a)) ], a[ a.index(min(a)) ] = min(a), max(a)
Python do things in that order:
max(a) # >> 6
a.index(max(a)) # >> 0
a[...] = min(a) # >> a[0] = 2
Then, it do the same with the second member:
min(a) # >> 2
a.index(min(a)) # >> 0
a[...] = max(a) # >> a[0] = 6
The bad behaviour is natural, since you changed the index during the operation...

Creating a general list of ascending and descending numbers using list comprehension

I am trying to generate a list containing a mixture of ascending and descending numbers.
e.g., say you have n=5. I want to generate a list/array based on n such that you have:
[0,1,2,3,4,3,2,1,0]
using list comprehension.
I tried doing this:
print [[i+j] for i in range(n)for j in range(n,-1,-1)]
but I can't seem to get it right.
I know you specified you wanted a list comp, but is it really necessary?
list(range(5)) + list(reversed(range(4)))
(python 3 syntax)
Or, in python2:
range(5) + range(4)[::-1]
or
range(5) + range(3,-1,-1)
I think the first one is more readable, but ymmv.
In [27]: n = 5
In [28]: [n-1-abs(i-n+1) for i in range(n*2-1)]
Out[28]: [0, 1, 2, 3, 4, 3, 2, 1, 0]
Update
This one might be more clear
In [36]: [n-abs(i) for i in range(-n,n+1)]
Out[36]: [0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0]
One-liner:
[i if i < n else 2*(n-1)-i for i in range(2*(n-1) + 1)]
More efficient:
_top = 2*(n-1)
[i if i < n else _top-i for i in range(_top + 1)]

Dictionary works for len(string) multiple of 3. Function deletes remainders but now doesn't translate with dictionary. Python 2.7.1

I made a function with a dictionary. The purpose of the function is to separate the input string into sets of 3 . If the input string value is not a multiple of 3, I want to delete the remainder [1 or 2]
my function was working perfectly until I added the part for deleting the remainders
def func(fx):
d={'AAA':1,'BBB':2,'CCC':3}
length=len(fx)
if length % 3 == 0:
return fx
if length % 3 == 1:
return fx[:-1]
if length % 3 == 2:
return fx[:-2]
Fx=fx.upper()
Fx3=[Fx[i:i+3] for i in range(0,len(Fx),3)]
translate=[d[x] for x in Fx3]
return translate
x='aaabbbcc'
output = func(x)
print output
>>>
aaabbb
the function is recognizing that the input sequence is not a multiple of 3 so its deleting the 2 values which is what i want. However, its splitting the new string into 3 letter words to be translated with my dictionary anymore. If you delete the if statements, the function works but only for strings that are a multiple of 3.
What am I doing wrong ???
You are returning fx when you probably should be reassigning it
def func(fx):
d={'AAA':1,'BBB':2,'CCC':3}
length=len(fx)
if length % 3 == 0:
pass
elif length % 3 == 1:
fx = fx[:-1]
elif length % 3 == 2:
fx = fx[:-2]
Fx=fx.upper()
Fx3=[Fx[i:i+3] for i in range(0,len(Fx),3)]
translate=[d[x] for x in Fx3]
return translate
Here is an alternate function for you to figure out when you know some more Python
def func(fx):
d = {'AAA':1,'BBB':2,'CCC':3}
return [d["".join(x).upper()] for x in zip(*[iter(fx)]*3)]
Does this do what you want?
def func(fx):
d = {'AAA': 1, 'BBB': 2, 'CCC': 3}
fx = fx[:-(len(fx)%3)].upper()
groups = [fx[i:i+3] for i in range(0, len(fx), 3)]
translate = [d[group] for group in groups]
return translate
x='aaabbbcc'
print func(x)
When trimming the end of the string, you were returning the result when you wanted to just store it in a variable or assign it back to fx.
Rather than the if .. elifs you can just use the result of the length modulo 3 directly.
There is no need of a function, it can be done in a one liner less complex than the gnibbler's one.
Acom's solution is nearly mine.
d={'AAA':1,'BBB':2,'CCC':3}
for fx in ('bbbcccaaabbbcccbbbcccaaabbbcc',
'bbbcccaaabbbaaa','bbbcccaaabbbaa','bbbcccaaabbba',
'bbbcccaaabbb','bbbcccaaabb','bbbcccaaab',
'bbbcccaaa','bbbcccaa','bbbccca',
'bbbccc','bbbcc','bbbc',
'bbb','bb','b',''):
print fx
print tuple( d[fx[i:i+3].upper()] for i in xrange(0, len(fx)-len(fx)%3, 3) )
produces
bbbcccaaabbbcccbbbcccaaabbbcc
(2, 3, 1, 2, 3, 2, 3, 1, 2)
bbbcccaaabbbaaa
(2, 3, 1, 2, 1)
bbbcccaaabbbaa
(2, 3, 1, 2)
bbbcccaaabbba
(2, 3, 1, 2)
bbbcccaaabbb
(2, 3, 1, 2)
bbbcccaaabb
(2, 3, 1)
bbbcccaaab
(2, 3, 1)
bbbcccaaa
(2, 3, 1)
bbbcccaa
(2, 3)
bbbccca
(2, 3)
bbbccc
(2, 3)
bbbcc
(2,)
bbbc
(2,)
bbb
(2,)
bb
()
b
()
()
.
I think you have to treat strings that can contain only 3 characters strings 'aaa','bbb','ccc' at the positions 0,3,6,9,etc
Then the preceding programs won't crash if there's an heterogenous 3-characters string at one of these positions instead of one of these set 'aaa','bbb','ccc'
In this case, note that you could use the dictionary's method get that returns a default value when a pased argument isn't a key of the dictionary.
In the following code, I put the default returned value as 0:
d={'AAA':1,'BBB':2,'CCC':3}
for fx in ('bbbcccaaa###bbbccc"""bbbcc',
'bbb aaabbbaaa','bbbccc^^^bbbaa','bbbc;;;aabbba',
'bbbc^caaabbb',']]bccca..bb','bbb%%%aaab',
'bbbcccaaa','bbb!ccaa','b#bccca',
'bbbccc','bbbcc','bbbc',
'b&b','bb','b',''):
print fx
print [d.get(fx[i:i+3].upper(), 0) for i in xrange(0, len(fx)-len(fx)%3, 3)]
produces
bbbcccaaa###bbbccc"""bbbcc
[2, 3, 1, 0, 2, 3, 0, 2]
bbb aaabbbaaa
[2, 0, 1, 2, 1]
bbbccc^^^bbbaa
[2, 3, 0, 2]
bbbc;;;aabbba
[2, 0, 0, 2]
bbbc^caaabbb
[2, 0, 1, 2]
]]bccca..bb
[0, 3, 0]
bbb%%%aaab
[2, 0, 1]
bbbcccaaa
[2, 3, 1]
bbb!ccaa
[2, 0]
b#bccca
[0, 3]
bbbccc
[2, 3]
bbbcc
[2]
bbbc
[2]
b&b
[0]
bb
[]
b
[]
[]
By the way, I preferred to create a tuple instead of a list because for the kind of invariable objects that are in the result, I think it is better not to create a list

Categories