How to sort with incomplete ordering? - python

I have a list of elements to sort and a comparison function cmp(x,y) which decides if x should appear before y or after y. The catch is that some elements do not have a defined order. The cmp function returns "don't care".
Example: Input: [A,B,C,D], and C > D, B > D. Output: many correct answers, e.g. [D,C,B,A] or [A,D,B,C]. All I need is one output from all possible outputs..
I was not able to use the Python's sort for this and my solution is the old-fashioned bubble-sort to start with an empty list and insert one element at a time to the right place to keep the list sorted all the time.
Is it possible to use the built-in sort/sorted function for this purpose? What would be the key?

It's not possible to use the built-in sort for this. Instead, you need to implement a Topological Sort.

The built-in sort method requires that cmp imposes a total ordering. It doesn't work if the comparisons are inconsistent. If it returns that A < B one time it must always return that, and it must return that B > A if the arguments are reversed.
You can make your cmp implementation work if you introduce an arbitrary tiebreaker. If two elements don't have a defined order, make one up. You could return cmp(id(a), id(b)) for instance -- compare the objects by their arbitrary ID numbers.

Related

How to get into details of str() and int() conversion [duplicate]

Tackling a few puzzle problems on a quiet Saturday night (wooohoo... not) and am struggling with sort(). The results aren't quite what I expect. The program iterates through every combination from 100 - 999 and checks if the product is a palindome. If it is, append to the list. I need the list sorted :D Here's my program:
list = [] #list of numbers
for x in xrange(100,1000): #loops for first value of combination
for y in xrange(x,1000): #and 2nd value
mult = x*y
reversed = str(mult)[::-1] #reverses the number
if (reversed == str(mult)):
list.append(reversed)
list.sort()
print list[:10]
which nets:
['101101', '10201', '102201', '102201', '105501', '105501', '106601', '108801',
'108801', '110011']
Clearly index 0 is larger then 1. Any idea what's going on? I have a feeling it's got something to do with trailing/leading zeroes, but I had a quick look and I can't see the problem.
Bonus points if you know where the puzzle comes from :P
You are sorting strings, not numbers. '101101' < '10201' because '1' < '2'. Change list.append(reversed) to list.append(int(reversed)) and it will work (or use a different sorting function).
Sort is doing its job. If you intended to store integers in the list, take Lukáš advice. You can also tell sort how to sort, for example by making ints:
list.sort(key=int)
the key parameter takes a function that calculates an item to take the list object's place in all comparisons. An integer will compare numerically as you expect.
(By the way, list is a really bad variable name, as you override the builtin list() type!)
Your list contains strings so it is sorting them alphabetically - try converting the list to integers and then do the sort.
You're sorting strings, not numbers. Strings compare left-to-right.
No need to convert to int. mult already is an int and as you have checked it is a palindrome it will look the same as reversed, so just:
list.append(mult)
You have your numbers stored as strings, so python is sorting them accordingly. So: '101x' comes before '102x' (the same way that 'abcd' will come before 'az').
No, it is sorting properly, just that it is sorting lexographically and you want numeric sorting... so remove the "str()"
The comparator operator is treating your input as strings instead of integers. In string comparsion 2 as the 3rd letter is lexically greater than 1.
reversed = str(mult)[::-1]

How to calculate Euclidian of dictionary with tuple as key

I have created a matrix by using a dictionary with a tuple as the key (e.g. {(user, place) : 1 } )
I need to calculate the Euclidian for each place in the matrix.
I've created a method to do this, but it is extremely inefficient because it iterates through the entire matrix for each place.
def calculateEuclidian(self, place):
count = 0;
for key, value in self.matrix.items():
if(key[1] == place and value == 1):
count += 1
euclidian = math.sqrt(count)
return euclidian
Is there a way to do this more efficiently?
I need the result to be in a dictionary with the place as a key, and the euclidian as the value.
You can use a dictionary comprehension (using a vectorized form is much faster than a for loop) and accumulate the result of the conditionals (0 or 1) as the euclidean value:
def calculateEuclidian(self, place):
return {place: sum(p==place and val==1 for (_,p), val in self.matrix.items())}
With your current data structure, I doubt there is any way you can avoid iterating through the entire dictionary.
If you cannot use another way (or an auxiliary way) of representing your data, iterating through every element of the dict is as efficient as you can get (asymptotically), since there is no way to ask a dict with tuple keys to give you all elements with keys matching (_, place) (where _ denotes "any value"). There are other, and more succinct, ways of writing the iteration code, but you cannot escape the asymptotic efficiency limitation.
If this is your most common operation, and you can in fact use another way of representing your data, you can use a dict[Place, list[User]] instead. That way, you can, in O(1) time, get the list of all users at a certain place, and all you would need to do is count the items in the list using the len(...) function which is also O(1). Obviously, you'll still need to take the sqrt in the end.
There may be ways to make it more Pythonic, but I do not think you can change the overall complexity since you are making a query based off both key and value. I think you have to search the whole matrix for your instances.
you may want to create a new dictionary from your current dictionary which isn't adapted to this kind of search and create a dictionary with place as key and list of (user,value) tuples as values.
Get the tuple list under place key (that'll be fast), then count the times where value is 1 (linear, but on a small set of data)
Keep the original dictionary for euclidian distance computation. Hoping that you don't change the data too often in the program, because you'd need to keep both dicts in-sync.

Python Sorting 2D List with custom Key

So I have a 2D list and want to sort it using a second file of keys. Does anyone know how I would go about doing that?
Heres an example input:
And here is an example input file:
first_nm,last_nm,gender,cwid,cred_hrs,qual_pts,gpa
John,Roe,M,44444444,40,150,3.75
Jane,Roe,F,66666666,100,260,2.6
John,Doe,M,22222222,50,140,2.8
Jane,Doe,F,88888888,80,280,3.5
Penny,Lowe,F,55555555,40,140,3.5
Lenny,Lowe,M,11111111,100,280,2.8
Denny,Lowe,M,99999999,80,260,3.25
Benny,Lowe,M,77777777,120,90,0.75
Jenny,Lowe,F,33333333,50,90,1.8
Zoe,Coe,F,0,50,130,2.6
Here are the keys to sort it(there could be more or less, depending on how you want to sort it)
gender,ascend,string
gpa,descend,float
last_nm,ascend,string
And here would be the output for that input and keys:
first_nm,last_nm,gender,cwid,cred_hrs,qual_pts,gpa
Jane,Doe,F,88888888,80,280,3.5
Penny,Lowe,F,55555555,40,140,3.5
Zoe,Coe,F,00000000,50,130,2.6
Jane,Roe,F,66666666,100,260,2.6
Jenny,Lowe,F,33333333,50,90,1.8
John,Roe,M,44444444,40,150,3.75
Denny,Lowe,M,99999999,80,260,3.25
John,Doe,M,22222222,50,140,2.8
Lenny,Lowe,M,11111111,100,280,2.8
Benny,Lowe,M,77777777,120,90,0.75
I was thinking of just using the built in sort() but was not sure if I would be able to use it if I am sorting 3 different times. I think I would have to sort backwards? (last_nm, then gpa, then gender)
You can return a tuple from your key function to create complex sorts. And as a quick trick, multiply numeric values by -1 for a reverse sort. Your example would look something like this:
lists.sort(key = lambda x: (x[2], x[6] * -1, x[1]))
The list sort() method takes a boolean parameter reverse, but it applies to the whole key; you can't say that you want some parts of the key to use ascending sort and others to use descending. Sadly, there isn't a simple way to extend g.d.d.c's trick of multiplying by -1 to non-numeric data.
So if you need to handle arbitrary combinations of ascending and descending then yes, you will have to sort multiple times, working backwards over your list of keys, like you mention in your question. The built-in Python sorting algorithm, timsort, is a stable sort, which means each time you sort your 2D list with a different key the previous sort results won't get scrambled.

Python sorting with "key" function insufficiencies

On one hand it is easy to see given a key function, one can easily implement a sort that does the same thing using a compare function. The reduction is as follows:
def compare(x,y):
return key(x) - key(y)
On the other, how do we know for sure we are not losing potential sortings by restricting every kinds of sort by a map of elements using key? For instance, suppose I want to sort a list of length 2 tuples (x,y) where I insist the following compare method:
def compare(tup1,tup2):
if (tup1[1] < tup2[0]):
return -1
if (tup1[0] % 2 == 0):
return 1
if (tup1[0] - tup2[1] < 4):
return 0
else:
return 1
Now tell me how do I translate this compare into a corresponding "key" function such that my sorting algorithm proceed the same way? This is not a contrived example as these kinds of customised sorting show up in symmetry breaking algorithms during a search, and is very important.
Use functools.cmp_to_key, this will guarantee the same sorting behavior as your compare function. The source for this function can be found on Python's Sorting How To document.

Python .sort() not working as expected

Tackling a few puzzle problems on a quiet Saturday night (wooohoo... not) and am struggling with sort(). The results aren't quite what I expect. The program iterates through every combination from 100 - 999 and checks if the product is a palindome. If it is, append to the list. I need the list sorted :D Here's my program:
list = [] #list of numbers
for x in xrange(100,1000): #loops for first value of combination
for y in xrange(x,1000): #and 2nd value
mult = x*y
reversed = str(mult)[::-1] #reverses the number
if (reversed == str(mult)):
list.append(reversed)
list.sort()
print list[:10]
which nets:
['101101', '10201', '102201', '102201', '105501', '105501', '106601', '108801',
'108801', '110011']
Clearly index 0 is larger then 1. Any idea what's going on? I have a feeling it's got something to do with trailing/leading zeroes, but I had a quick look and I can't see the problem.
Bonus points if you know where the puzzle comes from :P
You are sorting strings, not numbers. '101101' < '10201' because '1' < '2'. Change list.append(reversed) to list.append(int(reversed)) and it will work (or use a different sorting function).
Sort is doing its job. If you intended to store integers in the list, take Lukáš advice. You can also tell sort how to sort, for example by making ints:
list.sort(key=int)
the key parameter takes a function that calculates an item to take the list object's place in all comparisons. An integer will compare numerically as you expect.
(By the way, list is a really bad variable name, as you override the builtin list() type!)
Your list contains strings so it is sorting them alphabetically - try converting the list to integers and then do the sort.
You're sorting strings, not numbers. Strings compare left-to-right.
No need to convert to int. mult already is an int and as you have checked it is a palindrome it will look the same as reversed, so just:
list.append(mult)
You have your numbers stored as strings, so python is sorting them accordingly. So: '101x' comes before '102x' (the same way that 'abcd' will come before 'az').
No, it is sorting properly, just that it is sorting lexographically and you want numeric sorting... so remove the "str()"
The comparator operator is treating your input as strings instead of integers. In string comparsion 2 as the 3rd letter is lexically greater than 1.
reversed = str(mult)[::-1]

Categories