custom sort function in python 3 [duplicate] - python

In Python 2.x, I could pass custom function to sorted and .sort functions
>>> x=['kar','htar','har','ar']
>>>
>>> sorted(x)
['ar', 'har', 'htar', 'kar']
>>>
>>> sorted(x,cmp=customsort)
['kar', 'htar', 'har', 'ar']
Because, in My language, consonents are comes with this order
"k","kh",....,"ht",..."h",...,"a"
But In Python 3.x, looks like I could not pass cmp keyword
>>> sorted(x,cmp=customsort)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'cmp' is an invalid keyword argument for this function
Is there any alternatives or should I write my own sorted function too?
Note: I simplified by using "k", "kh", etc. Actual characters are Unicodes and even more complicated, sometimes there is vowels comes before and after consonents, I've done custom comparison function, So that part is ok. Only the problem is I could not pass my custom comparison function to sorted or .sort

Use the key keyword and functools.cmp_to_key to transform your comparison function:
sorted(x, key=functools.cmp_to_key(customsort))

Use the key argument (and follow the recipe on how to convert your old cmp function to a key function).
functools has a function cmp_to_key mentioned at docs.python.org/3.6/library/functools.html#functools.cmp_to_key

A complete python3 cmp_to_key lambda example:
from functools import cmp_to_key
nums = [28, 50, 17, 12, 121]
nums.sort(key=cmp_to_key(lambda x, y: 1 if str(x)+str(y) < str(y)+str(x) else -1))
compare to common object sorting:
class NumStr:
def __init__(self, v):
self.v = v
def __lt__(self, other):
return self.v + other.v < other.v + self.v
A = [NumStr("12"), NumStr("121")]
A.sort()
print(A[0].v, A[1].v)
A = [obj.v for obj in A]
print(A)

Instead of a customsort(), you need a function that translates each word into something that Python already knows how to sort. For example, you could translate each word into a list of numbers where each number represents where each letter occurs in your alphabet. Something like this:
my_alphabet = ['a', 'b', 'c']
def custom_key(word):
numbers = []
for letter in word:
numbers.append(my_alphabet.index(letter))
return numbers
x=['cbaba', 'ababa', 'bbaa']
x.sort(key=custom_key)
Since your language includes multi-character letters, your custom_key function will obviously need to be more complicated. That should give you the general idea though.

I don't know if this will help, but you may check out the locale module. It looks like you can set the locale to your language and use locale.strcoll to compare strings using your language's sorting rules.

Use the key argument instead. It takes a function that takes the value being processed and returns a single value giving the key to use to sort by.
sorted(x, key=somekeyfunc)

Related

What does single(not double) asterisk * means when unpacking dictionary in Python?

Can anyone explain the difference when unpacking the dictionary using single or double asterisk? You can mention their difference when used in function parameters, only if it is relevant here, which I don't think so.
However, there may be some relevance, because they share the same asterisk syntax.
def foo(a,b)
return a+b
tmp = {1:2,3:4}
foo(*tmp) #you get 4
foo(**tmp) #typeError: keyword should be string. Why it bothers to check the type of keyword?
Besides, why the key of dictionary is not allowed to be non-string when passed as function arguments in THIS situation? Are there any exceptions? Why they design Python in this way, is it because the compiler can't deduce the types in here or something?
When dictionaries are iterated as lists the iteration takes the keys of it, for example
for key in tmp:
print(key)
is the same as
for key in tmp.keys():
print(key)
in this case, unpacking as *tmp is equivalent to *tmp.keys(), ignoring the values. If you want to use the values you can use *tmp.values().
Double asterisk is used for when you define a function with keyword parameters such as
def foo(a, b):
or
def foo(**kwargs):
here you can store the parameters in a dictionary and pass it as **tmp. In the first case keys must be strings with the names of the parameter defined in the function firm. And in the second case you can work with kwargs as a dictionary inside the function.
def foo(a,b)
return a+b
tmp = {1:2,3:4}
foo(*tmp) #you get 4
foo(**tmp)
In this case:
foo(*tmp) mean foo(1, 3)
foo(**tmp) mean foo(1=2, 3=4), which will raise an error since 1 can't be an argument. Arg must be strings and (thanks # Alexander Reynolds for pointing this out) must start with underscore or alphabetical character. An argument must be a valid Python identifier. This mean you can't even do something like this:
def foo(1=2, 3=4):
<your code>
or
def foo('1'=2, '3'=4):
<your code>
See python_basic_syntax for more details.
It is a Extended Iterable Unpacking.
>>> def add(a=0, b=0):
... return a + b
...
>>> d = {'a': 2, 'b': 3}
>>> add(**d)#corresponding to add(a=2,b=3)
5
For single *,
def add(a=0, b=0):
... return a + b
...
>>> d = {'a': 2, 'b': 3}
>>> add(*d)#corresponding to add(a='a',b='b')
ab
Learn more here.
I think the ** double asterisk in function parameter and unpacking dictionary means intuitively in this way:
#suppose you have this function
def foo(a,**b):
print(a)
for x in b:
print(x,"...",b[x])
#suppose you call this function in the following form
foo(whatever,m=1,n=2)
#the m=1 syntax actually means assign parameter by name, like foo(a = whatever, m = 1, n = 2)
#so you can also do foo(whatever,**{"m":1,"n":2})
#the reason for this syntax is you actually do
**b is m=1,n=2 #something like pattern matching mechanism
so b is {"m":1,"n":2}, note "m" and "n" are now in string form
#the function is actually this:
def foo(a,**b): # b = {"m":1,"n":2}
print(a)
for x in b: #for x in b.keys(), thanks to #vlizana answer
print(x,"...",b[x])
All the syntax make sense now. And it is the same for single asterisk. It is only worth noting that if you use single asterisk to unpack dictionary, you are actually trying to unpack it in a list way, and only key of dictionary are unpacked.
[https://docs.python.org/3/reference/expressions.html#calls]
A consequence of this is that although the *expression syntax may appear after explicit keyword arguments, it is processed before the keyword arguments (and any **expression arguments – see below). So:
def f(a, b):
print(a, b)
f(b=1, *(2,))
f(a=1, *(2,))
#Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
#TypeError: f() got multiple values for keyword argument 'a'
f(1, *(2,))

Converting list to dict python

Given a list:
source = [{'A':123},{'B':234},{'C':345}]
I need a dict from this list in the format:
newDict = {'A':123,'B':234,'C':345}
What is the syntactically cleanest way to accomplish this?
Use a dict-comprehension:
>>> l = [{'A':123},{'B':234},{'C':345}]
>>> d = {k: v for dct in l for k, v in dct.items()}
>>> d
{'A': 123, 'B': 234, 'C': 345}
However it's probably opinion-based if that's the "syntactically cleanest way" but I like it.
Here's an additional approach, provided here to give you a flavor for how Python implements the functional programming technique called reduction, via the reduce() function. In Python 3, reduce() is in the functools package. In Python 2, reduce() is a built-in function. I use Python 3 in the example below:
from functools import reduce # don't import if you are using Python 2
def updater(dict_orig, dict_add):
dict_orig.update(dict_add)
return dict_orig
new_dict = reduce(updater, l, dict())
The first argument to reduce() is the function to operate on the iterable, the second is the iterable itself (your list l), and the third is the optional initializer object to put at the beginning of the list to reduce.
Each step of the reduction requires an object to be operated on: namely, the result of the previous step. But dict.update() does not return anything, so we need the updater() function above, which performs the update and then returns the dict being updated, thus providing the required object for the next step. Were it not for dict.update() not having a return value, this would all be a one-liner.
Because dict.update() operates directly on the original dict, we need that optional empty dict() initializer object to start out the reduction - without it, the first dict in your original l list would be modified.
For all these reasons, I like #MSeifert's dict-comprehension approach much better, but I posted this anyway just to illustrate Python reduction for you.
If you use it often, you might want to define a merge function, which you can then pass to reduce :
from functools import reduce # Needed for Python3
source = [{'A':123},{'B':234},{'C':345}]
def merge(a,b):
d = a.copy()
d.update(b)
return d
print(reduce(merge, source))
#=> {'A': 123, 'C': 345, 'B': 234}

TypeError: 'cmp' is an invalid keyword argument for this function

I'm using Python3, but the script is not compatible with this version and I hit some errors. Now I have problem with cmp parameter. Here is the code
def my_cmp(x,y):
counter = lambda x, items: reduce(lambda a,b:a+b, [list(x).count(xx) for xx in items])
tmp = cmp(counter(x, [2,3,4,5]), counter(y, [2,3,4,5]))
return tmp if tmp!=0 else cmp(len(x),len(y))
for i, t in enumerate([tmp[0] for tmp in sorted(zip(tracks, self.mapping[idx][track_selection[-1]].iloc[0]), cmp=my_cmp, key=lambda x:x[1])]):
img[i,:len(t)] = t
I would really appreciate any help how to deal with this error in Python3.
from python documentation
In Python 2.7, the functools.cmp_to_key() function was added to the
functools module.
The function available in python 3 too.
Just wrap your cmp function with cmp_to_key
from functools import cmp_to_key
...
...key=cmp_to_key(my_cmp)...
You should try to rewrite your cmp function to a key function instead. In this case it looks like you can simply return the counter() function output for just one element:
def my_key(elem):
counter = lambda x, items: sum(list(x).count(xx) for xx in items)
return counter(elem, [2, 3, 4, 5]), len(elem)
I took the liberty of replacing the reduce(...) code with the sum() function, a far more compact and readable method to sum a series of integers.
The above too will first sort by the output of counter(), and by the length of each sorted element in case of a tie.
The counter function is hugely inefficient however; I'd use a Counter() class here instead:
from collections import Counter
def my_key(elem):
counter = lambda x, items: sum(Counter(i for i in x if i in items).values())
return counter(elem, {2, 3, 4, 5}), len(elem)
This function will work in both Python 2 and 3:
sorted(zip(tracks, self.mapping[idx][track_selection[-1]].iloc[0]),
key=lambda x: my_key(x[1]))
If you cannot, you can use the cmp_to_key() utility function to adapt your cmp argument, but take into account this is not an ideal solution (it affects performance).

How to use python generator expressions to create a oneliner to run a function multiple times and get a list output

I am wondering if there is there is a simple Pythonic way (maybe using generators) to run a function over each item in a list and result in a list of returns?
Example:
def square_it(x):
return x*x
x_set = [0,1,2,3,4]
squared_set = square_it(x for x in x_set)
I notice that when I do a line by line debug on this, the object that gets passed into the function is a generator.
Because of this, I get an error:
TypeError: unsupported operand type(s) for *: 'generator' and 'generator'
I understand that this generator expression created a generator to be passed into the function, but I am wondering if there is a cool way to accomplish running the function multiple times only by specifying an iterable as the argument? (without modifying the function to expect an iterable).
It seems to me that this ability would be really useful to cut down on lines of code because you would not need to create a loop to fun the function and a variable to save the output in a list.
Thanks!
You want a list comprehension:
squared_set = [square_it(x) for x in x_set]
There's a builtin function, map(), for this common problem.
>>> map(square_it, x_set)
[0,1,4,9,16] # On Python 3, a generator is returned.
Alternatively, one can use a generator expression, which is memory-efficient but lazy (meaning the values will not be computed now, only when needed):
>>> (square_it(x) for x in x_set)
<generator object <genexpr> at ...>
Similarly, one can also use a list comprehension, which computes all the values upon creation, returning a list.
Additionally, here's a comparison of generator expressions and list comprehensions.
You want to call the square_it function inside the generator, not on the generator.
squared_set = (square_it(x) for x in x_set)
As the other answers have suggested, I think it is best (most "pythonic") to call your function explicitly on each element, using a list or generator comprehension.
To actually answer the question though, you can wrap your function that operates over scalers with a function that sniffs the input, and has different behavior depending on what it sees. For example:
>>> import types
>>> def scaler_over_generator(f):
... def wrapper(x):
... if isinstance(x, types.GeneratorType):
... return [f(i) for i in x]
... return f(x)
... return wrapper
>>> def square_it(x):
... return x * x
>>> square_it_maybe_over = scaler_over_generator(square_it)
>>> square_it_maybe_over(10)
100
>>> square_it_maybe_over(x for x in range(5))
[0, 1, 4, 9, 16]
I wouldn't use this idiom in my code, but it is possible to do.
You could also code it up with a decorator, like so:
>>> #scaler_over_generator
... def square_it(x):
... return x * x
>>> square_it(x for x in range(5))
[0, 1, 4, 9, 16]
If you didn't want/need a handle to the original function.
Note that there is a difference between list comprehension returning a list
squared_set = [square_it(x) for x in x_set]
and returning a generator that you can iterate over it:
squared_set = (square_it(x) for x in x_set)

Python, how to make a function which takes a function as an argument along with two arrays?

For learning purposes, I'm trying to make a function using Python that takes in another function and two arrays as parameters and calls the function parameter on each index of each array parameter. So this should call add on a1[0] & a2[0], a1[1] & a2[1], etc. But all I'm getting back is a generator object. What's wrong?
def add(a,b):
yield a + b
def generator(add,a1,a2):
for i in range(len(a1)):
yield add(a1[i],a2[i])
g = generator(add,a1,a2)
print g.next()
I've also tried replacing what I have for yield above with
yield map(add,a1[i],a2[i])
But that works even less. I get this:
TypeError: argument 2 to map() must support iteration
Your definition of add() is at least strange (I'm leaning twoards calling it "wrong"). You should return the result, not yield it:
def add(a, b):
return a + b
Now, your generator() will work, though
map(add, a1, a2)
is an easier and faster way to do (almost) the same thing. (If you want an iterator rather than a list, use itertools.imap() instead of map().)
You get a generator because your add is a generator. It should be just return a + b.
I'm trying to make a function using Python that takes in another function and two arrays as parameters and calls the function parameter on each index of each array parameter.
def my_function(func, array_1, array_2):
for e_1,e_2 in zip(array_1, array_2):
yield func(e_1,e_2)
Example:
def add(a, b):
return a + b
for result in my_function(add, [1, 2, 3], [9, 8, 7]):
print(result)
will print:
10
10
10
Now, a couple of notes:
The add function can be found in the operator module.
You see that I used zip, take a look at its the doc.
Even though what you actually need is izip() the generator expression under zip() which basically doesn't return a list but an iterator to each value.
my_function is almost like map(), the only difference is that my_function is a generator while map() gives you a list. Once again the stdlib gives you the generator version of map in the itertools module: imap()
Example, my_fuction is just like imap:
from operator import add
from itertools import imap
for result in imap(add, [1, 2, 3], [9, 8, 7]):
print(result)
#10
#10
#10
I obviously suppose that the add function was just a quick example, otherwise check the built-in sum.
As others have said, you are defining add incorrectly and it should return instead of yield. Also, you could import it:
from operator import add
The reason why this doesn't work:
yield map(add, a1[i], a2[i])
Is because map works on lists/iterables and not single values. If add were defined correctly this could work:
yield map(add, [a[i]], [a2[i]])
But you shouldn't actually do that because it's more complicated than it needs to be for no good reason (as Sven Marnach's answer shows, your generator function is just an attempt to implement map so it really shouldn't use map even if it is a learning exercise). Finally, if the point is to make a function that takes a function as a parameter, I wouldn't call the parameter "add"; otherwise, what's the point of making it at all?
def generator(f, a1, a2):
for x, y in zip(a1, a2):
yield f(x, y)
Speaking of which, take a look at zip.

Categories