How do I implement a Schwartzian Transform in Python? - python

In Perl I sometimes use the Schwartzian Transform to efficiently sort complex arrays:
#sorted = map { $_->[0] } # sort by word length
sort { $a->[1] <=> $b->[1] } # use numeric comparison
map { [$_, length($_)] } # calculate the length of the string
#unsorted;
How to implement this transform in Python?

You don't need to. Python has this feature built in, and in fact Python 3 removed C-style custom comparisons because this is so much better in the vast majority of cases.
To sort by word length:
unsorted.sort(key=lambda item: len(item))
Or, because len is already an unary function:
unsorted.sort(key=len)
This also works with the built-in sorted function.
If you want to sort on multiple criteria, you can take advantage of the fact that tuples sort lexicographically:
# sort by word length, then alphabetically in case of a tie
unsorted.sort(key=lambda item: (len(item), item)))

While there should normally be no reason not to use the key argument for the sorted function or list.sort method, you can of course do without it, by creating a list of pairs (called tmp below) where the first item is the sort key and the second item is the original item.
Due to lexicographical sorting, sorting this list will sort by the key first. Then you can take the items in the desired order from the sorted tmp list of pairs.
example = ["hello", "spam", "foo"]
tmp = []
for item in example:
sort_key = len(item)
tmp.append((sort_key, item))
# tmp: [(5, "hello"), (4, "spam"), (3, "foo")]
tmp.sort()
# tmp: [(3, "foo"), (4, "spam"), (5, "hello")]
result = []
for _, item in tmp:
result.append(item)
# result: ["foo", "spam", "hello"]
Note that usually this would be written with list comprehensions instead of calling .append in a loop, but the purpose of this answer is to illustrate the underlying algorithm in a way most likely to be understood by beginners.

Related

What's the best way to sort a list while also keeping its original index?

l = ['0.40165794', '0.43157488', '0.65739065', '0.5142521']
I want to transform the list into the following form:
l = [('0.65739065', 2), ('0.5142521', 3), ('0.43157488', 1), ('0.40165794', 0)]
I can first create a dict to store the value-index pair, then sort l in descending order, then for each sorted element, look up the dict to the the index, then compose the list of tuples.
This seems quite complicated. Is there a better way to do that?
One way to do this is with enumerate() to generate the indexes and then use key to specify how to sort:
s = sorted(enumerate(l), key=lambda x: x[1])
print(s)
This will sort in ascending order. Descending order is left as an exercise for the reader. As is reversing the order of the (index, value) pairs if desired.

Assistance with Python 'sort(key=None)'

I'm having a hard time understanding why my function is not returning the reversed version of my list. I've spent a long time trying to understand why and i hit a wall: ---it only returns my list in ascending order.
letters = 'abcdefghijk'
numbers = '123456'
dict1 = {}
def reverseOrder(listing):
lst2 = []
lst2.append(listing)
lst2.sort(reverse=True)
return lst2
for l, n in zip(letters, numbers):
dict1.update({l:n})
lst1 = list(dict1) + list(dict1.values())
lst1.sort(key=reverseOrder)
print(lst1)
The key function passed to list.sort has a very specific purpose:
key specifies a function of one argument that is used to extract a comparison key from each list element (for example, key=str.lower). The key corresponding to each item in the list is calculated once and then used for the entire sorting process. The default value of None means that list items are sorted directly without calculating a separate key value.
So the function is supposed to take in a single list element, and then return a key that determines its sorting compared to the other elements.
For example, if you wanted to sort a list by the length of their contents, you could do it like this:
def lengthOfItem (item):
return len(item)
lst.sort(key=lengthOfItem)
Since the function only takes a single item, it makes it unsuitable for sorting behaviors where you actually need to compare two elements in order to make a relation. But those sortings are very inefficient, so you should avoid them.
In your case, it seems like you want to reverse your list. In that case you can just use list.reverse().
You are using sort function in an invalid way.
Here is the definition of sort function (from builtins.py):
def sort(self, key=None, reverse=False): # real signature unknown; restored from __doc__
""" L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE* """
pass
key argument has to be used if there is 'ambiguity' on how items have to be sorted e.g. items are tuples, dictionaries, etc.
Example:
lst = [(1, 2), (2, 1)]
lst.sort(key=lambda x: x[0]) # lst = [(1, 2), (2, 1)]
lst.sort(key=lambda x: x[1]) # lst = [(2, 1), (1, 2)]
Not quite sure what you want with this part though:
for l, n in zip(letters, numbers):
dict1.update({l:n})
lst1 = list(dict1) + list(dict1.values())
Seems like you want a list of all numbers and letters but you are doing it an odd way.
Edit: I have updated answer.

How can I compare two lists in python and return matches by email

I want to compare email into both list and put into new list,
a = [('abc#gmail.com',5),('xyz#gmail.com',6),('pqr#gmail.com',8)]
b = [('ABC','abc#gmail.com'),('XYZ','xyz#gmail.com'),('PQR','pqr#gmail.com')]
would return [('ABC',5),('XYZ',6),('PQR',8)], for instance.
Lookup in a list for each item if not already ordered is O(n) complexity and is not an ideal data structure for the process
It would be beneficial if you would convert the list you would be using for lookup converted to a dictionary
d_a = dict(a)
subsequent to which the lookup is both efficient and elegant
>>> [(key, d_a[value]) for key, value in b if value in d_a]
[('ABC', 5), ('XYZ', 6), ('PQR', 8)]
You should also take into consideration for negative case when the lookup key may not match or is present in the lookup list
Sorting both lists and using list comprehension:
a = [('abc#gmail.com',5),('xyz#gmail.com',6),('pqr#gmail.com',8)]
b = [('ABC','abc#gmail.com'),('XYZ','xyz#gmail.com'),('PQR','pqr#gmail.com')]
result = [(y[0],x[1]) for x,y in zip(sorted(a,key=lambda s:s[0])), sorted(b,key=lambda s:s[1])) if x[0]==y[1]]
list a is sorted based on first element(s[0]) of each tuple.
list b is sorted based on second element(s[1]) of each tuple.

Python build one dictionary from a list of keys, and a list of lists of values

So I have a list of keys:
keys = ['id','name', 'date', 'size', 'actions']
and I also have a list of lists of vales:
values=
[
['1','John','23-04-2015','0','action1'],
['2','Jane','23-04-2015','1','action2']
]
How can I build a dictionary with those keys matched to the values?
The output should be:
{
'id':['1','2'],
'name':['John','Jane'],
'date':['23-04-2015','23-04-2015'],
'size':['0','1'],
'actions':['action1','action2']
}
EDIT:
I tried to use zip() and dict(), but that would only work if the list of values had 1 list, i.e. values = [['1','John','23-04-2015','0','action1']]
for list in values:
dic = dict(zip(keys,list))
I also thought about initialising a dic with the keys, then building the list of values on my own, but I felt that there had to be an easier way to do it.
dic = dict.fromkeys(keys)
for list in values:
ids = list[0]
names = list[1]
dates = list[2]
sizes = list[3]
actions = list[4]
and then finally
dic['id'] = ids
dic['name'] = names
dic['date'] = dates
dic['size'] = sizes
dic['action'] = actions
This seemed really silly and I was wondering what a better way of doing it would be.
>>> keys = ['id','name', 'date', 'size', 'actions']
>>> values = [['1','John','23-04-2015','0','action1'], ['2','Jane','23-04-2015','1','action2']]
>>> c = {x:list(y) for x,y in zip(keys, zip(*values))}
>>> c
{'id': ['1', '2'], 'size': ['0', '1'], 'actions': ['action1', 'action2'], 'date': ['23-04-2015', '23-04-2015'], 'name': ['John', 'Jane']}
>>> print(*(': '.join([item, ', '.join(c.get(item))]) for item in sorted(c, key=lambda x: keys.index(x))), sep='\n')
id: 1, 2
name: John, Jane
date: 23-04-2015, 23-04-2015
size: 0, 1
actions: action1, action2
This uses several tools:
c is created with a dictionary comprehension. Comprehensions are a different way of expressing an iterable like a dictionary or a list. Instead of initializing an empty iterable and then using a loop to add elements to it, a comprehension moves these syntactical structures around.
result = [2*num for num in range(10) if num%2]
is equivalent to
result = []
for num in range(10):
if num%2: # shorthand for "if num%2 results in non-zero", or "if num is not divisible by 2"
result.append(2*num)
and we get [2, 6, 10, 14, 18].
zip() creates a generator of tuples, where each element of each tuple is the corresponding element of one of the arguments you passed to zip().
>>> list(zip(['a','b'], ['c','d']))
[('a', 'c'), ('b', 'd')]
zip() takes multiple arguments - if you pass it one large list containing smaller sublists, the result is different:
>>> list(zip([['a','b'], ['c','d']]))
[(['a', 'b'],), (['c', 'd'],)]
and generally not what we want. However, our values list is just such a list: a large list containing sublists. We want to zip() those sublists. This is a great time to use the * operator.
The * operator represents an "unpacked" iterable.
>>> print(*[1,2,3])
1 2 3
>>> print(1, 2, 3)
1 2 3
It is also used in function definitions:
>>> def func(*args):
... return args
...
>>> func('a', 'b', [])
('a', 'b', [])
So, to create the dictionary, we zip() the lists of values together, then zip() that with the keys. Then we iterate through each of those tuples and create a dictionary out of them, with each tuple's first item being the key and the second item being the value (cast as a list instead of a tuple).
To print this, we could make a large looping structure, or we can make generators (quicker to assemble and process than full data structures like a list) and iterate through them, making heavy use of * to unpack things. Remember, in Python 3, print can accept multiple arguments, as seen above.
We will first sort the dictionary, using each element's position in keys as the key. If we use something like key=len, that sends each element to the len() function and uses the returned length as the key. We use lambda to define an inline, unnamed function, that takes an argument x and returns x's index in the list of keys. Note that the dictionary isn't actually sorted; we're just setting it up so we can iterate through it according to a sort order.
Then we can go through this sorted dictionary and assemble its elements into printable strings. At the top level, we join() a key with its value separated by ': '. Each value has its elements join()ed with ', '. Note that if the elements weren't strings, we would have to turn them into strings for join() to work.
>>> list(map(str, [1,2,3]))
['1', '2', '3']
>>> print(*map(str, [1,2,3]))
1 2 3
The generator that yields each of these join()ed lines is then unpacked with the * operator, and each element is sent as an argument to print(), specifying a separator of '\n' (new line) instead of the default ' ' (space).
It's perfectly fine to use loops instead of comprehensions and *, and then rearrange them into such structures after your logic is functional, if you want. It's not particularly necessary most of the time. Comprehensions sometimes execute slightly faster than equivalent loops, and with practice you may come to prefer the syntax of comprehensions. Do learn the * operator, though - it's an enormously versatile tool for defining functions. Also look into ** (often referred to with "double star" or "kwargs"), which unpacks dictionaries into keyword arguments and can also be used to define functions.

How does the enumerate function work?

I am supposed to do the following:
Define a function my_enumerate(items) that behaves in a similar way to the built-in enumerate function. It should return a list of pairs (i, item) where item is the ith item, with 0 origin, of the list items (see the examples below). Check the test cases for how the function should work. Your function must not call Python's in-built enumerate function.
Examples:
Input:
ans = my_enumerate([10, 20, 30])
print(ans)
Output:
[(0, 10), (1, 20), (2, 30)]
What does enumerate do? Try expressing it in English, and it may help you understand how to write the necessary code. If it doesn't then the practice of learning English language descriptions into code will be useful.
One way of describing enumerate is to say it iterates over each item in the list, and for each item in the input list it produces a pair of the item's index in the input list and the item.
So we know we need to iterate over the list:
for item in input_list:
pass
And we need to keep track of the index of the current item.:
index = 0
for item in input_list:
index += 1
Hmm, there's a better way of doing that:
for index in range(len(input_list)):
pass
Now to produce the pairs:
for index in range(len(input_list)):
pair = index, input_list[index]
You also need somewhere to store these pairs:
def my_enumerate(input_list):
output_list = []
for index in range(len(input_list)):
pair = index, input_list[index]
output_list.append(pair)
return output_list
Are there other ways to write code that produces the same output? Yes. Is this the best way to write this function? Not by a long shot. What this exercise should help you with is turning your thoughts into code, as you gain more experience doing that then you can combine multiple steps at a time, and start using more complicated programming concepts.
Use itertools.count and zip:
from itertools import count
def my_enumerate(values):
return list(zip(count(), values))

Categories