Pythonic way of sorting a list

Pythonic way of sorting a list - python

For example i would have a list of of
lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
and I would need to sort it so based on the number,
lists.sort() = ['ben 10', 'jack 20', 'ollie 35', 'alisdar 50']
Possible somehow use formatting with split()?

Use a key function:
lists.sort(key=lambda s: int(s.rsplit(None, 1)[-1]))
The key callable is passed each and every element in lists and that element is sorted according to the return value. In this case we
split once on whitespace, starting on the right
take the last element of the split
turn that into an integer
The argument to key can be any callable, but a lambda is just more compact. You can try it out in the command prompt:
>>> key_function = lambda s: int(s.rsplit(None, 1)[-1])
>>> key_function('ben 10')
10
>>> key_function('Guido van Rossum 42')
42
In effect, when sorting the values are augmented with the return value of that function, and what is sorted is:
[(20, 0, 'jack 20'), (10, 1, 'ben 10'), (50, 2, 'alisdar 50'), (35, 3, 'ollie 35')]
instead (with the second value, the index of the element, added to keep the original order in case of equal sort keys).
Result:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> lists.sort(key=lambda s: int(s.rsplit(None, 1)[-1]))
>>> lists
['ben 10', 'jack 20', 'ollie 35', 'alisdar 50']

Use a key function that does what you want:
lists.sort(key=lambda e: int(e.split()[1]))
If some of your items don't follow that format, you'll have to write something a little more elaborate.

It would be better if you had a more appropriate data type than a string to represent, say, a person's name and age. One way would be a dictionary:
lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
d = dict(item.split(' ') for item in lists)
This constructs a dictionary from a stream of two-element lists.
Then you can sort like this:
print sorted((v, k) for k, v in d.iteritems())
and get this:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> d = dict(item.split(' ') for item in lists)
>>> print sorted((v, k) for k, v in d.iteritems())
[('10', 'ben'), ('20', 'jack'), ('35', 'ollie'), ('50', 'alisdar')]
Or you could convert age to integer:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> person_iter = (item.split(' ') for item in lists)
>>> d = {k: int(v) for k, v in person_iter}
>>> print sorted((v, k) for k, v in d.iteritems())
[(10, 'ben'), (20, 'jack'), (35, 'ollie'), (50, 'alisdar')]
person_iter is a generator that produces pairs of name-age. You feed that to the dictionary comprehension and convert the second argument to an integer.
The basic idea, though, is that you will have an easier time if you use more precise data types for your purposes.

Related

How to sort a dictionary by value

I am trying to sort a dictionary by value, which is a timestamp in the format H:MM:SS (eg "0:41:42") but the code below doesn't work as expected:
album_len = {
'The Piper At The Gates Of Dawn': '0:41:50',
'A Saucerful of Secrets': '0:39:23',
'More': '0:44:53', 'Division Bell': '1:05:52',
'The Wall': '1:17:46',
'Dark side of the moon': '0:45:18',
'Wish you were here': '0:44:17',
'Animals': '0:41:42'
}
album_len = OrderedDict(sorted(album_len.items()))
This is the output I get:
OrderedDict([
('A Saucerful of Secrets', '0:39:23'),
('Animals', '0:41:42'),
('Dark side of the moon', '0:45:18'),
('Division Bell', '1:05:52'),
('More', '0:44:53'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('The Wall', '1:17:46'),
('Wish you were here', '0:44:17')])
It's not supposed to be like that. The first element I expected to see is ('The Wall', '1:17:46'), the longest one.
How do I get the elements sorted the way I intended?

Try converting each value to a datetime and using that as the key:
from collections import OrderedDict
from datetime import datetime
def convert_to_datetime(val):
return datetime.strptime(val, "%H:%M:%S")
album_len = {'The Piper At The Gates Of Dawn': '0:41:50',
'A Saucerful of Secrets': '0:39:23', 'More': '0:44:53',
'Division Bell': '1:05:52', 'The Wall': '1:17:46',
'Dark side of the moon': '0:45:18',
'Wish you were here': '0:44:17', 'Animals': '0:41:42'}
album_len = OrderedDict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]))
)
print(album_len)
Output:
OrderedDict([('A Saucerful of Secrets', '0:39:23'), ('Animals', '0:41:42'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('Wish you were here', '0:44:17'), ('More', '0:44:53'),
('Dark side of the moon', '0:45:18'), ('Division Bell', '1:05:52'),
('The Wall', '1:17:46')])
Or in descending order with reverse set to True:
album_len = OrderedDict(
sorted(
album_len.items(),
key=lambda i: convert_to_datetime(i[1]),
reverse=True
)
)
Output:
OrderedDict([('The Wall', '1:17:46'), ('Division Bell', '1:05:52'),
('Dark side of the moon', '0:45:18'), ('More', '0:44:53'),
('Wish you were here', '0:44:17'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('Animals', '0:41:42'), ('A Saucerful of Secrets', '0:39:23')])
Edit: If only insertion order needs maintained and the OrderedDict specific functions like move_to_end are not going to be used then a regular python dict also works here for Python3.7+.
Ascending:
album_len = dict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]))
)
Descending:
album_len = dict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]),
reverse=True)
)

This is a duplicate of the question: How do I sort a dictionary by value?"
>>> dict(sorted(album_len.items(), key=lambda item: item[1]))
{'A Saucerful of Secrets': '0:39:23',
'Animals': '0:41:42',
'The Piper At The Gates Of Dawn': '0:41:50',
'Wish you were here': '0:44:17',
'More': '0:44:53',
'Dark side of the moon': '0:45:18',
'Division Bell': '1:05:52',
'The Wall': '1:17:46'}
Note: the time format is already lexicographically ordered, you don't need to convert to datetime.
See comment below of #DarrylG. He's totally right, therefore, the remark on the lexicographic order is valid as long as the duration does not exceed 9:59:59 except if hours are padded with a leading zero.

Adding values from one list to another when they share value

I'm trying to add values from List2 if the type is the same in List1. All the data is strings within lists. This isn't the exact data I'm using, just a representation. This is my first programme so please excuse any misunderstandings.
List1 = [['Type A =', 'Value 1', 'Value 2', 'Value 3'], ['Type B =', 'Value 4', 'Value 5']]
List2 = [['Type Z =', 'Value 6', 'Value 7', 'Value 8'], ['Type A =', 'Value 9', 'Value 10', 'Value 11'], ['Type A =', 'Value 12', 'Value 13']]
Desired result:
new_list =[['Type A =', 'Value 1', 'Value 2', 'Value 3', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 13'], ['Type B =', 'Value 4', 'Value 5']]
Current attempt:
newlist = []
for values in List1:
for valuestoadd in List2:
if values[0] == valuestoadd[0]:
newlist = [List1 + [valuestoadd[1:]]]
else:
print("Types don't match")
return newlist
This works for me if there weren't two Type A's in List2 as this causes my code to create two instances of List1. If I was able to add the values at a specific index of the list then that would be great but I can work around that.

It's probably easier to use a dictionary for this:
def merge(d1, d2):
return {k: v + d2[k] if k in d2 else v for k, v in d1.items()}
d1 = {'A': [1, 2, 3], 'B': [4, 5, 6]}
d2 = {'A': [7, 8, 9], 'C': [0]}
print(merge(d1, d2))
If you must use a list, it's fairly easy to temporarily convert to a dictionary and back to a list:
from collections import defaultdict
def list_to_dict(xss):
d = defaultdict(list)
for xs in xss:
d[xs[0]].extend(xs[1:])
return d
def dict_to_list(d):
return [[k, *v] for k, v in d.items()]

Rather than using List1 + [valuestoadd[1:]], you should be using newlist[0].append(valuestoadd[1:]) so that it doesn't ever create a new list and only appends to the old one. The [0] is necessary so that it appends to the first sublist rather than the whole list.
newlist = List1 #you're doing this already - might as well initialize the new list with this code
for values in List1:
for valuestoadd in List2:
if values[0] == valuestoadd[0]:
newlist[0].append(valuestoadd[1:]) #adds the values on to the end of the first list
else:
print("Types don't match")
Output:
[['Type A =', 'Value 1', 'Value 2', 'Value 3', ['Value 9', 'Value 10', 'Value 11'], ['Value 12', 'Value 13']], ['Type B =', 'Value 4', 'Value 5']]
This does, sadly, input the values as a list - if you want to split them into individual values, you would need to iterate through the lists you're adding on, and append individual values to newlist[0].
This could be achieved with another for loop, like so:
if values[0] == valuestoadd[0]:
for subvalues in valuestoadd[1:]: #splits the list into subvalues
newlist[0].append(subvalues) #appends those subvalues
Output:
[['Type A =', 'Value 1', 'Value 2', 'Value 3', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 13'], ['Type B =', 'Value 4', 'Value 5']]

I agree with the other answers that it would be better to use a dictionary right away. But if you want, for some reason, stick to the data structure you have, you could transform it into a dictionary and back:
type_dict = {}
for tlist in List1+List2:
curr_type = tlist[0]
type_dict[curr_type] = tlist[1:] if not curr_type in type_dict else type_dict[curr_type]+tlist[1:]
new_list = [[k] + type_dict[k] for k in type_dict]
In the creation of new_list, you can take the keys from a subset of type_dict only if you do not want to include all of them.

python re.compile() and re.findall()

So I try to print only the month, and when I use :
regex = r'([a-z]+) \d+'
re.findall(regex, 'june 15')
And it prints : june
But when I try to do the same for a list like this :
regex = re.compile(r'([a-z]+) \d+')
l = ['june 15', 'march 10', 'july 4']
filter(regex.findall, l)
it prints the same list like they didn't take in count the fact that I don't want the number.

Use map instead of filter like this example:
import re
a = ['june 15', 'march 10', 'july 4']
regex = re.compile(r'([a-z]+) \d+')
# Or with a list comprehension
# output = [regex.findall(k) for k in a]
output = list(map(lambda x: regex.findall(x), a))
print(output)
Output:
[['june'], ['march'], ['july']]
Bonus:
In order to flatten the list of lists you can do:
output = [elm for k in a for elm in regex.findall(k)]
# Or:
# output = list(elm for k in map(lambda x: regex.findall(x), a) for elm in k)
print(output)
Output:
['june', 'march', 'july']

complicated list and dictionary lookup in python

I have a list of tuples and a dictionary of lists as follows.
# List of tuples
lot = [('Item 1', 43), ('Item 4', 82), ('Item 12', 33), ('Item 10', 21)]
# dict of lists
dol = {
'item_category_one': ['Item 3', 'Item 4'],
'item_category_two': ['Item 1'],
'item_category_thr': ['Item 2', 'Item 21'],
}
Now I want to do a look-up where any item in any list within dol exists in any of the tuples given in lot. If this requirement is met, then i want to add another variable to that respective tuple.
Currently I am doing this as follows (which looks incredibly inefficient and ugly). I would want to know the most efficient and neat way of achieving this. what are the possibilities ?
PS: I am also looking to preserve the order of lot while doing this.
merged = [x[0] for x in lot]
for x in dol:
for item in dol[x]:
if item in merged:
for x in lot:
if x[0] == item:
lot[lot.index(x)] += (True, )

First, build a set of all your values inside of the dol structure:
from itertools import chain
dol_values = set(chain.from_iterable(dol.itervalues()))
Now membership testing is efficient, and you can use a list comprehension:
[tup + (True,) if tup[0] in dol_values else tup for tup in lot]
Demo:
>>> from itertools import chain
>>> dol_values = set(chain.from_iterable(dol.itervalues()))
>>> dol_values
set(['Item 3', 'Item 2', 'Item 1', 'Item 21', 'Item 4'])
>>> [tup + (True,) if tup[0] in dol_values else tup for tup in lot]
[('Item 1', 43, True), ('Item 4', 82, True), ('Item 12', 33), ('Item 10', 21)]

How to sort alpha numeric set in python

I have a set
set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
After sorting I want it to look like
4 sheets,
12 sheets,
48 sheets,
booklet
Any idea please

Jeff Atwood talks about natural sort and gives an example of one way to do it in Python. Here is my variation on it:
import re
def sorted_nicely( l ):
""" Sort the given iterable in the way that humans expect."""
convert = lambda text: int(text) if text.isdigit() else text
alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
return sorted(l, key = alphanum_key)
Use like this:
s = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
for x in sorted_nicely(s):
print(x)
Output:
4 sheets
12 sheets
48 sheets
booklet
One advantage of this method is that it doesn't just work when the strings are separated by spaces. It will also work for other separators such as the period in version numbers (for example 1.9.1 comes before 1.10.0).

Short and sweet:
sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
This version:
Works in Python 2 and Python 3, because:
It does not assume you compare strings and integers (which won't work in Python 3)
It doesn't use the cmp parameter to sorted (which doesn't exist in Python 3)
Will sort on the string part if the quantities are equal
If you want printed output exactly as described in your example, then:
data = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
print ',\n'.join(r)

You should check out the third party library natsort. Its algorithm is general so it will work for most input.
>>> import natsort
>>> your_list = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
>>> print ',\n'.join(natsort.natsorted(your_list))
4 sheets,
12 sheets,
48 sheets,
booklet

A simple way is to split up the strings to numeric parts and non-numeric parts and use the python tuple sort order to sort the strings.
import re
tokenize = re.compile(r'(\d+)|(\D+)').findall
def natural_sortkey(string):
return tuple(int(num) if num else alpha for num, alpha in tokenize(string))
sorted(my_set, key=natural_sortkey)

It was suggested that I repost this answer over here since it works nicely for this case also
from itertools import groupby
def keyfunc(s):
return [int(''.join(g)) if k else ''.join(g) for k, g in groupby(s, str.isdigit)]
sorted(my_list, key=keyfunc)
Demo:
>>> my_set = {'booklet', '4 sheets', '48 sheets', '12 sheets'}
>>> sorted(my_set, key=keyfunc)
['4 sheets', '12 sheets', '48 sheets', 'booklet']
For Python3 it's necessary to modify it slightly (this version works ok in Python2 too)
def keyfunc(s):
return [int(''.join(g)) if k else ''.join(g) for k, g in groupby('\0'+s, str.isdigit)]

Generic answer to sort any numbers in any position in an array of strings. Works with Python 2 & 3.
def alphaNumOrder(string):
""" Returns all numbers on 5 digits to let sort the string with numeric order.
Ex: alphaNumOrder("a6b12.125") ==> "a00006b00012.00125"
"""
return ''.join([format(int(x), '05d') if x.isdigit()
else x for x in re.split(r'(\d+)', string)])
Sample:
s = ['a10b20','a10b1','a3','b1b1','a06b03','a6b2','a6b2c10','a6b2c5']
s.sort(key=alphaNumOrder)
s ===> ['a3', 'a6b2', 'a6b2c5', 'a6b2c10', 'a06b03', 'a10b1', 'a10b20', 'b1b1']
Part of the answer is from there

>>> a = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
>>> def ke(s):
i, sp, _ = s.partition(' ')
if i.isnumeric():
return int(i)
return float('inf')
>>> sorted(a, key=ke)
['4 sheets', '12 sheets', '48 sheets', 'booklet']

Based on SilentGhost's answer:
In [4]: a = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
In [5]: def f(x):
...: num = x.split(None, 1)[0]
...: if num.isdigit():
...: return int(num)
...: return x
...:
In [6]: sorted(a, key=f)
Out[6]: ['4 sheets', '12 sheets', '48 sheets', 'booklet']

sets are inherently un-ordered. You'll need to create a list with the same content and sort that.

For people stuck with a pre-2.4 version of Python, without the wonderful sorted() function, a quick way to sort sets is:
l = list(yourSet)
l.sort()
This does not answer the specific question above (12 sheets will come before 4 sheets), but it might be useful to people coming from Google.

b = set(['booklet', '10-b40', 'z94 boots', '4 sheets', '48 sheets',
'12 sheets', '1 thing', '4a sheets', '4b sheets', '2temptations'])
numList = sorted([x for x in b if x.split(' ')[0].isdigit()],
key=lambda x: int(x.split(' ')[0]))
alphaList = sorted([x for x in b if not x.split(' ')[0].isdigit()])
sortedList = numList + alphaList
print(sortedList)
Out: ['1 thing',
'4 sheets',
'12 sheets',
'48 sheets',
'10-b40',
'2temptations',
'4a sheets',
'4b sheets',
'booklet',
'z94 boots']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pythonic way of sorting a list - python

For example i would have a list of of lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35'] and I would need to sort it so based on the number, lists.sort() = ['ben 10', 'jack 20', 'ollie 35', 'alisdar 50'] Possible somehow use formatting with split()?

Use a key function that does what you want: lists.sort(key=lambda e: int(e.split()[1])) If some of your items don't follow that format, you'll have to write something a little more elaborate.

Related

How to sort a dictionary by value

Adding values from one list to another when they share value

python re.compile() and re.findall()

complicated list and dictionary lookup in python

How to sort alpha numeric set in python

Categories

Resources