How to sort alpha numeric set in python - python

I have a set
set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
After sorting I want it to look like
4 sheets,
12 sheets,
48 sheets,
booklet
Any idea please

Jeff Atwood talks about natural sort and gives an example of one way to do it in Python. Here is my variation on it:
import re
def sorted_nicely( l ):
""" Sort the given iterable in the way that humans expect."""
convert = lambda text: int(text) if text.isdigit() else text
alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
return sorted(l, key = alphanum_key)
Use like this:
s = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
for x in sorted_nicely(s):
print(x)
Output:
4 sheets
12 sheets
48 sheets
booklet
One advantage of this method is that it doesn't just work when the strings are separated by spaces. It will also work for other separators such as the period in version numbers (for example 1.9.1 comes before 1.10.0).

Short and sweet:
sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
This version:
Works in Python 2 and Python 3, because:
It does not assume you compare strings and integers (which won't work in Python 3)
It doesn't use the cmp parameter to sorted (which doesn't exist in Python 3)
Will sort on the string part if the quantities are equal
If you want printed output exactly as described in your example, then:
data = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
r = sorted(data, key=lambda item: (int(item.partition(' ')[0])
if item[0].isdigit() else float('inf'), item))
print ',\n'.join(r)

You should check out the third party library natsort. Its algorithm is general so it will work for most input.
>>> import natsort
>>> your_list = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
>>> print ',\n'.join(natsort.natsorted(your_list))
4 sheets,
12 sheets,
48 sheets,
booklet

A simple way is to split up the strings to numeric parts and non-numeric parts and use the python tuple sort order to sort the strings.
import re
tokenize = re.compile(r'(\d+)|(\D+)').findall
def natural_sortkey(string):
return tuple(int(num) if num else alpha for num, alpha in tokenize(string))
sorted(my_set, key=natural_sortkey)

It was suggested that I repost this answer over here since it works nicely for this case also
from itertools import groupby
def keyfunc(s):
return [int(''.join(g)) if k else ''.join(g) for k, g in groupby(s, str.isdigit)]
sorted(my_list, key=keyfunc)
Demo:
>>> my_set = {'booklet', '4 sheets', '48 sheets', '12 sheets'}
>>> sorted(my_set, key=keyfunc)
['4 sheets', '12 sheets', '48 sheets', 'booklet']
For Python3 it's necessary to modify it slightly (this version works ok in Python2 too)
def keyfunc(s):
return [int(''.join(g)) if k else ''.join(g) for k, g in groupby('\0'+s, str.isdigit)]

Generic answer to sort any numbers in any position in an array of strings. Works with Python 2 & 3.
def alphaNumOrder(string):
""" Returns all numbers on 5 digits to let sort the string with numeric order.
Ex: alphaNumOrder("a6b12.125") ==> "a00006b00012.00125"
"""
return ''.join([format(int(x), '05d') if x.isdigit()
else x for x in re.split(r'(\d+)', string)])
Sample:
s = ['a10b20','a10b1','a3','b1b1','a06b03','a6b2','a6b2c10','a6b2c5']
s.sort(key=alphaNumOrder)
s ===> ['a3', 'a6b2', 'a6b2c5', 'a6b2c10', 'a06b03', 'a10b1', 'a10b20', 'b1b1']
Part of the answer is from there

>>> a = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
>>> def ke(s):
i, sp, _ = s.partition(' ')
if i.isnumeric():
return int(i)
return float('inf')
>>> sorted(a, key=ke)
['4 sheets', '12 sheets', '48 sheets', 'booklet']

Based on SilentGhost's answer:
In [4]: a = set(['booklet', '4 sheets', '48 sheets', '12 sheets'])
In [5]: def f(x):
...: num = x.split(None, 1)[0]
...: if num.isdigit():
...: return int(num)
...: return x
...:
In [6]: sorted(a, key=f)
Out[6]: ['4 sheets', '12 sheets', '48 sheets', 'booklet']

sets are inherently un-ordered. You'll need to create a list with the same content and sort that.

For people stuck with a pre-2.4 version of Python, without the wonderful sorted() function, a quick way to sort sets is:
l = list(yourSet)
l.sort()
This does not answer the specific question above (12 sheets will come before 4 sheets), but it might be useful to people coming from Google.

b = set(['booklet', '10-b40', 'z94 boots', '4 sheets', '48 sheets',
'12 sheets', '1 thing', '4a sheets', '4b sheets', '2temptations'])
numList = sorted([x for x in b if x.split(' ')[0].isdigit()],
key=lambda x: int(x.split(' ')[0]))
alphaList = sorted([x for x in b if not x.split(' ')[0].isdigit()])
sortedList = numList + alphaList
print(sortedList)
Out: ['1 thing',
'4 sheets',
'12 sheets',
'48 sheets',
'10-b40',
'2temptations',
'4a sheets',
'4b sheets',
'booklet',
'z94 boots']

Related

Find "n" in a list full of strings

I'm looking to go through a list and find any element with a number.
This is what i got so far
list = ['Alvarez, S', 'Crawford, B', 'Fury, 8', 'Mayweather, F', 'Lopez, 44']
num = '8'
for s in home_pitchers:
if num in s:
print(s)
print(ex)
>>> Fury, 8
What I'm looking to do is to have num be 0 - 9. I thought about using '[^0-9]' but that didn't work.
Ultimately I'm looking to print out this
print
>>> Fury, 8
>>> Lopez, 44
Just a heads up, I'm pretty new to coding so some concept might go over my head
You can use isdigit method with any function. The isdigit method return True if the string is a digit string, False otherwise.
>>> lst = ['Alvarez, S', 'Crawford, B', 'Fury, 8', 'Mayweather, F', 'Lopez, 44']
>>>
>>> for s in lst:
... if any(char.isdigit() for char in s):
... print(s)
...
Fury, 8
Lopez, 44
Using the re library:
import re
lst = ['Alvarez, S', 'Crawford, B', 'Fury, 8', 'Mayweather, F', 'Lopez, 44']
list(filter(lambda x:re.match(".*[0-9]+$",x), lst))
OUTPUT:
['Fury, 8', 'Lopez, 44']
The pattern matches any string ending with one or more numbers.

Is there a function for searching numbers inside a string list without looping 2 times?

Imagine some card games inside a list, like the one below:
list1 = ['1 of Spades', '1 of Diamonds', '2 of Hearts']
I was trying something like that but it didn't worked out:
popped = [s for s in list1 if s[:1] in list1.count(s[:1]) > 1]
How can I get all the same-value cards and remove them without looping 2 times? In this example, both '1 of Spades' and '1 of Diamonds' should be popped.
Assuming your looking to avoid nested loops and are looking for an O(n) solution, you could use a Counter dictionary to determine the number of each card value and filter the list based on it afterward:
list1 = ['1 of Spades', '1 of Diamonds', '2 of Hearts']
from collections import Counter
counts = Counter( c.split(" ",1)[0] for c in list1 )
list1 = [ c for c in list1 if counts[c.split(" ",1)[0]] == 1 ]
output:
print(list1)
# ['2 of Hearts']

Edabit task doesn't show correct result

I'm doing a simple task which requires to sort a list by expression result and running this code:
def sort_by_answer(lst):
ans = []
dict = {}
for i in lst:
if 'x' in i:
i = i.replace('x', '*')
dict.update({i: eval(i)})
dict = {k: v for k, v in sorted(dict.items(), key=lambda item: item[1])}
res = list(dict.keys())
for i in res:
if '*' in i:
i = i.replace('*', 'x')
ans.append(i)
else:
ans.append(i)
return ans
It checks out but the site for which i'm doing this test(here's a link to the task(https://edabit.com/challenge/9zf6scCreprSaQAPq) tells my that my list is not correctly sorted, which it is, can someone help me improve this code or smth so it works in every case-scenario?
P.S.
if 'x' in i:
i = i.replace('x', '*')
This is made so i can use the eval function but the site input has 'x' instead of '*' in their lists..
You can try this. But using eval is dangerous on untrusted strings.
In [63]: a=['1 + 1', '1 + 7', '1 + 5', '1 + 4']
In [69]: def evaluate(_str):
...: return eval(_str.replace('x','*'))
output
In [70]: sorted(a,key=evaluate)
Out[70]: ['1 + 1', '1 + 4', '1 + 5', '1 + 7']
In [71]: sorted(['4 - 4', '2 - 2', '5 - 5', '10 - 10'],key=evaluate)
Out[71]: ['4 - 4', '2 - 2', '5 - 5', '10 - 10']
In [72]: sorted(['2 + 2', '2 - 2', '2 x 1'],key=evaluate)
Out[72]: ['2 - 2', '2 x 1', '2 + 2']
I don't think it is an issue with your code, probably they are using something older that 3.6 and it is messing up the order of the dict. A tuple would be safer.
def sort_by_answer(lst):
string = ','.join(lst).replace('x','*')
l = string.split(',')
d = [(k.replace('*','x'), eval(k)) for k in l]
ans = [expr for expr, value in sorted(d, key = lambda x: x[1])]
return ans
EDIT:
#Ch3steR's answer is more pythonic:
def sort_by_answer(lst):
return sorted(lst, key= lambda x: eval(x.replace('x','*')))

python re.compile() and re.findall()

So I try to print only the month, and when I use :
regex = r'([a-z]+) \d+'
re.findall(regex, 'june 15')
And it prints : june
But when I try to do the same for a list like this :
regex = re.compile(r'([a-z]+) \d+')
l = ['june 15', 'march 10', 'july 4']
filter(regex.findall, l)
it prints the same list like they didn't take in count the fact that I don't want the number.
Use map instead of filter like this example:
import re
a = ['june 15', 'march 10', 'july 4']
regex = re.compile(r'([a-z]+) \d+')
# Or with a list comprehension
# output = [regex.findall(k) for k in a]
output = list(map(lambda x: regex.findall(x), a))
print(output)
Output:
[['june'], ['march'], ['july']]
Bonus:
In order to flatten the list of lists you can do:
output = [elm for k in a for elm in regex.findall(k)]
# Or:
# output = list(elm for k in map(lambda x: regex.findall(x), a) for elm in k)
print(output)
Output:
['june', 'march', 'july']

Pythonic way of sorting a list

For example i would have a list of of
lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
and I would need to sort it so based on the number,
lists.sort() = ['ben 10', 'jack 20', 'ollie 35', 'alisdar 50']
Possible somehow use formatting with split()?
Use a key function:
lists.sort(key=lambda s: int(s.rsplit(None, 1)[-1]))
The key callable is passed each and every element in lists and that element is sorted according to the return value. In this case we
split once on whitespace, starting on the right
take the last element of the split
turn that into an integer
The argument to key can be any callable, but a lambda is just more compact. You can try it out in the command prompt:
>>> key_function = lambda s: int(s.rsplit(None, 1)[-1])
>>> key_function('ben 10')
10
>>> key_function('Guido van Rossum 42')
42
In effect, when sorting the values are augmented with the return value of that function, and what is sorted is:
[(20, 0, 'jack 20'), (10, 1, 'ben 10'), (50, 2, 'alisdar 50'), (35, 3, 'ollie 35')]
instead (with the second value, the index of the element, added to keep the original order in case of equal sort keys).
Result:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> lists.sort(key=lambda s: int(s.rsplit(None, 1)[-1]))
>>> lists
['ben 10', 'jack 20', 'ollie 35', 'alisdar 50']
Use a key function that does what you want:
lists.sort(key=lambda e: int(e.split()[1]))
If some of your items don't follow that format, you'll have to write something a little more elaborate.
It would be better if you had a more appropriate data type than a string to represent, say, a person's name and age. One way would be a dictionary:
lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
d = dict(item.split(' ') for item in lists)
This constructs a dictionary from a stream of two-element lists.
Then you can sort like this:
print sorted((v, k) for k, v in d.iteritems())
and get this:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> d = dict(item.split(' ') for item in lists)
>>> print sorted((v, k) for k, v in d.iteritems())
[('10', 'ben'), ('20', 'jack'), ('35', 'ollie'), ('50', 'alisdar')]
Or you could convert age to integer:
>>> lists = ['jack 20', 'ben 10', 'alisdar 50', 'ollie 35']
>>> person_iter = (item.split(' ') for item in lists)
>>> d = {k: int(v) for k, v in person_iter}
>>> print sorted((v, k) for k, v in d.iteritems())
[(10, 'ben'), (20, 'jack'), (35, 'ollie'), (50, 'alisdar')]
person_iter is a generator that produces pairs of name-age. You feed that to the dictionary comprehension and convert the second argument to an integer.
The basic idea, though, is that you will have an easier time if you use more precise data types for your purposes.

Categories