Sorting Lists with numerals - python

If I have a list that all strings but with numerals :
a = ['0.01um', 'Control', '0.1um', '0.05um']
If I sort the list it looks like this
a.sort()
print(a)
['0.01um', '0.05um', '0.1um', 'Control']
How can I make it sort with so the strings starting with letters come before the strings starting with numbers, but the numbers are still ordered from smallest to biggest. For instance :
['Control', '0.01um', '0.05um', '0.1um']

Well, to get your strings starting with letters to collate before those starting with numerics sounds like you'd need to separate them, sort each, then append the one (sub)list to the the other.
To get "natural" sorting of strings containing numerics I'd look at natsort.
So the code might look something like:
#!python
# UNTESTED
import string
from natsort import natsorted
a = ['0.01um', 'Control', '0.1um', '0.05um']
astrs = [x for x in a if not x[0] in string.digits]
anums = [x for x in a if x[0] in string.digits]
results = natsorted(astrs) + natsorted(anums)

You can define a key function that “decorates” strings starting with letters with a value lower than strings starting with numbers. For example:
from itertools import takewhile
def get_numeric_prefix(str_):
return float(takewhile(char.isnumeric() or char == '.'))
def letters_before_numbers(str_):
if not str_:
return (3, str_)
elif str_[0].isalpha():
return (0, str_)
elif str_[0].isnumeric():
return (1, get_numeric_prefix(str_))
else:
return (2, str_)
a.sort(key=letters_before_numbers)
The initial not str_ case is necessary, because the conditions in the later cases access item 0, which would raise an IndexError if the string were empty. I've elected not to handle strings like '40.0.3um' (where there's a prefix that's all numbers and dots, but it's not a properly-formatted float)—those will cause the float() call to raise a ValueError. If you need those to be handled, you should probably grab a copy of natsort and use Jim's answer.
Tuples are compared lexicographically, so sort() looks at the number first—if it's lower, the original string will get sorted earlier in the list. This means that you can play with the numbers that are the first elements of the return values to get different kinds of strings to sort earlier or later than others. In this case, strings starting with letters will get sorted first, followed by strings starting with numbers (sorted by the numbers), then strings starting with neither, and finally empty strings.

Here is a trick involving python's tuple sorting semantics; strings which do not begin with a number will be sorted lexically before all numbers which will be sorted numerically:
In [1]: import re
In [2]: def parse_number(s):
try:
m = re.match(r'\d+(?:\.\d+)?', s)
return m and float(m.group(0))
except:
pass
In [3]: a = ['0.01um', 'Control', '0.1um', '0.05um']
In [4]: b = sorted((parse_number(i), i) for i in a)
In [5]: [i for _, i in b]
Out[5]: ['Control', '0.01um', '0.05um', '0.1um']

In code
a = ['0.01um', 'Control', '0.1um', '0.05um']
b = [x for x in a if x and x[0].isdigit()]
c = [x for x in a if x not in b]
d = sorted([(x, float(''.join([y for y in x if y.isdigit() or y == '.']))) for x in b], key=lambda x: x[1])
print sorted(c) + [k for k, v in d]

Related

Sorting a list of strings based on numeric order of numeric part

I have a list of strings that may contain digits. I would like to sort this list alphabetically, but every time the String contains a number, I want it to be sorted by value.
For example, if the list is
['a1a','b1a','a10a','a5b','a2a'],
the sorted list should be
['a1a','a2a','a5b','a10a','b1a']
In general I want to treat each number (a sequence of digits) in the string as a special character, which is smaller than any letter and can be compared numerically to other numbers.
Is there any python function which does this compactly?
You could use the re module to split each string into a tuple of characters and grouping the digits into one single element. Something like r'(\d+)|(.)'. The good news with this regex is that it will return separately the numeric and non numeric groups.
As a simple key, we could use:
def key(x):
# the tuple comparison will ensure that numbers come before letters
return [(j, int(i)) if i != '' else (j, i)
for i, j in re.findall(r'(\d+)|(.)', x)]
Demo:
lst = ['a1a', 'a2a', 'a5b', 'a10a', 'b1a', 'abc']
print(sorted(lst, key=key)
gives:
['a1a', 'a2a', 'a5b', 'a10a', 'abc', 'b1a']
If you want a more efficient processing, we could compile the regex only once in a closure
def build_key():
rx = re.compile(r'(\d+)|(.)')
def key(x):
return [(j, int(i)) if i != '' else (j, i)
for i, j in rx.findall(x)]
return key
and use it that way:
sorted(lst, key=build_key())
giving of course the same output.

Python Sort string by frequency - cannot sort with sorted() function

I have an issue with sorting a simple string by frequency (I get a string as an input, and I need to give a sorted string back as an output in descending order).
Let me give you an example (the original word contains 4 e's, 2 s's, 1 t, 1 r and 1 d; so these get sorted):
In [1]: frequency_sort("treeseeds")
Out [1]: "eeeesstrd"
Most solutions on Stack Overflow state that I should use the sorted() function to get my results, however, it only seems to work with certain cases.
I made two functions that supposed to work, but none of them seems to do the trick with my specific inputs (see below).
First function:
def frequency_sort(s):
s_sorted = sorted(s, key=s.count, reverse=True)
s_sorted = ''.join(c for c in s_sorted)
return s_sorted
Second function:
import collections
def frequency_sort_with_counter(s):
counter = collections.Counter(s)
s_sorted = sorted(s, key=counter.get, reverse=True)
s_sorted = ''.join(c for c in s_sorted)
return s_sorted
With both functions my outputs look like this:
The first output is okay:
In [1]: frequency_sort("loveleee")
Out [1]: "eeeellov"
The second output is not so much
In [2]: frequency_sort("loveleel")
Out [2]: "leleelov"
The third output is totally messy:
In [3]: frequency_sort("oloveleelo")
Out [3]: "oloeleelov"
What could have gone wrong? Is it connected to the 'o' and 'l' characters somehow? Or am I just missing something?
In a string where multiple characters have the same frequency, the algorithms you proposed have no way of distinguishing between characters that appear the same number of times. You could address this by sorting using a tuple of the frequency and the character itself; e.g.
In [7]: def frequency_sort(s):
s_sorted = sorted(s, key=lambda c: (s.count(c), c), reverse=True)
s_sorted = ''.join(c for c in s_sorted)
return s_sorted
...:
In [8]: frequency_sort("loveleel")
Out[8]: 'llleeevo'
In your third case, 3 letters have the same count that's why they are put together, you can sort it first(by alphabet) then sort it by frequency to arrange the letters as the following:
s_sorted = ''.join(sorted(sorted(s), key=lambda x: s.count(x), reverse=True))
output:
eeellloooav
or you can reverse it:
s_sorted = ''.join(sorted(sorted(s, reverse=True), key=lambda x: s.count(x), reverse=True))
output:
ooollleeeva
The problem is that sort and sorted are stable sorts. So if two values are "equal" (in this case key(item1) == key(item2)) they will appear in the same order as they were before the sort.
For example in your last case you have:
>>> from collections import Counter
>>> Counter("oloveleelo")
Counter({'e': 3, 'l': 3, 'o': 3, 'v': 1})
So 'e', 'l' and 'o' have the same key, so they will appear just like they did originally: "oloeleelo" and then just comes the only character that has a different count: 'v'.
If you don't care about the order of elements with equal counts (just that they are grouped by character) you don't even need sorted, just flatten the result of Counter.most_common:
>>> ''.join([item for item, cnt in Counter("oloveleelo").most_common() for _ in range(cnt)])
'llleeeooov'
>>> ''.join([item for item, cnt in Counter("loveleel").most_common() for _ in range(cnt)])
'eeelllov'
>>> ''.join([item for item, cnt in Counter("loveleee").most_common() for _ in range(cnt)])
'eeeellov'

Generate a sequence of number and alternating string in python

Aim
I would like to generate a sequence as list in python, such as:
['s1a', 's1b', 's2a', 's2b', ..., 's10a', 's10b']
Properties:
items contain a single prefix
numbers are sorted numerical
suffix is alternating per number
Approach
To get this, I applied the following code, using an xrange and comprehensive list approach:
# prefix
p = 's'
# suffix
s = ['a', 'b']
# numbers
n = [ i + 1 for i in list(xrange(10))]
# result
[ p + str(i) + j for i, j in zip(sorted(n * len(s)), s * len(n)) ]
Question
Is there a more simple syntax to obtain the results, e.g. using itertools?
Similar to this question?
A doubled-for list comprehension can accomplish this:
['s'+str(x)+y for x in range(1,11) for y in 'ab']
itertools.product might be your friend:
all_combos = ["".join(map(str, x)) for x in itertools.product(p, n, s)]
returns:
['s1a', 's1b', 's2a', 's2b', 's3a', 's3b', 's4a', 's4b', 's5a', 's5b', 's6a', 's6b', 's7a', 's7b', 's8a', 's8b', 's9a', 's9b', 's10a', 's10b']
EDIT: as a one-liner:
all_combos = ["".join(map(str,x)) for x in itertools.product(['s'], range(1, 11), ['a', 'b'])]
EDIT 2: as pointed out in James' answer, we can change our listed string element in the product call to just strings, and itertools will still be able to iterate over them, selecting characters from each:
all_combos = ["".join(map(str,x)) for x in itertools.product('s', range(1, 11), 'ab')]
How about:
def func(prefix,suffixes,size):
k = len(suffixes)
return [prefix+str(n/k+1)+suffixes[n%k] for n in range(size*k)]
# usage example:
print func('s',['a','b'],10)
This way you can alternate as many suffixes as you want.
And of course, each one of the suffixes can be as long as you want.
You can use a double-list comprehension, where you iterate on number and suffix. You don't need to load any
Below is a lambda function that takes 3 parameters, a prefix, a number of iterations, and a list of suffixes
foo = lambda prefix,n,suffix: list(prefix+str(i)+s for s in suffix for i in range(n))
You can use it like this
foo('p',10,'abc')
Or like that, if your suffixes have more than one letter
foo('p',10,('a','bc','de'))
For maximum versatility I would do this as a generator. That way you can either create a list, or just produce the sequence items as they are needed.
Here's code that runs on Python 2 or Python 3.
def psrange(prefix, suffix, high):
return ('%s%d%s' % (prefix, i, s) for i in range(1, 1 + high) for s in suffix)
res = list(psrange('s', ('a', 'b'), 10))
print(res)
for s in psrange('x', 'abc', 3):
print(s)
output
['s1a', 's1b', 's2a', 's2b', 's3a', 's3b', 's4a', 's4b', 's5a', 's5b', 's6a', 's6b', 's7a', 's7b', 's8a', 's8b', 's9a', 's9b', 's10a', 's10b']
x1a
x1b
x1c
x2a
x2b
x2c
x3a
x3b
x3c

sorting a list numerically that has string and integar value

I am looking for a code that can sort a list say for example list x, which contains integers and string. the code would then sort the list x so that the integer value is sorted corresponding to the string. so far I have tried this code however it does not work.
x =["a" 2,"c" 10, "b" 5]
x.sort()
print (x)
I want the result to be
["a" 2 "b" 5 "C" 10]
so the list is sorted numerically in acceding order and the string is also printed.
Use List of Tuples and then sort them according to what you want, example:
x = [('b',5),('a',2),('c',10)]
x.sort() # This will sort them based on item[0] of each tuple
x.sort(key=lambda s: s[1]) # This will sort them based on item[1] of each tuple
Another approach is to use dictionary instead of list of tuples, example:
x = {'b':5,'a':2,'c':10}#This will be automatically sorted based on the key of each element
if you print x, you will get:
{'a': 2, 'c': 10, 'b': 5}
if you want to sort them based on the value of each element, then:
x = sorted(x.items(), key=lambda s:s[1])
This will create a new list of tuples, since sorted() returns "new" sorted list, hence the result will be:
[('a', 2), ('b', 5), ('c', 10)]
If I deducted correctly you also want the resulting list to have an integer where the original list has an integer (and the same for characters).
I don't think there is an out-of-the-box way to do that. One possible approach is to separate your list into two others: one with integer, one with chars. Then, after sorting each list, you can merge them respecting the desired positions of integers and chars.
Use a nested iterable to pair the letters to numbers, then sort the items by the second elements:
# just pairs.sort(key = lambda x: x[1])
pairs = [('a', 2), ('c', 10), ('b', 5)]
I considered the elements are separate. The following code might help, you can fill or remove the print statement in the except block, as you wish.
x =["a", 2,"c", 10, "b", 5]
numbers = []
letters = []
for element in x:
try:
numbers.append(int(element))
except:
letters.append(str(element))
numbers.sort()
letters.sort()
numbers.reverse()
letters.reverse()
for index,item in enumerate(x):
try:
print int(item),
x[index] = numbers.pop()
except ValueError:
x[index] = letters.pop()
print "\n"+ str(x)

Ordering a string by its substring numerical value in python

I have a list of strings that need to be sorted in numerical order using as a int key two substrings.
Obviously using the sort() function orders my strings alphabetically so I get 1,10,2... that is obviously not what I'm looking for.
Searching around I found a key parameter can be passed to the sort() function, and using sort(key=int) should do the trick, but being my key a substring and not the whole string should lead to a cast error.
Supposing my strings are something like:
test1txtfgf10
test1txtfgg2
test2txffdt3
test2txtsdsd1
I want my list to be ordered in numeric order on the basis of the first integer and then on the second, so I would have:
test1txtfgg2
test1txtfgf10
test2txtsdsd1
test2txffdt3
I think I could extract the integer values, sort only them keeping track of what string they belong to and then ordering the strings, but I was wondering if there's a way to do this thing in a more efficient and elegant way.
Thanks in advance
Try the following
In [26]: import re
In [27]: f = lambda x: [int(x) for x in re.findall(r'\d+', x)]
In [28]: sorted(strings, key=f)
Out[28]: ['test1txtfgg2', 'test1txtfgf10', 'test2txtsdsd1', 'test2txffdt3']
This uses regex (the re module) to find all integers in each string, then compares the resulting lists. For example, f('test1txtfgg2') returns [1, 2], which is then compared against other lists.
Extract the numeric parts and sort using them
import re
d = """test1txtfgf10
test1txtfgg2
test2txffdt3
test2txtsdsd1"""
lines = d.split("\n")
re_numeric = re.compile("^[^\d]+(\d+)[^\d]+(\d+)$")
def key(line):
"""Returns a tuple (n1, n2) of the numeric parts of line."""
m = re_numeric.match(line)
if m:
return (int(m.groups(1)), int(m.groups(2)))
else:
return None
lines.sort(key=key)
Now lines are
['test1txtfgg2', 'test1txtfgf10', 'test2txtsdsd1', 'test2txffdt3']
import re
k = [
"test1txtfgf10",
"test1txtfgg2",
"test2txffdt3",
"test2txtsdsd1"
]
tmp = [([e for e in re.split("[a-z]",el) if e], el) for el in k ]
sorted(tmp, key=lambda k: tmp[0])
tmp = [res for cm, res in tmp]

Categories