Sort brackets after alphanumeric characters? - python

Working in Python 3:
a = ['(', 'z', 'a', '1', '{']
a.sort()
a
['(', '1', 'a', 'z', '{']
How can I sort the list so that alphanumeric characters come before punctuation characters:
a = ['(', 'z', 'a', '1', '{']
a.custom_sort()
a
['1', 'a', 'z', '(', '{']
(Actually I don't care about the order of the last two characters.)
This seems surprisingly difficult!
I understand that Python sorts asciibetically, and I'm looking for a human-readable sort. I found natsort but it only seems to deal with numbers.

You can use a key function for sort that returns a tuple to test if a given character is alphanumeric and use the character's lexicographical order as a secondary sorting key:
a.sort(key=lambda c: (not c.isalnum(), c))

You can pass a key function to sorted check if the value is in string.punctuation:
import string
punctuation = set(string.punctuation)
a = sorted(a, key=lambda x: (x in punctuation, x))
print(a)
#['1', 'a', 'z', '(', '{']

This approach explicitly checks if it's in the right sets:
import string
import sys
a = ['(', 'z', 'a', '1', '{']
def key(a):
if a in string.ascii_letters or a in string.digits:
return ord(a)
return sys.maxsize
a.sort(key=key)
print(a)

Related

Python - Splitting a string by special characters and numbers

I have a string that I want to split at every instance of an integer, unless an integer is directly followed by another integer. I then want to split that same string at "(" and ")".
myStr = ("H12(O1H2)2O2C1")
list1 = re.split('(\d+)', myStr)
print(list1)
list1 = re.split('(\W)', myStr)
print(list1)
I want the result to be ['H', '12', '(', 'O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1'].
After:
re.split('(\d+)', myStr)
I get:
['H', '12', '(O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1']
I now want to split up the open parenthesis and the "O" to make individual elements.
Trying to split up a list after it's already been split up the way I tried doesn't work.
Also, "myStr" eventually will be a user input, so I don't think that indexing through a known string (like myStr is in this example) would solve my issue.
Open to suggestions.
You have to use character set to get what you want, change (\d+) to something like this ([\d]+|[\(\)])
import re
myStr = ("H12(O1H2)2O2C12")
list1 = re.split('([\d]+|[\(\)])', myStr)
# print(list1)
noempty_list = list(filter(None, list1))
print(noempty_list)
Output:
['H', '12', '(', 'O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1']
You also have to match the () characters and without it will print (O, and since re.split returns a list with empty value(s), just remove it
With ([\d]+|[A-Z]) will work too but re.split will return more empty strings in the list

How does the "key=lambda w:sorted(w)" work?

words = "4of Fo1r pe6ople g3ood th5e the2"
words = sorted(words.split(), key=lambda w:sorted(w))
output:
['Fo1r', 'the2', 'g3ood', '4of', 'th5e', 'pe6ople']
I don't get how this function sorts the words based on the number in the word
Here are your words:
>>> words = "4of Fo1r pe6ople g3ood th5e the2"
>>> words = words.split()
>>> words
['4of', 'Fo1r', 'pe6ople', 'g3ood', 'th5e', 'the2']
You can sort the letters of each word:
>>> [sorted(w) for w in words]
[['4', 'f', 'o'], ['1', 'F', 'o', 'r'], ['6', 'e', 'e', 'l', 'o', 'p', 'p'], ['3', 'd', 'g', 'o', 'o'], ['5', 'e', 'h', 't'], ['2', 'e', 'h', 't']]
If you zip words and the previous list, you see each word along with the key:
>>> list(zip(words, [sorted(w) for w in words]))
[('4of', ['4', 'f', 'o']), ('Fo1r', ['1', 'F', 'o', 'r']), ('pe6ople', ['6', 'e', 'e', 'l', 'o', 'p', 'p']), ('g3ood', ['3', 'd', 'g', 'o', 'o']), ('th5e', ['5', 'e', 'h', 't']), ('the2', ['2', 'e', 'h', 't'])]
That's why Fo1r (key: ['1', 'F', 'o', 'r']) is before the2 (key: ['2', 'e', 'h', 't']) and so on...
sorted's key argument is a function called on each element of the input to produce the value to sort on. So when the key itself is also sorted (note: key=lambda w:sorted(w) is just a slow way to spell key=sorted), it means it sorts 'Fo1r' to produce a key value of ['1', 'F', 'o', 'r']. Since sorting characters effectively sorts them by ordinal value, and ASCII digits precede ASCII letters on ordinal value, it means that in this particular case, with one unique digit in each input, and the rest of each string being letters, it effectively sorts by the digit.
If the same digit appeared in more than one input, it would fallback to sorting by the highest ordinal value aside from the digit; e.g. 'abc1' would sort before 'xyz1' because the fallback comparison would be comparing 'a' to 'x'. Similarly, if a space character appeared in some inputs, but not others, those inputs would sort before all the others (because the space character is ordinal 32, while '0' is ordinal 48).
The inner sorted orders symbols (letters and numbers) from each word according to their numeric code, which is more or less alphabetical, with numbers preceding the letters. As you might noted, Python strings are often handled in the same way as any other iterables, such as lists, sets or dictionaries. For instance, sorted("of4") results in the list ["4", "f", "o"] because the number symbols are considered to go before letters. sorted maps ("For1") into a list starting with "1". This list precedes the one starting with "4", and the rest of lists.
Apply sorted to each of words, and it will be more clear to you.
More technically, sorted transforms a word into an ordered list of 1-letter string. The outer sorted orders those words by comparing corresponding ordered lists. The Python lists are compared element by element.
instead of "sorted" function you can use your version of function which compare string based on digits present in the string..
below i have shown function for the same
>>> def mysort(string):
lst=[]
final=''
for char in string:
if char.isdigit():
lst.append(char)
lst.sort()
for i in lst:
for word in string.split():
if i in word:
final=final+' '+word
return final
output:
>>> words= "4of Fo1r pe6ople g3ood th5e the2"
>>> mysort(words)
' Fo1r the2 g3ood 4of th5e pe6ople'

how to check if the is any char (a - b) in string [duplicate]

This question already has answers here:
How can I check if a string contains ANY letters from the alphabet?
(7 answers)
Closed 5 years ago.
i am trying to chek if certain string contain any char from the a - z.
I saw that i can use in in but it not seem to be the most comfortable Way to pass all over the string like that:
if a in string
if b in string
if c in string
Can you help me to find function/algorithm are doing it? will it work on numbers as well?
Try use regex by
import re
If re.search(r"[a-z]", s):
...
convert your input strings to lists then post process like this:
list(set([x for x in a if x in b]))
script:
STRING = ";alkd779-n;l--xswdlfkj"
TEST = "abcde"
string = list(STRING)
test = list(TEST)
matches = list(set([x for x in string if x in test]))
contains_match = True if len(matches)>0 else False
print 'string : %s' % string
print 'test : %s' % test
print 'matches : %s' % matches
print 'contains match : %s' % contains_match
>>>
string : [';', 'a', 'l', 'k', 'd', '7', '7', '9', '-', 'n', ';', 'l', '-', '-', 'x', 's', 'w', 'd', 'l', 'f', 'k', 'j']
test : ['a', 'b', 'c', 'd', 'e']
matches : ['a', 'd']
contains match : True

Split a string in Python having parenthesis (multiple splitters)

I have a string, for example:
"ab(abcds)kadf(sd)k(afsd)(lbne)"
I want to split it to a list such that the list is stored like this:
a
b
abcds
k
a
d
f
sd
k
afsd
lbne
I need to get the elements outside the parenthesis in separate rows and the ones inside it in separate ones.
I am not able to think of any solution to this problem.
You can use iter to make an iterator and use itertools.takewhile to extract the strings between the parens:
it = iter(s)
from itertools import takewhile
print([ch if ch != "(" else "".join(takewhile(lambda x: x!= ")",it)) for ch in it])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
If ch is not equal to ( we just take the char else if ch is a ( we use takewhile which will keep taking chars until we hit a ) .
Or using re.findall get all strings starting and ending in () with \((.+?))` and all other characters with :
print([''.join(tup) for tup in re.findall(r'\((.+?)\)|(\w)', s)])
['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
You just need to use the magic of 're.split' and some logic.
import re
string = "ab(abcds)kadf(sd)k(afsd)(lbne)"
temp = []
x = re.split(r'[(]',string)
#x = ['ab', 'abcds)kadf', 'sd)k', 'afsd)', 'lbne)']
for i in x:
if ')' not in i:
temp.extend(list(i))
else:
t = re.split(r'[)]',i)
temp.append(t[0])
temp.extend(list(t[1]))
print temp
#temp = ['a', 'b', 'abcds', 'k', 'a', 'd', 'f', 'sd', 'k', 'afsd', 'lbne']
Have a look at difference in append and extend here.
I hope this helps.
You have two options. The really easy one is to just iterate over the string. For example:
in_parens=False
buffer=''
for char in my_string:
if char =='(':
in_parens=True
elif char==')':
in_parens = False
my_list.append(buffer)
buffer=''
elif in_parens:
buffer+=char
else:
my_list.append(char)
The other option is regex.
I would suggest regex. It is worth practicing.
Try: Python re. If you are new to re it may take a bit of time but you can do all kind of string manipulations once you get it.
import re
search_string = 'ab(abcds)kadf(sd)k(afsd)(lbne)'
re_pattern = re.compile('(\w)|\((\w*)\)') # Match single character or characters in parenthesis
print [x if x else y for x,y in re_pattern.findall(search_string)]

Python 3: convert a list into a dictionary

I am looking for a simple method to convert a list into a dictionary. I have a simple list:
leet =['a','4','b','l3','c','(','d','[)','e','3','g','6','l','1','o','0','s','5','t','7','w','\/\/']
which I want to easily convert to a dictionary. I have tried using defaultdict but I don't quite understand what it is doing ( I found this code in a previous answer):
>>> from collections import defaultdict
>>> dic = defaultdict(list)
>>> for item in leet:
key ="/".join(item[:-1])
dic[key].append(item[-1])
>>> dic
defaultdict(<class 'list'>, {'\\:/:\\': [], '': ['a', '4', 'b', 'c', '(', 'd', 'e', '3', 'g', '6', 'l', '1', 'o', '0', 's', '5', 't', '7', 'w'], 'l': ['3'], '[': [')'], '\\///\\': ['/']})
Ultimately, I want to read in the data from a txt file ( line by line) into a list and convert to a dictionary for the rest of the simple program.
I'm looking for a straight-forward way to achieve this.
Thanks
Not sure you're going down the right path with a defaultdict, convert to a dict by grouping into pairs, then use dict.get to cater for when there isn't a matching key:
leet =['a','4','b','l3','c','(','d','[)','e','3','g','6','l','1','o','0','s','5','t','7','w','\/\/']
lookup = dict(zip(*[iter(leet)] * 2))
text = 'how are you?'
blah = ''.join(lookup.get(ch, ch) for ch in text)
# h0\/\/ 4r3 y0u?
components_dict = dict(((lambda y: y['id'])(y), y) for y in components)
component object as follows:
{"id":1234, "name":"xxx"}

Categories