using 'or' condition in re.split - python

I have a list of strings each one of those needs to be split when an 'y' or 'm' is found:
mylist = ['3m10y','10y20y','18m2y']
in the following items:
splitlist = [['3m','10y'],['10y','20y'],['18m','2y']]
i was thinking of using re.split() but I cannot use the 'or' condition in order to tell the function to split either when it finds an 'm' or an 'y'.
any help appreciated!
thanks

Try findall instead of split:
>>> re.findall(r'\d+[ym]', '3m10y')
['3m', '10y']
[my] is m or y.

>>> items = re.split(r'(m|y)', '10m2y4m55y55y53m')
>>> items
['10', 'm', '2', 'y', '4', 'm', '55', 'y', '55', 'y', '53', 'm', '']
>>> [''.join(p) for p in zip(items[::2], items[1::2])]
['10m', '2y', '4m', '55y', '55y', '53m']

Related

Python - Splitting a string by special characters and numbers

I have a string that I want to split at every instance of an integer, unless an integer is directly followed by another integer. I then want to split that same string at "(" and ")".
myStr = ("H12(O1H2)2O2C1")
list1 = re.split('(\d+)', myStr)
print(list1)
list1 = re.split('(\W)', myStr)
print(list1)
I want the result to be ['H', '12', '(', 'O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1'].
After:
re.split('(\d+)', myStr)
I get:
['H', '12', '(O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1']
I now want to split up the open parenthesis and the "O" to make individual elements.
Trying to split up a list after it's already been split up the way I tried doesn't work.
Also, "myStr" eventually will be a user input, so I don't think that indexing through a known string (like myStr is in this example) would solve my issue.
Open to suggestions.
You have to use character set to get what you want, change (\d+) to something like this ([\d]+|[\(\)])
import re
myStr = ("H12(O1H2)2O2C12")
list1 = re.split('([\d]+|[\(\)])', myStr)
# print(list1)
noempty_list = list(filter(None, list1))
print(noempty_list)
Output:
['H', '12', '(', 'O', '1', 'H', '2', ')', '2', 'O', '2', 'C', '1']
You also have to match the () characters and without it will print (O, and since re.split returns a list with empty value(s), just remove it
With ([\d]+|[A-Z]) will work too but re.split will return more empty strings in the list

How to write complex sort in python?

Is there a concise way to sort a list by first sorting numbers in ascending order and then sort the characters in descending order?
How would you sort the following:
['2', '4', '1', '6', '7', '4', '2', 'K', 'A', 'Z', 'B', 'W']
To:
['1', '2', '2', '4', '4', '6', '7', 'Z', 'W', 'K', 'B', 'A']
One way (there might be better ones) is to separate digits and letters beforehand, sort them appropriately and glue them again together in the end:
lst = ['2', '4', '1', '6', '7', '4', '2', 'K', 'A', 'Z', 'B', 'W']
numbers = sorted([number for number in lst if number.isdigit()])
letters = sorted([letter for letter in lst if not letter.isdigit()], reverse=True)
combined = numbers + letters
print(combined)
Another way makes use of ord(...) and the ability to sort by tuples. Here we use zero for numbers and one for letters:
def sorter(item):
if item.isdigit():
return 0, int(item)
else:
return 1, -ord(item)
print(sorted(lst, key=sorter))
Both will yield
['1', '2', '2', '4', '4', '6', '7', 'Z', 'W', 'K', 'B', 'A']
As for timing:
def different_lists():
global my_list
numbers = sorted([number for number in my_list if number.isdigit()])
letters = sorted([letter for letter in my_list if not letter.isdigit()], reverse=True)
return numbers + letters
def key_function():
global my_list
def sorter(item):
if item.isdigit():
return 0, int(item)
else:
return 1, -ord(item)
return sorted(my_list, key=sorter)
from timeit import timeit
print(timeit(different_lists, number=10**6))
print(timeit(key_function, number=10**6))
This yields (running it a million times on my MacBook):
2.9208732349999997
4.54283629
So the approach with list comprehensions is faster here.
To elaborate on the custom-comparison approach: in Python the built-in sort does key comparison.
How to think about the problem: to group values and then sort each group by a different quality, we can think of "which group is a given value in?" as a quality - so now we are sorting by multiple qualities, which we do with a key that gives us a tuple of the value for each quality.
Since we want to sort the letters in descending order, and we can't "negate" them (in the arithmetic sense), it will be easiest to apply reverse=True to the entire sort, so we keep that in mind.
We encode: True for digits and False for non-digits (since numerically, these are equivalent to 1 and 0 respectively, and we are sorting in descending order overall). Then for the second value, we'll use the symbol directly for non-digits; for digits, we need the negation of the numeric value, to re-reverse the sort.
This gives us:
def custom_key(value):
numeric = value.isdigit()
return (numeric, -int(value) if numeric else value)
And now we can do:
my_list.sort(key=custom_key, reverse=True)
which works for me (and also handles multi-digit numbers):
>>> my_list
['1', '2', '2', '4', '4', '6', '7', 'Z', 'W', 'K', 'B', 'A']
You will have to implement your own comparison function and pass it as the key argument for the sorted function. What you are seeking is not a trivial comparison as you "assign" custom values to fields so you will have to let Python know how you value each one of them

How can I split a string into tokens?

If I have a string
'x+13.5*10x-4e1'
how can I split it into the following list of tokens?
['x', '+', '13', '.', '5', '*', '10', 'x', '-', '4', 'e', '1']
Currently I'm using the shlex module:
str = 'x+13.5*10x-4e1'
lexer = shlex.shlex(str)
tokenList = []
for token in lexer:
tokenList.append(str(token))
return tokenList
But this returns:
['x', '+', '13', '.', '5', '*', '10x', '-', '4e1']
So I'm trying to split the letters from the numbers. I'm considering taking the strings that contain both letters and numbers then somehow splitting them, but not sure about how to do this or how to add them all back into the list with the others afterwards. It's important that the tokens stay in order, and I can't have nested lists.
In an ideal world, e and E would not be recognised as letters in the same way, so
'-4e1'
would become
['-', '4e1']
but
'-4x1'
would become
['-', '4', 'x', '1']
Can anybody help?
Use the regular expression module's split() function, to split at
'\d+' -- digits (number characters) and
'\W+' -- non-word characters:
CODE:
import re
print([i for i in re.split(r'(\d+|\W+)', 'x+13.5*10x-4e1') if i])
OUTPUT:
['x', '+', '13', '.', '5', '*', '10', 'x', '-', '4', 'e', '1']
If you don't want to separate the dot (as a floating-point number in the expression) then you should use this:
[\d.]+ -- digit or dot characters (although this allows you to write: 13.5.5
CODE:
print([i for i in re.split(r'([\d.]+|\W+)', 'x+13.5*10x-4e1') if i])
OUTPUT:
['x', '+', '13.5', '*', '10', 'x', '-', '4', 'e', '1']
Another alternative not suggested here, is to using nltk.tokenize module
Well, the problem seems not to be quite simple. I think, a good way to get robust (but, unfortunately, not so short) solution is to use Python Lex-Yacc for creating a full-weight tokenizer. Lex-Yacc is a common (not only Python) practice for this, thus there can exist ready grammars for creating a simple arithmetic tokenizer (like this one), and you have just to fit them to your specific needs.

Python 3: convert a list into a dictionary

I am looking for a simple method to convert a list into a dictionary. I have a simple list:
leet =['a','4','b','l3','c','(','d','[)','e','3','g','6','l','1','o','0','s','5','t','7','w','\/\/']
which I want to easily convert to a dictionary. I have tried using defaultdict but I don't quite understand what it is doing ( I found this code in a previous answer):
>>> from collections import defaultdict
>>> dic = defaultdict(list)
>>> for item in leet:
key ="/".join(item[:-1])
dic[key].append(item[-1])
>>> dic
defaultdict(<class 'list'>, {'\\:/:\\': [], '': ['a', '4', 'b', 'c', '(', 'd', 'e', '3', 'g', '6', 'l', '1', 'o', '0', 's', '5', 't', '7', 'w'], 'l': ['3'], '[': [')'], '\\///\\': ['/']})
Ultimately, I want to read in the data from a txt file ( line by line) into a list and convert to a dictionary for the rest of the simple program.
I'm looking for a straight-forward way to achieve this.
Thanks
Not sure you're going down the right path with a defaultdict, convert to a dict by grouping into pairs, then use dict.get to cater for when there isn't a matching key:
leet =['a','4','b','l3','c','(','d','[)','e','3','g','6','l','1','o','0','s','5','t','7','w','\/\/']
lookup = dict(zip(*[iter(leet)] * 2))
text = 'how are you?'
blah = ''.join(lookup.get(ch, ch) for ch in text)
# h0\/\/ 4r3 y0u?
components_dict = dict(((lambda y: y['id'])(y), y) for y in components)
component object as follows:
{"id":1234, "name":"xxx"}

Problem sorting list of strings - Python

I have a list of strings:
cards = ['2S', '8D', '8C', '4C', 'TS', '9S', '9D', '9C', 'AC', '3D']
and the order in which I want to display the cards:
CARD_ORDER = ['2', '3', '4', '5', '6', '7', '8', '9', 'T', 'J', 'Q', 'K', 'A']
This is how I'm trying to order the list:
sorted(cards, lambda x,y: CARD_ORDER.index(x[0]) >= CARD_ORDER.index(y[0]) )
Unfortunately this does not seem to work....
or more precisely the list stays exactly the same, sorted(cards) works fine instead.
Any ideas?
it's
sorted(cards, key=lambda x: CARD_ORDER.index(x[0]))
key parameter accepts a single value, by which to sort the main iterable. You're probably trying to use cmp parameter which is not recommended for quite some time.
Try
sorted(cards, key = lambda x: CARD_ORDER.index(x[0]) )

Categories