"x.rsplit()" with multiple delimiters in Python - python

I have this code:
x.rsplit("+", 1)[-1]
The string will be splitted at the end once if "+" is in the way.
Example:
12+345+32 --> 32
But I want it to split if "+", "-" or "/" are in the way (Not just with "+" but also with "-" or "/")
Example:
12+345 - 32 --> 32
or
12+345 / 32 --> 32
How can I add multiple limits when splitting?

You can use a regex to split the string based on the operators and return the rightmost result:
re.split(r"[+-/]", x)[-1]

Instead of splitting into a list you can extract the last "chunk" with precise regex matching:
import re
s = '12+345 - 32'
res = re.search(r'(?<=[+-/])[^+-/]+$', s).group()
print(res)
32

Related

extracting chars from string using regex and pythonic way

I have a string like this: "32H74312"
I want to extract some parts and put them in different variables.
first_part = 32 # always 2 digits
second_part = H # always 1 chars
third_part = 743 # always 3 digit
fourth_part = 12 # always 2 digit
Is there some way to this in pythonic way?
There's now reason to use a regex for such a simple task.
The pythonic way could be something like:
string = "32H74312"
part1 = string[:2]
part2 = string[2:3]
part3 = string[3:6]
part4 = string[6:]
If String is always same length, then you can do this:
string = "32H74312"
first_part = string[:2] #always 2 digits
second_part = string[2:-5] # always 1 chars
third_part = string[3:-2] # always 3 digit
fourth_part = string[:6] # always 2 digit
Since you have a fixed amount of characters to capture you can do:
(\d\d)(\w)(\d{3})(\d\d)
You can then utilize re.match.
pattern = r"(\d\d)(\w)(\d{3})(\d\d)"
string = "32H74312"
first_part, second_part, third_part, fourth_part = re.match(pattern, string).groups()
print(first_part, second_part, third_part, fourth_part)
Which outputs:
32 H 743 12
Unless it's because you want an easy way to enforce each part being digits and word characters. Then this isn't really something you need regex for.
This is quite 'pythonic' also :
string = "32H74312"
parts = {0:2, 2:3, 3:6, 3:6, 6:8 }
string_parts = [ string[ p : parts[p] ] for p in parts ]
Expanding on Pedro's excellent answer, string slicing syntax is the best way to go.
However, having variables like first_part, second_part, . . . nth_part is typically considered an anti-pattern; you are probably looking for a tuple instead:
str = "32H74312"
parts = (str[:2], str[2], str[3:6], str[6:])
print(parts)
print(parts[0], parts[1], parts[2], parts[3])
You can use this method:
import re
line = '32H74312'
d2p = r'(\d\d)' # two digits pattern
ocp = r'(\w)' # one char pattern
d3p = r'(\d{3})' # three digits pattern
lst = re.match(d2p + ocp + d3p + d2p, line).groups()
for item in lst:
print(item)
Brackets are necessary for grouping search elements. Also to make testing your regexps more comfortable, you can use special platforms such as regex101

How can i solve this regular expression, Python?

I would like to construct a reg expression pattern for the following string, and use Python to extract:
str = "hello w0rld how 34 ar3 44 you\n welcome 200 stack000verflow\n"
What I want to do is extract the independent number values and add them which should be 278. A prelimenary python code is:
import re
x = re.findall('([0-9]+)', str)
The problem with the above code is that numbers within a char substring like 'ar3' would show up. Any idea how to solve this?
Why not try something simpler like this?:
str = "hello w0rld how 34 ar3 44 you\n welcome 200 stack000verflow\n"
print sum([int(s) for s in str.split() if s.isdigit()])
# 278
s = re.findall(r"\s\d+\s", a) # \s matches blank spaces before and after the number.
print (sum(map(int, s))) # print sum of all
\d+ matches all digits. This gives the exact expected output.
278
How about this?
x = re.findall('\s([0-9]+)\s', str)
The solutions posted so far only work (if at all) for numbers that are preceded and followed by whitespace. They will fail if a number occurs at the very start or end of the string, or if a number appears at the end of a sentence, for example. This can be avoided using word boundary anchors:
s = "100 bottles of beer on the wall (ignore the 1000s!), now 99, now only 98"
s = re.findall(r"\b\d+\b", a) # \b matches at the start/end of an alphanumeric sequence
print(sum(map(int, s)))
Result: 297
To avoid a partial match
use this:
'^[0-9]*$'

The elegant way to replace specific characters in Python

I have strings that are unpredictable in terms of character content, but I know that every string contains exactly one character '*'.
How to replace two characters after the '*' with some non hard-coded string. Non hard-coded string is actually calculated checksum and converted into string:
checksum_str = str(hex(csum).lstrip('0x'))
You want something like:
star_pos = my_string.find('*')
my_string = my_string[:star_pos] + '*' + checksum_str + my_string[star_pos + 3:]
You can do it with a regular expression:
import re
my_string = re.sub(r'(?<=\*)..', checksum_str, my_string, 1)

Find number in string between spaces and parse it

I have a file with lines in following format:
d 55 r:100:10000
I would like to find that 55 and parse it to int. How can I do this ? I would like to make it variable-space-in-between-proof. Which means that there might be more or less spaces in between but it will be between d and r for sure.
That's easy:
number = int(line.split()[1])
If you actually need to check whether it's between d and r, then use
import re
number = int(re.search(r"d\s+(\d+)\s+r", line).group(1))
The str.split() method by default splits on arbitrary-width whitespace:
>>> 'd 55 r:100:10000'.split()
['d', '55', 'r:100:10000']
Picking out just the middle number then becomes a simple select:
>>> int('d 55 r:100:10000'.split()[1])
55
Quoting the documentation:
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
>>> line = 'd 55 r:100:10000'
>>> int(line.split()[1]) # By default splits by any whitespace
55
Or being more efficient, you can limit it to two splits (where the second one holds the integer):
>>> int(line.split(None, 2)[1]) # None means split by any whitespace
55

Need help designing a regex or pyparsing approach to modify all words enclosed within pipes

For example:
blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|
Must become
blahblahx0Ax4Dx5Ex43adfsdasdx92 sgagrewasx12x5E
I'm trying something along the lines of: re.sub(r'\|(\w+ ?)*\|', r'x\1', a) But I'm having trouble getting it to work on more than the first match.
UPDATE: It looks like regex is not a good choice for this. Would a pyparsing solution be doable?
If not, I can write a simple iterative solution, but I would prefer something more extensible. But I'm having trouble getting it to work on more than the first match.
UPDATE2: I used a pure python approach in the end, it works fine and can deal with escape characters too.
def strtohex(self, string):
hexmode = False
hexstring = ''
i=0
while i<len(string):
if string[i] == '\\':
i += 1
#No escape charecters inside hex pipes
hexstring += string[i]
elif string[i] == '|':
hexmode = not hexmode
elif string[i] == ' ':
hexstring += '' if hexmode else ' '
else:
if hexmode:
hexstring += chr(int(string[i:i+2],16))
i += 1
else:
hexstring += string[i]
i += 1
return hexstring
Here is what this might look like in pyparsing:
from pyparsing import Word,hexnums,Suppress,OneOrMore
twoDigitHex = Word(hexnums,exact=2)
VERT = Suppress('|')
pattern = VERT + OneOrMore(twoDigitHex) + VERT
# attach parse action to prefix each 2-digit hex with 'x' and join all together
pattern.setParseAction(lambda t: ''.join('x'+tt for tt in t))
# take sample code, and use transformString to apply conversion
sample = "blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|"
print pattern.transformString(sample)
prints
blahblahx0Ax4Dx5Ex43adfsdasdx92 sgagrewasx12x5E
I'm sure you could do it using only a regex, but why bother? It's simple to use your programming language:
Break your string at the vertical bars. Check and substitute if appropriate. Recombine.
line = 'blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|'
parts = line.split('|')
for i, s in enumerate(parts):
if re.match(r'^([\dA-F]{2} )*[\dA-F]$', s):
parts[i] = re.sub('^| ', 'x', s)
result = "".join(parts)
The check is whether the entire substring consists of two-digit hex numbers separated by spaces. I assume all hex letters are capitalized, as in your example.
I proceeded in 2 times:
1st replace every hex value
then remove blanks and |
It gives:
>>> s = 'blahblah|0A 4D 5E 43|adfsdasd|92| sgagrewas|12 5E|'
>>> re.sub(r'[| ]', r'', re.sub(r' ?([0-9A-F]{2})', r'x\1', s))
'blahblahx0Ax4Dx5Ex43adfsdasdx92sgagrewasx12x5E'
I don't think python is capable of balanced regex expressions. To my knowledge, .NET is the only flavor with such support (and it looks quite ugly and is nightmarish to maintain).
You may be better off splitting the string on the pipe symbol, then rejoining the string, applying the desired formatting (via regex, if so desired) on the odd numbered string array items.
EDIT: On second thought, I believe this would be possible using a lookbehind with a variable-length expression, but unfortunately python does not have support for those. (For example, something along the lines of (?<=^(?:[^|]*\|[^|]*\|)*[^|]*)\|(\w+ ?)*\|)

Categories