Replace string which has dynamic character in python

Replace string which has dynamic character in python - python

Trying to replace the string with regular expression and could not success.
The strings are "LIVE_CUS2_PHLR182" ,"LIVE_CUS2ee_PHLR182" and "PHLR182 - testing recovery".Here I need to get PHLR182 as an output with all the string but where second string has "ee" which is not constant. It can be string or number with 2 character.Below is the code what I have tried.
For first and last string I just simply used replace function like below.
s = "LIVE_CUS2_PHLR182"
s.replace("LIVE_CUS2_", ""), s.replace(" - testing recovery","")
>>> PHLR182
But for second I tried like below.
1. s= "LIVE_CUS2ee_PHLR182"
s.replace(r'LIVE_CUS2(\w+)*_','')
2. batRegex = re.compile(r'LIVE_CUS2(\w+)*_PHLR182')
mo2 = batRegex.search('LIVE_CUS2dd_PHLR182')
mo2.group()
3. re.sub(r'LIVE_CUS2(?is)/s+_PHLR182', '', r)
In all case I could not get "PHLR182" as an output. Please help me.

I think this is what you need:
import re
texts = """LIVE_CUS2_PHLR182
LIVE_CUS2ee_PHLR182
PHLR182 - testing recovery""".split('\n')
pat = re.compile(r'(LIVE_CUS2\w{,2}_| - testing recovery)')
# 1st alt pattern | 2nd alt pattern
# Look for 'LIV_CUS2_' with up to two alphanumeric characters after 2
# ... or Look for ' - testing recovery'
results = [pat.sub('', text) for text in texts]
# replace the matched pattern with empty string
print(f'Original: {texts}')
print(f'Results: {results}')
Result:
Original: ['LIVE_CUS2_PHLR182', 'LIVE_CUS2ee_PHLR182', 'PHLR182 - testing recovery']
Results: ['PHLR182', 'PHLR182', 'PHLR182']
Python Demo: https://repl.it/repls/ViolentThirdAutomaticvectorization
Regex Demo: https://regex101.com/r/JiEVqn/2

Related

How to start at a specific letter and end when it hits a digit?

I have some sample strings:
s = 'neg(able-23, never-21) s2-1/3'
i = 'amod(Market-8, magical-5) s1'
I've got the problem where I can figure out if the string has 's1' or 's3' using:
word = re.search(r's\d$', s)
But if I want to know if the contains 's2-1/3' in it, it won't work.
Is there a regex expression that can be used so that it works for both cases of 's#' and 's#+?
Thanks!

You can allow the characters "-" and "/" to be captured as well, in addition to just digits. It's hard to tell the exact pattern you're going for here, but something like this would capture "s2-1/3" from your example:
import re
s = "neg(able-23, never-21) s2-1/3"
word = re.search(r"s\d[-/\d]*$", s)

I'm guessing that maybe you would want to extract that with some expression, such as:
(s\d+)-?(.*)$
Demo 1
or:
(s\d+)-?([0-9]+)?\/?([0-9]+)?$
Demo 2
Test
import re
expression = r"(s\d+)-?(.*)$"
string = """
neg(able-23, never-21) s211-12/31
neg(able-23, never-21) s2-1/3
amod(Market-8, magical-5) s1
"""
print(re.findall(expression, string, re.M))
Output
[('s211', '12/31'), ('s2', '1/3'), ('s1', '')]

get all occurence of a regex in string python

I am trying to find in the following string TreeModel/Node/Node[1]/Node[4]/Node[1] this :
TreeModel/Node
TreeModel/Node/Node[1]
TreeModel/Node/Node[1]/Node[4]
TreeModel/Node/Node[1]/Node[4]/Node[1]
Using regular expression in python. Here is the code I tried:
string = 'TreeModel/Node/Node[1]/Node[4]/Node[1]'
pattern = r'.+?Node\[[1-9]\]'
print re.findall(pattern=pattern,string=string)
#result : ['TreeModel/Node/Node[1]', '/Node[4]', '/Node[1]']
#expected result : ['TreeModel/Node', 'TreeModel/Node/Node[1]', 'TreeModel/Node/Node[1]/Node[4]', 'TreeModel/Node/Node[1]/Node[4]/Node[1]']

You can use split here:
>>> s = 'TreeModel/Node/Node[1]/Node[4]/Node[1]'
>>> split_s = s.split('/')
>>> ['/'.join(split_s[:i]) for i in range(2, len(split_s)+1)]
['TreeModel/Node',
'TreeModel/Node/Node[1]',
'TreeModel/Node/Node[1]/Node[4]',
'TreeModel/Node/Node[1]/Node[4]/Node[1]']
You can also use regex:
for i in range(2, s.count('/')+2):
s_ = '[^/]+/*'
regex = re.search(r'('+s_*i+')', s).group(0)
print(regex)
TreeModel/Node/
TreeModel/Node/Node[1]/
TreeModel/Node/Node[1]/Node[4]/
TreeModel/Node/Node[1]/Node[4]/Node[1]

I'm not good in Python at all but for regex part with your specific structure of string below regex matches each segment:
/?(?:{[^{}]*})?[^/]+
Where braces and preceding / is optional. It matches a slash mark (if any) then braces with their content (if any) then the rest up to next slash mark.
Python code (see live demo here):
matches = re.findall(r'/?(?:{[^{}]*})?[^/]+', string)
output = ''
for i in range(len(matches)):
output += matches[i];
print(output)

Splitting a string using re module of python

I have a string
s = 'count_EVENT_GENRE in [1,2,3,4,5]'
#I have to capture only the field 'count_EVENT_GENRE'
field = re.split(r'[(==)(>=)(<=)(in)(like)]', s)[0].strip()
#o/p is 'cou'
# for s = 'sum_EVENT_GENRE in [1,2,3,4,5]' o/p = 'sum_EVENT_GENRE'
which is fine
My doubt is for any character in (in)(like) it is splitting the string s at that character and giving me first slice.(as after "cou" it finds one matching char i:e n). It's happening for any string that contains any character from (in)(like).
Ex : 'percentage_AMOUNT' o/p = 'p'
as it finds a matching char as 'e' after p.
So i want some advice how to treat (in)(like) as words not as characters , when splitting occurs/matters.
please suggest a syntax.

Answering your question, the [(==)(>=)(<=)(in)(like)] is a character class matching single characters you defined inside the class. To match sequences of characters, you need to remove [ and ] and use alternation:
r'==?|>=?|<=?|\b(?:in|like)\b'
or better:
r'[=><]=?|\b(?:in|like)\b'
You code would look like:
import re
ss = ['count_EVENT_GENRE in [1,2,3,4,5]','coint_EVENT_GENRE = "ROMANCE"']
for s in ss:
field = re.split(r'[=><]=?|\b(?:in|like)\b', s)[0].strip()
print(field)
However, there might be other (easier, or safer - depending on the actual specifications) ways to get what you want (splitting with space and getting the first item, use re.match with r'\w+' or r'[a-z]+(?:_[A-Z]+)+', etc.)
If your value is at the start of the string and starts with lowercase ASCII letters, and then can have any amount of sequences of _ followed with uppercase ASCII letters, use:
re.match(r'[a-z]+(?:_[A-Z]+)*', s)
Full demo code:
import re
ss = ['count_EVENT_GENRE in [1,2,3,4,5]','coint_EVENT_GENRE = "ROMANCE"']
for s in ss:
fieldObj = re.match(r'[a-z]+(?:_[A-Z]+)*', s)
if fieldObj:
print(fieldObj.group())

If you want only the first word of your string, then this should do the job:
import re
s = 'count_EVENT_GENRE in [1,2,3,4,5]'
field = re.split(r'\W', s)[0]
# count_EVENT_GENRE

Is there anything wrong with using split?
>>> s = 'count_EVENT_GENRE in [1,2,3,4,5]'
>>> s.split(' ')[0]
'count_EVENT_GENRE'
>>> s = 'coint_EVENT_GENRE = "ROMANCE"'
>>> s.split(' ')[0]
'coint_EVENT_GENRE'
>>>

Regex match everything between special tag

I have the following string that I need to parse and get the values of anything inside the defined \$ tags
for example, the string
The following math equation: \$f(x) = x^2\$ is the same as \$g(x) = x^(4/2) \$
I want to parse whatever is in between the \$ tags, so that the result will contain both equations
'f(x) = x^2'
'g(x) = x^(4/2) '
I tried something like re.compile(r'\\\$(.)*\\$') but it didnt work.

You almost got it, just missing a backslash and a question mark (so it stops as soon as it finds the second \$ and doesn't match the longest string possible): r'\\\$(.*?)\\\$'
>>> pattern = r'\\\$(.*?)\\\$'
>>> data = "The following math equation: \$f(x) = x^2\$ is the same as \$g(x) = x^(4/2) \$"
>>> re.findall(pattern, data)
['f(x) = x^2', 'g(x) = x^(4/2) ']

That regex can fit:
/\\\$.{0,}\\\$/g
/ - begin
\\\$ - escaped: \$
. - any character between
{0,} - at least 0 chars (any number of chars, actually)
\\\$ - escaped: \$
/ - end
g - global search

This works:
import re
regex = r'\\\$(.*)\\\$'
r = re.compile(regex)
print r.match("\$f(x) = x^2\$").group(1)
print r.match("\$g(x) = x^(4/2) \$").group(1)

Breaking up substrings in Python based on characters

I am trying to write code that will take a string and remove specific data from it. I know that the data will look like the line below, and I only need the data within the " " marks, not the marks themselves.
inputString = 'type="NN" span="123..145" confidence="1.0" '
Is there a way to take a Substring of a string within two characters to know the start and stop points?

You can extract all the text between pairs of " characters using regular expressions:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
pat=re.compile('"([^"]*)"')
while True:
mat=pat.search(inputString)
if mat is None:
break
strings.append(mat.group(1))
inputString=inputString[mat.end():]
print strings
or, easier:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
strings=re.findall('"([^"]*)"', inputString)
print strings
Output for both versions:
['NN', '123..145', '1.0']

fields = inputString.split('"')
print fields[1], fields[3], fields[5]

You could split the string at each space to get a list of 'key="value"' substrings and then use regular expressions to parse the substrings.
Using your input string:
>>> input_string = 'type="NN" span="123..145" confidence="1.0" '
>>> input_string_split = input_string.split()
>>> print input_string_split
[ 'type="NN"', 'span="123..145"', 'confidence="1.0"' ]
Then use regular expressions:
>>> import re
>>> pattern = r'"([^"]+)"'
>>> for substring in input_string_split:
match_obj = search(pattern, substring)
print match_obj.group(1)
NN
123..145
1.0
The regular expression '"([^"]+)"' matches anything within quotation marks (provided there is at least one character). The round brackets indicate the bit of the regular expression that you are interested in.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace string which has dynamic character in python - python

Related

How to start at a specific letter and end when it hits a digit?

get all occurence of a regex in string python

Splitting a string using re module of python

Regex match everything between special tag

Breaking up substrings in Python based on characters

Categories

Resources