how to replace symbols using regex.sub in python

how to replace symbols using regex.sub in python - python

I have a string s, where:
s = 'id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<ABCRMGrade=[0]>>>BasicData:id=ABCvalue='
I want to replace ABC with DEF when ever
<<<ABC\w+=\[0]>>>
occurs then output should be
<<<DEF\w+=\[0]>>>
in text \w+ refers to RMGrade but this changes randomly
desired ouput is:
S = id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<ABCRMGrade=[0]>>>BasicData:id=ABCvalue=
i have tried in way of:
s = re.sub('<<<ABC\w+=\[0]>>>','<<<DEF\w+=\[0]>>>',s)
i'm output as
'id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<DEF\\w+=\\[0]>>>BasicData:id=ABCvalue='

I'm a bit confused what you exactly want to achieve. But if you want to replace ABC in every match of pattern <<<ABC\w+=\[0]>>>, then you can use backreferences to groups.
For example, modify pattern so that you can reference the groups (<<<)ABC(\w+=\[0]>>>). Now group#1 refers to the part before ABC and group#2 refers to part after ABC. So the replacement string looks like this - \1DEF\2 - where \1 is group#1 and \2 is group#2.
import re
s = 'id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<ABCRMGrade=[0]>>>BasicData:id=ABCvalue='
res = re.sub(r'(<<<)ABC(\w+=\[0]>>>)', r'\1DEF\2', s)
print(res)
The output: id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<DEFRMGrade=[0]>>>BasicData:id=ABCvalue=
You also can use function to define replacement. For more check in documentation.

Related

How to extract function name python regex

Hello I am trying to extract the function name in python using Regex however I am new to Python and nothing seems to be working for me. For example: if i have a string "def myFunction(s): ...." I want to just return myFunction
import re
def extractName(s):
string = []
regexp = re.compile(r"\s*(def)\s+\([^\)]*\)\s*{?\s*")
for m in regexp.finditer(s):
string += [m.group()]
return string

Assumption: You want the name myFunction from "...def myFunction(s):..."
I find something missing in your regex and the way it is structured.
\s*(def)\s+\([^\)]*\)\s*{?\s*
Lets look at it step by step:
\s*: match to zero or more white spaces.
(def): match to the word def.
\s+: match to one or more white spaces.
\([^\)]*\): match to balanced ()
\s*: match to zero or more white spaces.
After that pretty much doesn't matter if you are going for just the name of the function. You are not matching the exact thing you want out of the regex.
You can try this regex if you are interested in doing it by regex:
\s*(def)\s([a-zA-Z]*)\([a-zA-z]*\)
Now the way I have structured the regex, you will get def myFunction(s) in group0, def in group1 and myFunction in group2. So you can use the following code to get you result:
import re
def extractName(s):
string = ""
regexp = re.compile(r"(def)\s([a-zA-Z]*)\([a-zA-z]*\)")
for m in regexp.finditer(s):
string += m.group(2)
return string
You can check your regex live by going on this site.
Hope it helps!

Python split before a certain character

I have following string:
BUCKET1:/dir1/dir2/BUCKET1:/dir3/dir4/BUCKET2:/dir5/dir6
I am trying to split it in a way I would get back the following dict / other data structure:
BUCKET1 -> /dir1/dir2/, BUCKET1 -> /dir3/dir4/, BUCKET2 -> /dir5/dir6/
I can somehow split it if I only have one BUCKET, not multiple, like this:
res.split(res.split(':', 1)[0].replace('.', '').upper()) -> it's not perfect
Input: ADRIAN:/dir1/dir11/DANIEL:/dir2/ADI_BUCKET:/dir3/CULEA:/dir4/ADRIAN:/dir5/ADRIAN:/dir6/
Output: [(ADRIAN, /dir1/dir11), (DANIEL, /dir2/), (CULEA, /dir3/), (ADRIAN, /dir5/), (ADRIAN, /dir6/)
As per Wiktor Stribiżew comments, the following regex does the job:
r"(BUCKET1|BUCKET2):(.*?)(?=(?:BUCKET1|BUCKET2)|$)"

If you're experienced, I'd recommend learning Regex just as the others have suggested. However, if you're looking for an alternative, here's a way of doing such without Regex. It also produces the output you're looking for.
string = input("Enter:") #Put your own input here.
tempList = string.replace("BUCKET",':').split(":")
outputList = []
for i in range(1,len(tempList)-1,2):
someTuple = ("BUCKET"+tempList[i],tempList[i+1])
outputList.append(someTuple)
print(outputList) #Put your own output here.
This will produce:
[('BUCKET1', '/dir1/dir2/'), ('BUCKET1', '/dir3/dir4/'), ('BUCKET2', '/dir5/dir6')]
This code is hopefully easier to understand and manipulate if you're unfamiliar with Regex, although I'd still personally recommend Regex to solve this if you're familiar with how to use it.

Use re.findall() function:
s = "ADRIAN:/dir1/dir11/DANIEL:/dir2/ADI_BUCKET:/dir3/CULEA:/dir4/ADRIAN:/dir5/ADRIAN:/dir6/"
result = re.findall(r'(\w+):([^:]+\/)', s)
print(result)
The output:
[('ADRIAN', '/dir1/dir11/'), ('DANIEL', '/dir2/'), ('ADI_BUCKET', '/dir3/'), ('CULEA', '/dir4/'), ('ADRIAN', '/dir5/'), ('ADRIAN', '/dir6/')]

Use regex instead?
impore re
test = 'BUCKET1:/dir1/dir2/BUCKET1:/dir3/dir4/BUCKET2:/dir5/dir6'
output = re.findall(r'(?P<bucket>[A-Z0-9]+):(?P<path>[/a-z0-9]+)', test)
print(output)
Which gives
[('BUCKET1', '/dir1/dir2/'), ('BUCKET1', '/dir3/dir4/'), ('BUCKET2', '/dir5/dir6')]

It appears you have a list of predefined "buckets" that you want to use as boundaries for the records inside the string.
That means, the easiest way to match these key-value pairs is by matching one of the buckets, then a colon and then any chars not starting a sequence of chars equal to those bucket names.
You may use
r"(BUCKET1|BUCKET2):(.*?)(?=(?:BUCKET1|BUCKET2)|$)"
Compile with re.S / re.DOTALL if your values span across multiple lines. See the regex demo.
Details:
(BUCKET1|BUCKET2) - capture group one that matches and stores in .group(1) any of the bucket names
: - a colon
(.*?) - any 0+ chars, as few as possible (as *? is a lazy quantifier), up to the first occurrence of (but not inlcuding)...
(?=(?:BUCKET1|BUCKET2)|$) - any of the bucket names or end of string.
Build it dynamically while escaping bucket names (just to play it safe in case those names contain * or + or other special chars):
import re
buckets = ['BUCKET1','BUCKET2']
rx = r"({0}):(.*?)(?=(?:{0})|$)".format("|".join([re.escape(bucket) for bucket in buckets]))
print(rx)
s = "BUCKET1:/dir1/dir2/BUCKET1:/dir3/dir4/BUCKET2:/dir5/dir6"
print(re.findall(rx, s))
# => (BUCKET1|BUCKET2):(.*?)(?=(?:BUCKET1|BUCKET2)|$)
[('BUCKET1', '/dir1/dir2/'), ('BUCKET1', '/dir3/dir4/'), ('BUCKET2', '/dir5/dir6')]
See the online Python demo.

Is there a better way to swap string without a placeholder

I have a string:
>>> s = 'Y/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<X/PROPN/pobj_>,/PUNCT/punct'
And the aim is to change the position of Y/ to X/, i.e. something like:
>>> s.replace('X/', '##').replace('Y/', 'X/').replace('##', 'Y/')
'X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct'
Assuming that there'll be no conflict when doing the replacement, i.e. X/ and Y/ is unique and will only happen once each in the original string.
Is there a way to do the replacement without the placeholder? Currently, i'm swapping there position by using the ## placeholder.

In Python, an easy way using a regex is via a lambda in the re.sub replacement part where you can evaluate/check texts captured with capturing groups and select appropriate replacement:
So, (X|Y)/ (I assume X and Y are potentially multicharacter string placeholders, otherwise use ([XY])) should work:
import re
s = 'Y/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<X/PROPN/pobj_>,/PUNCT/punct'
print(s)
print(re.sub(r"(X|Y)/", lambda m: "Y/" if m.group(1) == 'X' else 'X/' , s))
Output:
Y/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<X/PROPN/pobj_>,/PUNCT/punct
X/NOUN/dobj>_hold/VERB/ROOT_<membership/NOUN/dobj_<with/ADP/prep_<Y/PROPN/pobj_>,/PUNCT/punct

Python replace regex

I have a string in which there are some attributes that may be empty:
[attribute1=value1, attribute2=, attribute3=value3, attribute4=]
With python I need to sobstitute the empty values with the value 'None'. I know I can use the string.replace('=,','=None,').replace('=]','=None]') for the string but I'm wondering if there is a way to do it using a regex, maybe with the ?P<name> option.

You can use
import re
s = '[attribute1=value1, attribute2=, attribute3=value3, attribute4=]'
re.sub(r'=(,|])', r'=None\1', s)
\1 is the match in parenthesis.

With python's re module, you can do something like this:
# import it first
import re
# your code
re.sub(r'=([,\]])', '=None\1', your_string)

You can use
s = '[attribute1=value1, attribute2=, attribute3=value3, attribute4=]'
re.sub(r'=(?!\w)', r'=None', s)
This works because the negative lookahead (?!\w) checks if the = character is not followed by a 'word' character. The definition of "word character", in regular expressions, is usually something like "a to z, 0 to 9, plus underscore" (case insensitive).
From your example data it seems all attribute values match this. It will not work if the values may start with something like a comma (unlikely), may be quoted, or may start with anything else. If so, you need a more fool proof setup, such as parse from the start: skipping the attribute name by locating the first = character.

Be specific and use a character class:
import re
string = "[attribute1=value1, attribute2=, attribute3=value3, attribute4=]"
rx = r'\w+=(?=[,\]])'
string = re.sub(rx, '\g<0>None', string)
print string
# [attribute1=value1, attribute2=None, attribute3=value3, attribute4=None]

Change pa$$word to pa\$\$word in Python

I have a string pa$$word. I want to change this string to pa\$\$word. This must be changed to 2 or more such characters only and not for pa$word. The replacement must happen n number of times where n is the number of "$" symbols. For example, pa$$$$word becomes pa\$\$\$\$word and pa$$$word becomes pa\$\$\$word.
How can I do it?

import re
def replacer(matchobj):
mat = matchobj.group()
return "".join(item for items in zip("\\" * len(mat), mat) for item in items)
print re.sub(r"((\$)\2+)", replacer, "pa$$$$word")
# pa\$\$\$\$word
print re.sub(r"((\$)\2+)", replacer, "pa$$$word")
# pa\$\$\$word
print re.sub(r"((\$)\2+)", replacer, "pa$$word")
# pa\$\$word
print re.sub(r"((\$)\2+)", replacer, "pa$word")
# pa$word
((\$)\2+) - We create two capturing groups here. First one is, the entire match as it is, which can be referred later as \1. The second capturing group is a nested one, which captures the string \$ and referred as \2. So, we first match $ once and make sure that it exists more than once, continuously by \2+.
So, when we find a string like that, we call replacer function with the matched string and the captured groups. In the replacer function, we get the entire matched string with matchobj.group() and then we simply interleave that matched string with \.

I believe the regex you're after is:
[$]{2,}
which will match 2 or more of the character $

this should help
import re
result = re.sub("\$", "\\$", yourString)
or you can try
str.replace("\$", "\\$")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to replace symbols using regex.sub in python - python

Related

How to extract function name python regex

Python split before a certain character

Is there a better way to swap string without a placeholder

Python replace regex

Change pa$$word to pa\$\$word in Python

Categories

Resources