Replace paranthesis with "_" leaving all the contents as it is - python

I have a verilog file in which some inputs and ouputs are named as 133GAT(123).For example
nand2 g679(.a(n752), .b(n750), .O(1355GAT(558) ));
Here, I have to only replace 1355GAT(558) with 1355GAT_588 and not for .a(n752) There are multiple such instance.
I tried with python3.
re.sub(r'GAT*\((\w+)\)',r'_\1',"nand2 g679(.a(n752), .b(n750), .O(1355GAT(558) ) ")
It is giving output as
'nand2 g679(.a(n752), .b(n750), .O(1355_558 ) '
My expectation is to get the output as
'nand2 g679(.a(n752), .b(n750), .O(1355GAT_558 ) '

Why your code is not giving you expected results
Your regex GAT*\((\w+)\) matches GA, GAT, GATT, etc., and while it matches GAT in your string, you're effectively replacing it with your substitution since you never capture it and include it again in the substitution.
Regex 1
This works and gives you the option to check for digits before GAT.
See this regex in use here
# regex
(\d+GAT)\((\d+)\)
# replacement
\1_\2
Code 1
See code in use here
import re
s = "nand2 g679(.a(n752), .b(n750), .O(1355GAT(558) ));"
r = r'(\d+GAT)\((\d+)\)'
x = re.sub(r,r'\1_\2',s)
print(x)
Regex 2
This works too, but uses one capture group rather than two.
See this regex in use here
# regex
(?<=\dGAT)\((\d+)\)
# replacement
_\1
Code 2
See code in use here
import re
s = "nand2 g679(.a(n752), .b(n750), .O(1355GAT(558) ));"
r = r'(?<=\dGAT)\((\d+)\)'
x = re.sub(r,r'_\1',s)
print(x)

Related

How to start at a specific letter and end when it hits a digit?

I have some sample strings:
s = 'neg(able-23, never-21) s2-1/3'
i = 'amod(Market-8, magical-5) s1'
I've got the problem where I can figure out if the string has 's1' or 's3' using:
word = re.search(r's\d$', s)
But if I want to know if the contains 's2-1/3' in it, it won't work.
Is there a regex expression that can be used so that it works for both cases of 's#' and 's#+?
Thanks!
You can allow the characters "-" and "/" to be captured as well, in addition to just digits. It's hard to tell the exact pattern you're going for here, but something like this would capture "s2-1/3" from your example:
import re
s = "neg(able-23, never-21) s2-1/3"
word = re.search(r"s\d[-/\d]*$", s)
I'm guessing that maybe you would want to extract that with some expression, such as:
(s\d+)-?(.*)$
Demo 1
or:
(s\d+)-?([0-9]+)?\/?([0-9]+)?$
Demo 2
Test
import re
expression = r"(s\d+)-?(.*)$"
string = """
neg(able-23, never-21) s211-12/31
neg(able-23, never-21) s2-1/3
amod(Market-8, magical-5) s1
"""
print(re.findall(expression, string, re.M))
Output
[('s211', '12/31'), ('s2', '1/3'), ('s1', '')]

Python RegexHelp

I have an sentence and want to run the regex on it, to match a word.
Test Inputs :
This is about CHG6784532
Starting CHG4560986.
Code Snippet:
regVal = re.compile(r"(CHG\w+)")
for i in text:
if regVal.search(i):
print(i)
Desired Output:
CHG4560986 ( NOT CHG4560986.)
The output the for the first input is apt, it prints "CHG6784532" but the second prints "CHG4560986.",I tried adding ^ $ to the regex but still its not helping. Is there something I am missing here.
Thanks!
Make sure text is a string variable (if it is a list use " ".join(text) instead of text in the code below) and then you may use
import re
text="This is about CHG6784532\nStarting CHG4560986."
regVal = re.compile(r"CHG\w+")
res = regVal.findall(text)
print(res)
# => ['CHG6784532', 'CHG4560986']
See the Python demo.
Details
regVal = re.compile(r"CHG\w+") - the regVal variable is declared that holds the CHG\w+ pattern: it matches CHG and then 1+ word chars
res = regVal.findall(text) finds all the matching substrings in text variable and saves them in res variable

how to replace symbols using regex.sub in python

I have a string s, where:
s = 'id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<ABCRMGrade=[0]>>>BasicData:id=ABCvalue='
I want to replace ABC with DEF when ever
<<<ABC\w+=\[0]>>>
occurs then output should be
<<<DEF\w+=\[0]>>>
in text \w+ refers to RMGrade but this changes randomly
desired ouput is:
S = id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<ABCRMGrade=[0]>>>BasicData:id=ABCvalue=
i have tried in way of:
s = re.sub('<<<ABC\w+=\[0]>>>','<<<DEF\w+=\[0]>>>',s)
i'm output as
'id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<DEF\\w+=\\[0]>>>BasicData:id=ABCvalue='
I'm a bit confused what you exactly want to achieve. But if you want to replace ABC in every match of pattern <<<ABC\w+=\[0]>>>, then you can use backreferences to groups.
For example, modify pattern so that you can reference the groups (<<<)ABC(\w+=\[0]>>>). Now group#1 refers to the part before ABC and group#2 refers to part after ABC. So the replacement string looks like this - \1DEF\2 - where \1 is group#1 and \2 is group#2.
import re
s = 'id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<ABCRMGrade=[0]>>>BasicData:id=ABCvalue='
res = re.sub(r'(<<<)ABC(\w+=\[0]>>>)', r'\1DEF\2', s)
print(res)
The output: id=,value=<<<,RMOrigin=[0]>>>BasicData:id=ABCvalue=<<<DEFRMGrade=[0]>>>BasicData:id=ABCvalue=
You also can use function to define replacement. For more check in documentation.

Using regex to extract information from string

I am trying to write a regex in Python to extract some information from a string.
Given:
"Only in Api_git/Api/folder A: new.txt"
I would like to print:
Folder Path: Api_git/Api/folder A
Filename: new.txt
After having a look at some examples on the re manual page, I'm still a bit stuck.
This is what I've tried so far
m = re.match(r"(Only in ?P<folder_path>\w+:?P<filename>\w+)","Only in Api_git/Api/folder A: new.txt")
print m.group('folder_path')
print m.group('filename')
Can anybody point me in the right direction??
Get the matched group from index 1 and 2 using capturing groups.
^Only in ([^:]*): (.*)$
Here is demo
sample code:
import re
p = re.compile(ur'^Only in ([^:]*): (.*)$')
test_str = u"Only in Api_git/Api/folder A: new.txt"
re.findall(p, test_str)
If you want to print in the below format then try with substitution.
Folder Path: Api_git/Api/folder A
Filename: new.txt
DEMO
sample code:
import re
p = re.compile(ur'^Only in ([^:]*): (.*)$')
test_str = u"Only in Api_git/Api/folder A: new.txt"
subst = u"Folder Path: $1\nFilename: $2"
result = re.sub(p, subst, test_str)
Your pattern: (Only in ?P<folder_path>\w+:?P<filename>\w+) has a few flaws in it.
The ?P construct is only valid as the first bit inside a parenthesized expression,
so we need this.
(Only in (?P<folder_path>\w+):(?P<filename>\w+))
The \w character class is only for letters and underscores. It won't match / or ., for example. We need to use a different character class that more closely aligns with requirements. In fact, we can just use ., the class of nearly all characters:
(Only in (?P<folder_path>.+):(?P<filename>.+))
The colon has a space after it in your example text. We need to match it:
(Only in (?P<folder_path>.+): (?P<filename>.+))
The outermost parentheses are not needed. They aren't wrong, just not needed:
Only in (?P<folder_path>.+): (?P<filename>.+)
It is often convenient to provide the regular expression separate from the call to the regular expression engine. This is easily accomplished by creating a new variable, for example:
regex = r'Only in (?P<folder_path>.+): (?P<filename>.+)'
... # several lines later
m = re.match(regex, "Only in Api_git/Api/folder A: new.txt")
The above is purely for the convenience of the programmer: it neither saves nor squanders time or memory space. There is, however, a technique that can save some of the time involved in regular expressions: compiling.
Consider this code segment:
regex = r'Only in (?P<folder_path>.+): (?P<filename>.+)'
for line in input_file:
m = re.match(regex, line)
...
For each iteration of the loop, the regular expression engine must interpret the regular expression and apply it to the line variable. The re module allows us to separate the interpretation from the application; we can interpret once but apply several times:
regex = re.compile(r'Only in (?P<folder_path>.+): (?P<filename>.+)')
for line in input_file:
m = re.match(regex, line)
...
Now, your original program should look like this:
regex = re.compile(r'Only in (?P<folder_path>.+): (?P<filename>.+)')
m = re.match(regex, "Only in Api_git/Api/folder A: new.txt")
print m.group('folder_path')
print m.group('filename')
However, I'm a fan of using comments to explain regular expressions. My version, including some general cleanup, looks like this:
import re
regex = re.compile(r'''(?x) # Verbose
Only\ in\ # Literal match
(?P<folder_path>.+) # match longest sequence of anything, and put in 'folder_path'
:\ # Literal match
(?P<filename>.+) # match longest sequence of anything and put in 'filename'
''')
with open('diff.out') as input_file:
for line in input_file:
m = re.match(regex, line)
if m:
print m.group('folder_path')
print m.group('filename')
It really depends on the limitation of the input, if this is the only input this will do the trick.
^Only in (?P<folder_path>[a-zA-Z_/ ]*): (?P<filename>[a-z]*.txt)$

python regex for repeating string

I am wanting to verify and then parse this string (in quotes):
string = "start: c12354, c3456, 34526; other stuff that I don't care about"
//Note that some codes begin with 'c'
I would like to verify that the string starts with 'start:' and ends with ';'
Afterward, I would like to have a regex parse out the strings. I tried the following python re code:
regx = r"start: (c?[0-9]+,?)+;"
reg = re.compile(regx)
matched = reg.search(string)
print ' matched.groups()', matched.groups()
I have tried different variations but I can either get the first or the last code but not a list of all three.
Or should I abandon using a regex?
EDIT: updated to reflect part of the problem space I neglected and fixed string difference.
Thanks for all the suggestions - in such a short time.
In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).
Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text).
You could use the standard string tools, which are pretty much always more readable.
s = "start: c12354, c3456, 34526;"
s.startswith("start:") # returns a boolean if it starts with this string
s.endswith(";") # returns a boolean if it ends with this string
s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "
This can be done (pretty elegantly) with a tool like Pyparsing:
from pyparsing import Group, Literal, Optional, Word
import string
code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:
with open('lines.txt', 'r') as f:
for line in f:
try:
result = parser.parseString(line)
codes = [c[1] for c in result[1:-1]]
# Do something with teh codez...
except ParseException exc:
# Oh noes: string doesn't match!
continue
Cleaner than a regular expression, returns a list of codes (no need to string.split), and ignores any extra characters in the line, just like your example.
import re
sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')
mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
res = re.findall(slst, match.group(0))
results in
['12354', '3456', '34526']

Categories