How to get longest match of a pattern of overlapping options? [duplicate] - python

This question already has answers here:
How to extract longest of overlapping groups?
(4 answers)
Closed 8 months ago.
I am dealing with Python regular expressions where I am trying to get the longest match of a pattern that includes overlapping options.
Consider this example:
import re
task = "s290_fpga_simv_test_verilog"
pattern_str = "(s290|s290_fpga|s289|s289_fpga|s274|s274_fpga)"
result = re.match(pattern_str, task)
print(result.group(1))
It gives me the output s290 where I am expecting the longer s290_fpga. What is necessary to get the longest possible match?

Reverse your order of matches so you become less specific as you go to right. Your code is correct but the re.match() finds a match at s290 and then stops. If you want the result s290_fpgaswap your order to:
"(s290_fpga|s290 etc...)"

Related

Split python string in a specific way [duplicate]

This question already has answers here:
Split a string by a delimiter in python
(5 answers)
Match text between two strings with regular expression
(3 answers)
Closed 5 months ago.
I have a string like a = 'This is an example string that has a code !3377! this is the code I want to extract'.
How can I extract 3377 from this string, i.e., the part surrounded by !?
There are multiple ways of doing what you are looking for. But the most optimal way of doing it would be by using regular expressions.
For example, in the case you gave:
import re
def subtract_code_from(sentence: str) -> str:
m = re.search(r'\w?!(\d+)!\w?', sentence)
return m.group(0)
Keep in mind that what I've done is a very quick and loose solution I implemented in five minutes. I don't know what other types of particular cases you could encounter for each sentence. So it is your job to implement the proper regex to match all the cases.
I encourage you to follow this tutorial. And you can use this website to build your regexes.
Good luck.

python regex filter out exact string [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 3 years ago.
I'm trying to write a regex that filters out matches if they contain "plex" in them.
plex-release -> should not match
my-release -> should match
potato -> should match
Been playing with pythex and came up with this one that works partially:
(?![plex])(\w+)[-_](release|version)$
However this also messes with any other values containing the letter "p".
I'm trying to come up with a regex that leaves out matches that only contain the string "plex" and in this order, not just any letter from the string.
Yes, you can do it using this regex.
^((?!plex).)*$
Source : Regular expression to match a line that doesn't contain a word

Regex but just in substring [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I cant find the solution for a regex that looks for a pattern but only in a specific range of the string
I want to find $ $ but only if it is in the 5-7 position of the string and it doesnt matter which character is between those two
Example
xxxx$x$xxxxx would match
xx$x$xxxxxxx would not
import re
should = "xxxx$x$xxxxx would match"
shouldnt = "xx$x$xxxxxxx would not"
pattern = r'^.{4}\$.\$.+'
re.match(pattern, should)
re.match(pattern, shouldnt)
gives
match
None
https://regex101.com/r/RLHrZb/1

RegEx for finding words between dots [duplicate]

This question already has answers here:
How to find overlapping matches with a regexp?
(4 answers)
Closed 4 years ago.
I am new to RegEx and I want to use regular expression to find words between dots.
For example, the text is something like:
abc.efg.hij.klm.opq.
I tried with below RegEx:
\.(\w+)\.
It only show me 2 matches:
.efg.
.klm.
Why am I getting this result?
Here is the link to the RegEx: https://regex101.com/r/pqMN8t/1/
It only shows two matches because the regex engine will not match what it has already matched. After matching .efg., it won't match the dot before hij, because that dot has already been matched (the dot after efg).
One way to fix this is to not match the dots and use lookaheads and lookbehinds instead:
(?<=\.)\w+(?=\.)
This way, the dots won't get matched.

re.findall return separate non-overlapping results [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 4 years ago.
I am new to Python and I am struggling a bit with regular expressions. If I have an input like this:
text = <tag>xyz</tag>\n<tag>abc</tag>
Is it possible to get an output list with elements like:
matches = ['<tag>xyz</tag>','<tag>abc</tag>]
Right now I am using the following regex
matches = re.findall(r"<tag>[\w\W]*</tag>", text)
But instead of a list with two elements I am getting only one element with the whole input string like:
matches = ['<tag>xyz</tag>\n<tag>abc</tag>']
Could someone please guide me?
Thank you.
You just need to make your capture non-greedy.
Change this regex,
<tag>[\w\W]*</tag>
to
<tag>[\w\W]*?</tag>
import re
text = '<tag>xyz</tag>\n<tag>abc</tag>'
matches = re.findall(r"<tag>[\w\W]*?</tag>", text)
print(matches)
Prints,
['<tag>xyz</tag>', '<tag>abc</tag>']

Categories