Using variable in regular expression in Python [duplicate] - python

This question already has answers here:
How to use a variable inside a regular expression?
(12 answers)
Closed 2 years ago.
I've looked at several posts and other forums to find an answer related to my question, but nothing has come up specific to what I need. As a heads up, I'm new to programming and don't possess the basic foundation that most would.
I know bash, little python, and decent with RE.
I'm trying to create a python script, using RE's to parse through data and give me an output that I need/want.
My output will consist of 4 values, all originating from one line. The line being read in is thrown together with no defined delimiter. (hence the reason for my program)
In order to find one of the 4 values, I have to say look for 123- and give me everything after that but stop here df5. The 123- is not constant, but defined by a regular expression that works, same goes for df5. I assigned both RE's to a variable. How can I use those variables to find what I want between the two... Please let me know if this makes sense.

import re
start = '123-'
stop = 'df5'
regex = re.compile('{0}(.*?){1}'.format(re.escape(start), re.escape(stop)))
Note that the re.escape() calls aren't necessary for these example strings, but it is important if your delimiters can ever include characters with a special meaning in regex (., *, +, ? etc.).

How about a pattern "%s(.*?)%s" % (oneTwoThree, dF5)? Then you can do a re.search on that pattern and use the groups function on the result.
Something on the lines of
pattern = "%s(.*?)%s" % (oneTwoThree, dF5)
matches = re.search(pattern, text)
if matches:
print matches.groups()
re.findall, if used instead of re.search, can save you the trouble of grouping the matches.

Related

How does this regex remove punctuation pattern work? [duplicate]

This question already has answers here:
Carets in Regular Expressions
(2 answers)
Closed 11 months ago.
I'm currently learning a bit of regex in python in a course I'm doing online and I'm struggling to understand a particular expression - I've been searching the python re docs and not sure why I'm returning the non-punctuation elements rather than the punctuation.
The code is:
import re
test_phrase = "This is a sentence, with! unnecessary: punctuation."
punc_remove = re.findall(r'[^,!:]+',test_phrase)
punc_reomve
OUTPUT: ['This is a sentence',' with',' unnecessary',' punctuation.']
I think I understand what each character does. I.e. [] is a character set, and ^ means starts with. So anything starting with ,!: will be returned? (or at least that's how I'm probably mistakingly interpreting it) And the + will return one of more of the pattern. But why is the output not returning something like:
OUTPUT: [', with','! unnecessary',': punctuation.']
Any explanation really appreciated!
Inside a character class, a ^ does not mean ‘start with’: it means ‘not’. So the RegEx matches sequences of one or more non-,1: characters.

How can I find multiple of the same format in Python? [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
For a little idea of what the project is, I'm trying to make a markup language that compiles to HTML/CSS. I plan on formatting links like this: #(link mask)[(link url)], and I want to find all occurrences of this and get both the link mask and the link url.
I tried using this code for it:
re.search("#(.*)\[(.*)\]", string)
But it started at the beginning of the first instance, and ended at the end of the last instance of a link. Any ideas how I can have it find all of them, in a list or something?
The default behavior of a regular expression is "greedy matching". This means each .* will match as many characters as it can.
You want them to instead match the minimal possible number of characters. To do that, change each .* into a .*?. The final question mark will make the pattern match the minimal number of characters. Because you anchor your pattern to a ] character, it will still match/consume the whole link correctly.
* is greedy: it matches as many characters as it can, e.g. up to the last right parenthesis in your document. (After all, . means "any character" and ) is 'any character" as much as any other character.)
You need the non-greedy version of *, which is *?. (Probably actually you should use +?, as I don't think zero-length matches would be very useful).

Regex not working to get string between 2 strings. Python 27 [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 3 years ago.
From this URL view-source:https://www.amazon.com/dp/073532753X?smid=A3P5ROKL5A1OLE
I want to get string between var iframeContent = and obj.onloadCallback = onloadCallback;
I have this regex iframeContent(.*?)obj.onloadCallback = onloadCallback;
But it does not work. I am not good at regex so please pardon my lack of knowledge.
I even tried iframeContent(.*?)obj.onloadCallback but it does not work.
It looks like you just want that giant encoded string. I believe yours is failing for two reasons. You're not running in DOTALL mode, which means your . won't match across multiple lines, and your regex is failing because of catastrophic backtracking, which can happen when you have a very long variable length match that matches the same characters as the ones following it.
This should get what you want
m = re.search(r'var iframeContent = \"([^"]+)\"', html_source)
print m.group(1)
The regex is just looking for any characters except double quotes [^"] in between two double quotes. Because the variable length match and the match immediately after it don't match any of the same characters, you don't run into the catastrophic backtracking issue.
I suspect that input string lies across multiple lines.Try adding re.M in search line (ie. re.findall('someString', text_Holder, re.M)).
You could try this regex too
(?<=iframeContent =)(.*)(?=obj.onloadCallback = onloadCallback)
you can check at this site the test.
Is it very important you use DOTALL mode, which means that you will have single-line

Python regex search function wierd behavior [duplicate]

This question already has answers here:
ip address validation in python using regex [duplicate]
(5 answers)
Closed 6 years ago.
I'm making a code to retrieve IP adresses in a text file, but i have an issue with the regex part:
import re
print re.search(r"[0-255].[0-255].[0-255].[0-255]","5.39.0.0")
this returns None but it should return <_sre.SRE_Match object at 0x0000000001D7C510> (because "5.39.0.0" matches with the expression). If I replace 39 with 0 it works.
Your regular expression wont wort for many reasons (see the comments).
The dots indicate that any character can be used you want \.
Try this regular expression:
(?:(?:[01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])
As a reference check this site, there are similar examples:
Regular Experession Examples
P.S.: There are several testing sites on the web for regular expressions, you should use them, for example: Regex101
Edit: The conditional options in the last group must be inverted, if not the match of 2** will get with the two first characters throught first condition, ex: 255.255.255.250 will be matched as 255.255.255.25 (the last digit is lost). Also using non capturing groups in regular expressions is recomended in cases where individual groups (used for alternatives or counting) have no meaning or are not needed.
Ok, i forgot some important stuffs, here is the solution:
[0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}

I need help figuring out some Python Regex

I have tried to properly wrap my head around the below but I still have big hole in my reasoning. What is ?::, and could someone explain it properly for me
rule_syntax = re.compile('(\\\\*)'\
'(?:(?::([a-zA-Z_][a-zA-Z_0-9]*)?()(?:#(.*?)#)?)'\
'|(?:<([a-zA-Z_][a-zA-Z_0-9]*)?(?::([a-zA-Z_]*)'\
'(?::((?:\\\\.|[^\\\\>]+)+)?)?)?>))')
There are two tools that you may wish to look into to help with your understanding
Regexper creates a visual representation of regex, here's yours:
Regexpal is a tool that allows you to input a regex and various strings and see what matches, here's yours with some example matches
(?:expr) is just like normal parentheses (expr), except that for purposes of retrieving groups later (backreferences, re.sub, or MatchObject.group), parenthesized groups beginning with ?: are excluded. This can be useful if you need to capture a complex expression in parentheses to apply another operator like * to it, but don't want to get it mixed in with groups that you'll actually need to retrieve later.'
?:: is simply ?: followed by a literal :.

Categories