Matching an empty paragraph at the end of HTML text [duplicate]

Matching an empty paragraph at the end of HTML text [duplicate] - python

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 2 years ago.
I have written the following pattern to match an empty paragraph at the end of HTML:
https://regex101.com/r/6TNgUV/1
But when I try the following Python code, the result is None
html_desc = '</span><p></p><p></p>'
res = re.match('(<p>){1}(\s)*(<br>|<br\/>){0,9}(\s)*(<\/p>){1}(\s)*$', html_desc)
# returns None
I am not able to understand the issue.

re.match matches starting with the first character, and since your HTML string starts with a tag, it returns the default case, None, maybe use re.search() instead of re.match()

Related

Python regex find everything between substring and first space [duplicate]

This question already has answers here:
Python non-greedy regexes
(7 answers)
Closed 2 years ago.
I have a string string = "radios label="Does the command/question above meet the Rules to the left?" name="tq_utt_test" validates="required" gold="true" aggregation="agg"" and I want to be able to extract the substring within the "name". So in this case I want to extract "tq_utt_test" because it is the substring inside name.
I've tried regex re.findall('name=(.*)\s', string) which I thought would extract everything after the substring name= and before the first space. But after running that regex, it actually returned "tq_utt_test" validates="required" gold="true". So seems like it's returning everything between name= and the last space, instead of everything between name= and first space.
Is there a way to twist this regex so that it returns everything after name= and before the first space?

I will just do re.findall('name=([^ ]*)\s', string)

Regex working in text editor(sublime) but not in python [duplicate]

This question already has answers here:
Case insensitive regular expression without re.compile?
(10 answers)
Closed 2 years ago.
I want to extract the line using regex.
The line that I want to extract from document is:
":method":"POST",":path":"/api/browser/projects/8bd4d1d3-0b69-515e-8e15-e9c49992f7d5/buckets/b-ao-mock-testing/copy
The regex I am using is:
":method"[:"a-z,/\d-]{20,1000}/copy
The code for the same in python is:
re.findall('":method"[:"a-z,/\d-]{20,1000}/copy', str(s), re.MULTILINE)
It is working perfectly fine in sublime text but not in python. It is returning an empty list in python. How to resolve this?

You need to use i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]).
Without this how will POST match?
or use ":method"[:"a-zA-Z,/\d-]{20,1000}/copy
See demo

re.findall return separate non-overlapping results [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 4 years ago.
I am new to Python and I am struggling a bit with regular expressions. If I have an input like this:
text = <tag>xyz</tag>\n<tag>abc</tag>
Is it possible to get an output list with elements like:
matches = ['<tag>xyz</tag>','<tag>abc</tag>]
Right now I am using the following regex
matches = re.findall(r"<tag>[\w\W]*</tag>", text)
But instead of a list with two elements I am getting only one element with the whole input string like:
matches = ['<tag>xyz</tag>\n<tag>abc</tag>']
Could someone please guide me?
Thank you.

You just need to make your capture non-greedy.
Change this regex,
<tag>[\w\W]*</tag>
to
<tag>[\w\W]*?</tag>
import re
text = '<tag>xyz</tag>\n<tag>abc</tag>'
matches = re.findall(r"<tag>[\w\W]*?</tag>", text)
print(matches)
Prints,
['<tag>xyz</tag>', '<tag>abc</tag>']

re.match in python to match pattern with string [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 7 years ago.
I am trying to match string with mypattern, somehow I do not get correct result. Can you please point where am I wrong?
import re
mypattern = '_U_[R|S]_data.csv'
string = 'X003_U_R_data.csv'
re.match(mypattern, string)

I like to compile the regex statement first. Then I do whatever kind of matching/searching I would like.
mypattern = re.compile(ur'_U_[R|S]_data.csv')
Then
re.search(mypattern, string)
Here's a great website for regex creation- https://regex101.com/#python

Python: Replacing strings [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 9 years ago.
I'm iterating through pages and I'd like to modify lines containing
<span class="font16"></span>
How can I correct the code below?
text = re.sub(r'<span class="font(.*)"></span><span', r'<span class="font\1"> </span><span', text)

The pattern .* will match anything until the end of line, so the match will look like this:
16"></span>....
which isn't what you want. Use a pattern that stops at the first " (since they aren't allowed inside attribute values which are quoted with "):
r'<span class="font([^"]+)"></span><span'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matching an empty paragraph at the end of HTML text [duplicate] - python

re.match matches starting with the first character, and since your HTML string starts with a tag, it returns the default case, None, maybe use re.search() instead of re.match()

Related

Python regex find everything between substring and first space [duplicate]

Regex working in text editor(sublime) but not in python [duplicate]

re.findall return separate non-overlapping results [duplicate]

re.match in python to match pattern with string [duplicate]

Python: Replacing strings [duplicate]

Categories

Resources