I'm trying to convert a perl regex to python equivalent.
Line in perl:
($Cur) = $Line =~ m/\s*\<stat\>(.+)\<\/stat\>\s*$/i;
What I've attempted, but doesn't seem to work:
m = re.search('<stat>(.*?)</stat>/i', line)
cur = m.group(0)
almost /i means case insensitive
m = re.search(r'<stat>(.*?)</stat>',line,re.IGNORECASE)
also use the r modifier on the string so you dont need to escape stuff like angle brackets.
but my guess is a better solution is to use an html/xml parser like beautifulsoup or other similar packages
Something like the following ...
r is Python’s raw string notation for regex patterns and to avoid escaping, after the prefix comes your regular expression following your string data. re.I is used for case-insensitive matching.
See the re documentation explaining this in more detail.
To find your match, you could use the group() method of MatchObject like the following:
cur = re.search(r'<stat>([^<]*)</stat>', line).group(1)
Using search() matches only the first occurrence, use findall() to match all occurrences.
matches = re.findall(r'<stat>([^<]*)</stat>', line)
Related
I want to replace some "markdown" tags into html tags.
for example:
#Title1#
##title2##
Welcome to **My Home Page**
will be turned into
<h1>Title1</h1>
<h2>title2</h2>
Welcome to <b>My Home Page</b>
I just don't know how to do that...For Title1,I tried this:
#!/usr/bin/env python3
import re
text = '''
#Title1#
##title2##
'''
p = re.compile('^#\w*#\n$')
print(p.sub('<h1>\w*</h1>',text))
but nothing happens..
#Title1#
##title2##
How could those bbcode/markdown language come into html tags?
Check this regex: demo
Here you can see how I substituted the #...# into <h1>...</h1>.
I believe you can get this to work with double # and so on to get other markdown features considered, but still you should listen to #Thomas and #nhahtdh comments and use a markdown parser. Using regexes in such cases is unreliable, slow and unsafe.
As for inline text like **...** to <b>...</b> you can try this regex with substitution: demo. Hope you can twink this for other features like underlining and so on.
Your regular expression does not work because in the default mode, ^ and $ (respectively) matches the beginning and the end of the whole string.
'^'
(Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline (my emph.)
'$'
Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.
(7.2.1. Regular Expression Syntax)
Add the flag re.MULTILINE in your compile line:
p = re.compile('^#(\w*)#\n$', re.MULTILINE)
and it should work – at least for single words, such as your example. A better check would be
p = re.compile('^#([^#]*)#\n$', re.MULTILINE)
– any sequence that does not contain a #.
In both expressions, you need to add parentheses around the part you want to copy so you can use that text in your replacement code. See the official documentation on Grouping for that.
I have a list in python with some strings, and I need to know witch item in the list is like "A1_8301". This "_" means that can be any char.
Is there a quick way to do that?
If I was using SQL, i just type something like "where x like "A1_8301"
Thank you!
In Python you'd use a regular expression:
import re
pattern = re.compile(r'^A1.8301$')
matches = [x for x in yourlist if pattern.match(x)]
This produces a list of elements that match your requirements.
The ^ and $ anchors are needed to prevent substring matches; BA1k8301-42 should not match, for example. The re.match() call will only match at the start of the tested string, but using ^ makes this a little more explicit and mirrors the $ for the end-of-string anchor nicely.
The _ in a SQL like is translated to ., meaning match one character.
regular expressions are probably the way to go. IIRC, % should map to .* and _ should map to ..
matcher = re.compile('^A1.8301$')
list_of_string = [s for s in stringlist if matcher.match(s)]
I'm trying to match either # or the string at, like for name#email and nameatemail. I imagine it's something like
regex = '#|at'
or
regex = '#|(at)'
but I just can't find the right syntax.
I suggest you use Kodos to test your regular expressions (it also provides you with Python code for your regex). And this for regular expression info.
For your issue both regex works correctly:
match = re.search("#|at", subject)
if match:
result = match.group()
I have a Perl regular expression (shown here, though understanding the whole thing isn't hopefully necessary to answering this question) that contains the \G metacharacter. I'd like to translate it into Python, but Python doesn't appear to support \G. What can I do?
Try these:
import re
re.sub()
re.findall()
re.finditer()
for example:
# Finds all words of length 3 or 4
s = "the quick brown fox jumped over the lazy dogs."
print re.findall(r'\b\w{3,4}\b', s)
# prints ['the','fox','over','the','lazy','dogs']
Python does not have the /g modifier for their regexen, and so do not have the \G regex token. A pity, really.
You can use re.match to match anchored patterns. re.match will only match at the beginning (position 0) of the text, or where you specify.
def match_sequence(pattern,text,pos=0):
pat = re.compile(pattern)
match = pat.match(text,pos)
while match:
yield match
if match.end() == pos:
break # infinite loop otherwise
pos = match.end()
match = pat.match(text,pos)
This will only match pattern from the given position, and any matches that follow 0 characters after.
>>> for match in match_sequence(r'[^\W\d]+|\d+',"he11o world!"):
... print match.group()
...
he
11
o
I know I'm little late, but here's an alternative to the \G approach:
import re
def replace(match):
if match.group(0)[0] == '/': return match.group(0)
else: return '<' + match.group(0) + '>'
source = '''http://a.com http://b.com
//http://etc.'''
pattern = re.compile(r'(?m)^//.*$|http://\S+')
result = re.sub(pattern, replace, source)
print(result)
output (via Ideone):
<http://a.com> <http://b.com>
//http://etc.
The idea is to use a regex that matches both kinds of string: a URL or a commented line. Then you use a callback (delegate, closure, embedded code, etc.) to find out which one you matched and return the appropriate replacement string.
As a matter of fact, this is my preferred approach even in flavors that do support \G. Even in Java, where I have to write a bunch of boilerplate code to implement the callback.
(I'm not a Python guy, so forgive me if the code is terribly un-pythonic.)
Don't try to put everything into one expression as it become very hard to read, translate (as you see for yourself) and maintain.
import re
lines = [re.sub(r'http://[^\s]+', r'<\g<0>>', line) for line in text_block.splitlines() if not line.startedwith('//')]
print '\n'.join(lines)
Python is not usually best when you literally translate from Perl, it has it's own programming patterns.
I'm trying to substitute something in a string in python and am having some trouble. Here's what I'd like to do.
For a given comment in my posting:
"here are some great sites that i will do cool things with! https://stackoverflow.com/it's a pig & http://google.com"
I'd like to use python to make the strings like this:
"here are some great sites that i will do cool things with! http%3A//stackoverflow.com & http%3A//google.com
Here's what I have so far...
import re
import urllib
def getExpandedURL(url)
encoded_url = urllib.quote(url)
return ""+encoded_url+""
text = '<text from above>'
url_pattern = re.compile('(http.+?[^ ]+', re.I | re.S | re.M)
url_iterator = url_pattern.finditer(text)
for matched_url in url_iterator:
getExpandedURL(matched_url.groups(1)[0])
But this is where i'm stuck. I've previously seen things on here like this: Regular Expressions but for Writing in the Match but surely there's got to be a better way than iterating through each match and doing a position replace on them. The difficulty here is that it's not a straight replace, but I need to do something specific with each match before replacing it.
I think you want url_pattern.sub(getExpandedURL, text).
re.sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a callable, it's passed the match object and must return a replacement string to be used.