Python regex findall works but match does not [duplicate] - python

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 8 years ago.
I'm testing this in IPython. The variable t is being set from text in a dictionary and returns:
u'http://www.amazon.com/dp/B003T0G9GM/ref=wl_it_dp_v_nS_ttl/177-5611794-0982247?_encoding=UTF8&colid=SBGZJRGMR8TA&coliid=I205LCXDIRSLL3'
using this code:
r = r'amazon\.com/dp/(\w{10})'
m = re.findall(r,t)
matches correctly and m returns [u'B003T0G9GM']
Using this code,
p = re.compile(r)
m = p.match(t)
m returns None
This appears correct to me after reading this documentation.
https://docs.python.org/2/howto/regex.html#grouping
I also tested here to verify the regex before trying this in IPython
http://regex101.com/r/gG8eQ2/1
What am I missing?

SHould be using search, not match. This is what you should have:
p = re.compile(r)
m = p.search(t)
if m: print(m.group(1)) # gives: B003T0G9GM
Match checks only the begining of string. Search goes over whole string.

Related

about Regex in Python [duplicate]

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 1 year ago.
I made this code:
import re
match = re.search(r'[DER]\d+[Y]', 'DER1234Y' )
print(match.group())
and it prints this :
R1234Y
I want the code to only print the numbers and nothing else. How to do that ?
It's basically regex. So would this work?: re.sub('[^0-9]+', '', 'DER1234Y')
[^0-9]+ = everything that is not a numeric value (0-9).

How do I get part of a string with a regex in Python [duplicate]

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 2 years ago.
I am new to regex's with python
I have a string which has got a sub-string which I would like to extract from
I have the following pattern:
r = re.compile("(flag{.+[^}]})")
and the string is
Something has gone horribly wrong\n\nflag{Hi!}
I would like to get hold of just flag{Hi!}
I have tried it with:
a = re.search(r,string)
a = re.split(r,string)
None of the approaches work, if I print a I get None
How can I get hold of the desired flag.
Thanks in advance
import re
str="Something has gone horribly wrong\n\nflag{Hi!}"
r = re.compile("(flag{.+[^}]})")
a = re.search(r,str)
print(a.group())
This worked.
Firstly, as mentioned in the comments, your output is not None. You do get a match, the match you were looking for. You actually get a Match object that spans from position 35 -> 44 and matches flag{Hi!}. You can use group() to get the match represented as a string:
>>> a = re.search(r, string)
>>> print(a.group())
"flag{Hi!}"
You can also shorten your regex a little bit. There really isn't a need to use .+ because it becomes redundant when you add [^}], which matches all characters that aren't a closing curly bracket (}):
"(flag{[^}]+})"
You can replace the +, which matches one or more with * which matches zero or more if you want to match things like flag{} where there are no characters inside the curly brackets.
We can directly search the string for matching string.
import re
line = 'Something has gone horribly wrong\n\nflag{Hi!}'
r = re.search("(flag{[^}]*})", line)
print(r.group())
Output:-
flag{Hi!}

Extracting Decimal Numbers Using Regular Expression(python 3.8) [duplicate]

This question already has answers here:
Python regex does not match line start
(3 answers)
Closed 2 years ago.
I was trying to extract some numbers from mail-data, here is my code:
import re
f = open('mbox-short.txt','r')
x = f.read()
z = re.findall('^X-DSPAM-Confidence: (0\.[0-9])+',x)
print(z)
But when i try to print the output it comes out to be NULL.
Here is the link to the txt file:
http://www.py4inf.com/code/mbox-short.txt
You need to add the re.MULTILINE flag in order for ^ to match at beginning of line anywhere in a string with multiple lines.
Also, you want to include the + quantifier inside the parentheses; otherwise, the match group will only match the last occurrence of several (if there can't be multiple occurrences, that doesn't matter much, of course) and you only match the first digit after the decimal point.
z = re.findall('^X-DSPAM-Confidence: (0\.[0-9]+)', x, re.MULTILINE)

extracting python .finditer() span=( ) results [duplicate]

This question already has answers here:
Python Regex - How to Get Positions and Values of Matches
(4 answers)
Closed 4 years ago.
After using .finditer() on a string I want to extract the index of the 'Match object; span=(xx,xx) and use them in a print(search_text[xx:xx]) statement.
How would I extract the locations of the search results.
matches = search_pattern.finditer(search_text)
print(search_text[xx:xx]) # need to find a way to get the slice indexes
You can use the span method
matches = search_pattern.finditer(search_text)
print ([m.span() for m in matches])
I think your question might have already been answered. Look here: Python Regex - How to Get Positions and Values of Matches
Peter Hoffmann gave this answer (which I linked above):
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
print m.start(), m.group()
Please let me know if this does not help.

Replace sequence of chars in string with its length [duplicate]

This question already has answers here:
Python replace string pattern with output of function
(4 answers)
Closed 5 years ago.
Say I have the following string:
mystr = "6374696f6e20????28??????2c??2c????29"
And I want to replace every sequence of "??" with its length\2. So for the example above, I want to get the following result:
mystr = "6374696f6e2022832c12c229"
Meaning:
???? replaced with 2
?????? replaced with 3
?? replaced with 1
???? replaced with 2
I tried the following but I'm not sure it's the good approach, and anyway -- it doesn't work:
regex = re.compile('(\?+)')
matches = regex.findall(mystr)
if matches:
for match in matches:
match_length = len(match)/2
if (match_length > 0):
mystr= regex .sub(match_length , mystr)
You can use a callback function in Python's re.sub. FYI lambda expressions are shorthand to create anonymous functions.
See code in use here
import re
mystr = "6374696f6e20????28??????2c??2c????29"
regex = re.compile(r"\?+")
print(re.sub(regex, lambda m: str(int(len(m.group())/2)), mystr))
There seems to be uncertainty about what should happen in the case of ???. The above code will result in 1 since it converts to int. Without int conversion the result would be 1.0. If you want to ??? to become 1? you can use the pattern (?:\?{2})+ instead.

Categories