REGex in python does not extract right [duplicate]

REGex in python does not extract right [duplicate] - python

This question already has answers here:
Python - re.findall returns unwanted result
(4 answers)
Closed 6 years ago.
test = """1d48bac (TAIL, ticket: TAG-AB123-6, origin/master) Took example of 123
6f2c5f9 (ticket: TAG-CD456) Took example of 456
9aa5436 (ticket: TAG-EF567-3) Took example of 6789"""
I want to write a regex in python that will extract just the tag- i.e.output should be
[TAG-AB123-6, TAG-CD456, TAGEF567-3]
I tired a regex
print re.findall("TAG-[A-Z]{0,9}\d{0,5}(-\d{0,2})?", test)
but this gives me
['-6', '', '-3']
what am I doing wrong?

Your optional capturing group needs to be made a non-capturing one:
>>> print re.findall(r"TAG-[A-Z]{0,9}\d{0,5}(?:-\d{0,2})?", test)
['TAG-AB123-6', 'TAG-CD456', 'TAG-EF567-3']
findall returns all capturing groups. If there are no capturing groups it will return all the matches.

In addition, note that you can also take advantage of this behaviour (the fact that re.findall returns a list of captures if any instead of the whole match). This allows to describe all the context around the target substring and to easily extract the part you want:
>>> re.findall(r'ticket: ([^,)]*)', test)
['TAG-AB123-6', 'TAG-CD456', 'TAG-EF567-3']

Related

How do I get part of a string with a regex in Python [duplicate]

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 2 years ago.
I am new to regex's with python
I have a string which has got a sub-string which I would like to extract from
I have the following pattern:
r = re.compile("(flag{.+[^}]})")
and the string is
Something has gone horribly wrong\n\nflag{Hi!}
I would like to get hold of just flag{Hi!}
I have tried it with:
a = re.search(r,string)
a = re.split(r,string)
None of the approaches work, if I print a I get None
How can I get hold of the desired flag.
Thanks in advance

import re
str="Something has gone horribly wrong\n\nflag{Hi!}"
r = re.compile("(flag{.+[^}]})")
a = re.search(r,str)
print(a.group())
This worked.

Firstly, as mentioned in the comments, your output is not None. You do get a match, the match you were looking for. You actually get a Match object that spans from position 35 -> 44 and matches flag{Hi!}. You can use group() to get the match represented as a string:
>>> a = re.search(r, string)
>>> print(a.group())
"flag{Hi!}"
You can also shorten your regex a little bit. There really isn't a need to use .+ because it becomes redundant when you add [^}], which matches all characters that aren't a closing curly bracket (}):
"(flag{[^}]+})"
You can replace the +, which matches one or more with * which matches zero or more if you want to match things like flag{} where there are no characters inside the curly brackets.

We can directly search the string for matching string.
import re
line = 'Something has gone horribly wrong\n\nflag{Hi!}'
r = re.search("(flag{[^}]*})", line)
print(r.group())
Output:-
flag{Hi!}

why python regex is not finding numbers? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I'm trying to find numbers in a string.
import re
text = "42 ttt 1,234 uuu 6,789,001"
finder = re.compile(r'\d{1,3}(,\d{3})*')
print(re.findall(finder, text))
It returns this:
['', ',234', ',745']
What's wrong with regex?
How can I get ['42', '1,234', '6,789,745']?
Note: I'm getting correct result at https://regexr.com

You indicate with parentheses (...) what the groups are that should be captured by the regex.
In your case, you only capture the part after (and including) the first comma. Instead, you can capture the whole number by putting a group around everything, and make the parentheses you need for * non-capturing through an initial ?:, like so:
r'(\d{1,3}(?:,\d{3})*)'
This gives the correct result:
>>> print(re.findall(finder, text))
['42', '1,234', '6,789,001']

you just need to change your finder like this.
finder = re.compile(r'\d+\,?\d+,?\d*')

extracting python .finditer() span=( ) results [duplicate]

This question already has answers here:
Python Regex - How to Get Positions and Values of Matches
(4 answers)
Closed 4 years ago.
After using .finditer() on a string I want to extract the index of the 'Match object; span=(xx,xx) and use them in a print(search_text[xx:xx]) statement.
How would I extract the locations of the search results.
matches = search_pattern.finditer(search_text)
print(search_text[xx:xx]) # need to find a way to get the slice indexes

You can use the span method
matches = search_pattern.finditer(search_text)
print ([m.span() for m in matches])

I think your question might have already been answered. Look here: Python Regex - How to Get Positions and Values of Matches
Peter Hoffmann gave this answer (which I linked above):
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
print m.start(), m.group()
Please let me know if this does not help.

Python: re module to replace digits of telephone with asterisk [duplicate]

This question already has answers here:
re.sub replace with matched content
(4 answers)
Closed 8 years ago.
I want to replace the digits in the middle of telephone with regex but failed. Here is my code:
temp= re.sub(r'1([0-9]{1}[0-9])[0-9]{4}([0-9]{4})', repl=r'$1****$2', tel_phone)
print temp
In the output, it always shows:
$1****$2
But I want to show like this: 131****1234. How to accomplish it ? Thanks

I think you're trying to replace four digits present in the middle (four digits present before the last four digits) with ****
>>> s = "13111111234"
>>> temp= re.sub(r'^(1[0-9]{2})[0-9]{4}([0-9]{4})$', r'\1****\2', s)
>>> print temp
131****1234
You might have seen $1 in replacement string in other languages. However, in Python, use \1 instead of $1. For correctness, you also need to include the starting 1 in the first capturing group, so that the output also include the starting 1; otherwise, the starting 1 will be lost.

Python's .group() returning only the first match [duplicate]

This question already has an answer here:
re.search() only matches the first occurrence
(1 answer)
Closed 3 years ago.
I ran the following code and get only the first ')' as a match. Could someone help me with why the regular greedy '))' is not being returned?
r=re.compile('\)')
var=r.search('- hi- ))there')
print var.group()

search will only return the first match.
To find all matches use findall:
r=re.compile('\)')
var= r.findall('- hi- )) there')
print (var)
If you want to find both braces in one match use:
r=re.compile('\)+')
The + matches to 1 or more of the object.

Your regex isn't greedy. In fact, it's set up to match only a single character. If you want it to match repeats as well, add a +:
>>> r=re.compile('\)+')
>>> var=r.search('- hi- ))there')
>>> print var.group()
))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

REGex in python does not extract right [duplicate] - python

Your optional capturing group needs to be made a non-capturing one: >>> print re.findall(r"TAG-[A-Z]{0,9}\d{0,5}(?:-\d{0,2})?", test) ['TAG-AB123-6', 'TAG-CD456', 'TAG-EF567-3'] findall returns all capturing groups. If there are no capturing groups it will return all the matches.

Related

How do I get part of a string with a regex in Python [duplicate]

why python regex is not finding numbers? [duplicate]

extracting python .finditer() span=( ) results [duplicate]

Python: re module to replace digits of telephone with asterisk [duplicate]

Python's .group() returning only the first match [duplicate]

Categories

Resources