This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I'm trying to find numbers in a string.
import re
text = "42 ttt 1,234 uuu 6,789,001"
finder = re.compile(r'\d{1,3}(,\d{3})*')
print(re.findall(finder, text))
It returns this:
['', ',234', ',745']
What's wrong with regex?
How can I get ['42', '1,234', '6,789,745']?
Note: I'm getting correct result at https://regexr.com
You indicate with parentheses (...) what the groups are that should be captured by the regex.
In your case, you only capture the part after (and including) the first comma. Instead, you can capture the whole number by putting a group around everything, and make the parentheses you need for * non-capturing through an initial ?:, like so:
r'(\d{1,3}(?:,\d{3})*)'
This gives the correct result:
>>> print(re.findall(finder, text))
['42', '1,234', '6,789,001']
you just need to change your finder like this.
finder = re.compile(r'\d+\,?\d+,?\d*')
Related
This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 1 year ago.
I made this code:
import re
match = re.search(r'[DER]\d+[Y]', 'DER1234Y' )
print(match.group())
and it prints this :
R1234Y
I want the code to only print the numbers and nothing else. How to do that ?
It's basically regex. So would this work?: re.sub('[^0-9]+', '', 'DER1234Y')
[^0-9]+ = everything that is not a numeric value (0-9).
This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 2 years ago.
I am new to regex's with python
I have a string which has got a sub-string which I would like to extract from
I have the following pattern:
r = re.compile("(flag{.+[^}]})")
and the string is
Something has gone horribly wrong\n\nflag{Hi!}
I would like to get hold of just flag{Hi!}
I have tried it with:
a = re.search(r,string)
a = re.split(r,string)
None of the approaches work, if I print a I get None
How can I get hold of the desired flag.
Thanks in advance
import re
str="Something has gone horribly wrong\n\nflag{Hi!}"
r = re.compile("(flag{.+[^}]})")
a = re.search(r,str)
print(a.group())
This worked.
Firstly, as mentioned in the comments, your output is not None. You do get a match, the match you were looking for. You actually get a Match object that spans from position 35 -> 44 and matches flag{Hi!}. You can use group() to get the match represented as a string:
>>> a = re.search(r, string)
>>> print(a.group())
"flag{Hi!}"
You can also shorten your regex a little bit. There really isn't a need to use .+ because it becomes redundant when you add [^}], which matches all characters that aren't a closing curly bracket (}):
"(flag{[^}]+})"
You can replace the +, which matches one or more with * which matches zero or more if you want to match things like flag{} where there are no characters inside the curly brackets.
We can directly search the string for matching string.
import re
line = 'Something has gone horribly wrong\n\nflag{Hi!}'
r = re.search("(flag{[^}]*})", line)
print(r.group())
Output:-
flag{Hi!}
This question already has answers here:
Remove duplicate chars using regex?
(3 answers)
Closed 6 years ago.
I found a code snippet of removing duplicated consecutive characters and reserving the first character in Python by regex from web like this:
import re
re.sub(r'(?s)(.)(?=.*\1)','','aabbcc') #'abc'
But there is a defect that if the string is 'aabbccaabb' it will ignore the first 'aa', 'bb' and turn out 'cab'.
re.sub(r'(?s)(.)(?=.*\1)','','aabbccaabb') #'cab'
Is there a way to solve it by regex?
Without regex, check if previous character is the same as current, using a list comprehension with a condition and join the results:
s='aabbccaabb'
print("".join([c for i,c in enumerate(s) if i==0 or s[i-1]!=c]))
Just remove the .* in the positive look ahead.
import re
print re.sub(r'(?s)(.)(?=\1)','','aabbcc')
print re.sub(r'(?s)(.)(?=\1)','','aabbccaabb')
Output:
abc
abcab
This question already has answers here:
Python - re.findall returns unwanted result
(4 answers)
Closed 6 years ago.
test = """1d48bac (TAIL, ticket: TAG-AB123-6, origin/master) Took example of 123
6f2c5f9 (ticket: TAG-CD456) Took example of 456
9aa5436 (ticket: TAG-EF567-3) Took example of 6789"""
I want to write a regex in python that will extract just the tag- i.e.output should be
[TAG-AB123-6, TAG-CD456, TAGEF567-3]
I tired a regex
print re.findall("TAG-[A-Z]{0,9}\d{0,5}(-\d{0,2})?", test)
but this gives me
['-6', '', '-3']
what am I doing wrong?
Your optional capturing group needs to be made a non-capturing one:
>>> print re.findall(r"TAG-[A-Z]{0,9}\d{0,5}(?:-\d{0,2})?", test)
['TAG-AB123-6', 'TAG-CD456', 'TAG-EF567-3']
findall returns all capturing groups. If there are no capturing groups it will return all the matches.
In addition, note that you can also take advantage of this behaviour (the fact that re.findall returns a list of captures if any instead of the whole match). This allows to describe all the context around the target substring and to easily extract the part you want:
>>> re.findall(r'ticket: ([^,)]*)', test)
['TAG-AB123-6', 'TAG-CD456', 'TAG-EF567-3']
This question already has answers here:
re.sub replace with matched content
(4 answers)
Closed 8 years ago.
I want to replace the digits in the middle of telephone with regex but failed. Here is my code:
temp= re.sub(r'1([0-9]{1}[0-9])[0-9]{4}([0-9]{4})', repl=r'$1****$2', tel_phone)
print temp
In the output, it always shows:
$1****$2
But I want to show like this: 131****1234. How to accomplish it ? Thanks
I think you're trying to replace four digits present in the middle (four digits present before the last four digits) with ****
>>> s = "13111111234"
>>> temp= re.sub(r'^(1[0-9]{2})[0-9]{4}([0-9]{4})$', r'\1****\2', s)
>>> print temp
131****1234
You might have seen $1 in replacement string in other languages. However, in Python, use \1 instead of $1. For correctness, you also need to include the starting 1 in the first capturing group, so that the output also include the starting 1; otherwise, the starting 1 will be lost.