Extracting Decimal Numbers Using Regular Expression(python 3.8) [duplicate] - python

This question already has answers here:
Python regex does not match line start
(3 answers)
Closed 2 years ago.
I was trying to extract some numbers from mail-data, here is my code:
import re
f = open('mbox-short.txt','r')
x = f.read()
z = re.findall('^X-DSPAM-Confidence: (0\.[0-9])+',x)
print(z)
But when i try to print the output it comes out to be NULL.
Here is the link to the txt file:
http://www.py4inf.com/code/mbox-short.txt

You need to add the re.MULTILINE flag in order for ^ to match at beginning of line anywhere in a string with multiple lines.
Also, you want to include the + quantifier inside the parentheses; otherwise, the match group will only match the last occurrence of several (if there can't be multiple occurrences, that doesn't matter much, of course) and you only match the first digit after the decimal point.
z = re.findall('^X-DSPAM-Confidence: (0\.[0-9]+)', x, re.MULTILINE)

Related

How do I get part of a string with a regex in Python [duplicate]

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 2 years ago.
I am new to regex's with python
I have a string which has got a sub-string which I would like to extract from
I have the following pattern:
r = re.compile("(flag{.+[^}]})")
and the string is
Something has gone horribly wrong\n\nflag{Hi!}
I would like to get hold of just flag{Hi!}
I have tried it with:
a = re.search(r,string)
a = re.split(r,string)
None of the approaches work, if I print a I get None
How can I get hold of the desired flag.
Thanks in advance
import re
str="Something has gone horribly wrong\n\nflag{Hi!}"
r = re.compile("(flag{.+[^}]})")
a = re.search(r,str)
print(a.group())
This worked.
Firstly, as mentioned in the comments, your output is not None. You do get a match, the match you were looking for. You actually get a Match object that spans from position 35 -> 44 and matches flag{Hi!}. You can use group() to get the match represented as a string:
>>> a = re.search(r, string)
>>> print(a.group())
"flag{Hi!}"
You can also shorten your regex a little bit. There really isn't a need to use .+ because it becomes redundant when you add [^}], which matches all characters that aren't a closing curly bracket (}):
"(flag{[^}]+})"
You can replace the +, which matches one or more with * which matches zero or more if you want to match things like flag{} where there are no characters inside the curly brackets.
We can directly search the string for matching string.
import re
line = 'Something has gone horribly wrong\n\nflag{Hi!}'
r = re.search("(flag{[^}]*})", line)
print(r.group())
Output:-
flag{Hi!}

why python regex is not finding numbers? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I'm trying to find numbers in a string.
import re
text = "42 ttt 1,234 uuu 6,789,001"
finder = re.compile(r'\d{1,3}(,\d{3})*')
print(re.findall(finder, text))
It returns this:
['', ',234', ',745']
What's wrong with regex?
How can I get ['42', '1,234', '6,789,745']?
Note: I'm getting correct result at https://regexr.com
You indicate with parentheses (...) what the groups are that should be captured by the regex.
In your case, you only capture the part after (and including) the first comma. Instead, you can capture the whole number by putting a group around everything, and make the parentheses you need for * non-capturing through an initial ?:, like so:
r'(\d{1,3}(?:,\d{3})*)'
This gives the correct result:
>>> print(re.findall(finder, text))
['42', '1,234', '6,789,001']
you just need to change your finder like this.
finder = re.compile(r'\d+\,?\d+,?\d*')

Replace sequence of chars in string with its length [duplicate]

This question already has answers here:
Python replace string pattern with output of function
(4 answers)
Closed 5 years ago.
Say I have the following string:
mystr = "6374696f6e20????28??????2c??2c????29"
And I want to replace every sequence of "??" with its length\2. So for the example above, I want to get the following result:
mystr = "6374696f6e2022832c12c229"
Meaning:
???? replaced with 2
?????? replaced with 3
?? replaced with 1
???? replaced with 2
I tried the following but I'm not sure it's the good approach, and anyway -- it doesn't work:
regex = re.compile('(\?+)')
matches = regex.findall(mystr)
if matches:
for match in matches:
match_length = len(match)/2
if (match_length > 0):
mystr= regex .sub(match_length , mystr)
You can use a callback function in Python's re.sub. FYI lambda expressions are shorthand to create anonymous functions.
See code in use here
import re
mystr = "6374696f6e20????28??????2c??2c????29"
regex = re.compile(r"\?+")
print(re.sub(regex, lambda m: str(int(len(m.group())/2)), mystr))
There seems to be uncertainty about what should happen in the case of ???. The above code will result in 1 since it converts to int. Without int conversion the result would be 1.0. If you want to ??? to become 1? you can use the pattern (?:\?{2})+ instead.

Python: re module to replace digits of telephone with asterisk [duplicate]

This question already has answers here:
re.sub replace with matched content
(4 answers)
Closed 8 years ago.
I want to replace the digits in the middle of telephone with regex but failed. Here is my code:
temp= re.sub(r'1([0-9]{1}[0-9])[0-9]{4}([0-9]{4})', repl=r'$1****$2', tel_phone)
print temp
In the output, it always shows:
$1****$2
But I want to show like this: 131****1234. How to accomplish it ? Thanks
I think you're trying to replace four digits present in the middle (four digits present before the last four digits) with ****
>>> s = "13111111234"
>>> temp= re.sub(r'^(1[0-9]{2})[0-9]{4}([0-9]{4})$', r'\1****\2', s)
>>> print temp
131****1234
You might have seen $1 in replacement string in other languages. However, in Python, use \1 instead of $1. For correctness, you also need to include the starting 1 in the first capturing group, so that the output also include the starting 1; otherwise, the starting 1 will be lost.

Python parser for Calculator [duplicate]

This question already has answers here:
Evaluating a mathematical expression in a string
(14 answers)
Closed 9 years ago.
I am trying to write a parser which takes expressions as a input from file.
expressions can be A=B=10 or B=(C-A)-4 etc.
What i have tried so far is . I am reading a file IP.txt
import re
opert = '+-/*()_='
fileName = "input.txt"
f = open(fileName,'r')
variableDict = {}
lines = f.readlines()
for i in lines:
for x in re.finditer(r'[A-Z_]\w*', i):
print x.group() # prints list containing all the alphabets.
for z in re.finditer(r'[0-9]\d*', i):
print z.group() # prints list containing all the numbers.
for c in i:
if c in opert:
print c # prints all the operators.
# '_' has special meaning. '_' can only be used before numbers only like _1 or _12 etc
#And i have parsed this also using
print re.findall(r'[_][0-9]\d+',i) # prints the _digits combination.
Now the problem is i have struck at how should i proceed with expression evaluation.
First some rule which i must mention about above inputs are.
No line should be greater then 50 characters.
Left most operator will always be '=' assignment operator.
'=' always Preceded by variables[A-Z],operators are {'+','-','/','*','_'}, digits {0-9}.
How should i first extract the first variable then push it into python list then '=' operator,then either '(','A-Z' push it into stack and so on
Could someone help me with this problem. I am overwhelmed with problem..
If any one is not able to understand the description please goto this link
So, you asked about the stack problem, which of course you need for evaluation. I would do something like this:
import re #1
stack = [] #2 FIX: NOT NECESSARY (since fourth line returns a list anyway)
inputstr = "A=B=C+26-(23*_2 )-D" #3
stack = re.findall(r'(?:[A-Z])|(?:[0-9]+)|(?:[/*+_=\(\)-])', inputstr) #4
while len(stack): #5
print stack.pop() #6
First three lines are some init stuff only. After that, I would make a stack with regex in the fourth line. (?:[A-Z]) matches variable, (?:[0-9]+) matches number (which may have more than one digit) and (?:[/*+_=\(\)-]) matches all the operators. Braces are escaped, and - is on the end, so you don't have to escape it.
Fifth and sixth line prints the stack.
I used (?: ...) because I don't want to match either group. Hard to explain - just try to run it without ?: and you will see the effect.

Categories