This question already has answers here:
Evaluating a mathematical expression in a string
(14 answers)
Closed 9 years ago.
I am trying to write a parser which takes expressions as a input from file.
expressions can be A=B=10 or B=(C-A)-4 etc.
What i have tried so far is . I am reading a file IP.txt
import re
opert = '+-/*()_='
fileName = "input.txt"
f = open(fileName,'r')
variableDict = {}
lines = f.readlines()
for i in lines:
for x in re.finditer(r'[A-Z_]\w*', i):
print x.group() # prints list containing all the alphabets.
for z in re.finditer(r'[0-9]\d*', i):
print z.group() # prints list containing all the numbers.
for c in i:
if c in opert:
print c # prints all the operators.
# '_' has special meaning. '_' can only be used before numbers only like _1 or _12 etc
#And i have parsed this also using
print re.findall(r'[_][0-9]\d+',i) # prints the _digits combination.
Now the problem is i have struck at how should i proceed with expression evaluation.
First some rule which i must mention about above inputs are.
No line should be greater then 50 characters.
Left most operator will always be '=' assignment operator.
'=' always Preceded by variables[A-Z],operators are {'+','-','/','*','_'}, digits {0-9}.
How should i first extract the first variable then push it into python list then '=' operator,then either '(','A-Z' push it into stack and so on
Could someone help me with this problem. I am overwhelmed with problem..
If any one is not able to understand the description please goto this link
So, you asked about the stack problem, which of course you need for evaluation. I would do something like this:
import re #1
stack = [] #2 FIX: NOT NECESSARY (since fourth line returns a list anyway)
inputstr = "A=B=C+26-(23*_2 )-D" #3
stack = re.findall(r'(?:[A-Z])|(?:[0-9]+)|(?:[/*+_=\(\)-])', inputstr) #4
while len(stack): #5
print stack.pop() #6
First three lines are some init stuff only. After that, I would make a stack with regex in the fourth line. (?:[A-Z]) matches variable, (?:[0-9]+) matches number (which may have more than one digit) and (?:[/*+_=\(\)-]) matches all the operators. Braces are escaped, and - is on the end, so you don't have to escape it.
Fifth and sixth line prints the stack.
I used (?: ...) because I don't want to match either group. Hard to explain - just try to run it without ?: and you will see the effect.
Related
This question already has answers here:
Python regex does not match line start
(3 answers)
Closed 2 years ago.
I was trying to extract some numbers from mail-data, here is my code:
import re
f = open('mbox-short.txt','r')
x = f.read()
z = re.findall('^X-DSPAM-Confidence: (0\.[0-9])+',x)
print(z)
But when i try to print the output it comes out to be NULL.
Here is the link to the txt file:
http://www.py4inf.com/code/mbox-short.txt
You need to add the re.MULTILINE flag in order for ^ to match at beginning of line anywhere in a string with multiple lines.
Also, you want to include the + quantifier inside the parentheses; otherwise, the match group will only match the last occurrence of several (if there can't be multiple occurrences, that doesn't matter much, of course) and you only match the first digit after the decimal point.
z = re.findall('^X-DSPAM-Confidence: (0\.[0-9]+)', x, re.MULTILINE)
This question already has answers here:
Python replace string pattern with output of function
(4 answers)
Closed 5 years ago.
Say I have the following string:
mystr = "6374696f6e20????28??????2c??2c????29"
And I want to replace every sequence of "??" with its length\2. So for the example above, I want to get the following result:
mystr = "6374696f6e2022832c12c229"
Meaning:
???? replaced with 2
?????? replaced with 3
?? replaced with 1
???? replaced with 2
I tried the following but I'm not sure it's the good approach, and anyway -- it doesn't work:
regex = re.compile('(\?+)')
matches = regex.findall(mystr)
if matches:
for match in matches:
match_length = len(match)/2
if (match_length > 0):
mystr= regex .sub(match_length , mystr)
You can use a callback function in Python's re.sub. FYI lambda expressions are shorthand to create anonymous functions.
See code in use here
import re
mystr = "6374696f6e20????28??????2c??2c????29"
regex = re.compile(r"\?+")
print(re.sub(regex, lambda m: str(int(len(m.group())/2)), mystr))
There seems to be uncertainty about what should happen in the case of ???. The above code will result in 1 since it converts to int. Without int conversion the result would be 1.0. If you want to ??? to become 1? you can use the pattern (?:\?{2})+ instead.
There are probably several ways to solve this problem, so I'm open to any ideas.
I have a file, within that file is the string "D133330593" Note: I do have the exact position within the file this string exists, but I don't know if that helps.
Following this string, there are 6 digits, I need to replace these 6 digits with 6 other digits.
This is what I have so far:
def editfile():
f = open(filein,'r')
filedata = f.read()
f.close()
#This is the line that needs help
newdata = filedata.replace( -TOREPLACE- ,-REPLACER-)
#Basically what I need is something that lets me say "D133330593******"
#->"D133330593123456" Note: The following 6 digits don't need to be
#anything specific, just different from the original 6
f = open(filein,'w')
f.write(newdata)
f.close()
Use the re module to define your pattern and then use the sub() function to substitute occurrence of that pattern with your own string.
import re
...
pat = re.compile(r"D133330593\d{6}")
re.sub(pat, "D133330593abcdef", filedata)
The above defines a pattern as -- your string ("D133330593") followed by six decimal digits. Then the next line replaces ALL occurrences of this pattern with your replacement string ("abcdef" in this case), if that is what you want.
If you want a unique replacement string for each occurrence of pattern, then you could use the count keyword argument in the sub() function, which allows you to specify the number of times the replacement must be done.
Check out this library for more info - https://docs.python.org/3.6/library/re.html
Let's simplify your problem to you having a string:
s = "zshisjD133330593090909fdjgsl"
and you wanting to replace the 6 characters after "D133330593" with "123456" to produce:
"zshisjD133330594123456fdjgsl"
To achieve this, we can first need to find the index of "D133330593". This is done by just using str.index:
i = s.index("D133330593")
Then replace the next 6 characters, but for this, we should first calculate the length of our string that we want to replace:
l = len("D133330593")
then do the replace:
s[:i+l] + "123456" + s[i+l+6:]
which gives us the desired result of:
'zshisjD133330593123456fdjgsl'
I am sure that you can now integrate this into your code to work with a file, but this is how you can do the heart of your problem .
Note that using variables as above is the right thing to do as it is the most efficient compared to calculating them on the go. Nevertheless, if your file isn't too long (i.e. efficiency isn't too much of a big deal) you can do the whole process outlined above in one line:
s[:s.index("D133330593")+len("D133330593")] + "123456" + s[s.index("D133330593")+len("D133330593")+6:]
which gives the same result.
This question already has answers here:
Python Regex to find a string in double quotes within a string
(6 answers)
Closed 6 years ago.
I'm trying to write a function where the input has a keyword that occurs multiple times in a string and will print the stuff that has double quotation marks between them after the keyword. Essentially...
Input= 'alkfjjiekeyword "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee keyword"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness'
Output= someonehelpmepls
itonlygivesmeoneinsteadofmultiple
If its possible to have the outputs as its own line that would be better.
Here's what I have so far:
def getEm(s):
h = s.find('keyword')
if h == -1
return -1
else:
begin = s.find('"',h)
end = s.find('"', begin+1)
result = s[begin +1:end]
print (result)
Please don't suggest import. I do not know how to do that nor know what it is, I am a beginner.
Let's take some sample input:
>>> Input= 'alkfjjiekeyword "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee keyword"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness'
I believe that one " was missing from the sample input, so I added it.
As I understand it, you want to get the strings in double-quotes that follow the word keyword. If that is the case, then:
def get_quoted_after_keyword(input):
results = []
split_by_keyword = input.split('keyword')
# you said no results before the keyword
for s in split_by_keyword[1:]:
split_by_quote = s.split('"')
if len(split_by_quote) > 1:
# assuming you want exactly one quoted result per keyword
results.append(split_by_quote[1])
return results
>print('\n'.join(get_quoted_after_keyword(Input))
>someonehelpmepls
>itonlygivesmeoneinsteadofmultiple
How it works
Let's look at the first piece:
>>> Input.split('keyword')
['alkfjjie',
' "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee ',
'"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness']
By splitting Input on keyword, we get, in this case, three strings. The second string to the last are all strings that follow the word keyword. To get those strings without the first string, we use subscripting:
>>> Input.split('keyword')[1:]
[' "someonehelpmepls"fjioee... omgsos someonerandom help helpppmeeeeeee ',
'"itonlygivesmeoneinsteadofmultiple"... sadnesssadness!sadness']
Now, our next task is to get the part of these strings that is in double-quotes. To do that, we split each of these strings on ". The second string, the one numbered 1, will be the string in double quotes. As a simpler example, let's take these strings:
>>> [s.split('"')[1] for s in ('"one"otherstuff', ' "two"morestuff')]
['one', 'two']
Next, we put these two steps together:
>>> [s.split('"')[1] for s in Input.split('keyword')[1:]]
['someonehelpmepls', 'itonlygivesmeoneinsteadofmultiple']
We now have the strings that we want. The last step is to print them out nicely, one per line:
>>> print('\n'.join(s.split('"')[1] for s in Input.split('keyword')[1:]))
someonehelpmepls
itonlygivesmeoneinsteadofmultiple
Limitation: this approach assumes that keyword never appears inside the double-quoted strings.
This question already has answers here:
How can I check if something is true in any (at least one) iteration of a Python for loop? [duplicate]
(4 answers)
Closed 7 months ago.
I need to check if any of the strings in a list match a regex. If any do, I want to continue. The way I've always done it in the past is using list comprehension with something like:
r = re.compile('.*search.*')
if [line for line in output if r.match(line)]:
do_stuff()
Which I now realize is pretty inefficient. If the very first item in the list matches, we can skip all the rest of the comparisons and move on. I could improve this with:
r = re.compile('.*search.*')
for line in output:
if r.match(line):
do_stuff()
break
But I'm wondering if there's a more pythonic way to do this.
You can use the builtin any():
r = re.compile('.*search.*')
if any(r.match(line) for line in output):
do_stuff()
Passing in the lazy generator to any() will allow it to exit on the first match without having to check any farther into the iterable.
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can also capture a witness of an any expression when a match is found and directly use it:
# pattern = re.compile('.*search.*')
# items = ['hello', 'searched', 'world', 'still', 'searching']
if any((match := pattern.match(x)) for x in items):
print(match.group(0))
# 'searched'
For each item, this:
Applies the regex search (pattern.match(x))
Assigns the result to a match variable (either None or a re.Match object)
Applies the truth value of match as part of the any expression (None -> False, Match -> True)
If match is None, then the any search loop continues
If match has captured a group, then we exit the any expression which is considered True and the match variable can be used within the condition's body
Given that I am not allowed to comment yet, I wanted to provide a small correction to MrAlexBailey's answer, and also answer nat5142's question. Correct form would be:
r = re.compile('.*search.*')
if any(r.match(line) for line in output):
do_stuff()
If you desire to find the matched string, you would do:
lines_to_log = [line for line in output if r.match(line)]
In addition, if you want to find all lines that match any compiled regular expression in a list of compiled regular expressions r=[r1,r2,...,rn], you can use:
lines_to_log = [line for line in output if any(reg_ex.match(line) for reg_ex in r)]
In reply to a question asked by #nat5142, in the answer given by #MrAlexBailey:
"Any way to access the matched string using this method? I'd like to print it for logging purposes", assuming "this" implies to:
if any(re.match(line) for line in output):
do_stuff()
You can do a for loop over the generator
# r = re.compile('.*search.*')
for match in [line for line in output if r.match(line)]:
do_stuff(match) # <- using the matched object here
Another approach is mapping each match with the map function:
# r = re.compile('.*search.*')
# log = lambda x: print(x)
map(log, [line for line in output if r.match(line)])
Although this does not involve the "any" function and might not even be close to what you desire...
I thought this answer was not very relevant so here's my second attempt...
I suppose you could do this:
# def log_match(match):
# if match: print(match)
# return match
if any(log_match(re.match(line)) for line in output):
do_stuff()