How to excute an external regular expression script via Python? - python

Or Should I change my question to 'how to implement regular expression tester like these': regular expression editor python
Maybe this is more like a programming question. My application will read in an article and expose a variable, say TEXT, to represent this article. Allow users to write Python re compatible script to manipulate the TEXT (the article), particularly to replace something.
For example, users could write the following commands, and my app should read it in and execute it:
p = re.compile( 'red')
p.sub( 'color', TEXT)
And I will release my app with something like py2exe, so I think How can I make one python file run another? doesn't work.
I know how to write regular expression, but now "using a Python script to run a Python script" really confuses me.

Do you want to write second word? You need to use input():
word = input('Specify color\n')
U can use just string.replace():
text = 'red'
text.replace('color', word)
If you want to use RegExp:
import re
re.sub('color', word, text)

Related

RegEx works in regexr but not in python re

I have this regex: If you don't want these messages, please [a-zA-Z0-9öäüÖÄÜ<>\n\-=#;&?_ "/:.#]+settings<\/a>. It works on regexr but not when I am using the re
library in Python:
data = "<my text (comes from a file)>"
search = "If you don't want these messages, please [a-zA-Z0-9öäüÖÄÜ<>\n\-=#;&?_ \"/:.#]+settings<\/a>" # this search string comes from a database, so it's not hardcoded into my script
print(re.search(search, data))
Is there something I don't see?
Thank you!
the pattern you are using on regexr contains \- but in your exemple shows \\- wich may give an incorrect regex. (and add the r in front of of the string as jupiterby said).

how to extract javascript variables by using python bs4

<script type="text/javascript">var csrfMagicToken = "sid:bf8be784734837a64a47fcc30b9df99,162591180";var csrfMagicName = "__csrf_magic";</script>
The above script tag is from a webpage.
script = soup.find_all('script')[5]
By using the above line of code I was able to extract the script tag which I want but I need to extract the value of variables in a python script,I am using BeautifulSoup in my python script to extract the data.
You could use
(?:var|let)\s+(\w+)\s*=\s*"([^"]+)"
See a demo on regex101.com.
Note: However, there are a couple of drawbacks in general to using regular expressions on code. E.g. with the above, sth. like let x = -10; would not be matched but would be totally valid JavaScript code. Also, single quotes are not supported (yet) - it totally depends on your actual input.
That being said, you could go for:
(?:var|let)\s+
(?P<key>\w+)\s*=\s*
(['"])?(?(2)(?P<value1>.+?)\2|(?P<value2>[^;]+))
See another demo on regex101.com.
This still leaves you helpless against escaped quotes like let x = "some \" string"; or against variable declarations in comments. In general, favour a parser solution.

variable regex python

MyDir = os.getcwd().split(os.sep)[-1]
command = re.search(r"(MyDir)", body).group(1)
etc
hi guys,
am trying to have a python script (on windows)
search my outlook email body for certain words using regex
that works fine, coupled with the rest of the script (not shown)
but the minute i want it to search for a variable, ie MyDir it does nothing when i fire off an email to
myself with the word: documents in the body of the email (documents, being the directory the script is located on this occasion; though should populate the variable with whatever top level directory the script is being run from)
now i have read and seen that re.escape is a method to consider, and have copied lots of different variations, and examples
and adapted it to my scenario, but none have worked, i have built the regex as a string also, still no joy
is there anything in my MyDir "variable" that is throwing the regex search off?
am stumped, its my first python script, so am sure am doing something wrong - or maybe i cant use os.getcwd().split(os.sep)[-1] inside regex and have it not look at the variable but the literal string!
thanks for any help, as i have read through similar regex+variable posts on here but havent worked for me
:)
Try:
command = re.search("(" + re.escape(MyDir) + ")", body).group(1)
You searching for the string MyDir not the variable MyDir. You could use str.format
command = re.search(r"({})".format(MyDir), body).group(1)

Python Script using Fileinput Module and with Regex substitution containing \n (multiple lines)

All,
I am relatively new to Python but have used other scripting languages with REGEX extensively. I need a script that will open a file, look for a REGEX pattern, replace the pattern and close the file. I have found that the below script works great, however, I dont know if the "for line in fileinput.input" command can accomodate for a regex pattern that exceeds a single line (i.e. the regex includes a carriage return). In my instance, it covers 2 lines. My test file read_it.txt looks like this
read_it.txt (contains just 3 lines)
ABA
CDC
EFE
The script is designed to open the file, recognize the pattern ABA\nCDC that is seen over 2 lines, then replace it with the word TEST.
If the pattern replace is successful, then the file should read as follows and contain now only 2 lines:
TEST
EFE
Knowing the answer to this will help greatly in using Python scripts to parse text files and modify them on the fly. I believe, but am not sure, that there may be a better Python construct that still allows for REGEX searches. So the question is:
1) Do I need to change something in the existing script that would change the behavior of the "for line" command to match a multi-line REGEX pattern?
2) Or do I need a different Python script that is better suited to a multi-line search?
Some things that may help but I currently dont know how to write them are:
1) fileinput "readline" option.
2) adding (?m) in the expression for multline
Please help!
Brent
SCRIPT
import sys
import fileinput
import re
for line in fileinput.input('C:\\Python34\\read_it.txt', inplace=1):
line = re.sub(r'A(B)A$\nCDC', r'TEST', line.rstrip())
print(line)
2) adding (?m) in the expression for multline
You can do this by adding re.M or flags=re.MULTILINE as an argument in re.sub
Example:-
re.sub(r'A(B)A$\nCDC', r'TEST', line.rstrip(), re.M)
or
re.sub(r'A(B)A$\nCDC', r'TEST', line.rstrip(), flags=re.MULTILINE)

How to get the next token (int, float or string) from a file in Python?

Is there some way to just get the next token from a file in Python, as for example the Scanner class does in Java?
File file = new File("something");
Scanner myinput = new Scanner(file);
double a = myinput.nextDouble();
String s = myinput.next();
I'd like to ignore whitespaces, tabs, newlines and just get the next int/float/word from the file. I know I could read the lines and build something like Scanner myself, but I'd like to know if there isn't already something that I could use.
I've searched around but could only find line-oriented methods.
Thank you!
Check out the shlex-module in the standard library: http://docs.python.org/library/shlex.html
import shlex
import StringIO # use in place of files
list(shlex.shlex(StringIO.StringIO('Some tokens. 123, 45.67 "A string with whitespace"')))
It does not handle floats the way you seem to want. Maybe you can extend or modify it.
I don't think there is really something around that sophisticated.
But you can take a look at the following options
use re.split(pattern, string) and get what you want by providing regex's
There is somewhere a Scanner class in the re module (but I don't think they developed it further)
You could also consider using tokenize + StringIO
Or as you yourself mentioned: Build one yourself, donate it do community and get famous ;)
Probably you can take a look at PLY
if your file is *.ini alike text files, you could use ConfigParser module
There is few examples out there.
http://docs.python.org/library/configparser.html
and pyparsing will do that for other purpose, I think.
I havn't use pyparsing before, so I have no clue right now.
http://pyparsing.wikispaces.com/

Categories