Matching with regex line for line - python

I am working on a fun little language using regex matching lines in a file. Here is what I have so far:
import re
code=open("code.txt", "r").read()
outputf=r'output (.*)'
inputf=r'(.*) = input (.*)'
intf=r'int (.*) = (\d)'
floatf=r'float (.*) = (\d\.\d)'
outputq=re.match(outputf, code)
if outputq:
print "Executing OUTPUT query"
exec "print %s" %outputq.group(1)
inputq=re.match(inputf, code)
if inputq:
print "Executing INPUT query"
exec "%s=raw_input(%s)"%(inputq.group(1), inputq.group(2))
intq=re.match(intf, code)
if intq:
exec "%s = %s"%(intq.group(1), intq.group(2))
exec "print %s"%(intq.group(1))
else:
print "Invalid syntax"
The code works in matching say:
int x = 1
But it will only match the first line and stop matching and ignore the rest of the code that I want to match. How can I match every line in the file to my regex definitions?

.read() reads as one line, use .split("\n") on the .read() code or use .readlines().
Then iterate over the lines and test for your commands.
At the moment you take the whole code as one single line. You want to check all lines line by line.
EDIT:
for that, create a function
then read lines with readlines()
And finally iterate over lines, using the function on lines
Like that:
import re
outputf=r'output (.*)'
inputf=r'(.*) = input (.*)'
intf=r'int (.*) = (\d)'
floatf=r'float (.*) = (\d\.\d)'
def check_line(line):
outputq=re.match(outputf, line)
if outputq:
print ("Executing OUTPUT query")
exec ("print (%s)" % outputq.group(1))
inputq=re.match(inputf, line)
if inputq:
print ("Executing INPUT query")
exec ("%s=raw_input(%s)"%(inputq.group(1), inputq.group(2)))
intq=re.match(intf, line)
if intq:
exec ("%s = %s"%(intq.group(1), intq.group(2)))
exec ("print (%s)"%(intq.group(1)))
else:
print ("Invalid syntax")
code=open("code.txt", "r").readlines()
for line in code:
check_line(line)
This code will still return an error, which has nothing to do with the issue tho, think about if you do the assigning of the variable correctly.

You're using re.match() which means that your regex has to match the whole string (which in this case is the whole file). If you iterate over each line in the file, then .match() will work. Alternatively you might want to look at re.search(), re.findall() and other similar alternatives.
It looks like your code needs to iterate over the lines in the file: How to iterate over the file in python

Related

regex code for getting only the first line from the text file using python

Can anyone please provide me the regex code for printing only the first line of the data in the text file??? I am using spyder
i have tried may solutions but it prints all my data in every line ...last one helped me but it chose two lines. i just want the first line of my text file only till it encounters line break or till the text starts from next line.
import re
def getname(s):
nameregex=re.findall(r'^.*?[\.!\?](?:\s|$)',line)
if len(nameregex)!=0:
print(nameregex)
s = open('yesno.txt')
for line in s:
getname(s)
In the output i am getting first two lines.
Basically i am trying to print the company name only which is mostly in the first line.
Read the file into a variable using read() and use re.search to get the match:
import re
def getname(s):
nameregex=re.search(r'^.*?[.!?](?!\S)', s) # Run search with regex
if nameregex: # If there is a match
print(nameregex.group()) # Get Group 0 - whole match - value
s = open('yesno.txt', 'r') # Open file handle to read it
contents = s.read() # Get all file contents
getname(contents) # Run the getname method with the contents
See the Python demo.
The regex is a bit modified to avoid the whitespace at the end. See details:
^ - start of the string
.*? - any 0 or more chars other than line break chars, as few as possible
[.!?] - ., ! or ? char
(?!\S) - there must be a whitespace or end of string here.
See the regex graph:

Python Regex with Paramiko

I am using Python Paramiko module to sftp into one of my servers. I did a list_dir() to get all of the files in the folder. Out of the folder I'd like to use regex to find the matching pattern and then printout the entire string.
List_dir will list a list of the XML files with this format
LOG_MMDDYYYY_HHMM.XML
LOG_07202018_2018 --> this is for the date 07/20/2018 at the time 20:18
Id like to use regex to file all the XML files for that particular date and store them to a list or a variable. I can then pass this variable to Paramiko to get the file.
for log in file_list:
regex_pattern = 'POSLog_' + date + '*'
if (re.search(regex_pattern, log) != None):
matchObject = re.findall(regex_pattern, log)
print(matchObject)
the code above just prints:
['Log_07202018'] I want it to store the entire string Log_07202018_20:18.XML to a variable.
How would I go about doing this?
Thank you
If you are looking for a fixed string, don't use regex.
search_str = 'POSLog_' + date
for line in file_list:
if search_str in line:
print(line)
Alternatively, a list comprehension can make list of matching lines in one go:
log_lines = [line for line in file_list if search_str in line]
for line in log_lines:
print(line)
If you must use regex, there are a few things to change:
Any variable part that you put into the regex pattern must either be guaranteed to be a regex itself, or it must be escaped.
"The rest of the line" is not *, it's .*.
The start-of-line anchor ^ should be used to speed up the search - this way the regex fails faster when there is no match on a given line.
To support the ^ on multiple lines instead of only at the start of the entire string, the MULTILINE flag is needed.
There are several ways of getting all matches. One could do "for each line, if there is a match, print line", same as above. Here I'm using .finditer() and a search over the whole input block (i.e. not split into lines).
log_pattern = '^POSLog_' + re.escape(date) + '.*'
for match in re.finditer(log_pattern, whole_file, re.MULTILINE):
print(match.string)
Because you only print the matched part, just do print(log) instead and it'll print the whole filename.

why is regex matching into string output of file.read() different when i assign it to a variable?

i am trying to regex match into a yml file, and i have found a situation i don't understand with how read() works in Python.
i’m opening a <4k yaml file as such:
with open(filename, 'r') as fptr:
i have three things to compare wrt behavior:
ftpr.read()
new_file = fptr.read()
copied the stdout from fptr.read() into variable “filedummy”
if i apply type() around all three, i get 'str'
if i print each to stdout, they're identical
now if i apply regex:
print [match for match in re.findall(YAML_PATTERN, filedummy, flags=re.MULTILINE)], "<=== matches"
print [match for match in re.findall(YAML_PATTERN, new_file, flags=re.MULTILINE)], "<=== matches"
print [match for match in re.findall(YAML_PATTERN, fptr.read(), flags=re.MULTILINE)], "<=== NOTHING"
note: this behavior still occurs if i do these actions in isolation, so it doesn't seem to be because i'm exhausting the ftpr.read() output

Finding multiple lines with a regex?

I am trying to complete a "Regex search" project from the book Automate boring stuff with python. I tried searching for answer, but I failed to find related thread in python.
The task is: "Write a program that opens all .txt files in a folder and searches for any line that matches a user-supplied regular expression. The results should be printed to the screen."
With the below compile I manage to find the first match
regex = re.compile(r".*(%s).*" % search_str)
And I can print it out with
print(regex.search(content).group())
But if I try to use
print(regex.findall(content))
The output is only the inputted word/words, not the whole line they are on. Why won't findall match the whole line, even though that is how I compiled the regex?
My code is as follows.
# Regex search - Find user given text from a .txt file
# and prints the line it is on
import re
# user input
print("\nThis program searches for lines with your string in them\n")
search_str = input("Please write the string you are searching for: \n")
print("")
# file input
file = open("/users/viliheikkila/documents/kooditreeni/input_file.txt")
content = file.read()
file.close()
# create regex
regex = re.compile(r".*(%s).*" % search_str)
# print out the lines with match
if regex.search(content) is None:
print("No matches was found.")
else:
print(regex.findall(content))
In python regex, parentheses define a capturing group. (See here for breakdown and explanation).
findall will only return the captured group. If you want the entire line, you will have to iterate over the result of finditer.

Passing REGEX string into re.search

I am stuck when trying to substitute a variable into a re.search.
I use the following code to gather a stored regex from a file and save it to the variable "regex." In this example the stored regex is used to find ip addresses with port numbers from a log message.
for line in workingconf:
regexsearch = re.search(r'regex>>>(.+)', line)
if regexsearch:
regex = regexsearch.group(1)
print regex
#I use re.search to go through "data" to find a match.
data = '[LOADBALANCER] /Common/10.10.10.10:10'
alertforsrch = re.search(r'%s' % regex, data)
if alertforsrch:
print "MATCH"
print alertforsrch.group(1)
else:
print "no match"
When this program runs I get the following.
$ ./messageformater.py
/Common/([\d]{1,}\.[\d]{1,}\.[\d]{1,}\.[\d]{1,}:[\d]{1,})
no match
when I change re.search to the following it works. The regex will be obtained from the file and may not be the same every time. That is why I am trying to use a variable.
for line in workingconf:
regexsearch = re.search(r'regex>>>(.+)', line)
if regexsearch:
regex = regexsearch.group(1)
print regex
alertforsrch = re.search(r'/Common/([\d]{1,}\.[\d]{1,}\.[\d]{1,}\.[\d]{1,}:[\d]{1,})', data)
if alertforsrch:
print "MATCH"
print alertforsrch.group(1)
else:
print "no match"
####### Results ########
$./messageformater.py
/Common/([\d]{1,}\.[\d]{1,}\.[\d]{1,}\.[\d]{1,}:[\d]{1,})
MATCH
10.10.10.10:10
Works fine for me...
Why even bother with the string formatter though? re.search(regex, data) should work fine.
You may have a newline character at the end of the regex read in from the file - try re.search(regex.strip(), data)

Categories