I am trying to complete a "Regex search" project from the book Automate boring stuff with python. I tried searching for answer, but I failed to find related thread in python.
The task is: "Write a program that opens all .txt files in a folder and searches for any line that matches a user-supplied regular expression. The results should be printed to the screen."
With the below compile I manage to find the first match
regex = re.compile(r".*(%s).*" % search_str)
And I can print it out with
print(regex.search(content).group())
But if I try to use
print(regex.findall(content))
The output is only the inputted word/words, not the whole line they are on. Why won't findall match the whole line, even though that is how I compiled the regex?
My code is as follows.
# Regex search - Find user given text from a .txt file
# and prints the line it is on
import re
# user input
print("\nThis program searches for lines with your string in them\n")
search_str = input("Please write the string you are searching for: \n")
print("")
# file input
file = open("/users/viliheikkila/documents/kooditreeni/input_file.txt")
content = file.read()
file.close()
# create regex
regex = re.compile(r".*(%s).*" % search_str)
# print out the lines with match
if regex.search(content) is None:
print("No matches was found.")
else:
print(regex.findall(content))
In python regex, parentheses define a capturing group. (See here for breakdown and explanation).
findall will only return the captured group. If you want the entire line, you will have to iterate over the result of finditer.
Related
Can anyone please provide me the regex code for printing only the first line of the data in the text file??? I am using spyder
i have tried may solutions but it prints all my data in every line ...last one helped me but it chose two lines. i just want the first line of my text file only till it encounters line break or till the text starts from next line.
import re
def getname(s):
nameregex=re.findall(r'^.*?[\.!\?](?:\s|$)',line)
if len(nameregex)!=0:
print(nameregex)
s = open('yesno.txt')
for line in s:
getname(s)
In the output i am getting first two lines.
Basically i am trying to print the company name only which is mostly in the first line.
Read the file into a variable using read() and use re.search to get the match:
import re
def getname(s):
nameregex=re.search(r'^.*?[.!?](?!\S)', s) # Run search with regex
if nameregex: # If there is a match
print(nameregex.group()) # Get Group 0 - whole match - value
s = open('yesno.txt', 'r') # Open file handle to read it
contents = s.read() # Get all file contents
getname(contents) # Run the getname method with the contents
See the Python demo.
The regex is a bit modified to avoid the whitespace at the end. See details:
^ - start of the string
.*? - any 0 or more chars other than line break chars, as few as possible
[.!?] - ., ! or ? char
(?!\S) - there must be a whitespace or end of string here.
See the regex graph:
I am using Python Paramiko module to sftp into one of my servers. I did a list_dir() to get all of the files in the folder. Out of the folder I'd like to use regex to find the matching pattern and then printout the entire string.
List_dir will list a list of the XML files with this format
LOG_MMDDYYYY_HHMM.XML
LOG_07202018_2018 --> this is for the date 07/20/2018 at the time 20:18
Id like to use regex to file all the XML files for that particular date and store them to a list or a variable. I can then pass this variable to Paramiko to get the file.
for log in file_list:
regex_pattern = 'POSLog_' + date + '*'
if (re.search(regex_pattern, log) != None):
matchObject = re.findall(regex_pattern, log)
print(matchObject)
the code above just prints:
['Log_07202018'] I want it to store the entire string Log_07202018_20:18.XML to a variable.
How would I go about doing this?
Thank you
If you are looking for a fixed string, don't use regex.
search_str = 'POSLog_' + date
for line in file_list:
if search_str in line:
print(line)
Alternatively, a list comprehension can make list of matching lines in one go:
log_lines = [line for line in file_list if search_str in line]
for line in log_lines:
print(line)
If you must use regex, there are a few things to change:
Any variable part that you put into the regex pattern must either be guaranteed to be a regex itself, or it must be escaped.
"The rest of the line" is not *, it's .*.
The start-of-line anchor ^ should be used to speed up the search - this way the regex fails faster when there is no match on a given line.
To support the ^ on multiple lines instead of only at the start of the entire string, the MULTILINE flag is needed.
There are several ways of getting all matches. One could do "for each line, if there is a match, print line", same as above. Here I'm using .finditer() and a search over the whole input block (i.e. not split into lines).
log_pattern = '^POSLog_' + re.escape(date) + '.*'
for match in re.finditer(log_pattern, whole_file, re.MULTILINE):
print(match.string)
Because you only print the matched part, just do print(log) instead and it'll print the whole filename.
So I know the setpoints <start point> and <end point> in the text file and I need to use these to find certain information between them which will be used and printed. I currently have .readlines() within a different function which is used within the new function to find the information.
You can try something like this:
flag = False
info = [] # your desired information will be appended as a string in list
with open(your_file, 'r') as file:
for line in file.readlines():
if '<start point>' in line: # Pointer reached the start point
flag = True
if '<end point>' in line: # Pointer reached the end point
flag = False
if flag: # this line is between the start point and endpoint
info.append(line)
>> info
['Number=12', 'Word=Hello']
This seems like a job for regular expressions. If you have not yet encountered regular expressions, they are an extremely powerful tool that can basically be used to search for a specific pattern in a text string.
For example the regular expression (or regex for short) Number=\d+ would find any line in the text document that has Number= followed by any number of number characters. The regex Word=\w+ would match any string starting with Word= and then followed by any number of letters.
In python you can use regular expression through the re module. For a great introduction to using regular expressions in python check out this chapter from the book Automate the Boring Stuff with Python. To test out regular expressions this site is great.
In this particular instance you would do something like:
import re
your_file = "test.txt"
with open(your_file,'r') as file:
file_contents = file.read()
number_regex = re.compile(r'Number=\d+')
number_matches = re.findall(number_regex, file_contents)
print(number_matches)
>>> ['Number=12']
This would return a list with all matches to the number regex. You could then do the same thing for the word match.
I am stuck when trying to substitute a variable into a re.search.
I use the following code to gather a stored regex from a file and save it to the variable "regex." In this example the stored regex is used to find ip addresses with port numbers from a log message.
for line in workingconf:
regexsearch = re.search(r'regex>>>(.+)', line)
if regexsearch:
regex = regexsearch.group(1)
print regex
#I use re.search to go through "data" to find a match.
data = '[LOADBALANCER] /Common/10.10.10.10:10'
alertforsrch = re.search(r'%s' % regex, data)
if alertforsrch:
print "MATCH"
print alertforsrch.group(1)
else:
print "no match"
When this program runs I get the following.
$ ./messageformater.py
/Common/([\d]{1,}\.[\d]{1,}\.[\d]{1,}\.[\d]{1,}:[\d]{1,})
no match
when I change re.search to the following it works. The regex will be obtained from the file and may not be the same every time. That is why I am trying to use a variable.
for line in workingconf:
regexsearch = re.search(r'regex>>>(.+)', line)
if regexsearch:
regex = regexsearch.group(1)
print regex
alertforsrch = re.search(r'/Common/([\d]{1,}\.[\d]{1,}\.[\d]{1,}\.[\d]{1,}:[\d]{1,})', data)
if alertforsrch:
print "MATCH"
print alertforsrch.group(1)
else:
print "no match"
####### Results ########
$./messageformater.py
/Common/([\d]{1,}\.[\d]{1,}\.[\d]{1,}\.[\d]{1,}:[\d]{1,})
MATCH
10.10.10.10:10
Works fine for me...
Why even bother with the string formatter though? re.search(regex, data) should work fine.
You may have a newline character at the end of the regex read in from the file - try re.search(regex.strip(), data)
Currently, I am using a regular expression to search for a pattern of numbers in a log file. I also want to add another search capability, general user submitted ascii string search and print out the line number. This is what I have and trying work around (help is appreciated):
logfile = open("13.00.log", "r")
searchString = raw_input("Enter search string: ")
for line in logfile:
search_string = searchString.findall(line)
for word in search_string:
print word #ideally would like to create and write to a text file
First of all, strings don't have a findall method -- I don't know where you got that. Second, why use a string method or regex at all? For a simple string search of the kind you're describing, in is sufficient, as in if search_string in line:. To get line numbers, a quick solution is the enumerate built-in function: for line_number, line in enumerate(logfile):.
Your code seems fairly fragmented. Psuedocode would look something like
get_search_string
for line, line_no in logfile:
if search_string in line:
do output with line_no