Python Searching for String and printing the file it is in - python

I am working on a small program for work, and I have looked everywhere for help on this!
What I'm trying to do is to allow the user to put in string to search. The program will search multiple .txt files in a defined directory for the string and then either print the result, or open the .txt file using the default text editor.
Can someone please point me in the right direction for this search feature?
Thanks in advance!
Edit:
This is what I have so far. I cant use grep as this program wil be running on Windows as well as OSX. I have yet to test on Windows, but on OSX my results are access denied.
import os
import subprocess
text = str(raw_input("Enter the text you want to search for: "))
thedir = './f'
for file in os.listdir(thedir):
document = os.path.join(thedir, file)
for line in open(document):
if text in line:
subpocess.call(document, shell=True)

There are much better tools to do this (grep was mentioned, and it is probably the best way to go).
Now, if you want a Python solution (which would run very slow), you can start from here:
import os
def find(word):
def _find(path):
with open(path, "rb") as fp:
for n, line in enumerate(fp):
if word in line:
yield n+1, line
return _find
def search(word, start):
finder = find(word)
for root, dirs, files in os.walk(start):
for f in files:
path = os.path.join(root, f)
for line_number, line in finder(path):
yield path, line_number, line.strip()
if __name__ == "__main__":
import sys
if not len(sys.argv) == 3:
print("usage: word directory")
sys.exit(1)
word = sys.argv[1]
start = sys.argv[2]
for path, line_number, line in search(word, start):
print ("{0} matches in line {1}: '{2}'".format(path, line_number, line))
Please take this with a grain of salt: it will not use regular expressions, or be smart at all. For example, if you try to search for "hola" it will match "nicholas", but not "Hola" (in the latter case, you could add a line.lower() method.
Again, it is just a beginning to show you a possible way to start. However, please PLEASE use grep.
Cheers.
Sample run (I called this script "pygrep.py"; $ is the command prompt):
$python pygrep.py finder .
./pygrep.py matches in line 12: 'finder = find(word)'
./pygrep.py matches in line 16: 'for line_number, line in finder(path):'
./pygrep.py~ matches in line 11: 'finder = find(word)'

below are hints to your answer :)
You can use os.walk to traverse all the files in the specified directory structure, search the string in the file, use subprocess module to open the file in the required editor...

import os
import subprocess
text = str(raw_input("Enter the text you want to search for: "))
thedir = 'C:\\your\\path\\here\\'
for file in os.listdir(thedir):
filepath = thedir + file
for line in open(filepath):
if text in line:
subprocess.call(filepath, shell=True)
break

Related

How to find files with given extension in particular level of sub directory with specified pattern using python?

In bash it can be easily done using grep like this:
grep "$pattern" $directory/*/*/*/level4/*.txt* > out/$pattern.txt
where $pattern is the pattern, $directory is the base directory and we are looking for .txt files only at the 4th level subdirectories with the name level4. And possibly redirecting the output to file with the pattern name itself. This works perfectly in bash. Is there an easy equivalent in Python?
I tried iterating over all the subdirectories in $directory using for subdir, dirs, files in os.walk, endswith & find but that would look at all files instead of the 4th level with specified name.
Like this:
import glob
import re
pattern = 'directory_name/*/*/*/level4/*.txt*'
regex = re.compile(r'.*test.*')
for filename in glob.glob(pattern):
with open(filename) as file_desc:
for line_num, line in enumerate(file_desc):
if not regex.match(line):
continue
output = '{}:{} {}'.format(filename, line_num, line)
print(output, end='')
This should be equivalent to your bash command (but I didn't test it). I wanted to give an answer that shouldn't require tweaking and uses pathlib (which is amazing).
from pathlib import Path
import re
directory = Path('mydir')
out_directory = Path('out')
pattern = 'my (neat|cool|sick) pattern'
out_file = out_directory / f'{pattern}.txt'
with out_file.open('w') as out:
# this type of pattern is a `glob`
for found_file in directory.glob('*/*/*/level4/*.txt*')
with found_file.open('r') as f:
for line in f:
if re.search(pattern, line):
print(line, file=out)

Using terminal command to search through files for a specific string within a python script

I have a parent directory, and I'd like to go through that directory and grab each file with a specific string for editing in python. I have been using grep -r 'string' filepath in terminal, but I want to be able to do everything using python. I'm hoping to get all the files into an array and go through each of them to edit them.
Is there a way to do this by only running a python script?
changing current folder to parent
import os
os.chdir("..")
changing folder
import os
os.chdir(dir_of_your_choice)
finding files with a rule in the current folder
import glob
import os
current_dir = os.getcwd()
for f in glob.glob('*string*'):
do_things(f)
import os
#sourceFolder is the folder you're going to be looking inside for backslashes are a special character in python so they're escaped as double backslashes
sourceFolder = "C:\\FolderBeingSearched\\"
myFiles = []
# Find all files in the directory
for file in os.listdir(sourceFolder):
myFiles.append(file)
#open them for editing
for file in myFiles:
try:
open(sourceFolder + file,'r')
except:
continue
#run whatever code you need to do on each open file here
print("opened %s" % file)
EDIT: If you want to separate all files that contain a string (this just prints the list at the end currently):
import os
#sourceFolder is the folder you're going to be looking inside for backslashes are a special character in python so they're escaped as double backslashes
sourceFolder = "C:\\FolderBeingSearched\\"
myFiles = []
filesContainString = []
stringsImLookingFor = ['first','second']
# Find all files in the directory
for file in os.listdir(sourceFolder):
myFiles.append(file)
#open them for editing
for file in myFiles:
looking = sourceFolder + file
try:
open(looking,'r')
except:
continue
print("opened %s" % file)
found = 0
with open(looking,encoding="latin1") as in_file:
for line in in_file:
for x in stringsImLookingFor:
if line.find(x) != -1:
#do whatever you need to do to the file or add it to a list like I am
filesContainString.append(file)
found = 1
break
if found:
break
print(filesContainString)

Looping through (and opening) specific types of files in folder?

I want to loop through files with a certain extension in a folder, in this case .txt, open the file, and print matches for a regex pattern. When I run my program however, it only prints results for one file out of the two in the folder:
Anthony is too cool for school. I Reported the criminal. I am Cool.
1: A, I, R, I, C
My second file contains the text:
Oh My initials are AK
And finally my code:
import re, os
Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0
for file in files:
if '.txt' not in file:
del files[files.index(file)]
continue
count += 1
fileobj = open(os.path.join(filepath, file), 'r')
filetext = fileobj.read()
Matches = Regex.findall(filetext)
print(str(count)+': ' +', '.join(Matches), end = ' ')
fileobj.close()
Is there a way to loop through (and open) a list of files? Is it because I assign every File Object returned by open(os.path.join(filepath, file), 'r') to the same name fileobj?
U can do as simple as this :(its just a loop through file)
import re, os
Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0
for file in files:
if '.txt' in file:
fileobj = open(os.path.join(filepath, file), 'r')
filetext = fileobj.read()
Matches = Regex.findall(filetext)
print(str(count)+': ' +', '.join(Matches), end == ' ')
fileobj.close()
The del is causing the problem. The for loop have no idea if you delete an element or not, so it always advances. There might be a hidden file in the directory, and it is the first element in the files. After it got deleted, the for loop skips one of the files and then reads the second one. To verify, you can print out the files and the file at the beginning of each loop. In short, removing the del line should solve the problem.
If this is a standalone script, bash might be more clean:
count=0
for file in "$1"/*.txt; do
echo -n "${count}: $(grep -o '[A-Z]' "$file" | tr "\n" ",") "
((count++))
done
glob module will help you much more since you want to read files with specific extension.
You can directly get list of files with extension "txt" i.e. you saved one 'if' construct.
More info on glob module.
Code will be less and more readable.
import glob
for file_name in glob.glob(r'C:/Users/dinesh_pundkar\Desktop/*.txt'):
with open(file_name,'r') as f:
text = f.read()
"""
After this you can add code for Regex matching,
which will match pattern in file text.
"""

Trying to access a file located on a flashdrive

I have made a simple test code in python that reads from a text file, and then preforms an action if the text file contains a line "on".
My code works fine if i run the script on my hardive with the text file in the same folder. Example, (C:\Python27\my_file.txt, and C:\Python27\my_scipt.py).
However, if I try this code while my text file is located on my flashdrive and my script is still on my hardrive it won't work even though I have the correct path specified. Example, (G:\flashdrive_folder\flashdrive_file.txt, and C:\Python27\my_scipt.py).
Here is the code I have written out.
def locatedrive():
file = open("G:\flashdrive_folder\flashdrive_file.txt", "r")
flashdrive_file = file.read()
file.close()
if flashdrive_file == "on":
print "working"
else:
print"fail"
while True:
print "trying"
try:
locatedrive()
break
except:
pass
break
The backslash character does double duty. Windows uses it as a path separator, and Python uses it to introduce escape sequences.
You need to escape the backslash (using a backslash!), or use one of the other techniques below:
file = open("G:\\flashdrive_folder\\flashdrive_file.txt", "r")
or
file = open(r"G:\flashdrive_folder\flashdrive_file.txt", "r")
or
file = open("G:/flashdrive_folder/flashdrive_file.txt", "r")
cd /media/usb0
import os
path = "/media/usb0"
#!/usr/bin/python
import os
path = "/usr/tmp"
# Check current working directory.
retval = os.getcwd()
print "Current working directory %s" % retval
# Now change the directory
os.chdir( path )
# Check current working directory.
retval = os.getcwd()
print "Directory changed successfully %s" % retval
Use:
import os
os.chdir(path_to_flashdrive)

How to modify all text files in folder with Python Script

Brand new to Python, first time poster, be easy on me please!
I would like to insert a line of text into all files of a specific extension (in the example, .mod) within the current folder. It can point to a specific folder if that is easier.
Below is something that I copied and modified, it is doing exactly what I need for one specific file, the part about replacing sit with SIT is completely unnecessary, but if I remove it the program doesn't work. I have no idea why that is, but I can live with that.
import sys, fileinput
for i, line in enumerate(fileinput.input('filename.mod', inplace=1)):
sys.stdout.write(line.replace('sit', 'SIT'))
if i == 30: sys.stdout.write('TextToInsertIntoLine32' '\n') #adds new line and text to line 32
My question is, how do I run this for all files in a directory? I have tried replacing the filename with sys.argv[1] and calling the script from the command line with '*.mod' which did not work for me. Any help would be appreciated.
You can do like this:
import os
folder = '...' # your directory
files = [f for f in os.listdir(folder) if f.endswith('.mod')]
Then you can get a list of files with the extension '.mod', you can run your function for all files.
You could use glob.glob to list all the files in the current working directory whose filename ends with .mod:
import fileinput
import glob
import sys
for line in fileinput.input(glob.glob('*.mod'), inplace=True):
sys.stdout.write(line.replace('sit', 'SIT'))
if fileinput.filelineno() == 32:
#adds new line and text to line 32
sys.stdout.write('TextToInsertIntoLine32' '\n')
You can use os.walk function:
for root, dirs, files in os.walk("/mydir"):
for file in files:
if file.endswith(".mod"):
filepath = os.path.join(root, file)
with open(filepath, 'r', encoding='utf-8') as f:
text = f.read()
print('%s read' % filepath)
with open(filepath, 'w', encoding='utf-8') as f:
f.write(text.replace('sit', 'SIT'))
print('%s updated' % filepath)
Easy: you don't have to specify a filename. fileinput will take the filenams from sys.argv by default. You don't even have to use enumerate, as fileinput numbers the lines in each file:
import sys, fileinput
for line in fileinput.input(inplace=True):
sys.stdout.write(line.replace('sit', 'SIT'))
if fileinput.filelineno() == 30:
sys.stdout.write('TextToInsertIntoLine32' '\n')

Categories