Find one file out of many containing a desired string in Python - python

I have a string like 'apples'. I want to find this string, and I know that it exists in one out of hundreds of files. e.g.
file1
file2
file3
file4
file5
file6
...
file200
All of these files are in the same directory. What is the best way to find which file contains this string using python, knowing that exactly one file contains it.
I have come up with this:
for file in os.listdir(directory):
f = open(file)
for line in f:
if 'apple' in f:
print "FOUND"
f.close()
and this:
grep = subprocess.Popen(['grep','-m1','apple',directory+'/file*'],stdout=subprocess.PIPE)
found = grep.communicate()[0]
print found

Given that the files are all in the same directory, we just get a current directory listing.
import os
for fname in os.listdir('.'): # change directory as needed
if os.path.isfile(fname): # make sure it's a file, not a directory entry
with open(fname) as f: # open file
for line in f: # process line by line
if 'apples' in line: # search for string
print 'found string in file %s' %fname
break
This automatically gets the current directory listing, and checks to make sure that any given entry is a file (not a directory).
It then opens the file and reads it line by line (to avoid problems with memory it doesn't read it in all at once) and looks for the target string in each line.
When it finds the target string it prints the name of the file.
Also, since the files are opened using with they are also automatically closed when we are done (or an exception occurs).

For simplicity, this assumes your files are in the current directory:
def whichFile(query):
for root,dirs,files in os.walk('.'):
for file in files:
with open(file) as f:
if query in f.read():
return file

for x in os.listdir(path):
with open(x) as f:
if 'Apple' in f.read():
#your work
break

a lazy-evaluation, itertools-based approach
import os
from itertools import repeat, izip, chain
gen = (file for file in os.listdir("."))
gen = (file for file in gen if os.path.isfile(file) and os.access(file, os.R_OK))
gen = (izip(repeat(file), open(file)) for file in gen)
gen = chain.from_iterable(gen)
gen = (file for file, line in gen if "apple" in line)
gen = set(gen)
for file in gen:
print file

Open your terminal and write this:
Case insensitive search
grep -i 'apple' /path/to/files
Recursive search (through all sub folders)
grep -r 'apple' /path/to/files

Related

opening and reading all the files in a directory in python - python beginner

I'd like to read the contents of every file in a folder/directory and then print them at the end (I eventually want to pick out bits and pieces from the individual files and put them in a separate document)
So far I have this code
import os
path = 'results/'
fileList = os.listdir(path)
for i in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines = file.readlines()
print(allLines)
at the end I dont get any errors but it only prints the contents of the last file in my folder in a series of strings and I want to make sure its reading every file so I can then access the data I want from each file. I've looked online and I cant find where I'm going wrong. Is there any way of making sure the loop is iterating over all my files and reading all of them?
also i get the same result when I use
file = open(os.path.join('results/',i), 'r')
in the 5th line
Please help I'm so lost
Thanks!!
Separate the different functions of the thing you want to do.
Use generators wherever possible. Especially if there are a lot of files or large files
Imports
from pathlib import Path
import sys
Deciding which files to process:
source_dir = Path('results/')
files = source_dir.iterdir()
[Optional] Filter files
For example, if you only need files with extension .ext
files = source_dir.glob('*.ext')
Process files
def process_files(files):
for file in files:
with file.open('r') as file_handle :
for line in file_handle:
# do your thing
yield line
Save the lines you want to keep
def save_lines(lines, output_file=sys.std_out):
for line in lines:
output_file.write(line)
you forgot indentation at this line allLines = file.readlines()
and maybe you can try that :
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)
You forgot to indent this line allLines.append(file.read()).
Because it was outside the loop, it only appended the file variable to the list after the for loop was finished. So it only appended the last value of the file variable that remained after the loop. Also, you should not use readlines() in this way. Just use read() instead;
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)
This also creates a file containing all the files you wanted to print.
rootdir= your folder, like 'C:\\Users\\you\\folder\\'
import os
f = open('final_file.txt', 'a')
for root, dirs, files in os.walk(rootdir):
for filename in files:
data = open(full_name).read()
f.write(data + "\n")
f.close()
This is a similar case, with more features: Copying selected lines from files in different directories to another file

how can i use fileinput to edit multiple files?

I am using os.walk in python 2.7 to open multiple files, then, add all lines of interest of those files to a list. Later I'd want to edit those lines with fileinput and close it. How can I achieve this? Using the code below is how I'm opening the files:
import os
import fnmatch
import fileinput
lines = []
def openFiles():
for root, dirs, files in os.walk('/home/test1/'):
for lists in fnmatch.filter(files, "*.txt"):
filepath = os.path.join(root, lists)
print filepath
with open(filepath, "r") as sources:#opens 8 files and read their lines
#edit = fileinput.input(filepath, inplace=1)
for line in sources:
if line.startswith('xe') :
lines.append(line)
Then later, for each lines that start with xe, I'd like to add a # in front of it then close that file. I'd like to do that in a different function.
Here's the I way I do it, adding to your code:
import os
import fnmatch
import fileinput
def openFiles(dir):
filePaths = []
for root, dirs, files in os.walk(dir):
for textFile in fnmatch.filter(files, "*.txt"):
filepath = os.path.join(root, textFile)
filePaths.append(filepath)
return filePaths
def prefixLines(filepaths, chartoPrefix, prefixWith):
res = ''
for filepath in filepaths:
# Read file
with open(filepath, 'r') as f:
for line in f:
if line.startswith(chartoPrefix):
res += prefixWith + line
else:
res += line
# Write to file
with open(filepath, 'w') as f:
f.write(res)
res = '' # Rest res
prefixLines(openFiles(r'/home/test1/'), 'xe', '#')
prefixLines suffers from many shortcomings:
Because we read all the lines of files and store them in res, we
may ran out of memory for large files.
If somehow the programmer forgot to indent res = '' in the
right block or if res was completely omitted and the code ran on
actual files that the user needs, you'll end up writing the contents
of the previous read file to the next file and the last
file will have the contents of all the read files. That's why you
have use this code in a testing environment or use it cautiously.
This code only serves to demonstrate how you could achieve your desired effects, prefixing file lines that starts with a string with another string. Therefore, a slight improvement of this code is recommended. For example, instead of reading all the contents of the file and storing them at res you could simply save the line number that needs to be prefixed and thus eliminating the need to load all the data into memory. enumerate could also be helpful to return the file number, it returns an iterable in 2.7. By obviating res not only do we save memory, but also eliminate the shortcoming in bullet 2.
I ended up doing it this way. But I'm using classes in my main code so It's split into 2 functions instead of one. In my main code, I used a list to hold all the file paths and use fileinput to open each filepaths from the list this way for line in fileinput.FileInput(pathlist, inplace=1): do something. I do thank #direprobs for her answer, as she shed some light on how I'm supposed to do this.
import fnmatch
import fileinput
import os
import sys
def openFiles():
for dirpath, dirs, files in os.walk('/home/test1/'):
for filename in fnmatch.filter(files, "*.txt"):
filepaths = os.path.join(dirpath, filename)
for line in fileinput.FileInput(filepaths, inplace=1):
if line.startswith("xe"):
add = "# {}".format(line)
line = line.replace(line, add)
sys.stdout.write(line)
fileinput.close()
openFiles()

Looping through (and opening) specific types of files in folder?

I want to loop through files with a certain extension in a folder, in this case .txt, open the file, and print matches for a regex pattern. When I run my program however, it only prints results for one file out of the two in the folder:
Anthony is too cool for school. I Reported the criminal. I am Cool.
1: A, I, R, I, C
My second file contains the text:
Oh My initials are AK
And finally my code:
import re, os
Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0
for file in files:
if '.txt' not in file:
del files[files.index(file)]
continue
count += 1
fileobj = open(os.path.join(filepath, file), 'r')
filetext = fileobj.read()
Matches = Regex.findall(filetext)
print(str(count)+': ' +', '.join(Matches), end = ' ')
fileobj.close()
Is there a way to loop through (and open) a list of files? Is it because I assign every File Object returned by open(os.path.join(filepath, file), 'r') to the same name fileobj?
U can do as simple as this :(its just a loop through file)
import re, os
Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0
for file in files:
if '.txt' in file:
fileobj = open(os.path.join(filepath, file), 'r')
filetext = fileobj.read()
Matches = Regex.findall(filetext)
print(str(count)+': ' +', '.join(Matches), end == ' ')
fileobj.close()
The del is causing the problem. The for loop have no idea if you delete an element or not, so it always advances. There might be a hidden file in the directory, and it is the first element in the files. After it got deleted, the for loop skips one of the files and then reads the second one. To verify, you can print out the files and the file at the beginning of each loop. In short, removing the del line should solve the problem.
If this is a standalone script, bash might be more clean:
count=0
for file in "$1"/*.txt; do
echo -n "${count}: $(grep -o '[A-Z]' "$file" | tr "\n" ",") "
((count++))
done
glob module will help you much more since you want to read files with specific extension.
You can directly get list of files with extension "txt" i.e. you saved one 'if' construct.
More info on glob module.
Code will be less and more readable.
import glob
for file_name in glob.glob(r'C:/Users/dinesh_pundkar\Desktop/*.txt'):
with open(file_name,'r') as f:
text = f.read()
"""
After this you can add code for Regex matching,
which will match pattern in file text.
"""

Renaming files in folder from a text file

I want to know if it's possible to rename file in folder from a text file
..?
I explain:
I have a text file in which we find for each line a name and path (and checksum).
I would like to rename the name of EVERY photo file ( path).
Extract from text file:
...
15554615_05_hd.jpg /photos/FRYW-1555-16752.jpg de9da252fa1e36dc0f96a6213c0c73a3
15554615_06_hd.jpg /photos/FRYW-1555-16753.jpg 04de10fa29b2e6210d4f8159b8c3c2a8
...
My /photos folder:
Example:
Rename the file FRYW-1555-16752.jpg to 15554615_05_hd.jpg
My script (just a beginning):
for line in open("myfile.txt") :
print line.rstrip('\n') # .rstrip('\n') removes the line breaks
Something like this ought to work. Replace the txt with reading from a file and for the file names use something like os.walk
import os
import shutil
txt = """
15554615_05_hd.jpg /photos/FRYW-1555-16752.jpg de9da252fa1e36dc0f96a6213c0c73a3
15554615_06_hd.jpg /photos/FRYW-1555-16753.jpg 04de10fa29b2e6210d4f8159b8c3c2a8
"""
filenames = 'FRYW-1555-16752', 'FRYW-1555-16753.jpg'
new_names = []
old_names = []
hashes = []
for line in txt.splitlines():
if not line:
continue
new_name, old_name, hsh = line.split()
new_names.append(new_name)
old_names.append(old_name)
hashes.append(hsh)
dump_folder = os.path.expanduser('~/Desktop/dump') # or some other folder ...
if not os.path.exists(dump_folder):
os.makedirs(dump_folder)
for old_name, new_name in zip(old_names, new_names):
if os.path.exists(old_name):
base = os.path.basename(old_name)
dst = os.path.join(dump_folder, base)
shutil.copyfile(old_name, dst)
import os
with open('file.txt') as f:
for line in f:
newname, file, checksum = line.split()
if os.path.exists(file):
try:
os.rename(file, os.sep.join([os.path.dirname(file), newname]))
except OSError:
print "Got a problem with file {}. Failed to rename it to {}.".format(file, newname)
The problem can be solved by:
Looping through all files using os.listdir(). listdir will help you get all file name, with current directory, use os.listdir(".")
Then using os.rename() to rename the file: os.rename(old_name, new_name)
Sample code: assuming you're dealing with *.jpg
added = "NEW"
for image in os.listdir("."):
new_image = image[:len(image)-4] + added + image[len(image)-4:]
os.rename(image, new_image)
Yes it can be done.
You can divide your problem in sub-problems:
Open txt-file
Use line from txt-file to identify the image you want to rename and the new name you want to give to it
Open the image copy content and write it in a new file with the new name, save new file
Delete old file
I am sure there will be a faster/better/more efficient way of doing this but it all comes to dividing and conquering your problem and its sub-problems.
Can be done in python using a loop, file open in read/write modes and "os" module to access the file system.

multiple search and replace in python

I need to search in a parent folder all files that are config.xml
and in those files replace one string in another. (from this-is to where-as)
import os
parent_folder_path = 'somepath/parent_folder'
for eachFile in os.listdir(parent_folder_path):
if eachFile.endswith('.xml'):
newfilePath = parent_folder_path+'/'+eachFile
file = open(newfilePath, 'r')
xml = file.read()
file.close()
xml = xml.replace('thing to replace', 'with content')
file = open(newfilePath, 'w')
file.write(str(xml))
file.close()
Hope this is what you are looking for.
You want to take a look at os.walk() for recursively traveling through a folder and subfolders.
Then, you can read each line (for line in myfile: ...) and do a replacement (line = line.replace(old, new)) and save the line back to a temporary file (tmp.write(line)), and finally copy the temp file over the original.

Categories