Unable to find string in text file - python

I am trying to simple find if a string exists in a text file, but I am having issues. I am assuming its something on the incorrect line, but I am boggled.
def extract(mPath, frequency):
if not os.path.exists('history.db'):
f = open("history.db", "w+")
f.close()
for cFile in fileList:
with open('history.db', "a+") as f:
if cFile in f.read():
print("File found - skip")
else:
#with ZipFile(cFile, 'r') as zip_ref:
#zip_ref.extractall(mPath)
print("File Not Found")
f.writelines(cFile + "\n")
print(cFile)
Output:
File Not Found
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\Test1.zip
File Not Found
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\test2.zip
Text within the history.db file:
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\Test1.zip
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\test2.zip
What am I missing? Thanks in advance
Note: cFile is the file path shown in the output and fileList is the list of both the paths from the output.

You're using the wrong flags for what you want to do. open(file, 'a') opens a file for append-writing, meaning that it seeks to the end of the file. Adding the + modifier means that you can also read from the file, but you're doing so from the end of the file; so read() returns nothing, because there's nothing beyond the end of the file.
You can use r+ to read from the start of the file while having the option of writing to it. But keep in mind that anytime you write you'll be writing to the reader's current position in the file.

I haven't tested the code but this should put you on the right track!
def extract(mPath, frequency):
if not os.path.exists('history.db'):
f = open("history.db", "w+")
f.close()
with open('history.db', "rb") as f:
data = f.readlines()
for line in data:
if line.rstrip() in fileList: #assuming fileList is a list of strings
#do everything else here

Related

read data from multiple files but would like to write that data into a new text file but file shows up blank

the code reads from multiple text files so far i have it to display on the terminal but i would like to have the info written into a text file but the text file shows up blank and dont know why new to python so still haven't figured out all the commands.
directory = 'C:\Assignments\\CPLfiles\*'
test = False
start_text = '^GMWE'
for filename in glob.glob(directory):
with open(filename) as f:
with open('file.txt', 'w') as f1:
for line in f:
#for x in line:
if test is False:
if re.search(start_text, line.strip()) is not None:
x = line.strip()
f1.write(x+ '\n')
print(x)
break
test = False
I think you should change the order of opening files to the following.
The problem is that for each file you open to read, you're also re-opening the file to write, whipping it's contents.
Also, due to the break you will write at maximum one line per file due to the break after the write statement.
If the last file that you opened does not have any match with the regular expression, then nothing will exist in the final file.
Hope it makes sense
directory = 'C:\Assignments\\CPLfiles\*'
test = False
start_text = '^GMWE'
with open('file.txt', 'w') as f1:
for filename in glob.glob(directory):
with open(filename) as f:
for line in f:
#for x in line:
if test is False:
if re.search(start_text, line.strip()) is not None:
x = line.strip()
f1.write(x+ '\n')
print(x)
break
test = False
I think that the main problem here is that you reopen file.txt for each file in you globbing. Each time opening it in write mode erases the file. If no line match in the last file you will end up with an empty file as a result. So your loop should be inside your with that opens this file.

open or create file in python and append to it

how do you do this series of actions in python?
1) Create a file if it does not exist and insert a string
2) If the file exists, search if it contains a string
3) If the string does not exist, hang it at the end of the file
I'm currently doing it this way but I'm missing a step
EDIT
with this code every time i call the function seems that the file does not exist and overwrite the older file
def func():
if not os.path.exists(path):
#always take this branch
with open(path, "w") as myfile:
myfile.write(string)
myfile.flush()
myfile.close()
else:
with open(path) as f:
if string in f.read():
print("string found")
else:
with open(path, "a") as f1:
f1.write(string)
f1.flush()
f1.close()
f.close()
Try this:
with open(path, 'a+') as file:
file.seek(0)
content = file.read()
if string not in content:
file.write(string)
seek will move your pointer to the start, and write will move it back to the end.
Edit:
Also, you don't need to check the path.
Example:
>>> f = open('example', 'a+')
>>> f.write('a')
1
>>> f.seek(0)
0
>>> f.read()
'a'
file example didn't exist, but when I called open() it was created. see why
You don't need to reopen the file if you have not yet closed it after initially opening it. Use "a" when opening the file in order to append to it. So... "else: with open(path, "a") as f: f.write(string)". Try that

How to make a program that replaces newlines in python file with a string [duplicate]

This question already has answers here:
Why doesn't calling a string method (such as .replace or .strip) modify (mutate) the string?
(3 answers)
Closed 3 years ago.
I am trying to display my python file in html and therefore I would like to replace every time the file jumps to a newline with < br> but the program I've written is not working.
I've looked on here and tried changing the code around a bit I have gotten different results but not the ones I need.
with open(path, "r+") as file:
contents = file.read()
contents.replace("\n", "<br>")
print(contents)
file.close()
I want to have the file display < br> every time I have a new line but instead the code dosen't change anything to the file.
Here is an example program that works:
path = "example"
contents = ""
with open(path, "r") as file:
contents = file.read()
new_contents = contents.replace("\n", "<br>")
with open(path, "w") as file:
file.write(new_contents)
Your program doesn't work because the replace method does not modify the original string; it returns a new string.
Also, you need to write the new string to the file; python won't do it automatically.
Hope this helps :)
P.S. a with statement automatically closes the file stream.
Your code reads from the file, saves the contents to a variable and replaces the newlines. But the result is not saved anywhere. And to write the result into a file you must open the file for writing.
with open(path, "r+") as file:
contents = file.read()
contents = contents.replace("\n", "<br>")
with open(path, "w+") as file:
contents = file.write(contents)
there are some issues in this code snippet.
contents.replace("\n", "<br>") will return a new object which replaced \n with <br>, so you can use html_contents = contents.replace("\n", "<br>") and print(html_contents)
when you use with the file descriptor will close after leave the indented block.
Try this:
import re
with open(path, "r") as f:
contents = f.read()
contents = re.sub("\n", "<br>", contents)
print(contents)
Borrowed from this post:
import tempfile
def modify_file(filename):
#Create temporary file read/write
t = tempfile.NamedTemporaryFile(mode="r+")
#Open input file read-only
i = open(filename, 'r')
#Copy input file to temporary file, modifying as we go
for line in i:
t.write(line.rstrip()+"\n")
i.close() #Close input file
t.seek(0) #Rewind temporary file to beginning
o = open(filename, "w") #Reopen input file writable
#Overwriting original file with temporary file contents
for line in t:
o.write(line)
t.close() #Close temporary file, will cause it to be deleted

Replace newlines with a space in all files in a directory - Python

I have about 4000 txt files in a directory. I'd like to replace newlines with spaces in each file using a for loop. Actually, the script works for that purpose but when I save the file, it doesn't get saved or it gets saved with newlines again. Here is my script;
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.write(data)
As I said I'm able to replace the newlines with a space, but at the end, it doesn't get saved. I also don't get any errors.
To further elaborate my comment ("It's almost always a bad idea to open a file in the 'r+' mode (because of the way the current position is handled). Open a file for reading, read the data, replace the newlines, open the same file file for writing, write the data"):
for file in glob.glob(path):
with open(file) as f:
data = f.read().replace('\n', ' ')
with open(file, "w") as f:
f.write(data)
You need to reset file position to 0 with seek and then truncate the leftover with truncate after you finishing writing the replacement string.
import glob
path = "path_to_files/*.txt"
for file in glob.glob(path):
with open(file, "r+") as f:
data = f.read().replace('\n', ' ')
f.seek(0)
f.write(data)
f.truncate()

Slow python file I:O; Ruby runs better than this; Got the wrong language?

Please advise - I'm going to use this asa learning point. I'm a beginner.
I'm splitting a 25mb file into several smaller file.
A Kindly guru here gave me a Ruby sript. It works beautifully fast. So, in order to learn I mimicked it with a python script. This runs like a three-legged cat (slow). I wonder if anyone can tell me why?
My python script
##split a file into smaller files
###########################################
def splitlines (file) :
fileNo=0001
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
if re.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
fileNo +=1 #and add one to the filename, starting to read lines in again
else: # otherwise
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
outFile.write(line) ## then append it to the open outFile
fh.close()
The guru's Ruby 1.9 script
g=0001
f=File.open(g.to_s + ".txt","w")
open("corpus1.txt").each do |line|
if line[/\d+ of \d+ DOCUMENTS/]
f.close
f=File.open(g.to_s + ".txt","w")
g+=1
end
f.print line
end
There are many reasons why your script is slow -- the main reason being that you reopen the outputfile for almost every line you write. Since the old file gets implicitly closed on opening a new one (due to Python garbage collection), the write buffer is flushed for every single line you write, which is quite expensive.
A cleaned up and corrected version of your script would be
def file_generator():
file_no = 1
while True:
f = open(r"C:\Users\dunner7\Desktop\Textomics\Media"
r"\LexisNexus\ele\newdocs\%s.txt" % file_no, 'a')
yield f
f.close()
file_no += 1
def splitlines(filename):
files = file_generator()
out_file = next(files)
with open(filename) as in_file:
for line in in_file:
if "Copyright " in line:
out_file = next(files)
out_file.write(line)
out_file.close()
I guess the reason your script is so slow is that you open a new file descriptor for each line. If you look at your guru's ruby script, it closes and opens the output file only if your separator matches.
In contrast to that, your python script opens a new file descriptor for every line you read (and btw, does not close them). Opening a file requires talking to the kernel, so this is relatively slow.
Another change I would suggest is to change
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
to
fh = open(file, "r")
for line in fh:
With this change, you do not read the whole file into memory, but only block after block. Although it should not matter with a 25MiB file, it will hurt you with big files and is good practice (and less code ;)).
The Python code might be slow due to regex and not IO. Try
def splitlines (file) :
fileNo=0001
outFile=open("newdocs/%s.txt" % fileNo, 'a') ## open file to append
reg = re.compile("Copyright ")
for line in open(file, "r"):
if reg.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
outFile=open("newdocs%s.txt" % fileNo, 'a') ## open file to append
fileNo +=1 #and add one to the filename, starting to read lines in again
outFile.write(line) ## then append it to the open outFile
Several notes
Always use / instead of \ for path name
If regex is used repeatedly, compile it
Do you need re.search? or re.match?
UPDATE:
#Ed. S: point taken
#Winston Ewert: code updated to be closer to the original Ruby code
rosser,
Don't use names of built-in objects as identifiers in a code (file, splitlines)
The following code respects the effect of your own code: an out_file is closed without the line containing 'Copyright ' that constitutes the signal of closing
The use of the function writelines() is intended to obtain a faster execution than with a repetition of out_file.write(line)
The if li: block is there to trigger the closing of out_file in case the last line of the read file doesn't contains 'Copyright '
def splitfile(filename, wordstop, destrep, file_no = 1, li = []):
with open(filename) as in_file:
for line in in_file:
if wordstop in line:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)
file_no += 1
li = []
else:
li.append(line)
if li:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)

Categories