I would like to have a function which takes in a path to a file, checks if the file ends in an \n, and add in a \n if it doesn't.
I know that I could do this by opening the file twice, once in read mode and then again in append mode, but I feel like I must be missing something... I feel like 'w+' mode, for example, must be able to do it.
Here's a way of doing this opening the file twice (I want something simpler where you only open it once).
def ensureFileEndsWith(path, end):
with open(path) as f:
f.seek(-1, 2)
alreadyGood = f.read(1) == end
if not alreadyGood:
with open(path, 'a') as f:
f.write(end)
I want to do the same thing, but only opening the file once. I tried this:
def ensureFileEndsWith(path, end):
with open(path, 'w+') as f:
f.seek(-1, 2)
if not f.read(1) == end:
f.write(end)
But it printed out this exception:
IOError: [Errno 22] Invalid argument
Regarding my usage of seek in a file opened in 'w+' mode.
First of all you want open(path, 'r+'); 'w+' truncates the file. The reason you were getting that error is because you can't do f.seek(-1, 2) into an empty file. This should do it for you:
def ensureFileEndsWith(path, end):
with open(path, 'r+') as f:
try:
f.seek(-len(end), 2)
except IOError:
# The file is shorter than end (possibly empty)
f.seek(0, 2)
# Passing in a number to f.read() is unnecessary
if f.read() != end:
f.write(end)
Related
I have this function on my code that is supposed to read a files last line, and if there is no file create one. My issue is when it creates the files and tries to read the last line it comes up as an error.
with open(HIGH_SCORES_FILE_PATH, "w+") as file:
last_line = file.readlines()[-1]
if last_line == '\n':
with open(HIGH_SCORES_FILE_PATH, 'a') as file:
file.write('Jogo:')
file.write('\n')
file.write(str(0))
file.write('\n')
I have tried multiple ways of reading the last line but all of the ones I've tried ends in an error.
Opening a file in "w+" erases any content in the file. readlines() returns an empty list and trying to get value results in an IndexError. You can test for a file's existence with os.path.exists or os.path.isfile, or you could use an exception handler to deal with that case.
Start with last_line set to a sentinel value. If the open fails, or if no lines are read, last_line will not be updated and you can base file creation on that.
last_line = None
try:
with open(HIGH_SCORES_FILE_PATH) as file:
for last_line in file:
pass
except OSError:
pass
if last_line is None:
with open(HIGH_SCORES_FILE_PATH, "w") as file:
file.write('Jogo:\n0\n')
last_line = '0\n'
Let's say that I have a txt file that I have to get all in lowercase. I tried this
def lowercase_txt(file):
file = file.casefold()
with open(file, encoding = "utf8") as f:
f.read()
Here I get "'str' object has no attribute 'read'"
then I tried
def lowercase_txt(file):
with open(poem_filename, encoding="utf8") as f:
f = f.casefold()
f.read()
and here '_io.TextIOWrapper' object has no attribute 'casefold'
What can I do?
EDIT: I re-runned this exact code and now there are no errors (dunno why) but the file doesn't change at all, all the letters stay the way they are.
This will rewrite the file. Warning: if there is some type of error in the middle of processing (power failure, you spill coffee on your computer, etc.) you could lose your file. So, you might want to first make a backup of your file:
def lowercase_txt(file_name):
"""
file_name is the full path to the file to be opened
"""
with open(file_name, 'r', encoding = "utf8") as f:
contents = f.read() # read contents of file
contents = contents.lower() # convert to lower case
with open(file_name, 'w', encoding = "utf8") as f: # open for output
f.write(contents)
For example:
lowercase_txt('/mydirectory/test_file.txt')
Update
The following version opens the file for reading and writing. After the file is read, the file position is reset to the start of the file before the contents is rewritten. This might be a safer option.
def lowercase_txt(file_name):
"""
file_name is the full path to the file to be opened
"""
with open(file_name, 'r+', encoding = "utf8") as f:
contents = f.read() # read contents of file
contents = contents.lower() # convert to lower case
f.seek(0, 0) # position back to start of file
f.write(contents)
f.truncate() # in case new encoded content is shorter than older
how do you do this series of actions in python?
1) Create a file if it does not exist and insert a string
2) If the file exists, search if it contains a string
3) If the string does not exist, hang it at the end of the file
I'm currently doing it this way but I'm missing a step
EDIT
with this code every time i call the function seems that the file does not exist and overwrite the older file
def func():
if not os.path.exists(path):
#always take this branch
with open(path, "w") as myfile:
myfile.write(string)
myfile.flush()
myfile.close()
else:
with open(path) as f:
if string in f.read():
print("string found")
else:
with open(path, "a") as f1:
f1.write(string)
f1.flush()
f1.close()
f.close()
Try this:
with open(path, 'a+') as file:
file.seek(0)
content = file.read()
if string not in content:
file.write(string)
seek will move your pointer to the start, and write will move it back to the end.
Edit:
Also, you don't need to check the path.
Example:
>>> f = open('example', 'a+')
>>> f.write('a')
1
>>> f.seek(0)
0
>>> f.read()
'a'
file example didn't exist, but when I called open() it was created. see why
You don't need to reopen the file if you have not yet closed it after initially opening it. Use "a" when opening the file in order to append to it. So... "else: with open(path, "a") as f: f.write(string)". Try that
I am trying to simple find if a string exists in a text file, but I am having issues. I am assuming its something on the incorrect line, but I am boggled.
def extract(mPath, frequency):
if not os.path.exists('history.db'):
f = open("history.db", "w+")
f.close()
for cFile in fileList:
with open('history.db', "a+") as f:
if cFile in f.read():
print("File found - skip")
else:
#with ZipFile(cFile, 'r') as zip_ref:
#zip_ref.extractall(mPath)
print("File Not Found")
f.writelines(cFile + "\n")
print(cFile)
Output:
File Not Found
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\Test1.zip
File Not Found
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\test2.zip
Text within the history.db file:
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\Test1.zip
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\test2.zip
What am I missing? Thanks in advance
Note: cFile is the file path shown in the output and fileList is the list of both the paths from the output.
You're using the wrong flags for what you want to do. open(file, 'a') opens a file for append-writing, meaning that it seeks to the end of the file. Adding the + modifier means that you can also read from the file, but you're doing so from the end of the file; so read() returns nothing, because there's nothing beyond the end of the file.
You can use r+ to read from the start of the file while having the option of writing to it. But keep in mind that anytime you write you'll be writing to the reader's current position in the file.
I haven't tested the code but this should put you on the right track!
def extract(mPath, frequency):
if not os.path.exists('history.db'):
f = open("history.db", "w+")
f.close()
with open('history.db', "rb") as f:
data = f.readlines()
for line in data:
if line.rstrip() in fileList: #assuming fileList is a list of strings
#do everything else here
Please advise - I'm going to use this asa learning point. I'm a beginner.
I'm splitting a 25mb file into several smaller file.
A Kindly guru here gave me a Ruby sript. It works beautifully fast. So, in order to learn I mimicked it with a python script. This runs like a three-legged cat (slow). I wonder if anyone can tell me why?
My python script
##split a file into smaller files
###########################################
def splitlines (file) :
fileNo=0001
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
if re.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
fileNo +=1 #and add one to the filename, starting to read lines in again
else: # otherwise
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
outFile.write(line) ## then append it to the open outFile
fh.close()
The guru's Ruby 1.9 script
g=0001
f=File.open(g.to_s + ".txt","w")
open("corpus1.txt").each do |line|
if line[/\d+ of \d+ DOCUMENTS/]
f.close
f=File.open(g.to_s + ".txt","w")
g+=1
end
f.print line
end
There are many reasons why your script is slow -- the main reason being that you reopen the outputfile for almost every line you write. Since the old file gets implicitly closed on opening a new one (due to Python garbage collection), the write buffer is flushed for every single line you write, which is quite expensive.
A cleaned up and corrected version of your script would be
def file_generator():
file_no = 1
while True:
f = open(r"C:\Users\dunner7\Desktop\Textomics\Media"
r"\LexisNexus\ele\newdocs\%s.txt" % file_no, 'a')
yield f
f.close()
file_no += 1
def splitlines(filename):
files = file_generator()
out_file = next(files)
with open(filename) as in_file:
for line in in_file:
if "Copyright " in line:
out_file = next(files)
out_file.write(line)
out_file.close()
I guess the reason your script is so slow is that you open a new file descriptor for each line. If you look at your guru's ruby script, it closes and opens the output file only if your separator matches.
In contrast to that, your python script opens a new file descriptor for every line you read (and btw, does not close them). Opening a file requires talking to the kernel, so this is relatively slow.
Another change I would suggest is to change
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
to
fh = open(file, "r")
for line in fh:
With this change, you do not read the whole file into memory, but only block after block. Although it should not matter with a 25MiB file, it will hurt you with big files and is good practice (and less code ;)).
The Python code might be slow due to regex and not IO. Try
def splitlines (file) :
fileNo=0001
outFile=open("newdocs/%s.txt" % fileNo, 'a') ## open file to append
reg = re.compile("Copyright ")
for line in open(file, "r"):
if reg.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
outFile=open("newdocs%s.txt" % fileNo, 'a') ## open file to append
fileNo +=1 #and add one to the filename, starting to read lines in again
outFile.write(line) ## then append it to the open outFile
Several notes
Always use / instead of \ for path name
If regex is used repeatedly, compile it
Do you need re.search? or re.match?
UPDATE:
#Ed. S: point taken
#Winston Ewert: code updated to be closer to the original Ruby code
rosser,
Don't use names of built-in objects as identifiers in a code (file, splitlines)
The following code respects the effect of your own code: an out_file is closed without the line containing 'Copyright ' that constitutes the signal of closing
The use of the function writelines() is intended to obtain a faster execution than with a repetition of out_file.write(line)
The if li: block is there to trigger the closing of out_file in case the last line of the read file doesn't contains 'Copyright '
def splitfile(filename, wordstop, destrep, file_no = 1, li = []):
with open(filename) as in_file:
for line in in_file:
if wordstop in line:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)
file_no += 1
li = []
else:
li.append(line)
if li:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)