How to check if a text file content in Python is empty? - python

I want to check if a text file content has data inside before running some functions. so I use the following Python code
if os.path.exists("userInformation.txt") and os.path.getsize("userInformation.txt") > 0:
with open("userInformation.txt") as info:
contents = info.readlines()
else:
new_file = open("userInformation.txt", "w")
new_file.close()
if file doesn't exist or nothing inside it works fine. However, if a file contains newline, which will be 2 bytes. Then will cause IndexError: list index out of range, after I run some function
I know i can use try, except to catch that index error. Is there any other way to it?
Thanks.

You should use fewer conditions and more idempotent logic:
from pathlib import Path
p = Path('userInformation.txt')
p.touch()
with p.open() as f:
contents = [l for l in f.readlines() if l.strip()]
Path.touch ensures the file exists, because that is your goal either way. Testing each line with str.strip ensures empty lines, including lines with only a linebreak, are omitted. This uses minimal calls to the filesystem and not a single if..else branch to achieve your goal.

you have 2 easy ways to check the file size:
os.path.getsize("sample.txt")
os.stat('sample.txt').st_size
warning - both will return an error if the file doesn't exists
so-
import os
def is_non_zero_file(file_path):
return os.path.isfile(file_path) and os.path.getsize(file_path) > 0
if is_non_zero_file(file_path):
with open(file_path, "r") as f:
data = f.read()
else:
#create file
with open(file_path, "w") as f:
f.write("data")

Related

Undesired deletion of temporaly files

I am try to create some temporal files and make some operations on them inside a loop. Then I will access the information on all of the temporal files. And do some operations with that information. For simplicity I brought the following code that reproduces my issue:
import tempfile
tmp_files = []
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt")
with open(tmp.name, "w") as f:
f.write(str(i))
tmp_files.append(tmp.name)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
ERROR:
with open(tmp_file, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpynh0kbnw.txt'
When I look on /tmp directory (with some time.sleep(2) on the loop) I see that the file is deleted and only one is preserved. And for that the error.
Of course I could handle to keep all the files with the flag tempfile.NamedTemporaryFile(suffix=".txt", delete=False). But that is not the idea. I would like to hold the temporal files just for the running time of the script. I also could delete the files with os.remove. But my question is more why this happen. Because I expected that the files hold to the end of the running. Because I don't close the file on the execution (or do I?).
A lot of thanks in advance.
tdelaney does already answer your actual question.
I just would like to offer you an alternative to NamedTemporaryFile. Why not creating a temporary folder which is removed (with all files in it) at the end of the script?
Instead of using a NamedTemporaryFile, you could use tempfile.TemporaryDirectory. The directory will be deleted when closed.
The example below uses the with statement which closes the file handle automatically when the block ends (see John Gordon's comment).
import os
import tempfile
with tempfile.TemporaryDirectory() as temp_folder:
tmp_files = []
for i in range(40):
tmp_file = os.path.join(temp_folder, f"{i}.txt")
with open(tmp_file, "w") as f:
f.write(str(i))
tmp_files.append(tmp_file)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
By default, a NamedTemporaryFile deletes its file when closed. its a bit subtle, but tmp = tempfile.NamedTemporaryFile(suffix=".txt") in the loop causes the previous file to be deleted when tmp is reassigned. One option is to use the delete=False parameter. Or, just keep the file open and seek to the beginning after the write.
NamedTemporaryFile is already a file object - you can write to it directly without reopening. Just make sure the mode is "write plus" and in text, not binary mode. Put the code an a try/finally block to make sure the files are really deleted at the end.
import tempfile
tmp_files = []
try:
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt", mode="w+")
tmp.write(str(i))
tmp.seek(0)
tmp_files.append(tmp)
string = ""
for tmp_file in tmp_files:
data = tmp_file.read()
string += data
finally:
for tmp_file in tmp_files:
tmp_file.close()
print(string)

read data from multiple files but would like to write that data into a new text file but file shows up blank

the code reads from multiple text files so far i have it to display on the terminal but i would like to have the info written into a text file but the text file shows up blank and dont know why new to python so still haven't figured out all the commands.
directory = 'C:\Assignments\\CPLfiles\*'
test = False
start_text = '^GMWE'
for filename in glob.glob(directory):
with open(filename) as f:
with open('file.txt', 'w') as f1:
for line in f:
#for x in line:
if test is False:
if re.search(start_text, line.strip()) is not None:
x = line.strip()
f1.write(x+ '\n')
print(x)
break
test = False
I think you should change the order of opening files to the following.
The problem is that for each file you open to read, you're also re-opening the file to write, whipping it's contents.
Also, due to the break you will write at maximum one line per file due to the break after the write statement.
If the last file that you opened does not have any match with the regular expression, then nothing will exist in the final file.
Hope it makes sense
directory = 'C:\Assignments\\CPLfiles\*'
test = False
start_text = '^GMWE'
with open('file.txt', 'w') as f1:
for filename in glob.glob(directory):
with open(filename) as f:
for line in f:
#for x in line:
if test is False:
if re.search(start_text, line.strip()) is not None:
x = line.strip()
f1.write(x+ '\n')
print(x)
break
test = False
I think that the main problem here is that you reopen file.txt for each file in you globbing. Each time opening it in write mode erases the file. If no line match in the last file you will end up with an empty file as a result. So your loop should be inside your with that opens this file.

How to edit a file in python without a temporary sacrificial file? [duplicate]

I'm using Python, and would like to insert a string into a text file without deleting or copying the file. How can I do that?
Unfortunately there is no way to insert into the middle of a file without re-writing it. As previous posters have indicated, you can append to a file or overwrite part of it using seek but if you want to add stuff at the beginning or the middle, you'll have to rewrite it.
This is an operating system thing, not a Python thing. It is the same in all languages.
What I usually do is read from the file, make the modifications and write it out to a new file called myfile.txt.tmp or something like that. This is better than reading the whole file into memory because the file may be too large for that. Once the temporary file is completed, I rename it the same as the original file.
This is a good, safe way to do it because if the file write crashes or aborts for any reason, you still have your untouched original file.
Depends on what you want to do. To append you can open it with "a":
with open("foo.txt", "a") as f:
f.write("new line\n")
If you want to preprend something you have to read from the file first:
with open("foo.txt", "r+") as f:
old = f.read() # read everything in the file
f.seek(0) # rewind
f.write("new line\n" + old) # write the new line before
The fileinput module of the Python standard library will rewrite a file inplace if you use the inplace=1 parameter:
import sys
import fileinput
# replace all occurrences of 'sit' with 'SIT' and insert a line after the 5th
for i, line in enumerate(fileinput.input('lorem_ipsum.txt', inplace=1)):
sys.stdout.write(line.replace('sit', 'SIT')) # replace 'sit' and write
if i == 4: sys.stdout.write('\n') # write a blank line after the 5th line
Rewriting a file in place is often done by saving the old copy with a modified name. Unix folks add a ~ to mark the old one. Windows folks do all kinds of things -- add .bak or .old -- or rename the file entirely or put the ~ on the front of the name.
import shutil
shutil.move(afile, afile + "~")
destination= open(aFile, "w")
source= open(aFile + "~", "r")
for line in source:
destination.write(line)
if <some condition>:
destination.write(<some additional line> + "\n")
source.close()
destination.close()
Instead of shutil, you can use the following.
import os
os.rename(aFile, aFile + "~")
Python's mmap module will allow you to insert into a file. The following sample shows how it can be done in Unix (Windows mmap may be different). Note that this does not handle all error conditions and you might corrupt or lose the original file. Also, this won't handle unicode strings.
import os
from mmap import mmap
def insert(filename, str, pos):
if len(str) < 1:
# nothing to insert
return
f = open(filename, 'r+')
m = mmap(f.fileno(), os.path.getsize(filename))
origSize = m.size()
# or this could be an error
if pos > origSize:
pos = origSize
elif pos < 0:
pos = 0
m.resize(origSize + len(str))
m[pos+len(str):] = m[pos:origSize]
m[pos:pos+len(str)] = str
m.close()
f.close()
It is also possible to do this without mmap with files opened in 'r+' mode, but it is less convenient and less efficient as you'd have to read and temporarily store the contents of the file from the insertion position to EOF - which might be huge.
As mentioned by Adam you have to take your system limitations into consideration before you can decide on approach whether you have enough memory to read it all into memory replace parts of it and re-write it.
If you're dealing with a small file or have no memory issues this might help:
Option 1)
Read entire file into memory, do a regex substitution on the entire or part of the line and replace it with that line plus the extra line. You will need to make sure that the 'middle line' is unique in the file or if you have timestamps on each line this should be pretty reliable.
# open file with r+b (allow write and binary mode)
f = open("file.log", 'r+b')
# read entire content of file into memory
f_content = f.read()
# basically match middle line and replace it with itself and the extra line
f_content = re.sub(r'(middle line)', r'\1\nnew line', f_content)
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content
f.truncate()
# re-write the content with the updated content
f.write(f_content)
# close file
f.close()
Option 2)
Figure out middle line, and replace it with that line plus the extra line.
# open file with r+b (allow write and binary mode)
f = open("file.log" , 'r+b')
# get array of lines
f_content = f.readlines()
# get middle line
middle_line = len(f_content)/2
# overwrite middle line
f_content[middle_line] += "\nnew line"
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content
f.truncate()
# re-write the content with the updated content
f.write(''.join(f_content))
# close file
f.close()
Wrote a small class for doing this cleanly.
import tempfile
class FileModifierError(Exception):
pass
class FileModifier(object):
def __init__(self, fname):
self.__write_dict = {}
self.__filename = fname
self.__tempfile = tempfile.TemporaryFile()
with open(fname, 'rb') as fp:
for line in fp:
self.__tempfile.write(line)
self.__tempfile.seek(0)
def write(self, s, line_number = 'END'):
if line_number != 'END' and not isinstance(line_number, (int, float)):
raise FileModifierError("Line number %s is not a valid number" % line_number)
try:
self.__write_dict[line_number].append(s)
except KeyError:
self.__write_dict[line_number] = [s]
def writeline(self, s, line_number = 'END'):
self.write('%s\n' % s, line_number)
def writelines(self, s, line_number = 'END'):
for ln in s:
self.writeline(s, line_number)
def __popline(self, index, fp):
try:
ilines = self.__write_dict.pop(index)
for line in ilines:
fp.write(line)
except KeyError:
pass
def close(self):
self.__exit__(None, None, None)
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
with open(self.__filename,'w') as fp:
for index, line in enumerate(self.__tempfile.readlines()):
self.__popline(index, fp)
fp.write(line)
for index in sorted(self.__write_dict):
for line in self.__write_dict[index]:
fp.write(line)
self.__tempfile.close()
Then you can use it this way:
with FileModifier(filename) as fp:
fp.writeline("String 1", 0)
fp.writeline("String 2", 20)
fp.writeline("String 3") # To write at the end of the file
If you know some unix you could try the following:
Notes: $ means the command prompt
Say you have a file my_data.txt with content as such:
$ cat my_data.txt
This is a data file
with all of my data in it.
Then using the os module you can use the usual sed commands
import os
# Identifiers used are:
my_data_file = "my_data.txt"
command = "sed -i 's/all/none/' my_data.txt"
# Execute the command
os.system(command)
If you aren't aware of sed, check it out, it is extremely useful.

Returning file as list of lines

I want to add all lines of content in the file to a list, but the returned list is empty.
def phone_num():
item = []
with open("test.txt", "r") as f:
for i in f.readlines():
item.append(i.strip('\n'))
return item
phone_num()
You can make this method quite concise by taking advantage of list comprehensions:
def phone_num():
with open('test.txt', 'r') as f:
return [line.strip('\n') for line in f]
Also consider using 'rU' as your open mode to enable Universal Newline Support.
Not sure why your result didn't work but it is likely because the file doesn't actually have the contents you want. Try running open("test.txt", "r").read() and ensuring you get output. Since this is a local file you may not be reading from the directory you think you are. Try this:
import os
os.getcwd()
Ensure that this matches the directory where you think your file should is.
This works for me:
def phone_num():
item = []
with open("test.txt", "r") as f:
for i in f:
item.append(i.strip('\n'))
return item
thelist = phone_num()
print (thelist)
If my test.txt file has abc\n efg\n hij\n it stuffs those all into a list

Slow python file I:O; Ruby runs better than this; Got the wrong language?

Please advise - I'm going to use this asa learning point. I'm a beginner.
I'm splitting a 25mb file into several smaller file.
A Kindly guru here gave me a Ruby sript. It works beautifully fast. So, in order to learn I mimicked it with a python script. This runs like a three-legged cat (slow). I wonder if anyone can tell me why?
My python script
##split a file into smaller files
###########################################
def splitlines (file) :
fileNo=0001
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
if re.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
fileNo +=1 #and add one to the filename, starting to read lines in again
else: # otherwise
outFile=open("C:\\Users\\dunner7\\Desktop\\Textomics\\Media\\LexisNexus\\ele\\newdocs\%s.txt" % fileNo, 'a') ## open file to append
outFile.write(line) ## then append it to the open outFile
fh.close()
The guru's Ruby 1.9 script
g=0001
f=File.open(g.to_s + ".txt","w")
open("corpus1.txt").each do |line|
if line[/\d+ of \d+ DOCUMENTS/]
f.close
f=File.open(g.to_s + ".txt","w")
g+=1
end
f.print line
end
There are many reasons why your script is slow -- the main reason being that you reopen the outputfile for almost every line you write. Since the old file gets implicitly closed on opening a new one (due to Python garbage collection), the write buffer is flushed for every single line you write, which is quite expensive.
A cleaned up and corrected version of your script would be
def file_generator():
file_no = 1
while True:
f = open(r"C:\Users\dunner7\Desktop\Textomics\Media"
r"\LexisNexus\ele\newdocs\%s.txt" % file_no, 'a')
yield f
f.close()
file_no += 1
def splitlines(filename):
files = file_generator()
out_file = next(files)
with open(filename) as in_file:
for line in in_file:
if "Copyright " in line:
out_file = next(files)
out_file.write(line)
out_file.close()
I guess the reason your script is so slow is that you open a new file descriptor for each line. If you look at your guru's ruby script, it closes and opens the output file only if your separator matches.
In contrast to that, your python script opens a new file descriptor for every line you read (and btw, does not close them). Opening a file requires talking to the kernel, so this is relatively slow.
Another change I would suggest is to change
fh = open(file, "r") ## open the file for reading
mylines = fh.readlines() ### read in lines
for line in mylines: ## for each line
to
fh = open(file, "r")
for line in fh:
With this change, you do not read the whole file into memory, but only block after block. Although it should not matter with a 25MiB file, it will hurt you with big files and is good practice (and less code ;)).
The Python code might be slow due to regex and not IO. Try
def splitlines (file) :
fileNo=0001
outFile=open("newdocs/%s.txt" % fileNo, 'a') ## open file to append
reg = re.compile("Copyright ")
for line in open(file, "r"):
if reg.search("Copyright ", line): # if the line is equal to the regex
outFile.close() ## close the file
outFile=open("newdocs%s.txt" % fileNo, 'a') ## open file to append
fileNo +=1 #and add one to the filename, starting to read lines in again
outFile.write(line) ## then append it to the open outFile
Several notes
Always use / instead of \ for path name
If regex is used repeatedly, compile it
Do you need re.search? or re.match?
UPDATE:
#Ed. S: point taken
#Winston Ewert: code updated to be closer to the original Ruby code
rosser,
Don't use names of built-in objects as identifiers in a code (file, splitlines)
The following code respects the effect of your own code: an out_file is closed without the line containing 'Copyright ' that constitutes the signal of closing
The use of the function writelines() is intended to obtain a faster execution than with a repetition of out_file.write(line)
The if li: block is there to trigger the closing of out_file in case the last line of the read file doesn't contains 'Copyright '
def splitfile(filename, wordstop, destrep, file_no = 1, li = []):
with open(filename) as in_file:
for line in in_file:
if wordstop in line:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)
file_no += 1
li = []
else:
li.append(line)
if li:
with open(destrep+str(file_no)+'.txt','w') as f:
f.writelines(li)

Categories