I am routinely missing the last few kb of a file I am trying to copy using shutil copyfile.
I did some research and do see someone asking about something similar here:
python shutil copy function missing last few lines
But I am using copyfile, which DOES seem to use a with statement...
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
So I am perplexed that more users aren't having this issue, if indeed it is some sort of buffering issue - I would think it'd be more well known.
I am calling copyfile very simply, don't think I could possibly be doing something wrong, essentially doing it the standard way I think:
copyfile(target_file_name,dest_file_name)
Yet I am missing the last 4kb or so of the file eachtime.
I have also not touched the copyfile function which gets called in shutil which is...
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
So I am at a loss, but I suppose I am about to learn something about flushing, buffering, or the with statement, or ... Help! thanks
to Anand:
Anand, I avoided mentioning that stuff bc it's my sense that it's not the problem, but since you asked... executive summary is that I am grabbing a file from an FTP, checking if the file is different from the last time I saved a copy, if so, downloading the file and saving a copy. It's circuitous spaghetti code and was written when I was a truly pure utilitarian novice of a coder I guess. It looks like:
for filename in ftp.nlst(filematch):
target_file_name = os.path.basename(filename)
with open(target_file_name ,'wb') as fhandle:
try:
ftp.retrbinary('RETR %s' % filename, fhandle.write)
the_files.append(target_file_name)
mtime = modification_date(target_file_name)
mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16] + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
sorted_xml_files.sort(key=os.path.getmtime)
last_file = sorted_xml_files[-1]
file_is_the_same = filecmp.cmp(target_file_name, last_file)
if not file_is_the_same:
print 'File changed!'
copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml')
else:
print 'File '+ last_file +' hasn\'t changed, doin nothin'
continue
The issue here would most probably be that , when executing the line -
ftp.retrbinary('RETR %s' % filename, fhandle.write)
This is using the fhandle.write() function to write the data from the ftp server to the file (with name - target_file_name) , but by the time you are calling -shutil.copyfile - the buffer for fhandle has not completely flushed, so you are missing out on some data when copying the file.
To make sure that this does not occur, you can either move the copyfile logic out of the with block for fhandle .
Or you can call fhandle.flush() to flush the buffer , before copying the file .
I believe it would be better to close the file (move the logic out of the with block). Example -
for filename in ftp.nlst(filematch):
target_file_name = os.path.basename(filename)
with open(target_file_name ,'wb') as fhandle:
ftp.retrbinary('RETR %s' % filename, fhandle.write)
the_files.append(target_file_name)
mtime = modification_date(target_file_name)
mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16] + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
sorted_xml_files.sort(key=os.path.getmtime)
last_file = sorted_xml_files[-1]
file_is_the_same = filecmp.cmp(target_file_name, last_file)
if not file_is_the_same:
print 'File changed!'
copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml')
else:
print 'File '+ last_file +' hasn\'t changed, doin nothin'
continue
You are trying to copy a file that was not closed. That's why buffers were not flushed. Move the copyfileobj out of the with block, to allow fhandle beeing closed.
Do:
with open(target_file_name ,'wb') as fhandle:
ftp.retrbinary('RETR %s' % filename, fhandle.write)
# and here the rest of your code
# so fhandle is closed, and file is stored completely on the disk
This looks like there is a better way to do nested withs:
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
I'd try something more like this. I'm far from an expert, hopefully someone more knowledgeable can lend some insight. My best thought is that the inner with closes before the outer one.
Related
Getting "0" output, when I am trying to use os.path.getsize()
Not sure what's wrong, using PyCharm, I see that the file was created and the "comments" were added to the file. But PyCharm shows the output "0" :(
Here is the code:
import os
def create_python_script(filename):
comments = "# Start of a new Python program"
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)
return(filesize)
print(create_python_script("program.py"))
Please, point what is the error I don't see.
You're getting the size 0, due to the peculiar behaviour of the write function.
When you call the write function, it writes the content to the internal buffer. An internal buffer is kept for performance constraints (to limit too frequent I/O calls).
So in this case, you can't ensure that the data/content has been actually dumped to the file on disk or not when you call the getsize function.
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)
In order to ensure that the content is dumped to the file before calling the getsize function, you can call flush method.
flush method clears the internal buffer and dumps all the content to the file on the disk.
with open(filename, "w") as file:
file.write(comments)
file.flush()
filesize = os.path.getsize(filename)
Or, a better way would be to first close the file and then call the getsize method.
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)
I want to do some changes in one file. For this purpose I am doing a temporary file where I write content with all wanted changes and at the end I try to replace the original file with this temp one.
Temp file is created and it looks like I expected, but replacing operation do not work.
This is my code which fails:
with tempfile.NamedTemporaryFile(mode='w', prefix=basename, dir=dirname, delete=False) as temp, open(file_path, 'r') as f:
for line in f:
temp.write(line + " test")
os.replace(temp.name, file_path)
but this gives me an error:
PermissionError: [WinError 32] The process cannot access the file
because it is being used by another process
Is my usage of 'replace' function is wrong?
your command os.replace(temp.name, file_path) has to be out of the with.
with tempfile.NamedTemporaryFile(mode='w', prefix=basename, dir=dirname, delete=False) as temp, open(file_path, 'r') as f:
for line in f:
temp.write(line + " test")
os.replace(temp.name, file_path)
When you are calling replace() inside 'with' the file is still open as you are still inside the scope of 'with'.
As soon as you're out of 'with' the file has now been closed and you can now replace with os.replace().
Try it.
with tempfile.NamedTemporaryFile(mode='w', prefix=basename, dir=dirname, delete=False) as temp, open(file_path, 'r') as f:
for line in f:
temp.write(line + " test")
os.replace(temp.name, file_path)
I need to read a local file and copy to remote location with FTP, I copy same file file.txt to remote location repeatedly hundreds of times with different names like f1.txt, f2.txt... f1000.txt etc. Now, is it necessary to always open, read, close my local file.txt for every single FTP copy or is there a way to store into a variable and use that all time and avoid file open, close functions. file.txt is small file of 6KB. Below is the code I am using
for i in range(1,101):
fname = 'file'+ str(i) +'.txt'
fp = open('file.txt', 'rb')
ftp.storbinary('STOR ' + fname, fp)
fp.close()
I tried reading into a string variable and replace fp but ftp.storbinary requires second argument to have method read(), please suggest if there is better way to avoid file open close or let me know if it has no performance improvement at all. I am using python 2.7.10 on Windows 7.
Simply open it before the loop, and close it after:
fp = open('file.txt', 'rb')
for i in range(1,101):
fname = 'file'+ str(i) +'.txt'
fp.seek(0)
ftp.storbinary('STOR ' + fname, fp)
fp.close()
Update Make sure you add fp.seek(0) before the call to ftp.storbinary, otherwise the read call will exhaust the file in the first iteration as noted by #eryksun.
Update 2 depending on the size of the file it will probably be faster to use BytesIO. This way the file content is saved in memory but will still be a file-like object (ie it will have a read method).
from io import BytesIO
with open('file.txt', 'rb') as f:
output = BytesIO()
output.write(f.read())
for i in range(1, 101):
fname = 'file' + str(i) + '.txt'
output.seek(0)
ftp.storbinary('STOR ' + fname, fp)
Is there a method of creating a text file without opening a text file in "w" or "a" mode? For instance If I wanted to open a file in "r" mode but the file does not exist then when I catch IOError I want a new file to be created
e.g.:
while flag == True:
try:
# opening src in a+ mode will allow me to read and append to file
with open("Class {0} data.txt".format(classNo),"r") as src:
# list containing all data from file, one line is one item in list
data = src.readlines()
for ind,line in enumerate(data):
if surname.lower() and firstName.lower() in line.lower():
# overwrite the relevant item in data with the updated score
data[ind] = "{0} {1}\n".format(line.rstrip(),score)
rewrite = True
else:
with open("Class {0} data.txt".format(classNo),"a") as src:
src.write("{0},{1} : {2}{3} ".format(surname, firstName, score,"\n"))
if rewrite == True:
# reopen src in write mode and overwrite all the records with the items in data
with open("Class {} data.txt".format(classNo),"w") as src:
src.writelines(data)
flag = False
except IOError:
print("New data file created")
# Here I want a new file to be created and assigned to the variable src so when the
# while loop iterates for the second time the file should successfully open
At the beginning just check if the file exists and create it if it doesn't:
filename = "Class {0} data.txt"
if not os.path.isfile(filename):
open(filename, 'w').close()
From this point on you can assume the file exists, this will greatly simplify your code.
No operating system will allow you to create a file without actually writing to it. You can encapsulate this in a library so that the creation is not visible, but it is impossible to avoid writing to the file system if you really want to modify the file system.
Here is a quick and dirty open replacement which does what you propose.
def open_for_reading_create_if_missing(filename):
try:
handle = open(filename, 'r')
except IOError:
with open(filename, 'w') as f:
pass
handle = open(filename, 'r')
return handle
Better would be to create the file if it doesn't exist, e.g. Something like:
import sys, os
def ensure_file_exists(file_name):
""" Make sure that I file with the given name exists """
(the_dir, fname) = os.path.split(file_name)
if not os.path.exists(the_dir):
sys.mkdirs(the_dir) # This may give an exception if the directory cannot be made.
if not os.path.exists(file_name):
open(file_name, 'w').close()
You could even have a safe_open function that did something similar prior to opening for read and returning the file handle.
The sample code provided in the question is not very clear, specially because it invokes multiple variables that are not defined anywhere. But based on it here is my suggestion. You can create a function similar to touch + file open, but which will be platform agnostic.
def touch_open( filename):
try:
connect = open( filename, "r")
except IOError:
connect = open( filename, "a")
connect.close()
connect = open( filename, "r")
return connect
This function will open the file for you if it exists. If the file doesn't exist it will create a blank file with the same name and the open it. An additional bonus functionality with respect to import os; os.system('touch test.txt') is that it does not create a child process in the shell making it faster.
Since it doesn't use the with open(filename) as src syntax you should either remember to close the connection at the end with connection = touch_open( filename); connection.close() or preferably you could open it in a for loop. Example:
file2open = "test.txt"
for i, row in enumerate( touch_open( file2open)):
print i, row, # print the line number and content
This option should be preferred to data = src.readlines() followed by enumerate( data), found in your code, because it avoids looping twice through the file.
I am trying to upload file from windows server to a unix server (basically trying to do FTP). I have used the code below
#!/usr/bin/python
import ftplib
import os
filename = "MyFile.py"
ftp = ftplib.FTP("xx.xx.xx.xx")
ftp.login("UID", "PSW")
ftp.cwd("/Unix/Folder/where/I/want/to/put/file")
os.chdir(r"\\windows\folder\which\has\file")
ftp.storbinary('RETR %s' % filename, open(filename, 'w').write)
I am getting the following error:
Traceback (most recent call last):
File "Windows\folder\which\has\file\MyFile.py", line 11, in <module>
ftp.storbinary('RETR %s' % filename, open(filename, 'w').write)
File "windows\folder\Python\lib\ftplib.py", line 466, in storbinary
buf = fp.read(blocksize)
AttributeError: 'builtin_function_or_method' object has no attribute 'read'
Also all contents of MyFile.py got deleted .
Can anyone advise what is going wrong.I have read that ftp.storbinary is used for uploading files using FTP.
If you are trying to store a non-binary file (like a text file) try setting it to read mode instead of write mode.
ftp.storlines("STOR " + filename, open(filename, 'rb'))
for a binary file (anything that cannot be opened in a text editor) open your file in read-binary mode
ftp.storbinary("STOR " + filename, open(filename, 'rb'))
also if you plan on using the ftp lib you should probably go through a tutorial, I'd recommend this article from effbot.
Combined both suggestions. Final answer being
#!/usr/bin/python
import ftplib
import os
filename = "MyFile.py"
ftp = ftplib.FTP("xx.xx.xx.xx")
ftp.login("UID", "PSW")
ftp.cwd("/Unix/Folder/where/I/want/to/put/file")
os.chdir(r"\\windows\folder\which\has\file")
myfile = open(filename, 'r')
ftp.storlines('STOR ' + filename, myfile)
myfile.close()
try making the file an object, so you can close it at the end of the operaton.
myfile = open(filename, 'w')
ftp.storbinary('RETR %s' % filename, myfile.write)
and at the end of the transfer
myfile.close()
this might not solve the problem, but it may help.
ftplib supports the use of context managers so you can make it even simpler as such
with ftplib.FTP('ftp_address', 'user', 'pwd') as ftp, open(file_path, 'rb') as file:
ftp.storbinary(f'STOR {file_path.name}', file)
...
This way you are robust against both file and ftp issues without having to insert try/except/finally blocks. And well, it's pythonic.
PS: since it uses f-strings is python >= 3.6 only but can easily be modified to use the old .format() syntax