Windows - file opened by another process, still can rename it in Python - python

On Windows OS, just before doing some actions on my file, I need to know if it's in use by another process. After some serious research over all the other questions with a similar problem, I wasn't able to find a working solution for it.
os.rename(my_file.csv, my_file.csv) is still working even if I have the file opened with ... notepad let's say.
psutil ... it took too much time and it doesn't work (can't find my file path in nt.path:
for proc in psutil.process_iter():
try:
flist = proc.open_files()
if flist:
for nt in flist:
if my_file_path == nt.path:
print("it's here")
except psutil.NoSuchProcess as err:
print(err)
Is there any other solution for this?
UPDATE 1
I have to do 2 actions on this file: 1. check if the filename corresponds to a pattern; 2. copy it over SFTP.
UPDATE 2 + solution
Thanks to #Eryk Sun, I found out that Notepad "reads the contents into memory and then closes the handle". After opening my file with Word, os.rename and psutil are working like a (py)charm.

If the program You use opens the file by importing it (like Excel would do it, for example), that means that it transforms Your data in a readable form for itself, without keeping a hand on the actual file afterwards. If You save the file from there, it either saves it in the programs own format or exports (and transforms) the file back.
What dou You want to do with the file? Maybe You can simply copy the file?

Related

How to check that file is saved to hard during TCP sending?

I send many files over TCP from PC(windows) to Server(Linux).
When I process files on server sometimes I get error, since file is corrupted or has zero size, because it is still undergoes 'saving' to hard disc.
I process files in python, grabbing like this:
file_list = sorted(glob('*.bin'))
for file in file_list:
file_size = os.path.getsize(file)
if file_size > min_file_size:
do_process(file)
How to make it in proper way, i.e make sure, that file is ok.
I cant choose right min_file_size, since files have different sizes..
May be I should copy it to another folder ant then process them?
** I'm using SCP to copy files. So on the server side how can I be sure(some linux hints), that file is ok, to move it to directory, which will be processing? Sometimes by typing ls I see files, whch is not fully sent yet.. so how can I rename them?
You can use the fuser command to check whether the file is currently being accessed by any process, as follows:
import subprocess
...
file_list = sorted(glob('*.bin'))
for file in file_list:
result = subprocess.run(['fuser','--silent',file])
if result.returncode != 0:
do_process(file)
The fuser command will terminate with a non-0 return code if the file is not being accessed.
This has nothing to do with TCP. You are basically asking how to synchronize two processes in a way that if one writes the file the other will only use it once it is completely written and closed by the other.
One common way is to let the first process (writer) use a temporary file name which is not expected by the second process (reader) and to rename the file to the expected one after the file was closed. Other ways involve using file locking. One can also have a communication between the two processes (like a pipe or socketpair) which is used to explicitly inform the reader once the writer has finished and which file was written.

Strange race condition: FileNotFoundError with mkdtemp

I am getting a rather bizarre race condition in Mac OS X with Python (I've only tested Python 3.3). I am making several temporary directories, writing things to them, and then clearing them. Something along the lines of
while running:
(do something)
tempdir = mkdtemp('name')
try:
(write some stuff to tempdir)
finally:
shutil.rmtree(tempdir)
However, in some of the later loops of the (write some stuff to tempdir), I get errors like
with open(os.path.join("/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmp0fh2ztname", "file"), 'w', encoding='utf-8') as fn:
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpups5dpname/file'
(I've inlined the temp dir path for clarity)
Notice how the path being opened is not the same as the path that it can't find. In each case, the path in the error message is the temporary directory from the previous iteration of the loop.
The error is reproducible most of the time in the same place (after about the fourth iteration), but not every time.
EDIT: I just realized this is probably relevant. The (write some stuff to tempdir) stuff actually happens in a subprocess. This is how I am sure about the tempdir path, I have to pass it on to the subprocess (I actually lied about the "clarity" bit, I am actually writing out a Python file with that exact with open line). This is how I know for sure that the tempdir path is indeed different from the one being used.
I figured it out. It turns out it has nothing to do with mkdtemp (a sigh of relief that Mac OS X and Python are doing the right things there).
The problem is that I was writing out the code to a file, including the with open(os.path.join("/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmp0fh2ztname", "file"), 'w', encoding='utf-8') as fn: bit, and running it in a subprocess. The issue was that I was using the same file each time, and the .pyc files were not being invalidated correctly.
The error message was confusing because when Python generates a traceback, it reads the .py file (where the actual code is), but what is actually run is the .pyc file.
If I understand http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html correctly, the timestamps in .pyc files ony have one second granularity (this explains why this was reproducible in the same place each time: the same fourth item in the loop ran in under a second).
The solution was to explicitly delete the .pyc files when writing out the file (in other circumstances you could also write out to a temp file itself, but in my case I needed the file to be importable under the same name).
Something along the lines of
if sys.version_info >= (3,):
os.unlink(os.path.join(path_to_file, '__pycache__', 'file.cpython-%s%s.pyc' % sys.version_info[:2]))
os.unlink(os.path.join(path_to_file, '__pycache__', 'file.cpython-%s%s.pyo' % sys.version_info[:2]))
else:
os.unlink(os.path.join(path_to_file, 'file.pyc'))
os.unlink(os.path.join(path_to_file, 'file.pyo'))

Python: Detect directory that cannot be detected in Windows 7

I am trying to write a detector that checks if a certain directory can be deleted using shutil.rmtree. I have a partial code finished as below that now works partial.
This code is now able to gives warning when any .exe files under the target folder is still running. But, this code is not yet able to flag warnings if any particular file under a folder is opened by an editor (which is another cause that makes a directory not deletable). Any guidance will be appreciated. Thanks in advance
Note: I've used open method to check for any locked file.
def list_locked_files(dir):
isLocked = False
for name in os.listdir(dir):
uni_name = unicode(name)
fullname = dir + u'/' + uni_name
if os.path.isdir(fullname):
list_locked_files(fullname)
else:
try:
f = open(fullname, 'r+')
f.close()
except IOError:
print fullname + u' is locked!'
isLocked = True
if isLocked is True:
print u'Please close the files/dir above !'
sys.exit(0)
It is not necessarily possible to determine whether a file deletion will succeed or fail on Windows. The file could be opened in a fully permissive share mode which means another attempt to open the file will succeed (no matter what kind of access you request).
The only way to tell whether a file can be deleted is to actually try it.
Even if there were an accurate way to tell beforehand, once you get the information it is instantly out of date. For example, after you call list_locked_files, a program could open another file in that directory which would cause rmtree() to fail.

Reading updated files on the fly in Python

I'm writing two Python scripts that both parse files. One is a standard unix logfile and the other is a binary file. I'm trying to figure out the best way to monitor these so I can read data as soon as they're updated. Most of the solutions I've found thus far are linux specific, but I need this to work in FreeBSD.
Obviously one approach would be to just run my script every X amount of time, but this seems gross and inefficient. If I want my Python app running continuously in the background monitoring a file and acting on it once it's changed/updated, what's my best bet?
Have you tried KQueue events?
http://docs.python.org/library/select.html#kqueue-objects
kqueue is the FreeBSD / OS version of inotify (file change notification service). I haven't used this, but I think it's what you want.
I once did to make a sort of daemon process for a parser built in Python. I needed to watch a series of files and process them in Python, and it had to be a truly multi-OS solution (Windows & Linux in this case). I wrote a program that watches over a list of files by checking their modification time. The program sleeps for a while and then checks the modification time of the files being watched. If the modification time is newer than the one previously registered, then the file has changed and so stuff has to be done with this file.
Something like this:
import os
import time
path = os.path.dirname(__file__)
print "Looking for files in", path, "..."
# get interesting files
files = [{"file" : f} for f in os.listdir(path) if os.path.isfile(f) and os.path.splitext(f)[1].lower() == ".src"]
for f in files:
f["output"] = os.path.splitext(f["file"])[0] + ".out"
f["modtime"] = os.path.getmtime(f["file"]) - 10
print " watching:", f["file"]
while True:
# sleep for a while
time.sleep(0.5)
# check if anything changed
for f in files:
# is mod time of file is newer than the one registered?
if os.path.getmtime(f["file"]) > f["modtime"]:
# store new time and...
f["modtime"] = os.path.getmtime(f["file"])
print f["file"], "has changed..."
# do your stuff here
It does not look like top notch code, but it works quite well.
There are other SO questions about this, but I don't know if they'll provide a direct answer to your question:
How to implement a pythonic equivalent of tail -F?
How do I watch a file for changes?
How can I "watch" a file for modification / change?
Hope this helps!

Delete file from zipfile with the ZipFile Module

The only way I came up for deleting a file from a zipfile was to create a temporary zipfile without the file to be deleted and then rename it to the original filename.
In python 2.4 the ZipInfo class had an attribute file_offset, so it was possible to create a second zip file and copy the data to other file without decompress/recompressing.
This file_offset is missing in python 2.6, so is there another option than creating another zipfile by uncompressing every file and then recompressing it again?
Is there maybe a direct way of deleting a file in the zipfile, I searched and didn't find anything.
The following snippet worked for me (deletes all *.exe files from a Zip archive):
zin = zipfile.ZipFile ('archive.zip', 'r')
zout = zipfile.ZipFile ('archve_new.zip', 'w')
for item in zin.infolist():
buffer = zin.read(item.filename)
if (item.filename[-4:] != '.exe'):
zout.writestr(item, buffer)
zout.close()
zin.close()
If you read everything into memory, you can eliminate the need for a second file. However, this snippet recompresses everything.
After closer inspection the ZipInfo.header_offset is the offset from the file start. The name is misleading, but the main Zip header is actually stored at the end of the file. My hex editor confirms this.
So the problem you'll run into is the following: You need to delete the directory entry in the main header as well or it will point to a file that doesn't exist anymore. Leaving the main header intact might work if you keep the local header of the file you're deleting as well, but I'm not sure about that. How did you do it with the old module?
Without modifying the main header I get an error "missing X bytes in zipfile" when I open it. This might help you to find out how to modify the main header.
Not very elegant but this is how I did it:
import subprocess
import zipfile
z = zipfile.ZipFile(zip_filename)
files_to_del = filter( lambda f: f.endswith('exe'), z.namelist()]
cmd=['zip', '-d', zip_filename] + files_to_del
subprocess.check_call(cmd)
# reload the modified archive
z = zipfile.ZipFile(zip_filename)
The routine delete_from_zip_file from ruamel.std.zipfile¹ allows you to delete a file based on its full path within the ZIP, or based on (re) patterns. E.g. you can delete all of the .exe files from test.zip using
from ruamel.std.zipfile import delete_from_zip_file
delete_from_zip_file('test.zip', pattern='.*.exe')
(please note the dot before the *).
This works similar to mdm's solution (including the need for recompression), but recreates the ZIP file in memory (using the class InMemZipFile()), overwriting the old file after it is fully read.
¹ Disclaimer: I am the author of that package.
Based on Elias Zamaria comment to the question.
Having read through Python-Issue #51067, I want to give update regarding it.
For today, solution already exists, though it is not approved by Python due to missing Contributor Agreement from the author.
Nevertheless, you can take the code from https://github.com/python/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py and create a separate file from it. After that just reference it from your project instead of built-in python library: import myproject.zipfile as zipfile.
Usage:
with zipfile.ZipFile(f"archive.zip", "a") as z:
z.remove(f"firstfile.txt")
I believe it will be included in future python versions. For me it works like a charm for given use case.

Categories