Use a timeout to prevent deadlock when opening a file in Python? - python

I need to open a file which is NFS mounted to my server. Sometimes, the NFS mount fails in a manner that causes all file operations to deadlock. In order to prevent this, I need a way to let the open function in python time out after a set period. E.g. something like open('/nfsdrive/foo', timeout=5). Of course, the default open procedure has no timeout or similar keyword.
Does anyone here know of a way to effectively stop trying to open a (local) file if the opening takes too long?
Note: I've already tried the urllib2 module, but it's timeout options only work for web requests, not local ones.

You can try using stopit
from stopit import SignalTimeout as Timeout
with Timeout(5.0) as timeout_ctx:
with open('/nfsdrive/foo', 'r') as f:
# do something with f
pass
There may be some issues with SignalTimeout in multithreaded environments (like Django). ThreadingTimeout on the other hand may cause problems with resources on some virtual hostings when you run too many "time-limited" functions
P.S. My example also limits processing time of opened file. To only limit file opening you should use different approach with manual file opening/closing and manual exception handling

Related

How to detect files in a directory if the files have finished copying/adding? [duplicate]

Files are being pushed to my server via FTP. I process them with PHP code in a Drupal module. O/S is Ubuntu and the FTP server is vsftp.
At regular intervals I will check for new files, process them with SimpleXML and move them to a "Done" folder. How do I avoid processing a partially uploaded file?
vsftp has lock_upload_files defaulted to yes. I thought of attempting to move the files first, expecting the move to fail on a currently uploading file. That doesn't seem to happen, at least on the command line. If I start uploading a large file and move, it just keeps growing in the new location. I guess the directory entry is not locked.
Should I try fopen with mode 'a' or 'r+' just to see if it succeeds before attempting to load into SimpleXML or is there a better way to do this? I guess I could just detect SimpleXML load failing but... that seems messy.
I don't have control of the sender. They won't do an upload and rename.
Thanks
Using the lock_upload_files configuration option of vsftpd leads to locking files with the fcntl() function. This places advisory lock(s) on uploaded file(s) which are in progress. Other programs don't need to consider advisory locks, and mv for example does not. Advisory locks are in general just an advice for programs that care about such locks.
You need another command line tool like lockrun which respects advisory locks.
Note: lockrun must be compiled with the WAIT_AND_LOCK(fd) macro to use the lockf() and not the flock() function in order to work with locks that are set by fcntl() under Linux. So when lockrun is compiled with using lockf() then it will cooperate with the locks set by vsftpd.
With such features (lockrun, mv, lock_upload_files) you can build a shell script or similar that moves files one by one, checking if the file is locked beforehand and holding an advisory lock on it as long as the file is moved. If the file is locked by vsftpd then lockrun can skip the call to mv so that running uploads are skipped.
If locking doesn't work, I don't know of a solution as clean/simple as you'd like. You could make an educated guess by not processing files whose last modified time (which you can get with filemtime()) is within the past x minutes.
If you want a higher degree of confidence than that, you could check and store each file's size (using filesize()) in a simple database, and every x minutes check new size against its old size. If the size hasn't changed in x minutes, you can assume nothing more is being sent.
The lsof linux command lists opened files on your system. I suggest executing it with shell_exec() from PHP and parsing the output to see what files are still being used by your FTP server.
Picking up on the previous answer, you could copy the file over and then compare the sizes of the copied file and the original file at a fixed interval.
If the sizes match, the upload is done, delete the copy, work with the file.
If the sizes do not match, copy the file again.
repeat.
Here's another idea: create a super (but hopefully not root) FTP user that can access some or all of the upload directories. Instead of your PHP code reading uploaded files right off the disk, make it connect to the local FTP server and download files. This way vsftpd handles the locking for you (assuming you leave lock_upload_files enabled). You'll only be able to download a file once vsftp releases the exclusive/write lock (once writing is complete).
You mentioned trying flock in your comment (and how it fails). It does indeed seem painful to try to match whatever locking vsftpd is doing, but dio_fcntl might be worth a shot.
I guess you've solved your problem years ago but still.
If you use some pattern to find the files you need you can ask the party uploading the file to use different name and rename the file once the upload has completed.
You should check the Hidden Stores in proftp, more info here:
http://www.proftpd.org/docs/directives/linked/config_ref_HiddenStores.html

Intentionally cause a read/write timeout?

I'm trying to test some file io and I was wondering if there's a way to emulate the following situation:
I have a block-storage device that is constantly being read/written from, but I want to notify the users of the proper error when they are trying to read/write from a file stored in the block-storage device but the block-storage service/device becomes unavailable or detached mid write. In which case, the read or write command would "timeout," or "hang."
I'm trying to write a test case that reads a file and I want to emulate that situation as closely as possible, meaning I don't want to use signal or just some timeout, I want to be able to make some kind of file that will hang a python file.read() statement or a file.write() statement.
Is this possible? I'm testing on a linux machine and mounting a blockstorage to a folder, pretty simple.
It seems to me that fsdisk is the right tool your looking for. It can bind your storage and inject errors.

Python: Two script working with same file , one updating it another deleting the data when processed

Firstly I am new to Python.
Now my question goes like this:
I have a call back script running in remote machine
which sends some data and run a script in local machine
which process that data and write to a file. Now another
script of mine locally needs to process the file data
one by one and delete them from the file if done.
The problem is the file may be updating continuoulsy.
How do i schyncronize the work so that it doesnt mess up
my file.
Also please suggest me if the same work can be done in some
better way.
I would suggest you to look into named pipes or sockets which seem to be more suited for your purpose than a file. If it's really just between those two applications and you have control on the source code of both.
For example, on unix, you could create a pipe like (see os.mkfifo):
import os
os.mkfifo("/some/unique/path")
And then access it like a file:
dest = open("/some/unique/path", "w") # on the sending side
src = open("/some/unique/path", "r") # on the reading side
The data will be queued between your processes. It's a First In First Out really, but it behaves like a file (mostly).
If you cannot go for named pipes like this, I'd suggest to use IP sockets over localhost from the socket module, preferably DGRAM sockets, as you do not need to do some connection handling there. You seem to know how to do networking already.
I would suggest using a database whose transactions allow for concurrent processing.

Using Python, how do I close a file in use by another user over a network?

I have a program that creates a bunch of movie files. I runs as a cron job and every time it runs the movies from the previous iteration are moved to a 'previous' folder so that there is always a previous version to look at.
These movie files are accessed across a network by various users and that's where I'm running into a problem.
When the script runs and tries to move the files it throws a resource busy error because the files are open by various users. Is there a way in Python to force close these files before I attempt to move them?
Further clarification:
JMax is correct when he mentions it is server level problem. I can access our windows server through Administrative Tools > Computer Management > Shared Folders > Open Files and manually close the files there, but I am wondering whether there is a Python equivalent which will achieve the same result.
something like this:
try:
shutil.move(src, dst)
except OSError:
# Close src file on all machines that are currently accessing it and try again.
This question has nothing to do with Python, and everything to do with the particular operating system and file system you're using. Could you please provide these details?
At least in Windows you can use Sysinternals Handle to force a particular handle to a file to be closed. Especially as this file is opened by another user over a network this operation is extremely destabilising and will probably render the network connection subsequently useless. You're looking for the "-c" command-line argument, where the documentation reads:
Closes the specified handle (interpreted as a hexadecimal number). You
must specify the process by its PID.
WARNING: Closing handles can cause application or system instability.
And if you're force-closing a file mounted over Samba in Linux, speaking from experience this is an excruciating experience in futility. However, others have tried with mixed success; see Force a Samba process to close a file.
As far as I know you have to end the processes which access the file. At least on Windows
The .close() method doesn't work on your object file?
See dive into Python for more information on file objects
[EDIT] I've re-read your question. Your problem is that users do open the same file from the network and you want them to close the file? But can you access to their OS?
[EDIT2] The problem is more on a server level to disconnect the user that access the file. See this example for Windows servers.

How do I watch a folder for changes and when changes are done using Python?

i need to watch a folder for incoming files. i did that with the following help:
How do I watch a file for changes?
the problem is that the files that are being moved are pretty big (10gb)
and i want to be notified when all files are done moving.
i tried comparing the size of the folder every 20 seconds but the file shows its correct size even tough windows shows that it is still moving.
i am using windows with python
i found a solution using open and waiting for an io exception.
if the file is still being moved i get errno 13.
You should take a look at this link:
http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html
There you can see the comparison of the method you are speaking about (simple polling) with two other windows-specific techniques which, in my opinion, offers a really better solution to your problem!
Otherwise, if you are using linux, there's iNotify and the relative Python wrapper:
Pyinotify is a pure Python module used
for monitoring filesystems events on
Linux platforms through inotify
Here: http://trac.dbzteam.org/pyinotify
If you have control over the process of importing the files, I would put a lock file when starting to copy files in, and remove it when you are done. by lock file I mean a tmp empty file, which is just there to indicate that you are coping a file. then your py script can check for the existence of the lock files.
You may be able to use os.stat() to monitor the mtime of the file. However be aware that under various network conditions, the copy may stall momentarily and so the mtime is not updated for a few seconds, so you need to make allowance for this.
Another option is to try opening the file with exclusive read/write which should fail under windows if the file is still opened by the other process
The most reliable method would be to write your own program to move the files.
try checking for the last-modified time change instead of the filesize during your poll.

Categories