My application is keeping watch on a set of folders where users can upload files. When a file upload is finished I have to apply a treatment, but I don't know how to detect that a file has not finish to upload.
Any way to detect if a file is not released yet by the FTP server?
There's no generic solution to this problem.
Some FTP servers lock the file being uploaded, preventing you from accessing it, while the file is still being uploaded. For example IIS FTP server does that. Most other FTP servers do not. See my answer at Prevent file from being accessed as it's being uploaded.
There are some common workarounds to the problem (originally posted in SFTP file lock mechanism, but relevant for the FTP too):
You can have the client upload a "done" file once the upload finishes. Make your automated system wait for the "done" file to appear.
You can have a dedicated "upload" folder and have the client (atomically) move the uploaded file to a "done" folder. Make your automated system look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the client (atomically) rename the file after upload to its final name. Make your automated system ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for an example of implementing this approach.
Also, some FTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive.
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, that the file is incomplete.
Some FTP servers allow you to configure a hook to be called, when an upload is finished. You can make use of that. For example ProFTPD has a mod_exec module (see the ExecOnCommand directive).
I use ftputil to implement this work-around:
connect to ftp server
list all files of the directory
call stat() on each file
wait N seconds
For each file: call stat() again. If result is different, then skip this file, since it was modified during the last seconds.
If stat() result is not different, then download the file.
This whole ftp-fetching is old and obsolete technology. I hope that the customer will use a modern http API the next time :-)
If you are reading files of particular extensions, then use WINSCP for File Transfer. It will create a temporary file with extension .filepart and it will turn to the actual file extension once it fully transfer the file.
I hope, it will help someone.
This is a classic problem with FTP transfers. The only mostly reliable method I've found is to send a file, then send a second short "marker" file just to tell the recipient the transfer of the first is complete. You can use a file naming convention and just check for existence of the second file.
You might get fancy and make the content of the second file a checksum of the first file. Then you could verify the first file. (You don't have the problem with the second file because you just wait until file size = checksum size).
And of course this only works if you can get the sender to send a second file.
Related
I am working on a Python tool to synchronize files between my local machines and a remote server. When I upload a file to the server the modification time property of that file on the server is set to the time of the upload process and not to the mtime of the source file, which I want to preserve. I am using FTP.storbinary() from the Python ftplib to perform the upload. My question: Is there a simple way to preserve the mtime when uploading or to set it after the upload? Thanks.
Short answer: no. The Python ftplib module offers no option to transport the time of the file. Furthermore, the FTP protocol as defined by rfc-959 has no provision to directly get not set the mtime of a file. It may be possible on some servers through SITE commands, but this is server dependant.
If it is possible for you, you should be able to pass a site command with the sendcmd method of a connection object. For example if the server accepts a special SITE SETDATE filename iso-8601-date-string you could use:
resp = ftp.sendcmd(f'SITE SETDATE {file_name} {date-string}')
My Flask application will allow the upload of large files (up to 100 Mb) to my server. I was wondering how Flask managed the chunked file if the client decides to stop the upload half way. I read the documentation about File Upload but wasn't able to find that mentioned.
Does Flask automatically delete the file? How can it know that the user won't retry it? Or do I have to manually delete the aborted files in the temporary folder?
Werkzeug (the library that Flask uses for many tasks including this one) uses a tempfile.TemporaryFile object to receive the WSGI file stream when uploading. The object automatically manages the open file.
The file is immediately deleted on disk; there is no entry in the directory table anymore, but the process retains a file handle
When the TemporaryFile object is cleared (no references remain, usually because the request ended), the file object is closed and the operating system clears the disk space used.
As such, the file data is deleted when a request is aborted.
Flask does not handle the case where a user uploads the file again; there is no standard way to handle that anyway. You'd have to come up with your own solution there.
I was wondering if it was possible to take the newest files uploaded to an ftp server and send them to another ftp server. BUT, every file can only be sent once. If you can do this in python that would be nice, I know intermediate python. EXAMPLE:
2:14 PM file.txt is uploaded to the server. the program takes the file and sensd it to another server.
2:15 PM example.txt is uploaded to the server. the program takes just that file and sends it to another server.
I have searched online for this but cant find anything. Please help!
As you said that you already know python, I will give you some conceptual hints. Basically, you are looking for a one-way synchronisation. The main problem with this task is to make your program detect new files. The simplest way to do this is to create a database (note that by database I mean a way of storing data, not necessarly a specialized database). For example, a text file. In this database, each file will be recorded. Periodically, check the database with the current files (the basic ls or something similar will do). If a new file appears (meaning that there are files that are not in database), upload them.
This is the basic idea. You can improve it by using multi threading, some checks if a file has modified and so on.
EDIT: This is a programming way. As it has been suggested in comments, there are also some software solutions that will do this for you.
I'm writing a Python-based service that scans a specified drive for files changes and backs them up to a storage service. My concern is handling files which are open and being actively written to (primarily database files).
I will be running this cross-platform so Windows/Linux/OSX.
I do not want to have to tinker with volume shadow copy services. I am perfectly happy with throwing a notice to the user/log that a file had to be skipped or even retrying a copy operation x number of times in the event of an intermittent write lock on a small document or similar type of file.
Successfully copying out a file in an inconsistent state and not failing would certainly be a Bad Thing(TM).
The users of this service will be able to specify the path(s) they want backed-up so I have to be able to determine at runtime what to skip.
I am thinking I could just identify any file which has a read/write handle and try to obtain exclusive access to it during the archival process, but I think this might be too intrusive(?) if the user was actively using the system.
Ideas?
You could look for the file being closed and archive it. The phi notify library allows you to watch given files or directories for a number of events, including CLOSE-WRITE which allows you to detect those files which have closed with changes.
How can I download files from a website using wildacrds in Python? I have a site that I need to download file from periodically. The problem is the filenames change each time. A portion of the file stays the same though. How can I use a wildcard to specify the unknown portion of the file in a URL?
If the filename changes, there must still be a link to the file somewhere (otherwise nobody would ever guess the filename). A typical approach is to get the HTML page that contains a link to the file, search through that looking for the link target, and then send a second request to get the actual file you're after.
Web servers do not generally implement such a "wildcard" facility as you describe, so you must use other techniques.
You could try logging into the ftp server using ftplib.
From the python docs:
from ftplib import FTP
ftp = FTP('ftp.cwi.nl') # connect to host, default port
ftp.login() # user anonymous, passwd anonymous#
The ftp object has a dir method that lists the contents of a directory.
You could use this listing to find the name of the file you want.