I am working on a Python tool to synchronize files between my local machines and a remote server. When I upload a file to the server the modification time property of that file on the server is set to the time of the upload process and not to the mtime of the source file, which I want to preserve. I am using FTP.storbinary() from the Python ftplib to perform the upload. My question: Is there a simple way to preserve the mtime when uploading or to set it after the upload? Thanks.
Short answer: no. The Python ftplib module offers no option to transport the time of the file. Furthermore, the FTP protocol as defined by rfc-959 has no provision to directly get not set the mtime of a file. It may be possible on some servers through SITE commands, but this is server dependant.
If it is possible for you, you should be able to pass a site command with the sendcmd method of a connection object. For example if the server accepts a special SITE SETDATE filename iso-8601-date-string you could use:
resp = ftp.sendcmd(f'SITE SETDATE {file_name} {date-string}')
Related
I have one use case in which I want to read only top 5 rows of a large CSV file which is present in one of my sftp server and I don't want to download the complete file to just read the top 5 rows. I am using pysftp in Python to interact with my SFTP server. Do we have any way in which I can download only the chunk of the file instead of downloading the complete file in pysftp?
If there are any other libraries in Python or any technique I can use, please guide me. Thanks
First, do not use pysftp. It's dead unmaintained project. Use Paramiko instead. See pysftp vs. Paramiko.
If you want to read data from specific point in the file, you can open a file-like object representing the remote file using Paramiko SFTPClient.open method (or equivalent pysftp Connection.open) and then use it as if you were accessing data from any local file:
Use .seek to set read pointer to the desired offset.
Use .read to read data.
with sftp.open("/remote/path/file", "r", bufsize=32768) as f:
f.seek(offset)
data = f.read(count)
For the purpose of bufsize, see:
Writing to a file on SFTP server opened using Paramiko/pysftp "open" method is slow
My application is keeping watch on a set of folders where users can upload files. When a file upload is finished I have to apply a treatment, but I don't know how to detect that a file has not finish to upload.
Any way to detect if a file is not released yet by the FTP server?
There's no generic solution to this problem.
Some FTP servers lock the file being uploaded, preventing you from accessing it, while the file is still being uploaded. For example IIS FTP server does that. Most other FTP servers do not. See my answer at Prevent file from being accessed as it's being uploaded.
There are some common workarounds to the problem (originally posted in SFTP file lock mechanism, but relevant for the FTP too):
You can have the client upload a "done" file once the upload finishes. Make your automated system wait for the "done" file to appear.
You can have a dedicated "upload" folder and have the client (atomically) move the uploaded file to a "done" folder. Make your automated system look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the client (atomically) rename the file after upload to its final name. Make your automated system ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for an example of implementing this approach.
Also, some FTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive.
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, that the file is incomplete.
Some FTP servers allow you to configure a hook to be called, when an upload is finished. You can make use of that. For example ProFTPD has a mod_exec module (see the ExecOnCommand directive).
I use ftputil to implement this work-around:
connect to ftp server
list all files of the directory
call stat() on each file
wait N seconds
For each file: call stat() again. If result is different, then skip this file, since it was modified during the last seconds.
If stat() result is not different, then download the file.
This whole ftp-fetching is old and obsolete technology. I hope that the customer will use a modern http API the next time :-)
If you are reading files of particular extensions, then use WINSCP for File Transfer. It will create a temporary file with extension .filepart and it will turn to the actual file extension once it fully transfer the file.
I hope, it will help someone.
This is a classic problem with FTP transfers. The only mostly reliable method I've found is to send a file, then send a second short "marker" file just to tell the recipient the transfer of the first is complete. You can use a file naming convention and just check for existence of the second file.
You might get fancy and make the content of the second file a checksum of the first file. Then you could verify the first file. (You don't have the problem with the second file because you just wait until file size = checksum size).
And of course this only works if you can get the sender to send a second file.
I'm developing a service which has to copy multiple files from a central node to remote servers.
The problem is that each time the service is executed, there are new servers and new files to dispatch to these servers. I mean, in each execution, I have the information of which files have to be copied to each server and in which directory.
Obviously, this information is very dynamically changing, so I would like to be able to automatize this task. I tried to get a solution with Ansible, FTP and SCP over Python.
I think Ansible is very difficult to automatize every scp task in each execution.
SCP is ok but I need to build each SCP command in Python to launch it.
FTP Is too much for this problem because there are not many files to dispatch to a single server.
Is there any better solution than what I thinked about?
In case you send some the same file (or files) to different destinations (that can be organized as sets), you could profit from solutions as dsh or parallel-scp.
If this will make sense depends on your use-case.
Parallel-SSH Documentation
from __future__ import print_function
from pssh.pssh_client import ParallelSSHClient
hosts = ['myhost1', 'myhost2']
client = ParallelSSHClient(hosts)
output = client.run_command('uname')
for host, host_output in output.items():
for line in host_output.stdout:
print(line)
How about rsync ?
Example: rsync -rave
Where source or destination could be:
user#IP:/path/to/dest
It knows incremental + you can Cron it or trigger a small script when anything changes
Suppose I have one server called server 1. On this server there's a directory called dir1. dir1 has 3 files in them called neh_iu.dat_1, neh_hj.dat_2, jen_ak.dat_1.
I need to get ONLY the 'neh' files from server 1 to another server called server 2. server 2 is where I will be performing certain modifications on these files.
How do I get ONLY the 'neh' files in Python? I'm new to python. I'm aware of a module called paramiko which allows for file transfers but assuming that there are millions of 'neh' files in dir1, and that I don't know the full names of all of them, how can I get an automated process for it in Python?
If you really need to use python instead of bash (assuming you're on unix).
>>> import subprocess
>>> subprocess.call("tar cvzf /path/to/ftp-or-static-http/foo.tgz /path/to/dir/neh*")
This will create a tar file with all the neh*s files. Easy to be transferred between servers (it's only one file instead of millions).
Use FTP, SFTP, HTTP or any transfer protocol supported by your server and perform a request from the other server (curl or ftp).
How can I download files from a website using wildacrds in Python? I have a site that I need to download file from periodically. The problem is the filenames change each time. A portion of the file stays the same though. How can I use a wildcard to specify the unknown portion of the file in a URL?
If the filename changes, there must still be a link to the file somewhere (otherwise nobody would ever guess the filename). A typical approach is to get the HTML page that contains a link to the file, search through that looking for the link target, and then send a second request to get the actual file you're after.
Web servers do not generally implement such a "wildcard" facility as you describe, so you must use other techniques.
You could try logging into the ftp server using ftplib.
From the python docs:
from ftplib import FTP
ftp = FTP('ftp.cwi.nl') # connect to host, default port
ftp.login() # user anonymous, passwd anonymous#
The ftp object has a dir method that lists the contents of a directory.
You could use this listing to find the name of the file you want.