Archive all files from one SFTP folder to another in Python - python

I was able to successfully upload the file from S3 to SFTP location using the below syntax as given by #Martin Prikryl Transfer file from AWS S3 to SFTP using Boto 3.
with sftp.open('/sftp/path/filename', 'wb') as f:
s3.download_fileobj('mybucket', 'mykey', f)
I have a requirement to archive the previous file into the archive folder from the current folder before uploading the current dated file from S3 to SFTP
I am trying to achieve using the wildcard, because sometimes, when running on Monday, you won't be able to find the file for Sunday and you have the previous file which is Friday's file. So I want to achieve any of the previous file irrespective of the date.
Example
I have folder as below and filename_20200623.csv needs to be moved to ARCHIVE folder and the new file filename_20200625.csv will be uploaded.
MKT
ABC
ARCHIVE
filename_20200623.csv
Expected
MKT
ABC
ARCHIVE
filename_20200623.csv
filename_20200625.csv

Use Connection.listdir_attr to retrieve list of all files in the directory, filter it to those you are interested in, and then move them one-by-one using Connection.rename:
remote_path = "/remote/path"
archive_path = "/archive/path"
for f in sftp.listdir_attr(remote_path):
if (not stat.S_ISDIR(f.st_mode)) and f.filename.startswith('prefix'):
remote_file_path = remote_path + "/" + f.filename
archive_file_path = archive_path + "/" + f.filename
print("Archiving %s to %s" % (remote_file_path, archive_file_path))
sftp.rename(remote_file_path, archive_file_path)
For future readers, who use Paramiko, the code will be identical, except of course that sftp will refer to Paramiko SFTPClient class, instead of pysftp Connection class. As Paramiko SFTPClient.listdir_attr and SFTPClient.rename methods behave identically to those of pysftp.

Related

Retrieve file from FTP, unzip file, save extract file to Amazon S3 bucket

Attempting to retrieve a file from FTP and save it to an S3 bucket within lambda function.
I can confirm the first part of the code works as I can see the list of files printed to Cloudwatch logs.
import ftplib
from ftplib import FTP
import zipfile
import boto3
s3 = boto3.client('s3')
S3_OUTPUT_BUCKETNAME = 'my-s3bucket'
ftp = FTP('ftp.godaddy.com')
ftp.login(user='auctions', passwd='')
ftp.retrlines('LIST')
The next part was resulting in the following error:
module initialization error: [Errno 30] Read-only file system: 'tdnam_all_listings.csv.zip'
However I managed to overcome this by adding 'tmp' to the file location as per following code:
fileName = 'all_expiring_auctions.json.zip'
with open('/tmp/fileName', 'wb') as file:
ftp.retrbinary('RETR ' + fileName, file.write)
Next, I am attempting to unzip the file from the temporary loaction
with zipfile.ZipFile('/tmp/fileName', 'r') as zip_ref:
zip_ref.extractall('')
Finally, I am attempting save the file to a particular 'folder' in the s3 bucket, as follows:
data = open('/tmp/all_expiring_auctions.json')
s3.Bucket('brnddmn-s3').upload_fileobj('data','my-s3bucket/folder/')
The code produces no errors that I can see in the log, however the unzipped file is not reaching the destination despite my efforts.
Any help greatly appreciated.
Firstly, you have to use the tmp directory for working with files in Lambda. The ZipFile extractall('') will create the extract in your current working directory though, assuming the zip content is a simple plain text file with no relative path. To create the extract in tmp directory, use
zip_ref.extract_all('tmp')
I'm not sure why there are no errors logged. data = open(...) should throw an error if no file is found. If required you can explicitly print if file exists:
import os
print(os.path.exists('tmp/all_expiring_auctions.json')) # True/False
Finally, once you have ensured the file exists, the argument for Bucket() should be the bucket name. Not sure if your bucket name is 'brnddmn-s3' or 'my-s3bucket'. Also, the first argument to upload_fileobj() should be a file object, i.e., data instead of string 'data'. The second argument should be the object key (filename in S3) instead of the folder name.
Putting it together, the last line should look like this.
S3_OUTPUT_BUCKETNAME = 'my-s3bucket' # Replace with your S3 bucket name
s3.Bucket(S3_OUTPUT_BUCKETNAME).upload_fileobj(data,'folder/all_expiring_auctions.json')

python unable to extract zip file uploaded to aws s3 bucket

medias = ['https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/0002.jpg',
'https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/2.png',
'https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/02.png'
]
for i in medias:
file_name = i.split("/")[-1]
urllib.urlretrieve (i, "media/"+file_name)
# writing files to a zipfile
local_os_path = f'media/{title}.zip'
with ZipFile(local_os_path, 'w') as zip:
# writing each file one by one
for file in medias:
file_name = file.split("/")[-1]
zip.write("media/"+file_name)
os.remove("media/"+file_name)
s3 = session.resource('s3')
storage_path = f'asset/nfts/zip/{title}.zip'
s3.meta.client.upload_file(Filename=local_os_path, Bucket=AWS_STORAGE_BUCKET_NAME, Key=storage_path)
# os.remove(local_os_path)
DesignerProduct.objects.filter(id=instance.id).update(
zip_file_path=S3_BUCKET_URL + storage_path,
)
I am using this code to create zip file and saving to w3 bucket.
Fitst i am downloading to localsystem then zipping all files and saving zip file to s3 bucket
In my local system i am able to extract zip file but when i download from s3 bucket i am not able to extract it.
https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/nfts/zip/ant.zip
This is my path of s3 where zip file uploaded .
what can be the reason please take a look
Move the upload after the with block.
You are uploading your zipfile before the archive is closed.
See ZipFile.close():
Close the archive file. You must call close() before exiting your program or essential records will not be written.
close is automatically called by the with statement.
You open your local file after the program exits - which means after the zipfile is closed - so your local version is not corrupted.

Avoid Overiding Existing File [duplicate]

I am using pysftp library's get_r function (https://pysftp.readthedocs.io/en/release_0.2.9/pysftp.html#pysftp.Connection.get_r) to get a local copy of a directory structure from sftp server.
Is that the correct approach for a situation when the contents of the remote directory have changed and I would like to get only the files that changed since the last time the script was run?
The script should be able to sync the remote directory recursively and mirror the state of the remote directory - f.e. with a parameter controlling if the local outdated files (those that are no longer present on the remote server) should be removed, and any changes to the existing files and new files should be fetched.
My current approach is here.
Example usage:
from sftp_sync import sync_dir
sync_dir('/remote/path/', '/local/path/')
Use the pysftp.Connection.listdir_attr to get file listing with attributes (including the file timestamp).
Then, iterate the list and compare against local files.
import os
import pysftp
import stat
remote_path = "/remote/path"
local_path = "/local/path"
with pysftp.Connection('example.com', username='user', password='pass') as sftp:
sftp.cwd(remote_path)
for f in sftp.listdir_attr():
if not stat.S_ISDIR(f.st_mode):
print("Checking %s..." % f.filename)
local_file_path = os.path.join(local_path, f.filename)
if ((not os.path.isfile(local_file_path)) or
(f.st_mtime > os.path.getmtime(local_file_path))):
print("Downloading %s..." % f.filename)
sftp.get(f.filename, local_file_path)
Though these days, you should not use pysftp, as it is dead. Use Paramiko directly instead. See pysftp vs. Paramiko. The above code will work with Paramiko too with its SFTPClient.listdir_attr.

It is possible to search in files on ftp in Python?

right now this is all I have:
import ftputil
a_host = ftputil.FTPHost("ftp_host", "username","pass") # login to ftp
for (dirname, subdirs, files) in a_host.walk("/"): # directory
for f in files:
fullpath = a_host.path.join(dirname, f)
if fullpath.endswith('html'):
#stucked
so I can log in to my ftp, and do a .walk in my files
the thing I am not able to manage is when the .walk finds a html file to also search in it for a string I want.
for example:
on my ftp - there is a index.html and a something.txt file
I want to find with .walk the index.html file, and then in index.html search for 'my string'
thanks
FTP is a protocol for file transfer only. It has not the ability by itself to execute remote commands which are needed to search the files on the remote server (there is a SITE command but it can usually not be used for such a purpose because it is not implemented or restricted to only a few commands).
This means your only option with FTP is to download the file and search it locally, i.e. transfer the file to the local system, open it there and look for the string.

Downloading from an ftp using Python

I have a piece of code in Python to download files from an ftp. The code downloads the very first file in the list of available days but fails to download the second. What could be the problem?
import os, ftplib
destdir='D:\precipitation\dl'
ftp = ftplib.FTP('ftp.itc.nl')
ftp.login('anonymous', '')
ftp.cwd('pub/mpe/msg')
available_days=['summsgmpe_20100101.zip','summsgmpe_20100102.zip', 'summsgmpe_20100103.zip', 'summsgmpe_20100104.zip', 'summsgmpe_20100105.zip', 'summsgmpe_20100106.zip', 'summsgmpe_20100107.zip', 'summsgmpe_20100108.zip']
hdfs = list()
for day in available_days :
file = available_days[available_days.index(day)]
print 'file=', file
local_file = os.path.join(destdir, file)
ftp.retrbinary('RETR %s' %file, open(local_file, 'wb').write)
hdfs.append(os.path.abspath(local_file))
ftp.cwd('..')
ftp.quit()
Remove your call to ftp.cwd(..)
That's moving up a directory for each iteration of the list, instead of staying in the correct folder where the files are.

Categories