python unable to extract zip file uploaded to aws s3 bucket

python unable to extract zip file uploaded to aws s3 bucket - python

medias = ['https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/0002.jpg',
'https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/2.png',
'https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/02.png'
]
for i in medias:
file_name = i.split("/")[-1]
urllib.urlretrieve (i, "media/"+file_name)
# writing files to a zipfile
local_os_path = f'media/{title}.zip'
with ZipFile(local_os_path, 'w') as zip:
# writing each file one by one
for file in medias:
file_name = file.split("/")[-1]
zip.write("media/"+file_name)
os.remove("media/"+file_name)
s3 = session.resource('s3')
storage_path = f'asset/nfts/zip/{title}.zip'
s3.meta.client.upload_file(Filename=local_os_path, Bucket=AWS_STORAGE_BUCKET_NAME, Key=storage_path)
# os.remove(local_os_path)
DesignerProduct.objects.filter(id=instance.id).update(
zip_file_path=S3_BUCKET_URL + storage_path,
)
I am using this code to create zip file and saving to w3 bucket.
Fitst i am downloading to localsystem then zipping all files and saving zip file to s3 bucket
In my local system i am able to extract zip file but when i download from s3 bucket i am not able to extract it.
https://baby-staging-bucket.s3.us-east-2.amazonaws.com/asset/nfts/zip/ant.zip
This is my path of s3 where zip file uploaded .
what can be the reason please take a look

Move the upload after the with block.
You are uploading your zipfile before the archive is closed.
See ZipFile.close():
Close the archive file. You must call close() before exiting your program or essential records will not be written.
close is automatically called by the with statement.
You open your local file after the program exits - which means after the zipfile is closed - so your local version is not corrupted.

Related

Retrieve file from FTP, unzip file, save extract file to Amazon S3 bucket

Attempting to retrieve a file from FTP and save it to an S3 bucket within lambda function.
I can confirm the first part of the code works as I can see the list of files printed to Cloudwatch logs.
import ftplib
from ftplib import FTP
import zipfile
import boto3
s3 = boto3.client('s3')
S3_OUTPUT_BUCKETNAME = 'my-s3bucket'
ftp = FTP('ftp.godaddy.com')
ftp.login(user='auctions', passwd='')
ftp.retrlines('LIST')
The next part was resulting in the following error:
module initialization error: [Errno 30] Read-only file system: 'tdnam_all_listings.csv.zip'
However I managed to overcome this by adding 'tmp' to the file location as per following code:
fileName = 'all_expiring_auctions.json.zip'
with open('/tmp/fileName', 'wb') as file:
ftp.retrbinary('RETR ' + fileName, file.write)
Next, I am attempting to unzip the file from the temporary loaction
with zipfile.ZipFile('/tmp/fileName', 'r') as zip_ref:
zip_ref.extractall('')
Finally, I am attempting save the file to a particular 'folder' in the s3 bucket, as follows:
data = open('/tmp/all_expiring_auctions.json')
s3.Bucket('brnddmn-s3').upload_fileobj('data','my-s3bucket/folder/')
The code produces no errors that I can see in the log, however the unzipped file is not reaching the destination despite my efforts.
Any help greatly appreciated.

Firstly, you have to use the tmp directory for working with files in Lambda. The ZipFile extractall('') will create the extract in your current working directory though, assuming the zip content is a simple plain text file with no relative path. To create the extract in tmp directory, use
zip_ref.extract_all('tmp')
I'm not sure why there are no errors logged. data = open(...) should throw an error if no file is found. If required you can explicitly print if file exists:
import os
print(os.path.exists('tmp/all_expiring_auctions.json')) # True/False
Finally, once you have ensured the file exists, the argument for Bucket() should be the bucket name. Not sure if your bucket name is 'brnddmn-s3' or 'my-s3bucket'. Also, the first argument to upload_fileobj() should be a file object, i.e., data instead of string 'data'. The second argument should be the object key (filename in S3) instead of the folder name.
Putting it together, the last line should look like this.
S3_OUTPUT_BUCKETNAME = 'my-s3bucket' # Replace with your S3 bucket name
s3.Bucket(S3_OUTPUT_BUCKETNAME).upload_fileobj(data,'folder/all_expiring_auctions.json')

Archive all files from one SFTP folder to another in Python

I was able to successfully upload the file from S3 to SFTP location using the below syntax as given by #Martin Prikryl Transfer file from AWS S3 to SFTP using Boto 3.
with sftp.open('/sftp/path/filename', 'wb') as f:
s3.download_fileobj('mybucket', 'mykey', f)
I have a requirement to archive the previous file into the archive folder from the current folder before uploading the current dated file from S3 to SFTP
I am trying to achieve using the wildcard, because sometimes, when running on Monday, you won't be able to find the file for Sunday and you have the previous file which is Friday's file. So I want to achieve any of the previous file irrespective of the date.
Example
I have folder as below and filename_20200623.csv needs to be moved to ARCHIVE folder and the new file filename_20200625.csv will be uploaded.
MKT
ABC
ARCHIVE
filename_20200623.csv
Expected
MKT
ABC
ARCHIVE
filename_20200623.csv
filename_20200625.csv

Use Connection.listdir_attr to retrieve list of all files in the directory, filter it to those you are interested in, and then move them one-by-one using Connection.rename:
remote_path = "/remote/path"
archive_path = "/archive/path"
for f in sftp.listdir_attr(remote_path):
if (not stat.S_ISDIR(f.st_mode)) and f.filename.startswith('prefix'):
remote_file_path = remote_path + "/" + f.filename
archive_file_path = archive_path + "/" + f.filename
print("Archiving %s to %s" % (remote_file_path, archive_file_path))
sftp.rename(remote_file_path, archive_file_path)
For future readers, who use Paramiko, the code will be identical, except of course that sftp will refer to Paramiko SFTPClient class, instead of pysftp Connection class. As Paramiko SFTPClient.listdir_attr and SFTPClient.rename methods behave identically to those of pysftp.

How to upload large number of files to Amazon S3 efficiently using boto3?

I have 10000s of 10Mb files in my local directory and I'm trying to upload it to a bucket in Amazon S3 using boto3 by sequential upload approach. The only problem I'm facing here is it takes lot of time to upload large number of files to S3. I want to know like whether there are efficient ways(using multithreading or multiprocessing) to upload files to Amazon S3?
root_path ="/home/shivraj/folder/"
path = root_path+'folder_raw/' # use your path
dest_path = root_path+'folder_parsed/'
backup_path = root_path+'folder_backup/'
def parse_ivn_files():
src_files_list = glob.glob(path + "*.txt.zip") # .log files in the path files
try:
if src_files_list:
for file_ in src_files_list:
df = pd.read_csv(file_,compression="zip",sep="|", header=None)
file = file_.replace(path,'')
file_name = file.replace(".txt.zip",'')
df.columns=["Date","Time","System_Event","Event_Type","Event_sub_type","Latitude","Longitude","Field_1","Field_2","Field_3","Field_4","Event_Number","Event_Description"]
new_df=df['Event_Description'].str.split(',',expand=True)
large_df = pd.concat([df,new_df],axis=1)
large_df.to_csv(dest_path+file_name+".csv",index=False)
s3.meta.client.upload_file(dest_path+file_name+".csv", 's3-bucket-name-here', 'ivn_parsed/'+file_name+".csv")
s3.meta.client.upload_file(path+file_name+".txt.zip", 's3-bucket-name-here', 'ivn_raw_backup/'+file_name+"_bk.txt.zip")
os.rename(path+file_name+".txt.zip", backup_path+file_name+"_bk.txt.zip")
else:
print("No files in the source folder")
except:
raise FileNotFoundError

I’d go for s4cmd - it’s a nice tool that can upload your files in parallel and has solved some other problems too:
https://github.com/bloomreach/s4cmd

why I have a extra file when listing files from S3 folder

I'm trying to list the file from S3 Bucket "card-prtnr-npi". The files that I want to read are in the "ambs_ambivolatile" folder which is there in the S3 "card-prtnr-npi" bucket. This is the actual path "card-prtnr-npi/users/rtltest/ambs_ambivolatile". "ambs_ambivolatile" folder has only one file in it but Boto 3 is reading an additional file which is not present.
'users/rtltest/ambs_ambivolatile/' is not present in the folder and only part-m-00026.bz2' is present. Please see the below code.
['users/rtltest/ambs_ambivolatile/', 'users/rtltest/ambs_ambivolatile/part-m-00026.bz2']
s3_src_bucket = 'card-prtnr-npi'
s3_src_prefix = 'users/rtltest/ambs_ambivolatile/'
print("getting response from source")
source_bucket = src_session.resource('s3').Bucket(s3_src_bucket)
files = source_bucket.objects.filter(Prefix=s3_src_prefix)
source_keys = []
for file in files:
source_keys.append(file.key)
print(source_keys)
The above print statement prints the following list
['users/rtltest/ambs_ambivolatile/', 'users/rtltest/ambs_ambivolatile/part-m-00026.bz2']
How do I stop reading this extra file 'users/rtltest/ambs_ambivolatile/'.

It's not a file, it's a prefix that you set. S3 has no concept of folders. Every file path is a single string, and you filtered for the start of it
Try this to exclude the prefix
source_keys = [file.key for file in files if file.key!=s3_src_prefix]

Download S3 Files with Boto

I am trying to set up an app where users can download their files stored in an S3 Bucket. I am able to set up my bucket, and get the correct file, but it won't download, giving me the this error: No such file or directory: 'media/user_1/imageName.jpg' Any idea why? This seems like a relatively easy problem, but I can't quite seem to get it. I can delete an image properly, so it is able to identify the correct image.
Here's my views.py
def download(request, project_id=None):
conn = S3Connection('AWS_BUCKET_KEY', 'AWS_SECRET_KEY')
b = Bucket(conn, 'BUCKET_NAME')
k = Key(b)
instance = get_object_or_404(Project, id=project_id)
k.key = 'media/'+str(instance.image)
k.get_contents_to_filename(str(k.key))
return redirect("/dashboard/")

The problem is that you are downloading to a local directory that doesn't exist (media/user1). You need to either:
Create the directory on the local machine first
Just use the filename rather than a full path
Use the full path, but replace slashes (/) with another character -- this will ensure uniqueness of filename without having to create directories
The last option could be achieved via:
k.get_contents_to_filename(str(k.key).replace('/', '_'))
See also: Boto3 to download all files from a S3 Bucket

Downloading files using boto3 is very simple, configure your AWS credentials at system level before using this code.
client = boto3.client('s3')
// if your bucket name is mybucket and the file path is test/abc.txt
// then the Bucket='mybucket' Prefix='test'
resp = client.list_objects_v2(Bucket="<your bucket name>", Prefix="<prefix of the s3 folder>")
for obj in resp['Contents']:
key = obj['Key']
//to read s3 file contents as String
response = client.get_object(Bucket="<your bucket name>",
Key=key)
print(response['Body'].read().decode('utf-8'))
//to download the file to local
client.download_file('<your bucket name>', key, key.replace('test',''))
replace is to locate the file in your local with s3 file name, if you don't replace it will try to save as 'test/abc.txt'.

import os
import boto3
import json
s3 = boto3.resource('s3', aws_access_key_id="AKIAxxxxxxxxxxxxJWB",
aws_secret_access_key="LV0+vsaxxxxxxxxxxxxxxxxxxxxxry0/LjxZkN")
my_bucket = s3.Bucket('s3testing')
# download file into current directory
for s3_object in my_bucket.objects.all():
# Need to split s3_object.key into path and file name, else it will give error file not found.
path, filename = os.path.split(s3_object.key)
my_bucket.download_file(s3_object.key, filename)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python unable to extract zip file uploaded to aws s3 bucket - python

Related

Retrieve file from FTP, unzip file, save extract file to Amazon S3 bucket

Archive all files from one SFTP folder to another in Python

How to upload large number of files to Amazon S3 efficiently using boto3?

why I have a extra file when listing files from S3 folder

Download S3 Files with Boto

Categories

Resources