I am developing a Python app which consist of an uploading module, the core function pulls .png images from a queue, and by using a Boto3 client, uploading them to a certain bucket.
The problem is that sometimes, not always, the images are only partially uploaded. e.g. when I download a defective image, it seem to be cropped.
When I manually uploading the images (using an FTP/SSH client) the images are being perfectly uploaded.
The following is my core function, note that I'm using upload_fileobj() with a callback for progress bar mechanics.
def upload_file_aws(self):
s3 = boto3.client('s3', aws_access_key_id=self.aws_access_key,
aws_secret_access_key=self.aws_secret_key)
if (not self.uploader.queue.empty()):
file = self.uploader.queue.get()
with open(file, 'rb') as f:
aws_format = '%s' % AppObject.file_path_dic.get(file)
s3.upload_fileobj(f, self.bucket_name, aws_format, Callback=ProgressBarInit(file))
Did anyone encountered with that problem before?
At Amazon's doc file they declare that boto3 protocols does not enables partial uploads.
There are high chances that it is happening for larger images of size more then 5 MB.
You should be using multipart upload for large size objects.
Here is basic code example of multipart upload.
import boto3
def upload_file( filename ):
session = boto3.Session()
s3_client = session.client( 's3' )
try:
print "Uploading file:", filename
tc = boto3.s3.transfer.TransferConfig()
t = boto3.s3.transfer.S3Transfer( client=s3_client,
config=tc )
t.upload_file( filename, 'my-bucket-name', 'name-in-s3.dat' )
except Exception as e:
print "Error uploading: %s" % ( e )
Related
Here is my code for uploading the image to AWS S3:
#app.post("/post_ads")
async def create_upload_files(files: list[UploadFile] = File(description="Multiple files as UploadFile")):
main_image_list = []
for file in files:
s3 = boto3.resource(
's3',
aws_access_key_id = aws_access_key_id,
aws_secret_access_key = aws_secret_access_key
)
bucket = s3.Bucket(aws_bucket_name)
bucket.upload_fileobj(file.file,file.filename,ExtraArgs={"ACL":"public-read"})
Is there any way to compress the image size and upload the image to a specific folder using boto3? I have this function for compressing the image, but I don't know how to integrate it into boto3.
for file in files:
im = Image.open(file.file)
im = im.convert("RGB")
im_io = BytesIO()
im = im.save(im_io, 'JPEG', quality=50)
s3 = boto3.resource(
's3',
aws_access_key_id = aws_access_key_id,
aws_secret_access_key = aws_secret_access_key
)
bucket = s3.Bucket(aws_bucket_name)
bucket.upload_fileobj(file.file,file.filename,ExtraArgs={"ACL":"public-read"})
Update #1
After following Chris's recommendation, my problem has been resolved:
Here is Chris's solution:
im_io.seek(0)
bucket.upload_fileobj(im_io,file.filename,ExtraArgs={"ACL":"public-read"})
You seem to be saving the image bytes to a BytesIO stream, which is never used, as you upload the original file object to the s3 bucket instead, as shown in this line of your code:
bucket.upload_fileobj(file.file, file.filename, ExtraArgs={"ACL":"public-read"})
Hence, you need to pass the BytesIO object to upload_fileobj() function, and make sure to call .seek(0) before that, in order to rewind the cursor (or "file pointer") to the start of the buffer. The reason for calling .seek(0) is that im.save() method uses the cursor to iterate through the buffer, and when it reaches the end, it does not reset the cursor to the beginning. Hence, any future read operations would start at the end of the buffer. The same applies to reading from the original file, as described in this answer—you would need to call file.file.seek(0), if the file contents were read already and you needed to read from the file again.
Example on how to load the image into BytesIO stream and use it to upload the file/image can be seen below. Please remember to properly close the UploadFile, Image and BytesIO objects, in order to release their memory (see related answer as well).
from fastapi import HTTPException
from PIL import Image
import io
# ...
try:
im = Image.open(file.file)
if im.mode in ("RGBA", "P"):
im = im.convert("RGB")
buf = io.BytesIO()
im.save(buf, 'JPEG', quality=50)
buf.seek(0)
bucket.upload_fileobj(buf, 'out.jpg', ExtraArgs={"ACL":"public-read"})
except Exception:
raise HTTPException(status_code=500, detail='Something went wrong')
finally:
file.file.close()
buf.close()
im.close()
As for the URL, using ExtraArgs={"ACL":"public-read"} should work as expected and make your resource (file) publicly accessible. Hence, please make sure you are accessing the correct URL.
aws s3 sync s3://your-pics. for file in "$ (find. -name "*.jpg")"; do gzip "$file"; echo "$file"; done aws s3 sync. s3://your-pics --content-encoding gzip --dryrun This will download all files in s3 bucket to the machine (or ec2 instance), compresses the image files and upload them back to s3 bucket.
This should help you.
I am using the "azure-storage-blob" package within fastAPI to upload a blob image to my Azure storage blob container. Aftera lot of trial and error I decided to just copy over a static file from my directory to the azure table storage. but everytime I upload the file it gets added as empty. If I write the file locally everything goes fine.
I am using the official documentation as decribed here:
https://pypi.org/project/azure-storage-blob/
I have the following code:
#app.post("/files/")
async def upload(incoming_file: UploadFile = File(...)):
fs = await incoming_file.read()
file_size = len(fs)
print(file_size)
if math.ceil(file_size / 1024) > 64:
raise HTTPException(400, detail="File must be smaller than 64kb.")
if incoming_file.content_type not in ["image/png", "image/jpeg"]:
raise HTTPException(400, detail="File type must either be JPEG or PNG.")
try:
blob = BlobClient.from_connection_string(conn_str=az_connection_string, container_name="app-store-logos",
blob_name="dockerLogo.png")
with open("./dockerLogo.png", "rb") as data:
blob.upload_blob(data)
except Exception as err :
return {"message": "There was an error uploading the file {0}".format(err)}
finally:
await incoming_file.close()
return {"message": f"Successfuly uploaded {incoming_file.filename}"}
When I upload the file to the table storage the entry gets saved but empty:
If I change any filenames or storage names I do get an error, so the files exist and are in the right place, thoug it seems like the azure storage sdk doesnt copy over the contents of the file.
If anyone has any pointers I would be grateful
I have this issue in which I hope that someone can help me with.
So I have a process that saves some images into a S3 bucket.
Then, I have a lambda process, that using python, it's supposed to create a PDF file, displaying these images.
I'm using the library xhtml2pdf for that, which I've uploaded to my lambda environment as a layer.
My 1st approach was to download the image from the S3 bucket, and save it into the lambda '/tmp', but I was getting this error from xhtml2pdf:
Traceback (most recent call last):
File "/opt/python/xhtml2pdf/xhtml2pdf_reportlab.py", line 359, in __init__
raise RuntimeError('Imaging Library not available, unable to import bitmaps only jpegs')
RuntimeError: Imaging Library not available, unable to import bitmaps only jpegs fileName=
<_io.BytesIO object at 0x7f1eaabe49a0>
Then I thought that if I had it being transformed into a base64 file, that this issue would be solved, but then I got the same error.
Can anybody here, please, give me some guidance about the best way to do this ?
Thank you
This is a small piece of my lambda code:
from xhtml2pdf import pisa
def getFileFromS3(fileKey, fileName):
try:
localFileName = f'/tmp/{fileName}'
bot_utils.log(f'fileKey : {fileKey}')
bot_utils.log(f'fileName : {fileName}')
bot_utils.log(f'localFileName : {localFileName}')
s3 = boto3.client('s3')
bucketName = 'fileholder'
s3.download_file(bucketName, fileKey, localFileName)
return 'data:image/jpeg;base64,' + getImgBase64( localFileName )
except botocore.exceptions.ClientError as e:
raise
htmlText = '<table>'
for i in range(0, len(shoppingLines), 2):
product = shoppingLines[i]
text = product['text']
folderName = product['folder']
tmpFile = getFileFromS3(f"pannings/{folderName}/{product['photo_id']}.jpg", f"{product['photo_id']}.jpg")
htmlText += f"""<tr><td align="center"><img src="{tmpFile}" width="40" height="55"></td><td>{text}</td></tr>"""
htmlText += '</table>'
result_file = open('/tmp/file.pdf', "w+b")
pisa_status = pisa.CreatePDF(htmlText ,dest=result_file)
result_file.close()
For future google searches.
Seems like the issue is with the PIL/Pillow library.
I've found a version of these library on this GIT repo (https://github.com/keithrozario/Klayers)
When I use this version, it works...
I need to archive multiply files that exists on s3 and then upload the archive back to s3.
I am trying to use lambda and python. As some of the files have more than 500MB, downloading in the '/tmp' is not an option. Is there any way to stream files one by one and put them in archive?
Do not write to disk, stream to and from S3
Stream the Zip file from the source bucket and read and write its contents on the fly using Python back to another S3 bucket.
This method does not use up disk space and therefore is not limited by size.
The basic steps are:
Read the zip file from S3 using the Boto3 S3 resource Object into a BytesIO buffer object
Open the object using the zipfile module
Iterate over each file in the zip file using the namelist method
Write the file back to another bucket in S3 using the resource meta.client.upload_fileobj method
The Code
Python 3.6 using Boto3
s3_resource = boto3.resource('s3')
zip_obj = s3_resource.Object(bucket_name="bucket_name_here", key=zip_key)
buffer = BytesIO(zip_obj.get()["Body"].read())
z = zipfile.ZipFile(buffer)
for filename in z.namelist():
file_info = z.getinfo(filename)
s3_resource.meta.client.upload_fileobj(
z.open(filename),
Bucket=bucket,
Key=f'{filename}'
)
Note: AWS Execution time limit has a maximum of 15 minutes so can you process your HUGE files in this amount of time? You can only know by testing.
AWS Lambda code: create zip from files by ext in bucket/filePath.
def createZipFileStream(bucketName, bucketFilePath, jobKey, fileExt, createUrl=False):
response = {}
bucket = s3.Bucket(bucketName)
filesCollection = bucket.objects.filter(Prefix=bucketFilePath).all()
archive = BytesIO()
with zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED) as zip_archive:
for file in filesCollection:
if file.key.endswith('.' + fileExt):
with zip_archive.open(file.key, 'w') as file1:
file1.write(file.get()['Body'].read())
archive.seek(0)
s3.Object(bucketName, bucketFilePath + '/' + jobKey + '.zip').upload_fileobj(archive)
archive.close()
response['fileUrl'] = None
if createUrl is True:
s3Client = boto3.client('s3')
response['fileUrl'] = s3Client.generate_presigned_url('get_object', Params={'Bucket': bucketName,
'Key': '' + bucketFilePath + '/' + jobKey + '.zip'},
ExpiresIn=3600)
return response
The /tmp/ directory is limited to 512MB for AWS Lambda functions.
If you search StackOverflow, you'll see some code from people who have created Zip files on-the-fly without saving files to disk. It becomes pretty complicated.
An alternative would be to attach an EFS filesystem to the Lambda function. It takes a bit of effort to setup, but the cost would be practically zero if you delete the files after use and you'll have plenty of disk space so your code will be more reliable and easier to maintain.
# For me below code worked for single file in Glue job to take single .txt file form AWS S3 and make it zipped and upload back to AWS S3.
import boto3
import zipfile
from io import BytesIO
import logging
logger = logging.getLogger()
s3_client = boto3.client('s3')
s3_resource= boto3.resource('s3')
# ZipFileStream function declaration
self._createZipFileStream(
bucketName="My_AWS_S3_bucket_name",
bucketFilePath="My_txt_object_prefix",
bucketfileobject="My_txt_Object_prefix + txt_file_name",
zipKey="My_zip_file_prefix")
# ZipFileStream function Defination
def _createZipFileStream(self, bucketName: str, bucketFilePath: str, bucketfileobject: str, zipKey: str, ) -> None:
try:
obj = s3_resource.Object(bucket_name=bucketName, key=bucketfileobject)
archive = BytesIO()
with zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED) as zip_archive:
with zip_archive.open(zipKey, 'w') as file1:
file1.write(obj.get()['Body'].read())
archive.seek(0)
s3_client.upload_fileobj(archive, bucketName, bucketFilePath + '/' + zipKey + '.zip')
archive.close()
# If you would like to delete the .txt after zipped from AWS S3 below code will work.
self._delete_object(
bucket=bucketName, key=bucketfileobject)
except Exception as e:
logger.error(f"Failed to zip the txt file for {bucketName}/{bucketfileobject}: str{e}")
# Delete AWS S3 funcation defination.
def _delete_object(bucket: str, key: str) -> None:
try:
logger.info(f"Deleting: {bucket}/{key}")
S3.delete_object(
Bucket=bucket,
Key=key
)
except Exception as e:
logger.error(f"Failed to delete {bucket}/{key}: str{e}")`enter code here`
I need to transfer files from google cloud storage to azure blob storage.
Google gives a code snippet to download files to byte variable like so:
# Get Payload Data
req = client.objects().get_media(
bucket=bucket_name,
object=object_name,
generation=generation) # optional
# The BytesIO object may be replaced with any io.Base instance.
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, req, chunksize=1024*1024)
done = False
while not done:
status, done = downloader.next_chunk()
if status:
print 'Download %d%%.' % int(status.progress() * 100)
print 'Download Complete!'
print fh.getvalue()
I was able to modify this to store to file by changing the fh object type like so:
fh = open(object_name, 'wb')
Then I can upload to azure blob storage using blob_service.put_block_blob_from_path.
I want to avoid writing to local file on machine doing the transfer.
I gather Google's snippet loads the data into the io.BytesIO() object a chunk at a time. I reckon I should probably use this to write to blob storage a chunk at a time.
I experimented with reading the whole thing into memory, and then uploading using put_block_blob_from_bytes, but I got a memory error (file is probably too big (~600MB).
Any suggestions?
According to the source codes of blobservice.py for Azure Storage and BlobReader for Google Cloud Storage, you can try to use the Azure function blobservice.put_block_blob_from_file to write the stream from the GCS class blobreader has the function read as stream, please see below.
So refering to the code from https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_BlobReader, you can try to do this as below.
from google.appengine.ext import blobstore
from azure.storage.blob import BlobService
blob_key = ...
blob_reader = blobstore.BlobReader(blob_key)
blob_service = BlobService(account_name, account_key)
container_name = ...
blob_name = ...
blobservice.put_block_blob_from_file(container_name, blob_name, blob_reader)
After looking through the SDK source code, something like this could work:
from azure.storage.blob import _chunking
from azure.storage.blob import BlobService
# See _BlobChunkUploader
class PartialChunkUploader(_chunking._BlockBlobChunkUploader):
def __init__(self, blob_service, container_name, blob_name, progress_callback = None):
super(PartialChunkUploader, self).__init__(blob_service, container_name, blob_name, -1, -1, None, False, 5, 1.0, progress_callback, None)
def process_chunk(self, chunk_offset, chunk_data):
'''chunk_offset is the integer offset. chunk_data is an array of bytes.'''
return self._upload_chunk_with_retries(chunk_offset, chunk_data)
blob_service = BlobService(account_name='myaccount', account_key='mykey')
uploader = PartialChunkUploader(blob_service, "container", "foo")
# while (...):
# uploader.process_chunk(...)