Amazon boto3 download file from S3 to tempfile - python

Python has the ability to create a tempfile as a context manager. Rather than having to make a directory myself, along with the path, and then cleaning it up when done, it would be better to use this tempfile.
Is there any support in the boto3 client to download, from s3, to a tempfile?

Try the download_fileobj method which accepts a file-like object and requires binary mode, for example:
import boto3
import tempfile
s3 = boto3.client('s3')
with tempfile.TemporaryFile(mode='w+b') as f:
s3.download_fileobj('mybucket', 'mykey', f)

Related

Create a zip file on S3 from CSV files on S3 using Lambda

Around 60 CSV files being generated daily in my S3 bucket. The average size of each file is around 500MB. I want to zip all these files through lambda function on the fly(without downloading a file inside Lambda execution) and upload these zipped files to another s3 bucket. I came across these solutions 1 and 2 but I am still getting issue in the implementation. Right now, I am trying to stream CSV file data into a zipped file(this zip file is being created in Lambda tmp directory) and then uploading on s3. But I am getting this error message while writing into zip file:
[Errno 36] File name too long
This is my test Lambda function where I am just trying with one file but in actual case I need to zip 50-60 CSV files individually:
import boto3
import zipfile
def lambda_handler(event, context):
s3 = boto3.resource('s3')
iterator = s3.Object('bucket-name', 'file-name').get()['Body'].iter_lines()
my_zip = zipfile.ZipFile('/tmp/test.zip', 'w')
for line in iterator:
my_zip.write(line)
s3_resource.meta.client.upload_fileobj(file-name, "another-bucket-name", "object-name")
Also, is there a way where I can stream data from my CSV file, zip it and upload it to another s3 bucket without actually saving a full zip file on Lambda memory?
After lot of research and trials, I am able to make it work. I used smart_open library for my issue and managed to zip 550MB file with just 150MB memory usage in my Lambda. To use external library, I had to use Layers in Lambda. Here is my code:
from smart_open import open, register_compressor
import lzma, os
def lambda_handler(event, context):
with open('s3://bucket-name-where-large-file/file-key-name') as fin:
with open('s3://bucket-name-to-put-zip-file/zip-file-key-name', 'w') as fout:
for line in fin:
fout.write(line)
Please note, smart_open supports .gz and .bz2 file compression. If you want to zip file in other formats, you can create your own compressor using register_compressor method of this library.

Extract 7z files on the fly in S3 with boto3

I have a really large 7z file in s3 bucket say s3://tempbucket1/Test_For7zip.7z that runs into several tens of GB. I do not want to download it, unzip it and re upload it back to s3. I want to use Boto3 to unzip it on the fly and save it into S3.
I tried to solve this using lzma package based on Previous SO answer which dealt with on the fly unzipping of *.zip files using the fileobj option present in gzip.GzipFile.
from io import BytesIO
import gzip
import lzma
import boto3
# setup constants
bucket = 'tempbucket1'
gzipped_key = 'Test_For7zip.7z'
uncompressed_key = 'Test_Unzip7zip'
# initialize s3 client, this is dependent upon your aws config being done
s3 = boto3.client('s3', use_ssl=False)
s3.upload_fileobj( # upload a new obj to s3
Fileobj=lzma.LZMAFile(
BytesIO(s3.get_object(Bucket=bucket,
Key=gzipped_key)['Body'].read()),
'rb'), # read binary
Bucket=bucket, # target bucket, writing to
Key=uncompressed_key) # target key, writing to
However, this thows the following error
LZMAError: Input format not supported by decoder
Is there a python package that provides can decode 7z files based on BytesIO, or is there a better way of achieving this?
I never tried this, but Googling gave me this as a possible solution. Please reach out through this post if this solves your problem.

How to get .stl file from Amazon S3 by using boto3?

I have a Django Web application and i deployed it to Elastic Beanstalk environment. I also have the numpy-stl package. I'm trying to get a .stl file from Amazon S3 bucket and use this file with a stl package's function but i'm getting an error such as 'bytes' object has no attribute 'get_mass_properties'.
My code is;
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket_name, Key=key)
body = obj['Body'].read()
volume, cog, inertia = body.get_mass_properties()
How can i get the .stl file and use it?
Assuming that you are talking about this stl file format, once you read it in into python from S3, you need some python library to open it.
Quick search returns numpy-stl:
Simple library to make working with STL files (and 3D objects in general) fast and easy.
Thus you can install that library and attempt to use it on the file you are downloading.
In case you run your code on lambda (not written in your question?) then you would have to bundle the library with your deployment package or construct custom lambda layer for that.
I have fixed such as below.
import stl
import boto3
import tempfile
s3 = boto3.resource('s3', region_name=region)
bucket = s3.Bucket(bucket)
obj = bucket.Object(uploadedVolume)
tmp = tempfile.NamedTemporaryFile()
with open(tmp.name, 'wb') as f:
obj.download_fileobj(f)
stlMesh = stl.mesh.Mesh.from_file(tmp.name)
volume, cog, inertia = stlMesh.get_mass_properties()

How to invoke gsutil or use path of GCS objects to move data from GCS to s3 bucket using cloud function

I am trying to move files from GCS to s3 bucket using GC Functions (equivalent of AWS Lambda). To achieve it, I've tried 3 different methods. In method 1 I get error and while I do not get error in other 2 options, the files actually don't get copied over.
Can someone please help ?
The two other methods are marked with # and I have tried each one separately.
s3_client.upload_file is not working because it expects a path of the source file and when i provide 'gs://< google_bucket_name>/30327570.pdf', it say
'No such file or directory exists'
gustil command executes correctly without error but no new file gets created in the s3 bucket.
import os
from google.cloud import storage
import boto3
import subprocess
s3_client=boto3.client('s3',aws_access_key_id='XYZ',aws_secret_access_key='ABC')
client = storage.Client()
def hello_gcs(data, context):
bucket = client.get_bucket(data['bucket'])
blob = bucket.blob(data['name'])
#subprocess.call(['gsutil -m rsync -r gs://<google_bucket_name>/30327570.pdf s3://<aws_bucket_name>'], shell=True)
subprocess.call(['gsutil cp gs://<google_bucket_name>/30327570.pdf s3://<aws_bucket_name>'], shell=True)
#s3_client.upload_file('gs://<google_bucket_name>/30327570.pdf','<aws_bucket_name>','30327570.pdf')
if gsutil rsync won't work, you can try with rclone, or invert the process to migrate data from S3 to GCS.
Although this is written in JavaScript, here is a Google Cloud Function to sync files from a GCS bucket to an S3 bucket:
https://github.com/pendo-io/gcs-s3-sync

python to write directly to S3 bucket

I have a python function that prints some stuff. I am gonig to put that to AWS Lambda and would like to print these stuff directly to a file in S3 bucket.
So something to redirect stdout in S3 bucket file.
This is how I call the fuction:
recurse_for_values(top_vault_prefix, top_level_keys)
Almost the same as Thomas L., but with io so that you don't actually write any file locally (everything is in memory).
import io
import tinys3
f=io.StringIO(u"some initial text data")
conn = tinys3.Connection('yourAccessKey', 'yourSecretKey', tls=True)
conn.upload('yourS3Key', f, 'yourBucketName')
f.close()
OR
you could use something like S3FS (https://github.com/s3fs-fuse/s3fs-fuse) to mount your S3 bucket as a disk and then simply redirect the output as you would do for a local disk, but I heavily discourage this option. S3 is definitely not a filesystem and shouldn't be used as such. For example it generates loads of requests to S3 (and thus costs) and may not be that reliable...
You can write your logs to a local file and then synchronise it with AWS S3 at the end of your script.
import tinys3
log_file_object = open(“logfile”, “w”)
log_file_object.write("Some logs...")
log_file_object.write("Some other logs...")
log_file_object.close()
conn = tinys3.Connection('S3_ACCESS_KEY','S3_SECRET_KEY',tls=True)
conn.upload('log_file_DDMMYYYY',f,'my_bucket')
You can also use boto3 to update your file but tinys3 is easier to use for tiny usage.
Hope it can help.

Categories