python to write directly to S3 bucket - python

I have a python function that prints some stuff. I am gonig to put that to AWS Lambda and would like to print these stuff directly to a file in S3 bucket.
So something to redirect stdout in S3 bucket file.
This is how I call the fuction:
recurse_for_values(top_vault_prefix, top_level_keys)

Almost the same as Thomas L., but with io so that you don't actually write any file locally (everything is in memory).
import io
import tinys3
f=io.StringIO(u"some initial text data")
conn = tinys3.Connection('yourAccessKey', 'yourSecretKey', tls=True)
conn.upload('yourS3Key', f, 'yourBucketName')
f.close()
OR
you could use something like S3FS (https://github.com/s3fs-fuse/s3fs-fuse) to mount your S3 bucket as a disk and then simply redirect the output as you would do for a local disk, but I heavily discourage this option. S3 is definitely not a filesystem and shouldn't be used as such. For example it generates loads of requests to S3 (and thus costs) and may not be that reliable...

You can write your logs to a local file and then synchronise it with AWS S3 at the end of your script.
import tinys3
log_file_object = open(“logfile”, “w”)
log_file_object.write("Some logs...")
log_file_object.write("Some other logs...")
log_file_object.close()
conn = tinys3.Connection('S3_ACCESS_KEY','S3_SECRET_KEY',tls=True)
conn.upload('log_file_DDMMYYYY',f,'my_bucket')
You can also use boto3 to update your file but tinys3 is easier to use for tiny usage.
Hope it can help.

Related

Extract 7z files on the fly in S3 with boto3

I have a really large 7z file in s3 bucket say s3://tempbucket1/Test_For7zip.7z that runs into several tens of GB. I do not want to download it, unzip it and re upload it back to s3. I want to use Boto3 to unzip it on the fly and save it into S3.
I tried to solve this using lzma package based on Previous SO answer which dealt with on the fly unzipping of *.zip files using the fileobj option present in gzip.GzipFile.
from io import BytesIO
import gzip
import lzma
import boto3
# setup constants
bucket = 'tempbucket1'
gzipped_key = 'Test_For7zip.7z'
uncompressed_key = 'Test_Unzip7zip'
# initialize s3 client, this is dependent upon your aws config being done
s3 = boto3.client('s3', use_ssl=False)
s3.upload_fileobj( # upload a new obj to s3
Fileobj=lzma.LZMAFile(
BytesIO(s3.get_object(Bucket=bucket,
Key=gzipped_key)['Body'].read()),
'rb'), # read binary
Bucket=bucket, # target bucket, writing to
Key=uncompressed_key) # target key, writing to
However, this thows the following error
LZMAError: Input format not supported by decoder
Is there a python package that provides can decode 7z files based on BytesIO, or is there a better way of achieving this?
I never tried this, but Googling gave me this as a possible solution. Please reach out through this post if this solves your problem.

How to get .stl file from Amazon S3 by using boto3?

I have a Django Web application and i deployed it to Elastic Beanstalk environment. I also have the numpy-stl package. I'm trying to get a .stl file from Amazon S3 bucket and use this file with a stl package's function but i'm getting an error such as 'bytes' object has no attribute 'get_mass_properties'.
My code is;
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket_name, Key=key)
body = obj['Body'].read()
volume, cog, inertia = body.get_mass_properties()
How can i get the .stl file and use it?
Assuming that you are talking about this stl file format, once you read it in into python from S3, you need some python library to open it.
Quick search returns numpy-stl:
Simple library to make working with STL files (and 3D objects in general) fast and easy.
Thus you can install that library and attempt to use it on the file you are downloading.
In case you run your code on lambda (not written in your question?) then you would have to bundle the library with your deployment package or construct custom lambda layer for that.
I have fixed such as below.
import stl
import boto3
import tempfile
s3 = boto3.resource('s3', region_name=region)
bucket = s3.Bucket(bucket)
obj = bucket.Object(uploadedVolume)
tmp = tempfile.NamedTemporaryFile()
with open(tmp.name, 'wb') as f:
obj.download_fileobj(f)
stlMesh = stl.mesh.Mesh.from_file(tmp.name)
volume, cog, inertia = stlMesh.get_mass_properties()

Amazon boto3 download file from S3 to tempfile

Python has the ability to create a tempfile as a context manager. Rather than having to make a directory myself, along with the path, and then cleaning it up when done, it would be better to use this tempfile.
Is there any support in the boto3 client to download, from s3, to a tempfile?
Try the download_fileobj method which accepts a file-like object and requires binary mode, for example:
import boto3
import tempfile
s3 = boto3.client('s3')
with tempfile.TemporaryFile(mode='w+b') as f:
s3.download_fileobj('mybucket', 'mykey', f)

Explicitly, how can I get s3fs to open a gz file in AWS? Glue/S3

Using AWS glue and AWS s3 and s3fs, I've come up with the following (among other attempts). I even see the examples at https://s3fs.readthedocs.io/en/latest/ but they're not getting me there.
key = 'https://s3.console.aws.amazon.com/s3/buckets/datalake-sbox-landing-55555-us-weast/cloudwatch_logs/prod-coarsair/dt%267D2019-09-11/144221-3837-462d-a2e6-ba300/coarsair-coarsair-144221-3837-462d-a2e6-ba300/000000.gz'
s3 = s3fs.S3FileSystem()
with s3.open(key, 'r') as f:
args_gz_file = f
Then it should be as easy as typing "args_gz_file" now, right? Nope. Where am I going wrong?!
The samples show:
with fs.open('my-bucket/my-file.txt', 'rb') as f:
This suggests that the first parameter is BUCKET-NAME/KEY.
However, your code sample shows a URL (https://s3.console.aws.amazon.com/s3/buckets/) instead of a bucket name.
Perhaps try:
key = 'datalake-sbox-landing-55555-us-weast/cloudwatch_logs/prod-coarsair/dt%267D2019-09-11/144221-3837-462d-a2e6-ba300/coarsair-coarsair-144221-3837-462d-a2e6-ba300/000000.gz'
Side-note: It is recommended to use the official AWS SDK or the AWS CLI to access Amazon S3, rather than using s3fs. Amazon S3 is an object storage service, not a filesystem.

How to invoke gsutil or use path of GCS objects to move data from GCS to s3 bucket using cloud function

I am trying to move files from GCS to s3 bucket using GC Functions (equivalent of AWS Lambda). To achieve it, I've tried 3 different methods. In method 1 I get error and while I do not get error in other 2 options, the files actually don't get copied over.
Can someone please help ?
The two other methods are marked with # and I have tried each one separately.
s3_client.upload_file is not working because it expects a path of the source file and when i provide 'gs://< google_bucket_name>/30327570.pdf', it say
'No such file or directory exists'
gustil command executes correctly without error but no new file gets created in the s3 bucket.
import os
from google.cloud import storage
import boto3
import subprocess
s3_client=boto3.client('s3',aws_access_key_id='XYZ',aws_secret_access_key='ABC')
client = storage.Client()
def hello_gcs(data, context):
bucket = client.get_bucket(data['bucket'])
blob = bucket.blob(data['name'])
#subprocess.call(['gsutil -m rsync -r gs://<google_bucket_name>/30327570.pdf s3://<aws_bucket_name>'], shell=True)
subprocess.call(['gsutil cp gs://<google_bucket_name>/30327570.pdf s3://<aws_bucket_name>'], shell=True)
#s3_client.upload_file('gs://<google_bucket_name>/30327570.pdf','<aws_bucket_name>','30327570.pdf')
if gsutil rsync won't work, you can try with rclone, or invert the process to migrate data from S3 to GCS.
Although this is written in JavaScript, here is a Google Cloud Function to sync files from a GCS bucket to an S3 bucket:
https://github.com/pendo-io/gcs-s3-sync

Categories