How to upload HDF5 file directly to S3 bucket in Python - python

I want to upload a HDF5 file created with h5py to S3 bucket without saving locally using boto3.
This solution uses pickle.dumps and pickle.loads and other solutions I have found, store the file locally which I like to avoid.

You can use io.BytesIO() to and put_object as illustrated here 6. Hope this helps. Even in this case, you'd have to 'store' the data locally(though 'in memory'). You could also create a tempfile.TemporaryFile and then upload your file with put_object. I don't think you can stream to an S3 Buckets in the sense that the local data would be discarded as it is uploaded to the Bucket.

Related

How to upload an image to MongoDB using an S3 bucket and Boto3 in Python

I'm working on a Python application where the desired functionality is that the webcam is used to take in a live video feed and based on whether a condition is true, an image is clicked and uploaded to a database.
The database I am using is MongoDB. As far as I can understand, uploading images straight-up to a database is not the correct method. So, what I wanted to do is the following:
an image is clicked from the webcam
the image is uploaded to an S3 bucket (from the same Python script, so using boto3 perhaps)
a URL of the uploaded image is retrieved (this seems to be the tricky part)
and then this URL along with some other details is uploaded to the database. (this is the easy part)
My ideal workflow would be a way to take that image and upload it to an S3 bucket, retrieve the URL and then upload this URL to the database all in one .py script.
My question is: how do I upload an image to an S3 bucket and then retrieve its public URL all through boto3 in a Python script?
I also welcome any suggestions for a better approach/strategy for storing images into MongoDB. I saw on some pages that GridFS could be a good method but that it is not recommended for the image uploads happening frequently (and really that using AWS is the more preferable way).
The URL of an S3 object can be construed if you know the S3 bucket and name:
https://{bucket}.s3.{region}.amazonaws.com/{key}
Using boto3 will be the easiest way to upload a file if you're using Python anyway.
See another answer of mine on different ways how to upload files here: https://stackoverflow.com/a/67108609/13245310
You don't need to 'retrieve' the public url, you get to specify the bucket and name of the s3 object when you upload it, so you already have the information you need to know what the public url will be once uploaded, its not like s3 assigns a new unique name to your object once uploaded.

How can I copy only changed files from one S3 Bucket to another one with python using boto3?

I want to copy files from one S3 Bucket to another S3 Bucket every x minutes. But of course I only want to update the files if they have changed how can I achieve that with python using boto3?
I would recommend using Amazon S3 replication, which can automatically copy objects from one bucket to another.
You can select which objects to copy by specifying a path or a tag.
It's all automatic.

Write a list directly to gcs file

Currently i have a simple python code which writes elements of a list to file. How can i do the same, but write to file in Google cloud storage
Current code :
with open('/home/nitin/temp.txt', 'w') as f:
for item in ["Nitin", "Agarwal"]:
f.write(item[0]+'\n')
I tend to find blob's "upload_from_string" method preferable when dumping some in-memory data to GSC, instead of dumping to a local file and uploading it to GSC.
Would this work for you?
from google.cloud.storage import Blob
data = ["Nitin", "Agarwal"]
client = storage.Client(project="my-project")
bucket = client.get_bucket("my-bucket")
blob = Blob("data", bucket)
blob.upload_from_string("\n".join(data))
https://googleapis.dev/python/storage/latest/blobs.html#google.cloud.storage.blob.Blob.upload_from_string
Google Cloud Storage (GCS) is also known as blob storage as opposed to file storage. Don't think of GCS as holding files, think of it as holding blobs of data. This means that you can't use file system APIs. Instead, think of building the blob of data that you want to write into GCS locally and then writing that complete blob into GCS as a unit. When you write data into GCS, it is immutable ... this means that once written, you can't change it (you can delete it and rewrite a new copy). This means that you can't append to a blob.
Here is a good example:
Using Cloud Storage with Python

Uploading a file to GCS in GAE (Python)

Im uploading files to my buckets in GCS thru GAE
upload_url = blobstore.create_upload_url('/upload', gs_bucket_name='my_bucket')
as described in the documentation and this question
Everything works fine, but, when I try to read the contents I find that the filename is being changed to a key value such as:
L2FwcGhvc3RpbmdfcHJvZC9ibG9icy9BRW5CMlVwOW93MmJzVWRyZ2RQSHJpMlNhMkZNUkloYm9xcnZnZlFzNEZCYnpWaGNENGkROOFk5b2pHSHBMcDIwcGVrVFZtYzdROHRDRWFpdy50YTNpMFdpNmNCQU9NU0xt
Is there any way to get the uploaded name of the file?
Thanks in advance
see: https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage
"In your upload handler, you need to process the returned FileInfo metadata and explicitly store the GCS filename needed to retrieve the blob later."
More on FileInfo: https://cloud.google.com/appengine/docs/python/blobstore/fileinfoclass
and I think this question is similar to How to get FileInfo/gcs file_name for files uploaded via BlobstoreUploadHandler?

How to upload a zip file to s3 using boto?

I zip a folder having multiple subdirectories. When I upload it to s3 using boto
By reading like this,
zipdata = open(os.path.join(os.curdir, zip_file), 'rb').read()
Then all files from all subdirectries copied to root directory. That is no subdirectory exists at s3.
How to upload a zip file of a folder to s3?
After running the command you show above, zip_data will contain the bytes contained in the zip file. If you then write that data to S3, you will get a single object (key) in S3 that contains that data. Is that what you want?
It sounds like you want the zip file to be expanded and all of the individual files and directories inside it to be stored in S3 as individual objects. If that is the case, you need to expand the zip file locally and then walk through the hierarchy and store each individual file in S3. You could use the s3put command line tool in boto to do this for you.
There is no way to get S3 itself to unpack the contents of a zip file for you automatically.

Categories