How to specify S3 subfolder path in Lambda function - python

I am trying to upload a file to a subfolder in S3 in lambda function.
Any suggestion for achieving this task. Currently I am able to upload to only the main S3 bucket folder.
s3_resource.Bucket("bucketname").upload_file("/tmp/file.csv", "file.csv")
However, my goal is to upload to a folder in bucketname/subfolder1/file.csv
Thanks in advance

Amazon has this to say about S3 paths:
In Amazon S3, buckets and objects are the primary resources, and objects are stored in buckets. Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. It does this by using a shared name prefix for objects (that is, objects have names that begin with a common string). Object names are also referred to as key names.
In other words, you just need to specify the path you want to use for the upload, any directory concept only impacts how objects are enumerated and displayed, there isn't a directory you need to make like a traditional filesystem:
s3_resource.Bucket("bucketname").upload_file("/tmp/file.csv", "subfolder1/file.csv")

Related

S3 resource not listing key for some folders

I am trying to delete files (not folder) from multiple folders within an s3 bucket.
my code:
for archive in src.objects.filter(Prefix="Nightly/{folder}"):
s3.Object(BUCKET, archive.key).delete()
when I do this, this deletes only files from some directory (works fine)
but for other 2 folders it deletes the folder itself.
if you see the picture, I am listing the files in each folder.
folders account, user, Archive printing an extra archieve (highlighted)
but folders opportunity and opphistory not printing key for folder. I would like to know why this key is not printing for these 2 folders, thanks.
There are no folders and files in S3. Everything is an object. So Nightly/user/ is an object, just like Nightly/opportunity/opportunity1.txt is an object.
The "folders" are only visual representation made by AWS console:
The console uses the key name prefixes (Development/, Finance/, and Private/) and delimiter ('/') to present a folder structure.
So your "folders account, user, Archive printing an extra archive (highlighted)" are just objects called Nightly/user/, Nightly/account/ and Nightly/Archive/. Such objects are created when you click "New folder" in the AWS console (you can use also AWS SDK or CLI to create them). Your other "files" don't have such folders, because these "files" weren't created like this. Instead they where uploaded to S3 under their full name, e.g. Nightly/opportunity/opportunity1.txt.

How can I copy only changed files from one S3 Bucket to another one with python using boto3?

I want to copy files from one S3 Bucket to another S3 Bucket every x minutes. But of course I only want to update the files if they have changed how can I achieve that with python using boto3?
I would recommend using Amazon S3 replication, which can automatically copy objects from one bucket to another.
You can select which objects to copy by specifying a path or a tag.
It's all automatic.

How can I create/access a file in an other project's GCS bucket in Python?

I have 2 projects (A and B), and with A I want to write files to B's storage.
I have granted write access to A on the bucket of B that I want to write.
I have checked https://cloud.google.com/storage/docs/json_api/v1/buckets/list
As it is mentioned there I was able to get the list, by passing the project number to the client:
req = client.buckets().list(project=PROJECT_NUMBER)
res = req.execute()
...
So far so good.
When I check the API to list a given bucket though, I am stuck.
https://cloud.google.com/storage/docs/json_api/v1/buckets/get
https://cloud.google.com/storage/docs/json_api/v1/objects/insert
These APIs do not expect project number, only bucket name.
How can I make sure that I save my file to the right bucket under project B?
What do I miss?
Thanks for your help in advance!
Bucket names are globally unique, so you don't need to specify a project number when dealing with a specific bucket. If bucket "bucketOfProjectB" belongs to project B, then you simply need to call buckets().get(bucket="bucketOfProjectB").execute(), and it will work even if the user calling it belongs to project A (as long as the caller has the right permissions).
Specifying a project is necessary when listing buckets or creating a new bucket, since buckets all belong to one specific project,but that's the only place you'll need to do it.

How to set a custom Key for files uploaded to appengine using blobstore

Following the appengine Uploading a blob instructions, I have been available to upload/download images.
But I want to found a method for preventing duplicates, therefore I would like to know if it is possible to have a custom key for the blobstore objects, or use the MD5 as the Key, so that at least I could overwrite existing files.
Is there any hook or some extra parameter that I could use within the blobstore.create_upload_url that could help to specify a custom Key for the uploaded object?
Google is moving away from the blobstore. You can also use the Cloudstorage Client Library.
Some of the benefits:
since 1,9.0 free quota in the default GCS bucket
use folders and filenames and you can overwrite (replace) existing files.
create a serving_url for images and other files, which will be served by Google
and more ..
I have created this gist to show how to use GCS in Google App Engine.
Blob keys are guaranteed to be unique. You don't need to do anything for that.
EDIT:
If you want to rewrite the blob, you need to know the key of a blob that you want to update somewhere in your model. If you want, you can also store a hash or any other identifier (i.e. file name) in your model too. Then you can compare a hash of a new file, for example, with the hashes of previously stored files, and decide if you want to delete a duplicate record.

Storing multiple files with the same name in Google Cloud Storage?

So I am trying to port a Python webapp written with Flask to Google App Engine. The app hosts user uploaded files up to 200mb in size, and for non-image files the original name of the file needs to be retained. To prevent filename conflicts, e.g. two people uploading stuff.zip, each containing completely different and unrelated contents, the app creates a UUID folder on the filesystem and stores the file within that, and serves them to users. Google App Engine's Cloud Storage, which I was planning on using to store the user files, by making a bucket - according to their documentation has "no notion of folders". What is the best way to go about getting this same functionality with their system?
The current method, just for demonstration:
# generates a new folder with a shortened UUID name to save files
# other than images to avoid filename conflicts
else:
# if there is a better way of doing this i'm not clever enough
# to figure it out
new_folder_name = shortuuid.uuid()[:9]
os.mkdir(
os.path.join(app.config['FILE_FOLDER'], new_folder_name))
file.save(
os.path.join(os.path.join(app.config['FILE_FOLDER'], new_folder_name), filename))
new_folder_path = os.path.join(
app.config['FILE_FOLDER'], new_folder_name)
return url_for('uploaded_file', new_folder_name=new_folder_name)
From the Google Cloud Storage Client Library Overview documentation:
GCS and "subdirectories"
Google Cloud Storage documentation refers to "subdirectories" and the GCS client library allows you to supply subdirectory delimiters when you create an object. However, GCS does not actually store the objects into any real subdirectory. Instead, the subdirectories are simply part of the object filename. For example, if I have a bucket my_bucket and store the file somewhere/over/the/rainbow.mp3, the file rainbow.mp3 is not really stored in the subdirectory somewhere/over/the/. It is actually a file named somewhere/over/the/rainbow.mp3. Understanding this is important for using listbucket filtering.
While Cloud Storage does not support subdirectories per se, it allows you to use subdirectory delimiters inside filenames. This basically means that the path to your file will still look exactly as if it was inside a subdirectory, even though it is not. This apparently should concern you only when you're iterating over the entire contents of the bucket.
From the Request URIs documentation:
URIs for Standard Requests
For most operations you can use either of the following URLs to access objects:
storage.googleapis.com/<bucket>/<object>
<bucket>.storage.googleapis.com/<object>
This means that the public URL for their example would be http://storage.googleapis.com/my_bucket/somewhere/over/the/rainbow.mp3. Their service would interpret this as bucket=my_bucket and object=somewhere/over/the/rainbow.mp3 (i.e. no notion of subdirectories, just an object name with embedded slashes in it); the browser however will just see the path /my_bucket/somewhere/over/the/rainbow.mp3 and will interpret it as if the filename is rainbow.mp3.

Categories