Retrieve multiple objects with presigned URL from S3 python

Retrieve multiple objects with presigned URL from S3 python - python

I am trying to get multiple objects from an S3 bucket using python with aws cli installed and configure. I can currently get a single file using this code.
import boto3
url = boto3.client('s3').generate_presigned_url(
ClientMethod='get_object',
Params={'Bucket': 'test-bucket', 'Key':'00001.png'},
ExpiresIn=3600)
print(url)
However I need to generate the same for 100 other image files, how can I possibly do this?

Run the code 100 times -- seriously!
You should separate out the client generation, such as:
s3_client = boto3.client('s3')
url = s3_client.generate_presigned_url(...)
It's a very quick command and doesn't require a call to AWS so you can repeat or loop-through the last line many times.
Each object will require a separate pre-signed URL because permission is being generated for just one object at a time.

Related

CloudCube and Boto3 - list content of the Objects

I am encountering you with the request for help with listing the objects in my CloudCube bucket. I am developing a Django application hosted on Heroku. I am using the CloudCube add-on for persistent storage. CloudCube is running on AWS S3 Bucket and CloudCube provides private Key/Namespace in order to access my files. I use the boto3 library to access the bucket and everything works fine when I want to upload/download the file; however, I am struggling with the attempts to list objects in that particular bucket with the CloudCube prefix key. On any request, I receive AccessDennied Exception.
To access the bucket I use the following implementation:
s3_client = boto3.client('s3', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
endpoint_url=settings.AWS_S3_ENDPOINT_URL, region_name='eu-west-1')
s3_result = s3_client.list_objects(Bucket=settings.AWS_STORAGE_BUCKET_NAME, Prefix=settings.CLOUD_CUBE_KEY)
if 'Contents' not in s3_result:
return []
file_list = []
for key in s3_result['Contents']:
if f"{username}/{mode.value}" in key['Key']:
file_list.append(key['Key'])
As a bucket name, I am using the prefix in the URI that aims to the CloudCube bucket on the AWS according to their documentation: https://BUCKETNAME.s3.amazonaws.com/CUBENAME. CUBENAME is used then as a Prefix Key.
Does anyone have clue what do I miss?
Thank you in advance!

According to CloudCube's documentation, you need a trailing slash on the prefix to list the directory.
So you should update your code like this to make it work:
s3_result = s3_client.list_objects(Bucket=settings.AWS_STORAGE_BUCKET_NAME, Prefix=f'{settings.CLOUD_CUBE_KEY}/')

Using boto3 to filter s3 objects so that caller is not filtering

My s3 bucket has a thousands of files. They are all in the same "folder". So the s3 prefix is the same. I want to use python boto3 to get a list of filenames that contains a certain word. I dont want the boto3 call to send back all the filenames and have the client filter out the names. I seen example from using "yield" and ".filter" but those are receiving all the files and making the client do a lot of work.
To help give a better understanding, if I use the AWS CLI:
aws --profile test s3api list-objects-v2 --bucket mybucket --prefix tmp/ --output json --query "Contents[?contains(Key, 'foobar')]"
BUT I need to send a request using boto3 and AWS just send the filenames back with "foobar" in them.

BUT I need to send a request using boto3 and AWS just send the filenames back with "foobar" in them
You can't do this with regular boto3 s3 API calls as this is not how that API works. So if you don't want to get all the names first, and then filter them out by yourself, then there is no way to achieve what you want with just a single boto3 request.
The only help would maybe be from Amazon S3 inventory. So you could request the inventory, get the resulting CSV file, and filter that. But still you would have to filter it yourself.

How to add custom header when creating bucket using Boto3?

I can create a bucket using these parameters. But none of them is a custom header. It's also said that boto3 will not support it because S3 does not currently allow setting arbitrary headers on buckets or objects.
But in my case. I am using Cloudian as storage. It supports x-gmt-policyid this policy determines how data in the bucket will be distributed and protected through either replication or erasure coding.
Any idea how to inject custom header to boto bucket creation?
s3_resource.create_bucket(Bucket='foo-1')
My last two options:
1) to fork botocore and add this functionality, but I saw they use loaders.py that read everything from json file, and it seems a bit complicated for a beginner.
2) or maybe I need to use pure python implementation using request module to create s3 bucket.
Thanks for suggestions.

My current solution is to fetch S3 compatible cloudian API directly. Signing the request is very complicated, so I use the help of requests-aws4auth library. I tried other libs but failed.
example to create bucket with clodian x-gmt-policyid value:
import requests
from requests_aws4auth import AWS4Auth
endpoint = "http://awesome-bucket.my-s3.net"
auth = AWS4Auth(
"00ac60d1a669fakekey",
"S2/x9sRvb1Jys9n+fakekey",
"eu-west-1",
"s3",
)
headers = {
"x-gmt-policyid": "9f934425b7f5de611c32fakeid",
"x-amz-acl": "public-read",
}
response = requests.put(endpoint, auth=auth, headers=headers)
print(response.text)

Invoke aws sagemaker endpoint

I have some data in S3 and I want to create a lambda function to predict the output with my deployed aws sagemaker endpoint then I put the outputs in S3 again. Is it necessary in this case to create an api gateway like decribed in this link ? and in the lambda function what I have to put. I expect to put (where to find the data, how to invoke the endpoint, where to put the data)
import boto3
import io
import json
import csv
import os
client = boto3.client('s3') #low-level functional API
resource = boto3.resource('s3') #high-level object-oriented API
my_bucket = resource.Bucket('demo-scikit-byo-iris') #subsitute this for your s3 bucket name.
obj = client.get_object(Bucket='demo-scikit-byo-iris', Key='foo.csv')
lines= obj['Body'].read().decode('utf-8').splitlines()
reader = csv.reader(lines)
import io
file = io.StringIO(lines)
# grab environment variables
runtime= boto3.client('runtime.sagemaker')
response = runtime.invoke_endpoint(
EndpointName= 'nilm2',
Body = file.getvalue(),
ContentType='*/*',
Accept = 'Accept')
output = response['Body'].read().decode('utf-8')
my data is a csv file of 2 columns of floats with no headers, the problem is that lines return a list of strings(each row is an element of this list:['11.55,65.23', '55.68,69.56'...]) the invoke work well but the response is also a string: output = '65.23\n,65.23\n,22.56\n,...'
So how to save this output to S3 as a csv file
Thanks

If your Lambda function is scheduled, then you won't need an API Gateway. But if the predict action will be triggered by a user, by an application, for example, you will need.
When you call the invoke endpoint, actually you are calling a SageMaker endpoint, which is not the same as an API Gateway endpoint.
A common architecture with SageMaker is:
API Gateway with receives a request then calls an authorizer, then
invoke your Lambda;
A Lambda with does some parsing in your input data, then calls your SageMaker prediction endpoint, then, handles the result and returns to your application.
By the situation you describe, I can't say if your task is some academic stuff or a production one.
So, how you can save the data as a CSV file from your Lambda?
I believe you can just parse the output, then just upload the file to S3. Here you will parse manually or with a lib, with boto3 you can upload the file. The output of your model depends on your implementation on SageMaker image. So, if you need the response data in another format, maybe you will need to use a custom image. I normally use a custom image, which I can define how I want to handle my data on requests/responses.
In terms of a production task, I certainly recommend you check Batch transform jobs from SageMaker. You can provide an input file (the S3 path) and also a destination file (another S3 path). The SageMaker will run the batch predictions and will persist a file with the results. Also, you won't need to deploy your model to an endpoint, when this job run, will create an instance of your endpoint, download your data to predict, do the predictions, upload the output, and shut down the instance. You only need a trained model.
Here some info about Batch transform jobs:
https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html
I hope it helps, let me know if need more info.
Regards.

Cloudformation wildcard search with boto3

I have been tasked with converting some bash scripting used by my team that performs various cloudformation tasks into Python using the boto3 library. I am currently stuck on one item. I cannot seem to determine how to do a wildcard type search where a cloud formation stack name contains a string.
My bash version using the AWS CLI is as follows:
aws cloudformation --region us-east-1 describe-stacks --query "Stacks[?contains(StackName,'myString')].StackName" --output json > stacks.out
This works on the cli, outputting the results to a json file, but I cannot find any examples online to do a similar search for contains using boto3 with Python. Is it possible?
Thanks!

Yes, it is possible. What you are looking for is the following:
import boto3
# create a boto3 client first
cloudformation = boto3.client('cloudformation', region_name='us-east-1')
# use client to make a particular API call
response = cloudformation.describe_stacks(StackName='myString')
print(response)
# as an aside, you'd need a different client to communicate
# with a different service
# ec2 = boto3.client('ec2', region_name='us-east-1')
# regions = ec2.describe_regions()
where, response is a Python dictionary, which, among other things, will contain the description of the stack, "myString".

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Retrieve multiple objects with presigned URL from S3 python - python

Related

CloudCube and Boto3 - list content of the Objects

Using boto3 to filter s3 objects so that caller is not filtering

How to add custom header when creating bucket using Boto3?

Invoke aws sagemaker endpoint

Cloudformation wildcard search with boto3

Categories

Resources