How to list content from a public Amazon s3 bucket - python

I have configured my bucket to be public, which means everyone can view the bucket as:
http://bucket.s3-website-us-east-1.amazonaws.com/
Now I need to be able to get the list of objects and download it if required.
I found answers on this very helpful in getting me setup on python:
Quick way to list all files in Amazon S3 bucket?
This work fine if I input the access-key and secret-access-key.
The problem though is we might have people accessing the bucket who we don't want to have any keys at all. So if the keys are not provided it gives me 400 Bad Response error.
At first I thought this might be impossible. But extensive search led me to this R-package:
Cloudyr R package
Using this I am able to full the objects without need of the keys:
get_bucket(bucket = 'bucket')
in R but the functionalities are limited in listing/downloading the files. Any ideas how I go about doing this in boto?

The default S3 policy is all deny, so you need to set permission policy to it:
choose your bucket and click property
add more permissions grantee everyone can list
why am I able to edit this without logging in?

I think what you need is Bucket Policy which will allow for anonymous to read objects stored.
Granting Read-Only Permission to an Anonymous User should help.

Related

Google Cloud Storage - How to retrieve a file's owner via Python?

I'm trying to access to a blob uploaded on a bucket of Google Cloud Storage via Python official client (google-cloud-storage).
I'm managing into retrieving the owner of the file (the one who uploaded it), and I'm not finding something useful on internet.
I've tried using the client with something like:
client(project='project').get_bucket('bucket').get_blob('blob')
But the blob properties like "owner" are empty!
So I tried using a Cloud Function and accessing to event and context.
In the Google documentation (https://cloud.google.com/storage/docs/json_api/v1/objects#resource) it is reported the structure of an event and it seems to have the owner propriety. But when I print or try to access it I obtain an error because it is not set.
Can someone help me? I just need to have the user email... thanks in advance!
EDIT:
It doesn't seem to be a permission error, because I obtain the correct results testing the API from the Google site: https://cloud.google.com/storage/docs/json_api/v1/objects/get?apix_params=%7B%22bucket%22%3A%22w3-dp-prod-technical-accounts%22%2C%22object%22%3A%22datagovernance.pw%22%2C%22projection%22%3A%22full%22%7D
By default, owner and ACL are not fetched by get_blob. You will have to explicitly fetch this info:
blob = client(project='project').get_bucket('bucket').get_blob('blob')
blob.reload(projection='full')
Note that if you use uniform bucket-level ACLs owner doesn't have any meaning and will be unset even with the above change.
EDIT: this is actually not the most efficient option because it makes an extra unnecessary call to GCS. The most efficient option is:
blob = Blob('bucket', 'blob')
blob.reload(projection='full', client=client)

Google drive API : How to find all files I shared with others

I am using drive api v3 to look for files I hare shared with others (anyone),to list them & potentially cancel sharing them.
I know that in the search box you can do a 'to:' and it will retrieve these files, but I could not use such thing on the API.
my current tri ;
query="'me' in owners and trashed=false and not mimeType = 'application/vnd.google-apps.folder' and visibility != 'limited'"
Thanks in advance
As tanaike said, a workaround that solves the problem is by looping through the files using files.list() function, and including id, owners, permissions in the fields. This is going to return a list of objects, and from there we can check if type is anyone.
From there, we can also check for attributes like shared:true & ownedByMe:true.
This is just a workaround, and surely not the best solution, since with Drive search, we can do all this by typing to:, which lists all owr shared files. I hope we get an API for this.
Thanks again tanaike

Unable to get object metadata from S3. Check object key, region and/or access permissions in aws Rekognition

import boto3
if __name__ == "__main__":
bucket='MyBucketName'
sourceFile='pic1.jpg'
targetFile='pic2.jpg'
client=boto3.client('rekognition','us-east-1')
response=client.compare_faces(SimilarityThreshold=70,
SourceImage={'S3Object':{'Bucket':bucket,'Name':sourceFile}},
TargetImage={'S3Object':{'Bucket':bucket,'Name':targetFile}})
for faceMatch in response['FaceMatches']:
position = faceMatch['Face']['BoundingBox']
confidence = str(faceMatch['Face']['Confidence'])
print('The face at ' +
str(position['Left']) + ' ' +
str(position['Top']) +
' matches with ' + confidence + '% confidence')
I am trying to compare two images present in my bucket but no matter which region i select i always get the following error:-
botocore.errorfactory.InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the CompareFaces operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
My bucket's region is us-east-1 and I have configured the same in my code.
what am I doing wrong?
I had the same problem. What I did to fix it was to rearrange my bucket and the folders. Make sure that your image is directly in your bucket and not in a folder in your bucket. Also double check that the name of the images are correct and that everything is on point.
Check if the S3 and Image Rekognition is in the same region, I know, it's not nice or documented (I guess), but this guys are talking about it here and here
Ensure bucket region is same as calling region. If you are using AWS CLI then make sure to include profile with appropriate region.
I faced a similar problem like
botocore.errorfactory.InvalidS3ObjectException: An error occurred
(InvalidS3ObjectException) when calling the CompareFaces operation: Unable to > get object metadata from S3. Check object key, region and/or access
permissions
It may be due to wrong AWS region, key or Permission was not given properly.
In my case the wrong region was set as an environment variable.
It happened to me using the AWS rekognition sdk for android , the problem was that the region of the S3 bucket is not the same in my request , so I had to put the correct region in the request (same as S3 bucket ) :
rekognitionClient.setRegion(Region.getRegion(Regions.US_WEST_1));//replace with your S3 region
It seems to me that you dont have enough permissions with that access_key and secret_key! If the credentials are of an IAM user, make sure the IAM user has permission to perform Rekognition compare_faces read operations and s3 read operations! Also check if your s3 source and target object key are correct.
And it is better to create roles with required permissions and assume that role to request temporary security credentials instead of using the permanent access keys.
For me the problem was the name of the file in s3 Bucket containing Spaces. So you have to make sure the key doesn't contain spaces while storing itself.
Ran into similar issue, and figured out it is due to having space in any of the folder name.
Please ensure the AWS environment variable configuration AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY in your script before compile
Also ran into this issue, noticed my IAM role had the bucketname as the resource, i had to add a slash and a wildcard to the end. changed it to "Resource": "arn:aws:s3:::/*"
For me changing the file permissions in the S3 bucket worked.
I had the same error: I checked and found out that the image was present in some subfolder of the bucket. Make sure that the image is in the root bucket.
In my case I had the path to my object prefixed with a slash (/). Removing it did the trick.
Although this is a very old question, but I also had the same issue. But In my case, I was using the Lambda and my Lambda role didn't had the access to S3, so if you are doing it through Lambda, you need to provide the S3 access to it in addition to Rekognition.
Same error mesagge but using Textract functions, no problem with permissions, but my files in s3 containing special caracters, once I renamed the files there was no problem.

How can I create/access a file in an other project's GCS bucket in Python?

I have 2 projects (A and B), and with A I want to write files to B's storage.
I have granted write access to A on the bucket of B that I want to write.
I have checked https://cloud.google.com/storage/docs/json_api/v1/buckets/list
As it is mentioned there I was able to get the list, by passing the project number to the client:
req = client.buckets().list(project=PROJECT_NUMBER)
res = req.execute()
...
So far so good.
When I check the API to list a given bucket though, I am stuck.
https://cloud.google.com/storage/docs/json_api/v1/buckets/get
https://cloud.google.com/storage/docs/json_api/v1/objects/insert
These APIs do not expect project number, only bucket name.
How can I make sure that I save my file to the right bucket under project B?
What do I miss?
Thanks for your help in advance!
Bucket names are globally unique, so you don't need to specify a project number when dealing with a specific bucket. If bucket "bucketOfProjectB" belongs to project B, then you simply need to call buckets().get(bucket="bucketOfProjectB").execute(), and it will work even if the user calling it belongs to project A (as long as the caller has the right permissions).
Specifying a project is necessary when listing buckets or creating a new bucket, since buckets all belong to one specific project,but that's the only place you'll need to do it.

Got error when trying to specific Canned ACL

I am trying to use boto api to upload photos to Amazon S3. I can successfully upload photos there if I haven't specified the Canned ACL.
But if I specified ACL as follow. I got the following error.
mp = self._bucket.initiate_multipart_upload(name)
pool = Pool(processes=self.NUM_PARALLEL_PROCESSES)
pool.apply_async(mp.upload_part_from_file(fp=buffer, part_num=counter, headers=headers, policy='public-read'))
Error as follow.
<Error><Code>InvalidArgument</Code><Message>The specified header is not valid in this context</Message><ArgumentValue>public-read</ArgumentValue><ArgumentName>x-amz-acl</ArgumentName><RequestId>xxx</RequestId><HostId>xxx</HostId></Error>
I tried for a long time but still cannot get any hints. Anyone knows why?
Thanks!
The upload_part_from_file method should not have a policy parameter. This is a bug in boto. To assign a policy to a multipart file, you specify the canned policy as the policy parameter on the initiate_multipart_upload call and then upload the parts and complete the upload. Don't try to pass the policy when uploading the individual parts. We should create an issue on github for boto to remove the policy parameter. It's confusing and doesn't work.

Categories