How to get contents of an AWS S3 bucket programmatically in Python

How to get contents of an AWS S3 bucket programmatically in Python - python

I can run command aws --profile minio s3 ls s3://aa/bb/ in a terminal successfully to get the contents of that particular bucket on minio, but when I run below code in Python, it returns an empty string.
import os
stream = os.popen('aws --profile minio s3 ls s3://aa/bb/')
stream.read()
And when I change the second line so that I query the contents of a local folder instead, like stream = os.popen('ls /Users/cc/'), the contents of that local folder are printed successfully as well.
When I execute the first command using os.system('aws --profile minio s3 ls s3://aa/bb/'), it returns 256 as the output.
So how to access the contents of a minio bucket programmatically in Python?

With the caveat that I haven't used minio, here's how I'd use boto3 (the AWS python sdk) in a python script to do what your CLI command does:
import boto3
session = boto3.session.Session(profile_name='minio')
client = session.client('s3')
response = client.list_objects_v2(
Bucket='aa',
Prefix='bb',
)
for item in response['Contents']:
print(item['Key'])
boto3 on GitHub
boto3 docs

Related

Unable to locate credentials in boto3 AWS

I'm trying to view S3 bucket list through a python scripts using boto3. Credential file and config file is available in the C:\Users\user1.aws location. Secret access and access key available there for user "vscode". But unable to run the script which return exception message as
"botocore.exceptions.NoCredentialsError: Unable to locate credentials".
Code sample follows,
import boto3
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
Do I need to specify user mentioned above ("vscode") ?
Copied the credential and config file to folder of python script is running. But same exception occurs.

When I got this error, I replaced resource with client and also added the secrets during initialization:
client = boto3.client('s3', region_name=settings.AWS_REGION, aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)

You can try with boto3.client('s3') instead of boto3.resource('s3')

Python-Boto3 to S3 limitation

Im new to Python and for my project purpose and Im using using boto3 to access AWS S3 in a pycharm IDE
I completed package installation for boto3 ,pyboto then created a Python file and successfully created bucket and transferred the files to S3 from my local using boto3
Later i created another python file in the same working directory and using the same steps but this time Im not able to connect AWS and not even API calls Im getting
So am doubtful that whether we can use boto3 packages with only one python file and we cant use it another python file in same directory?
I tried by creating both s3 client and s3 resource but no luck
Please advice is there any limitations is there for boto3 ?
Below are the Python code:-
import boto3
import OS
bucket_name='*****'
def s3_client():
s3=boto3.client('s3')
""":type:pyboto3:s3"""
return s3
def s3_resource():
s3=boto3.resource('s3')
return s3
def create_bucket(bucket_name):
val=s3_client().create_bucket(=bucket_name,
CreateBucketConfiguration={
'LocationConstraint':'ap-south-1'
})
return val
def upload_file():
s3=s3_resource().meta.client.upload_file('d:/s3_load2.csv',bucket_name,'snowflake.csv')
return s3
def upload_small_file():
s3=s3_client().upload_file('d:/s3_load2.csv',bucket_name,'snowflake.csv')
return s3
def create_bucket(bucket_name):
val=s3_client().create_bucket(
Bucket=bucket_name,
CreateBucketConfiguration={
'LocationConstraint':'ap-south-1'
})
return val
#calling
upload_small_file()

Perhaps the AWS credentials weren't set in the environment where you run the 2nd script. Or maybe the credentials you were using while running the 1st script already expired. Try getting your AWS credentials and set them when you instantiate a boto3 client or resource as documented:
import boto3
client = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
aws_session_token=SESSION_TOKEN # This is only required for temporary credentials
)
Or you can also try setting them as environment variables.
export AWS_ACCESS_KEY_ID="some key"
export AWS_SECRET_ACCESS_KEY="some key"
export AWS_SESSION_TOKEN="some token" # This is only required for temporary credentials
Or as a configuration file. See the docs for the complete list.

Cannot Access Subfolder of S3 bucket – Python, Boto3

I have been given access to a subfolder of an S3 bucket, and want to access all files inside using Python and boto3. I am new to S3 and have read the docs to death, but haven't been able to figure out how to successfully access just one subfolder. I understand that s3 does not use unix-like directory structure, but I don't have access to the root bucket.
How can I configure boto3 to just connect to this subfolder?
I have successfully used this AWS CLI command to download the entire subfolder to my machine:
aws s3 cp --recursive s3://s3-bucket-name/SUB_FOLDER/ /Local/Path/Where/Files/Download/To --profile my-profile
This code:
AWS_BUCKET='s3-bucket-name'
s3 = boto3.client("s3", region_name='us-east-1', aws_access_key_id=AWS_KEY_ID, aws_secret_access_key=AWS_SECRET)
response = s3.list_objects(Bucket=AWS_BUCKET)
Returns this error:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied
I have also tried specifying the 'prefix' option in the call to list_objects, but this produces the same error.

You want to aws configure and save have your credentials and region then using boto3 is simple and easy.
Use boto3.resource and get the client like this:
s3_resource = boto3.resource('s3')
s3_client = s3_resource.meta.client
s3_client.list_objects(Bucket=AWS_BUCKET)
You should be good to go.

Python S3 Amazon Code with 'Access Denied' Error

I am trying to download a specific S3 file off a server using Python Boto and am getting "403 Forbidden" and "Access Denied" error messages. It says the error is occurring at line 24 (get_contents command). I have tried it with and without the "aws s3 cp" at the start of the source file path, received the same error message both time. My code is below, any advice would be helpful.
# Code to append csv:
import csv
import boto
from boto.s3.key import Key
keyId ="key"
sKeyId="secretkey"
srcFileName="aws s3 cp s3://...."
destFileName="C:\\Users...."
bucketName="bucket00001"
conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName, validate = False)
#Get the Key object of the given key, in the bucket
k = Key(bucket, srcFileName)
#Get the contents of the key into a file
k.get_contents_to_filename(destFileName)

AWS is very vague with the errors that it outputs. This is intentional, but it definitely doesn't help with debugging. You are receiving an Access Denied error because the source file name you are using is not the correct path for the file.
aws s3 cp
This is the CLI command to copy one or more files from a source to a destination (with either being an s3 bucket). This should not be apart of the source file name.
s3://...
This prefix is appended to your bucket name that denotes that the path refers to an s3 object, however, this is not necessary in your source file path name when using boto3.
To download an s3 file using boto3, perform the following:
import boto3
BUCKET_NAME = 'my-bucket' # does not include s3://
KEY = 'image.jpg' # the file you want to download
s3 = boto3.resource('s3')
s3.Bucket(BUCKET_NAME).download_file(KEY, 'image.jpg')
Documentation for this command can be found here:
https://boto3.readthedocs.io/en/latest/guide/s3-example-download-file.html
In general, boto3 (and any other AWS SDK's) are simply wrappers around AWS api requests. You can also use the aws cli like I mentioned earlier to download a file from s3. That command would be:
aws s3 cp s3://my-bucket/my-file.jpg C:\location\my-file.jpg

srcFileName="aws s3 cp s3://...."
This has to be a key like somefolder/somekey or somekey as string.
You are providing a path or command to it.

How to recursively list files in AWS S3 bucket using AWS SDK for Python?

I am trying to replicate the AWS CLI ls command to recursively list files in an AWS S3 bucket. For example, I would use the following command to recursively list all of the files in the "location2" bucket.
aws s3 ls s3://location2 --recursive
What is the AWS SDK for Python (i.e. boto3) equivalent of aws s3 ls s3://location2 --recursive?

You'd need to use paginators:
import boto3
client = boto3.client("s3")
bucket = "my-bucket"
paginator = client.get_paginator('list_objects')
page_iterator = paginator.paginate(Bucket=bucket)
for page in page_iterator:
for obj in page['Contents']:
print(f"s3://{bucket}/{obj["Key"]}")

There is no need to use the --recursive option while using the AWS SDK as it lists all the objects in the bucket using the list_objects method.
import boto3
client = boto3.client('s3')
client.list_objects(Bucket='MyBucket')

Using the higher level API and use resources is the way to go.
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('location2')
bucket_files = [x.key for x in bucket.objects.all()]

You can also use minio-py client library, its open source & compatible with AWS S3.
list_objects.py example below, you can refer to the docs for additional information.
from minio import Minio
client = Minio('s3.amazonaws.com',
access_key='YOUR-ACCESSKEYID',
secret_key='YOUR-SECRETACCESSKEY')
# List all object paths in bucket that begin with my-prefixname.
objects = client.list_objects('my-bucketname', prefix='my-prefixname',
recursive=True)
for obj in objects:
print(obj.bucket_name, obj.object_name.encode('utf-8'), obj.last_modified,
obj.etag, obj.size, obj.content_type)
Hope it helps.
Disclaimer: I work for Minio

aws s3 ls s3://logs/access/20230104/14/ --recursive
To list all files complete path along with error handling
s3_client = boto3.client('s3')
paginator = s3_client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket="logs", Prefix="access/20230104/14/")
for page in pages:
try:
for obj in page['Contents']:
print(obj['Key'])
except KeyError:
print("No files exist")
exit(1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get contents of an AWS S3 bucket programmatically in Python - python

Related

Unable to locate credentials in boto3 AWS

Python-Boto3 to S3 limitation

Cannot Access Subfolder of S3 bucket – Python, Boto3

Python S3 Amazon Code with 'Access Denied' Error

How to recursively list files in AWS S3 bucket using AWS SDK for Python?

Categories

Resources