I am sucessfully downloading an image file to my local computer from my S3 bucket using the following:
import os
import boto3
import botocore
files = ['images/dog_picture.png']
bucket = 'animals'
s3 = boto3.resource('s3')
for file in files:
s3.Bucket(bucket).download_file(file, os.path.basename(file))
However, when I try to specify the directory to which the image should be saved on my local machine as is done in the docs:
s3.Bucket(bucket).download_file(file, os.path.basename(file), '/home/user/storage/new_image.png')
I get:
ValueError: Invalid extra_args key '/home/user/storage/new_image.png', must be one of: VersionId, SSECustomerAlgorithm, SSECustomerKey, SSECustomerKeyMD5, RequestPayer
I must be doing something wrong but I'm following the example in the docs. Can someone help me specify a local directory?
Looking into the docs, you're providing an extra parameter
import boto3
s3 = boto3.resource('s3')
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')
From the docs, hello.txt is the name of the object on the bucket and /tmp/hello.txt is the path on your device, so the correct way would be
s3.Bucket(bucket).download_file(file, '/home/user/storage/new_image.png')
Related
I am using the below code and referred to many SO answers for listing files under a folder using boto3 and python but was unable to do so. Below is my code:
s3 = boto3.client('s3')
object_listing = s3.list_objects_v2(Bucket='maxValue',
Prefix='madl-temp/')
My s3 path is "s3://madl-temp/maxValue/" where I want to find if there are any parquet files under the maxValue bucket based on which I have to do something like below:
If len(maxValue)>0:
maxValue=true
else:
maxValue=false
I am running it via Glue jobs and I am getting the below error:
botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
Your bucket name is madl-temp and prefix is maxValue. But in boto3, you have the opposite. So it should be:
s3 = boto3.client('s3')
object_listing = s3.list_objects_v2(Bucket='madl-temp',
Prefix='maxValue/')
To get the number of files you have to do:
len(object_listing['Contents']) - 1
where -1 accounts for a prefix maxValue/.
Can't seem to figure out how to translate what I can do with the cli to boto3 python.
I can run this fine:
aws s3 ls s3://bucket-name-format/folder1/folder2/
aws s3 cp s3://bucket-name-format/folder1/folder2/myfile.csv.gz
Trying to do this with boto3:
import boto3
s3 = boto3.client('s3', region_name='us-east-1', aws_access_key_id=KEY_ID, aws_secret_access_key=ACCESS_KEY)
bucket_name = "bucket-name-format"
bucket_dir = "/folder1/folder2/"
bucket = '{0}{1}'.format(bucket_name,bucket_dir)
filename = 'myfile.csv.gz'
s3.download_file(Filename=final_name,Bucket=bucket,Key=filename)
I get this error :
invalid bucket name "bucket-name-format/folder1/folder2/": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).:(s3|s3-object-lambda):[a-z-0-9]:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-.]{1,63}$|^arn:(aws).:s3-outposts:[a-z-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9-]{1,63}$"*
I know the error is because the bucket name "bucket-name-format/folder1/folder2/" is indeed invalid.
Question: how do I add the path? All the examples Ive seen just list the base bucket name
Take the following command:
aws s3 cp s3://bucket-name-format/folder1/folder2/myfile.csv.gz
That S3 URI can be broken down into
Bucket Name: bucket-name-format
Object Prefix: folder1/folder2/
Object Suffix: myfile.csv.gz
Really the prefix and suffix are a bit artificial, the object name is really folder1/folder2/myfile.csv.gz
This means to download the same object with the boto3 API, you want to call it with something like:
bucket_name = "bucket-name-format"
bucket_dir = "folder1/folder2/"
filename = 'myfile.csv.gz'
s3.download_file(Filename=final_name,Bucket=bucket_name,Key=bucket_dir + filename)
Note that the argument to download_file for the Bucket is just the bucket name, and the Key does not start with a forward slash.
I would like to write a test to mock the download of a function from s3 and replace it locally with an actual file that exists of my machine. I took inspiration from this post. The idea is the following:
from moto import mock_s3
import boto3
def dl(src_f, dest_f):
s3 = boto3.resource('s3')
s3.Bucket('fake_bucket').download_file(src_f, dest_f)
#mock_s3
def _create_and_mock_bucket():
# Create fake bucket and mock it
bucket = "fake_bucket"
# We need to create the bucket since this is all in Moto's 'virtual' AWS account
file_path = "some_real_file.txt"
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket)
s3.put_object(Bucket=bucket, Key=file_path, Body="")
dl(file_path, 'some_other_real_file.txt')
_create_and_mock_bucket()
Now some_other_real_file.txt exists, but it is not a copy of some_real_file.txt. Any idea on how to do that?
If 'some_real_file.txt' already exists on your system, you should use upload_file instead:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.upload_file
For your example:
file_path = "some_real_file.txt"
s3 = boto3.client("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket)
s3_resource = boto3.resource('s3')
s3_resource.meta.client.upload_file(file_path, bucket, file_path)
Your code currently creates an empty file in S3 (since Body=""), and that is exactly what is being downloaded to 'some_other_real_file.txt'.
Notice that, if you change the Body-parameter to have some text in it, that exact content will be downloaded to 'some_other_real_file.txt'.
Requirement: To download the latest file i.e., current file from s3
Sample file in s3
bucketname/2020/09/reporting_2020_09_20200902000335.zip
bucketname/2020/09/reporting_2020_09_20200901000027.zip
When I pass the s3_src_key as /2020/09/reporting_2020_09_20200902 doesn't work for below one
Code:
with tempfile.NamedTemporaryFile('r') as f_source, tempfile.NamedTemporaryFile('w') as f_target:
s3_client.download_file(self.s3_src_bucket, self.s3_src_key, f_source.name)
Below one works fine
import os
bucket = 'bucketname'
key = '/2020/09/reporting_2020_09_20200902'
s3_resource = boto3.resource('s3')
my_bucket = s3_resource.Bucket(bucket)
objects = my_bucket.objects.filter(Prefix=key)
for obj in objects:
path, filename = os.path.split(obj.key)
my_bucket.download_file(obj.key, filename)
I need help how to use wildcard in Airflow
You can list objects that match a given pattern, but then you'll need to write code that decides which one of them is the latest.
Here's the Python SDK function you'll need
i tried boto3 but no luck
import boto3
from botocore.exceptions import NoCredentialsError
ACCESS_KEY = 'access_key'
SECRET_KEY = 'secret_key'
def upload_to_aws(local_file, bucket, s3_file):
s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
try:
s3.upload_file(local_file, bucket, s3_file)
print("Upload Successful")
return True
except FileNotFoundError:
print("The file was not found")
return False
except NoCredentialsError:
print("Credentials not available")
return False
uploaded = upload_to_aws('local_file', 'information-arch', 's3_file_name')
print("Done! with the uploud")
Hussein,
According to the boto3Documentation, you should upload your upload like this:
upload_file(Filename, Bucket, Key, ExtraArgs=None, Callback=None, Config=None)
Example:
import boto3
s3 = boto3.resource('s3')
s3.meta.client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')
Parameters
Filename (str) -- The path to the file to upload.
Bucket (str) -- The name of the bucket to upload to.
Key (str) -- The name of the key to upload to.
So on your upload_to_aws function called to pass the parameter like this way.
Thanks
You can copy your dataframe directly to s3 like this:
Let's say you have a dataframe called df. You can use the to_csv option specifying your s3 path.
It will directly save the csv file on S3.
This works with pandas versions >= 0.24
df.to_csv(s3_path, index=False)
From pandas docs:
pandas now uses s3fs for handling S3 connections. This shouldn’t break any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto in prior versions of pandas.