Upload a file using boto - python

import boto
conn = boto.connect_s3('', '')
mybucket = conn.get_bucket('data_report_321')
I can download the file from a bucket using the following code.
for b in mybucket:
print b.name
b.get_contents_to_filename('0000_part_00', headers=None, cb=None, num_cb=10, torrent=False, version_id=None, res_download_handler=None, response_headers=None)
But I am not able to upload a file. I get an error:
AttributeError: 'str' object has no attribute 'tell'
send_file nor set_contents functions are working as expected.
for b in mybucket:
b.send_file('mytest.txt', headers=None, cb=None, num_cb=10, query_args=None, chunked_transfer=False, size=None)
b.set_contents_from_file('mytest.txt', headers=None, replace=True, cb=None, num_cb=10, policy=None, md5=None, reduced_redundancy=False, query_args=None, encrypt_key=False, size=None, rewind=False)
How do I upload a file from current directory of local server to S3 bucket using boto?
Update:
I need to declare the key (filename) of the uploaded file first before calling the set_contents_from_filename function.
k = boto.s3.key.Key(mybucket)
k.key = 'uploaded_file.txt'
k.set_contents_from_filename("myteswt.txt")

Both send_file and set_contents_from_file take a File object for first argument. If you would like to pass a string you should see set_contents_from_filename.

You could use this utility script: https://github.com/TimelyToga/upload_s3
python upload_s3.py <filename> [key_to_upload_as]
That will upload any file to a s3 bucket that you specify.
Although, when you are uploading set_contents_from_file takes a python File object.

Related

How to handle dynamic file naming convention, while AWS s3 file coping using boto3 python

I am new to AWS world, started to explore recently.
After running Athena Query, I am trying to copy the query result file generated, to another s3 location.
The problem I am getting here is :
file_name Here I'm trying to build dynamically, using the query id , that Athena generated and by appending with .csv file extension.
Which is generating exception:
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the CopyObject operation: The specified key does not exist.
If hardcode the file name e.g : file_name = '30795514-8b0b-4b17-8764-495b25d74100.csv' inside single quote '', my code is working fine. Copying is getting done.
Please help me how can I dynamically build source and destination file name dynamically.
import boto3
s3 = session.client('s3');
athena_client = boto3.client(
"athena",
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_KEY,
region_name=AWS_REGION,);
def main():
query = "select * from test_table";
response = athena_client.start_query_execution(
QueryString=query,
ResultConfiguration={"OutputLocation": RESULT_OUTPUT_LOCATION}
)
queryId = response['QueryExecutionId'];
src_bucket = 'smg-datalake-prod-athena-query-results'
dst_bucket = 'smg-datalake-prod-athena-query-results'
file_name = str(queryId+".csv");
copy_object(src_bucket, dst_bucket, file_name)
def copy_object(src_bucket, dst_bucket, file_name):
src_key = f'python-athena/{file_name}';
dst_key = f'python-athena/cosmo/rss/v2/newsletter/kloka_latest.csv';
# copy object to destination bucket
s3.copy_object(Bucket=dst_bucket, CopySource={'Bucket': src_bucket, 'Key': src_key}, Key=dst_key);
After executing Athena Query, I just put some sleep. then I tried to move file to another location, it started to work.
As it was running so fast , by the time file is available in query results bucket, my code was trying to copy the file, which yet to be present.

Upload object to Oracle Storage using put_object in Python

I'm trying to upload an object to Oracle Storage with oci-cli library in Python. When I try using command-line:
oci os object put -ns grddddaaaZZ -bn dev.bucket --name processed/2020-11 --file /path/to/my/file/image.tif
I actually get a response like:
Upload ID: 4f...78f0fdc5
Split file into 2 parts for upload.
Uploading object [------------------------------------] 0%
...
but when I try using the framework:
try:
namespace = 'grddddaaaZZ'
bucket = 'dev.bucket'
object_path = 'processed/2020-11/image.tif'
with open('/path/to/my/file/image.tif', "rb") as image:
publish_payload = image.read()
response = object_storage.put_object(namespace, bucket, object_path, publish_payload)
except (InvalidConfig, BaseConnectTimeout, ConfigFileNotFound, ServiceError) as error:
logging.error(">>>>>>>> Something went wrong when try to list bucket {} objects. Error {}".
format(bucket, error))
the upload does not complete:
...
response = object_storage.put_object(namespace, bucket, object_path, publish_payload)
File ".../.venv/lib/python3.8/site-packages/oci/object_storage/object_storage_client.py", line 4113, in put_object
return self.base_client.call_api(
File ".../.venv/lib/python3.8/site-packages/oci/base_client.py", line 272, in call_api
response = self.request(request)
File ".../.venv/lib/python3.8/site-packages/oci/base_client.py", line 378, in request
raise exceptions.RequestException(e)
oci.exceptions.RequestException: ('Connection aborted.', timeout('The write operation timed out'))
I thought that it could be the size of file (which is around 208Mb), but in put_object documentation says 5Gb limit. So, I do not think it could be the issue. My last chance would be to use os.system(), but it would not be what I truly want.
Some clue in what could be missing in this second option?
You could try uploading some other data first, to see if it's the payload:
namespace = 'grddddaaaZZ'
bucket_name = 'dev.bucket'
object_name = 'processed/2020-11/test.txt'
test_data = b"Hello, World!"
obj = object_storage.put_object(
namespace,
bucket_name,
object_name,
my_data)
or you try it without reading the file contents and just passing the file object:
namespace = 'grddddaaaZZ'
bucket = 'dev.bucket'
object_path = 'processed/2020-11/image.tif'
with open('/path/to/my/file/image.tif', 'rb') as f:
obj = object_storage.put_object(namespace, bucket, object_path, f)
with open('tomcat_access_log_20221118-231901.log.zip', 'rb') as filePtr:
... upload_resp = object_storage_client.put_object(nameSpace,bucket_name='my-Test-Bucket',object_name=file_to_upload,put_object_body=filePtr)
Note : file_to_upload = 'empty_folder_for_testing/tomcat-admin-server/tomcat_access_log_20221118-231901.log.zip'
The above code getting stuck for very log till end getting timeout. But actually i can see file uploaded properly. But this command getting stuck for long enough till timeout ... Any idea ?

Writing a file to S3 using Lambda in Python with AWS

In AWS, I'm trying to save a file to S3 in Python using a Lambda function. While this works on my local computer, I am unable to get it to work in Lambda. I've been working on this problem for most of the day and would appreciate help. Thank you.
def pdfToTable(PDFfilename, apiKey, fileExt, bucket, key):
# parsing a PDF using an API
fileData = (PDFfilename, open(PDFfilename, "rb"))
files = {"f": fileData}
postUrl = "https://pdftables.com/api?key={0}&format={1}".format(apiKey, fileExt)
response = requests.post(postUrl, files=files)
response.raise_for_status()
# this code is probably the problem!
s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'rb') as data:
data.write(response.content)
key = 'csv/' + key
bucket.upload_fileobj(data, key)
# FYI, on my own computer, this saves the file
with open('output.csv', "wb") as f:
f.write(response.content)
In S3, there is a bucket transportation.manifests.parsed containing the folder csv where the file should be saved.
The type of response.content is bytes.
From AWS, the error from the current set-up above is [Errno 2] No such file or directory: '/tmp/output2.csv': FileNotFoundError. In fact, my goal is to save the file to the csv folder under a unique name, so tmp/output2.csv might not be the best approach. Any guidance?
In addition, I've tried to use wb and w instead of rb also to no avail. The error with wb is Input <_io.BufferedWriter name='/tmp/output2.csv'> of type: <class '_io.BufferedWriter'> is not supported. The documentation suggests that using 'rb' is the recommended usage, but I do not understand why that would be the case.
Also, I've tried s3_client.put_object(Key=key, Body=response.content, Bucket=bucket) but receive An error occurred (404) when calling the HeadObject operation: Not Found.
Assuming Python 3.6. The way I usually do this is to wrap the bytes content in a BytesIO wrapper to create a file like object. And, per the boto3 docs you can use the-transfer-manager for a managed transfer:
from io import BytesIO
import boto3
s3 = boto3.client('s3')
fileobj = BytesIO(response.content)
s3.upload_fileobj(fileobj, 'mybucket', 'mykey')
If that doesn't work I'd double check all IAM permissions are correct.
You have a writable stream that you're asking boto3 to use as a readable stream which won't work.
Write the file, and then simply use bucket.upload_file() afterwards, like so:
s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'w') as data:
data.write(response.content)
key = 'csv/' + key
bucket.upload_file('/tmp/output2.csv', key)

Error 500 while Uploading CSV file to S3 bucket using boto3 and python flask

kind of looked at all possible options.
I am using boto3 and python3.6 to upload file to s3 bucket, Funny thing is while json and even .py file is getting uploaded, it is throwing Error 500 while uploading CSV. On successful uplaod i am returning an json to check all the values.
import boto3
from botocore.client import Config
#app.route("/upload",methods = ['POST','GET'])
def upload():
if request.method == 'POST':
file = request.files['file']
filename = secure_filename(file.filename)
s3 = boto3.resource('s3', aws_access_key_id= os.environ.get('AWS_ACCESS_KEY_ID'), aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'),config=Config(signature_version='s3v4'))
s3.Bucket(os.environ.get('S3_BUCKET')).put_object(Key=filename, Body=open(filename, 'rb'), ContentEncoding='text/csv')
return jsonify({'successful upload':filename, 'S3_BUCKET':os.environ.get('S3_BUCKET'), 'ke':os.environ.get('AWS_ACCESS_KEY_ID'), 'sec':os.environ.get('AWS_SECRET_ACCESS_KEY'),'filepath': "https://s3.us-east-2.amazonaws.com/"+os.environ.get('S3_BUCKET')+"/" +filename})
Please help!!
You are getting a FileNotFoundError for file xyz.csv because the file does not exist.
This could be because the code in upload() does not actually save the uploaded file, it merely obtains a safe name for it and immediately tries to open it - which fails.
That it works for other files is probably due to the fact that those files already exist, perhaps left over from testing, so there is no problem.
Try saving the file to the file system using save() after obtaining the safe filename:
upload_file = request.files['file']
filename = secure_filename(upload_file.filename)
upload_file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
and then uploading it (assuming that you've configured an UPLOAD_FOLDER):
with open(os.path.join(app.config['UPLOAD_FOLDER'], filename), 'rb') as f:
s3.Bucket(os.environ.get('S3_BUCKET')).put_object(Key=filename, Body=f, ContentEncoding='text/csv')
return jsonify({...})
There is no need to actually save the file to the file system; it can be streamed directly to your S3 bucket using the stream attribute of the upload_file object:
upload_file = request.files['file']
filename = secure_filename(upload_file.filename)
s3 = boto3.resource('s3', aws_access_key_id='key', aws_secret_access_key='secret')
s3.Bucket('bucket').put_object(Key=filename, Body=upload_file.stream, ContentType=upload_file.content_type)
To make this more generic you should use the content_type attribute of the uploaded file as shown above.

Flask - Handling Form File & Upload to AWS S3 without Saving to File

I am using a Flask app to receive a mutipart/form-data request with an uploaded file (a video, in this example).
I don't want to save the file in the local directory because this app will be running on a server, and saving it will slow things down.
I am trying to use the file object created by the Flask request.files[''] method, but it doesn't seem to be working.
Here is that portion of the code:
#bp.route('/video_upload', methods=['POST'])
def VideoUploadHandler():
form = request.form
video_file = request.files['video_data']
if video_file:
s3 = boto3.client('s3')
s3.upload_file(video_file.read(), S3_BUCKET, 'video.mp4')
return json.dumps('DynamoDB failure')
This returns an error:
TypeError: must be encoded string without NULL bytes, not str
on the line:
s3.upload_file(video_file.read(), S3_BUCKET, 'video.mp4')
I did get this to work by first saving the file and then accessing that saved file, so it's not an issue with catching the request file. This works:
video_file.save(form['video_id']+".mp4")
s3.upload_file(form['video_id']+".mp4", S3_BUCKET, form['video_id']+".mp4")
What would be the best method to handle this file data in memory and pass it to the s3.upload_file() method? I am using the boto3 methods here, and I am only finding examples with the filename used in the first parameter, so I'm not sure how to process this correctly using the file in memory. Thanks!
First you need to be able to access the raw data sent to Flask. This is not as easy as it seems, since you're reading a form. To be able to read the raw stream you can use flask.request.stream, which behaves similarly to StringIO. The trick here is, you cannot call request.form or request.file because accessing those attributes will load the whole stream into memory or into a file.
You'll need some extra work to extract the right part of the stream (which unfortunately I cannot help you with because it depends on how your form is made, but I'll let you experiment with this).
Finally you can use the set_contents_from_file function from boto, since upload_file does not seem to deal with file-like objects (StringIO and such).
Example code:
from boto.s3.key import Key
#bp.route('/video_upload', methods=['POST'])
def VideoUploadHandler():
# form = request.form <- Don't do that
# video_file = request.files['video_data'] <- Don't do that either
video_file_and_metadata = request.stream # This is a file-like object which does not only contain your video file
# This is what you need to implement
video_title, video_stream = extract_title_stream(video_file_and_metadata)
# Then, upload to the bucket
s3 = boto3.client('s3')
bucket = s3.create_bucket(bucket_name, location=boto.s3.connection.Location.DEFAULT)
k = Key(bucket)
k.key = video_title
k.set_contents_from_filename(video_stream)

Categories