I am downloading a file from AWS using boto 3 and after processing I am trying to delete that file from server. Deleting a file seems to be confusing (as I am kinda new to AWS and boto), here is what I am doing:
def test(self, obj):
current_bucket = obj.bucket
current_key = obj.key
client = boto3.client('s3', aws_access_key_id=settings.AWS_ACCESS_ID, aws_secret_access_key=settings.AWS_SECRET_KEY)
client.download_file(current_bucket, current_key, "temp.file")
# do the file processing
# delete the temp.file
Is there any specific keyword in boto 3 to delete the temporarily created files??
If you want to just process a file and then delete it.Then you should try reading it directly from s3 using get_object()
code in python will look something like:
# get Streaming Body
response = s3.get_object(Bucket='my_bucket', Key='s3_object')
lines = TextIOWrapper(response)
for line in lines:
print(line)
Related
I have a Jupyter Notebook and I want to access a dataset that is in S3 bucket( it is publicaly accesible)
response = s3.list_objects_v2(Bucket='sagemaker-eu-central-1-261218592922' )
for content in response['Contents']:
obj_dict = s3.get_object(Bucket='sagemaker-eu-central-1-261218592922', Key=content['Key'])
print(obj_dict)
I am using a boto3 client (s3). Ok so I go through the contents of the bucket with code above, but how does one access the contents of the file?
Without knowing what the contents of the file are, or what you want to do after, I can only offer a generic solution -
response = s3.list_objects_v2(Bucket='sagemaker-eu-central-1-261218592922' )
for content in response['Contents']:
obj_dict = s3.get_object(Bucket='sagemaker-eu-central-1-261218592922', Key=content['Key'])
contents = obj_dict['Body'].read().decode('utf-8')
More details on this can be found at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object
I need to be able to store a file, then access it for a celery task. Is there a way, when I return a s3_file_path I can download and store the file in the temporary file location? I saw the key.get_contents_to_filename('some_name') but that doesn't really serve my purpose. I would return the s3_file_path and then perform my actions commented in the celery task pseudo code, in another function. I am currently doing a hacky version of this by making an expired url using the generate_url(), but it's not really what I want to do.
conn = boto.connect_s3()
# TODO: add test to check for validate=False
bucket = conn.get_bucket(settings.S3_BACKUP_BUCKET, validate=False)
key = Key(bucket)
s3_file_path = os.path.join(
settings.ENVIRONMENT, location, destination_filename)
key.key = s3_file_path
key.set_contents_from_filename(source_filename)
# celery task code
# bucket.download(s3_file_path, tempfile_name)
# file_obj = open(tempfile_name, 'r')
# import_file(file_obj)
I am using a Flask app to receive a mutipart/form-data request with an uploaded file (a video, in this example).
I don't want to save the file in the local directory because this app will be running on a server, and saving it will slow things down.
I am trying to use the file object created by the Flask request.files[''] method, but it doesn't seem to be working.
Here is that portion of the code:
#bp.route('/video_upload', methods=['POST'])
def VideoUploadHandler():
form = request.form
video_file = request.files['video_data']
if video_file:
s3 = boto3.client('s3')
s3.upload_file(video_file.read(), S3_BUCKET, 'video.mp4')
return json.dumps('DynamoDB failure')
This returns an error:
TypeError: must be encoded string without NULL bytes, not str
on the line:
s3.upload_file(video_file.read(), S3_BUCKET, 'video.mp4')
I did get this to work by first saving the file and then accessing that saved file, so it's not an issue with catching the request file. This works:
video_file.save(form['video_id']+".mp4")
s3.upload_file(form['video_id']+".mp4", S3_BUCKET, form['video_id']+".mp4")
What would be the best method to handle this file data in memory and pass it to the s3.upload_file() method? I am using the boto3 methods here, and I am only finding examples with the filename used in the first parameter, so I'm not sure how to process this correctly using the file in memory. Thanks!
First you need to be able to access the raw data sent to Flask. This is not as easy as it seems, since you're reading a form. To be able to read the raw stream you can use flask.request.stream, which behaves similarly to StringIO. The trick here is, you cannot call request.form or request.file because accessing those attributes will load the whole stream into memory or into a file.
You'll need some extra work to extract the right part of the stream (which unfortunately I cannot help you with because it depends on how your form is made, but I'll let you experiment with this).
Finally you can use the set_contents_from_file function from boto, since upload_file does not seem to deal with file-like objects (StringIO and such).
Example code:
from boto.s3.key import Key
#bp.route('/video_upload', methods=['POST'])
def VideoUploadHandler():
# form = request.form <- Don't do that
# video_file = request.files['video_data'] <- Don't do that either
video_file_and_metadata = request.stream # This is a file-like object which does not only contain your video file
# This is what you need to implement
video_title, video_stream = extract_title_stream(video_file_and_metadata)
# Then, upload to the bucket
s3 = boto3.client('s3')
bucket = s3.create_bucket(bucket_name, location=boto.s3.connection.Location.DEFAULT)
k = Key(bucket)
k.key = video_title
k.set_contents_from_filename(video_stream)
I have an endpoint where I want to collect the response data and dump it into a file on S3 like this - https://stackoverflow.com/a/18731115/4824482
This is how I was trying to do it -
file_obj = open('/some/path/log.csv', 'w+')
file_obj.write(request.POST['data'])
and then passing file_obj to the S3 related code as in the above link.
The problem is that I don't have permissions to create a file on the server. Is there any way I can create a file object just in memory and then pass it to the S3 code?
Probably that's duplicate question of How to upload a file to S3 without creating a temporary local file. You would find best suggestion by checking out answers to that question.
Shortly the answer is code below:
from boto.s3.key import Key
k = Key(bucket)
k.key = 'yourkey'
k.set_contents_from_string(request.POST['data'])
Try tempfile https://docs.python.org/2/library/tempfile.html
f = tempfile.TemporaryFile()
f.write(request.POST['data'])
I have been uploading text files to S3 and I came across this interesting error: the files aren't always uploaded, just the file name. So sometimes the entire file uploaded and sometimes I have a 0 byte file on S3. I have been using this tutorial:
http://stackabuse.com/example-upload-a-file-to-aws-s3/
Here is the code I have been using (minus keys and such):
#NOTE Section 8: Uploading to Amazon
AWS_ACCESS_KEY = ''
AWS_ACCESS_SECRET_KEY = ''
filea = open(date + '.txt', 'r+')
key = filea.name
bucket = ''
import os
import boto
from boto.s3.key import Key
##Beginning of function
def upload_to_s3(aws_access_key_id, aws_secret_access_key, filea, bucket, key, callback=None, md5=None, reduced_redundancy=False, content_type=None):
"""
Uploads the given file to the AWS S3
bucket and key specified.
callback is a function of the form:
def callback(complete, total)
The callback should accept two integer parameters,
the first representing the number of bytes that
have been successfully transmitted to S3 and the
second representing the size of the to be transmitted
object.
Returns boolean indicating success/failure of upload.
"""
# try:
# size = os.fstat(file.fileno()).st_size
# except:
# Not all file objects implement fileno(),
# so we fall back on this
# file.seek(0, os.SEEK_END)
# size = file.tell()
conn = boto.connect_s3(aws_access_key_id, aws_secret_access_key)
bucket = conn.get_bucket(bucket, validate=False)
k = Key(bucket)
k.key = key
print k.key
#if content_type:
# k.set_metadata('Content-Type', content_type)
sent = k.set_contents_from_file(filea, cb=callback, md5=md5, reduced_redundancy=reduced_redundancy, rewind=True)
print sent
# Rewind for later use
filea.seek(0)
#print size
##End of function
upload_to_s3(AWS_ACCESS_KEY, AWS_ACCESS_SECRET_KEY, filea, bucket, key)
os.remove(date + '.txt')
Now some info about what I feed into this: earlier sections of the code write out a text file, there are multiple lines paragraphs, but still all one text file that was created with a+ permissions. The file is named using (date + '.txt') and is not closed in earlier sections of the code using .close() unless there is some subprocess that the python interpreter carries out that I am not aware of (.close() gave me a few issues so I just left it open, since the last line of my code here erases it).
I have tried looping the uploading process, but it seems like the file is just not read properly. What am I doing wrong?
Boto does not rewind the file to 0 before it starts to upload. If the file pointer you pass to k.set_contents_from_file is not at the beginning of the file then any data from the beginning to the file to its current position (as reported by fp.tell()) will not be sent. This is by design and I would not consider this a bug in boto.
If you want to be sure the entire file is uploaded to S3, make sure the file pointer is at the beginning of the file before passing it to boto. From the code you show above, you are doing a rewind after the upload but not before.