error sending multiple files to aws s3 using python

error sending multiple files to aws s3 using python - python

I'm learning about AWS IoT using Raspberry Pi and python. Problem I came across is the following:
In main function on event detection a picture is been taken and saved on the Pi. After picture is saved a function store_to_bucket is called and i'm passing two parameters: a path, where picture is stored and a date string. Everything is working fine for the first time. Second function call gives me the following error:
ClientError: An error occurred (AuthorizationHeaderMalformed) when calling the PutObject operation: The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential.
Code
ACCESS_KEY_ID = open("/mykey/path/key.txt", "r")
ACCESS_SECRET_KEY = open("/mykey/path/skey.txt", "r")
BUCKET_NAME = open("/mykey/path/bucket.txt", "r").read()
data = open(path, 'rb')
ext = '.jpg'
s3 = boto3.resource(
's3',
aws_access_key_id=ACCESS_KEY_ID.read(),
aws_secret_access_key=ACCESS_SECRET_KEY.read(),
config=Config(signature_version='s3v4')
)
def store_to_bucket(path, date):
s3.Bucket(BUCKET_NAME).put_object(Key=date+ext, Body=data)
print ("Done")
Not sure what the problem is with credentials? Has anyone experienced similar issue or knows how to fix it?
Found solution to my problem by fixing some code. UPdated version below:
import boto3
from botocore.client import Config
ACCESS_KEY_ID = open("/home/pi/Desktop/pythonForAWS/certs/key.txt",
"r").read()
ACCESS_SECRET_KEY = open("/home/pi/Desktop/pythonForAWS/certs/skey.txt",
"r").read()
BUCKET_NAME = open("/home/pi/Desktop/pythonForAWS/certs/bucket.txt",
"r").read()
def store_to_bucket(path, date):
data = open(path, 'rb')
ext = '.jpg'
s3 = boto3.resource(
's3',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY,
config=Config(signature_version='s3v4')
)
s3.Bucket(BUCKET_NAME).put_object(Key=date+ext, Body=data)
print ("Done")

If, as it can be understood by the re-writing, you are creating the s3 object inside the function, the reason for non-working can be the fact that you are calling read() multiple times on the same file object.
The first call returns the whole content of the file, but moves the "position" of the file object to the end, so subsequent calls of the same function return nothing - as there's nothing more to read.
Reading the value only once just works, as would seek()ing back to the beginning of the file and reading again each time. I'd advise to read the value only once, to avoid unnecessary I/O and related overhead.

Related

How to handle dynamic file naming convention, while AWS s3 file coping using boto3 python

I am new to AWS world, started to explore recently.
After running Athena Query, I am trying to copy the query result file generated, to another s3 location.
The problem I am getting here is :
file_name Here I'm trying to build dynamically, using the query id , that Athena generated and by appending with .csv file extension.
Which is generating exception:
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the CopyObject operation: The specified key does not exist.
If hardcode the file name e.g : file_name = '30795514-8b0b-4b17-8764-495b25d74100.csv' inside single quote '', my code is working fine. Copying is getting done.
Please help me how can I dynamically build source and destination file name dynamically.
import boto3
s3 = session.client('s3');
athena_client = boto3.client(
"athena",
aws_access_key_id=AWS_ACCESS_KEY,
aws_secret_access_key=AWS_SECRET_KEY,
region_name=AWS_REGION,);
def main():
query = "select * from test_table";
response = athena_client.start_query_execution(
QueryString=query,
ResultConfiguration={"OutputLocation": RESULT_OUTPUT_LOCATION}
)
queryId = response['QueryExecutionId'];
src_bucket = 'smg-datalake-prod-athena-query-results'
dst_bucket = 'smg-datalake-prod-athena-query-results'
file_name = str(queryId+".csv");
copy_object(src_bucket, dst_bucket, file_name)
def copy_object(src_bucket, dst_bucket, file_name):
src_key = f'python-athena/{file_name}';
dst_key = f'python-athena/cosmo/rss/v2/newsletter/kloka_latest.csv';
# copy object to destination bucket
s3.copy_object(Bucket=dst_bucket, CopySource={'Bucket': src_bucket, 'Key': src_key}, Key=dst_key);

After executing Athena Query, I just put some sleep. then I tried to move file to another location, it started to work.
As it was running so fast , by the time file is available in query results bucket, my code was trying to copy the file, which yet to be present.

How to get file key out of an SNS connected to S3 Bucket

I have two different profiles on AWS. The s3 bucket and SNS are in profile A and my lambda function is in profile B. When a new file is added to the s3 bucket, SNS triggers the lambda function.
The lambda function then supposed to access the new file and process it using pandas. Here is what I'm doing now;
sts_connection = boto3.client('sts')
acct_b = sts_connection.assume_role(
RoleArn="arn:aws:iam::**************:role/AllowS3AccessFromAccountB",
RoleSessionName="cross_acct_lambda"
)
ACCESS_KEY = acct_b['Credentials']['AccessKeyId']
SECRET_KEY = acct_b['Credentials']['SecretAccessKey']
SESSION_TOKEN = acct_b['Credentials']['SessionToken']
s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, aws_session_token=SESSION_TOKEN)
path = get_file_path(event)
obj = s3.get_object(Bucket='my-bucket-name', Key=path)
csv_string = io.BytesIO(obj['Body'].read())
# Read a csv file and turn it into a DataFrame
df = pd.read_json(csv_string, delimiter=';', engine ='c', encoding= 'unicode_escape')
def get_file_path(event_body):
"""Function to get manifest path anc check if it is manifest"""
try:
# Get message for first SNS record
sns_message = json.loads(event_body["Records"][0]["Sns"]["Message"])
path = sns_message["Records"][0]["s3"]["object"]["key"]
except TypeError as ex:
logging.error("Unable to parse event: " + str(event_body))
raise ex
return path
Everything works fine until the s3.get_object() part. I'm getting the following error;
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
Maybe I'm reading the file key in the wrong way?
Edit:
Here is what path looks like when I debugged it.
svv/sensor%3D11219V22151/year%3D2020/month%3D03/day%3D02/test.csv
And the s3 file structure is like this;
sensor-data/sensor=*******/year=2020/month=03/day=02
Seems like I need to use a regex for the equal signs. But there should be a more generic solution.

Here's a snippet I have in some Lambda code that is directly triggered by Amazon S3 (not via Amazon SNS):
import urllib
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
You could try the similar parsing to see if it corrects the Key.

Writing a file to S3 using Lambda in Python with AWS

In AWS, I'm trying to save a file to S3 in Python using a Lambda function. While this works on my local computer, I am unable to get it to work in Lambda. I've been working on this problem for most of the day and would appreciate help. Thank you.
def pdfToTable(PDFfilename, apiKey, fileExt, bucket, key):
# parsing a PDF using an API
fileData = (PDFfilename, open(PDFfilename, "rb"))
files = {"f": fileData}
postUrl = "https://pdftables.com/api?key={0}&format={1}".format(apiKey, fileExt)
response = requests.post(postUrl, files=files)
response.raise_for_status()
# this code is probably the problem!
s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'rb') as data:
data.write(response.content)
key = 'csv/' + key
bucket.upload_fileobj(data, key)
# FYI, on my own computer, this saves the file
with open('output.csv', "wb") as f:
f.write(response.content)
In S3, there is a bucket transportation.manifests.parsed containing the folder csv where the file should be saved.
The type of response.content is bytes.
From AWS, the error from the current set-up above is [Errno 2] No such file or directory: '/tmp/output2.csv': FileNotFoundError. In fact, my goal is to save the file to the csv folder under a unique name, so tmp/output2.csv might not be the best approach. Any guidance?
In addition, I've tried to use wb and w instead of rb also to no avail. The error with wb is Input <_io.BufferedWriter name='/tmp/output2.csv'> of type: <class '_io.BufferedWriter'> is not supported. The documentation suggests that using 'rb' is the recommended usage, but I do not understand why that would be the case.
Also, I've tried s3_client.put_object(Key=key, Body=response.content, Bucket=bucket) but receive An error occurred (404) when calling the HeadObject operation: Not Found.

Assuming Python 3.6. The way I usually do this is to wrap the bytes content in a BytesIO wrapper to create a file like object. And, per the boto3 docs you can use the-transfer-manager for a managed transfer:
from io import BytesIO
import boto3
s3 = boto3.client('s3')
fileobj = BytesIO(response.content)
s3.upload_fileobj(fileobj, 'mybucket', 'mykey')
If that doesn't work I'd double check all IAM permissions are correct.

You have a writable stream that you're asking boto3 to use as a readable stream which won't work.
Write the file, and then simply use bucket.upload_file() afterwards, like so:
s3 = boto3.resource('s3')
bucket = s3.Bucket('transportation.manifests.parsed')
with open('/tmp/output2.csv', 'w') as data:
data.write(response.content)
key = 'csv/' + key
bucket.upload_file('/tmp/output2.csv', key)

Flask - Handling Form File & Upload to AWS S3 without Saving to File

I am using a Flask app to receive a mutipart/form-data request with an uploaded file (a video, in this example).
I don't want to save the file in the local directory because this app will be running on a server, and saving it will slow things down.
I am trying to use the file object created by the Flask request.files[''] method, but it doesn't seem to be working.
Here is that portion of the code:
#bp.route('/video_upload', methods=['POST'])
def VideoUploadHandler():
form = request.form
video_file = request.files['video_data']
if video_file:
s3 = boto3.client('s3')
s3.upload_file(video_file.read(), S3_BUCKET, 'video.mp4')
return json.dumps('DynamoDB failure')
This returns an error:
TypeError: must be encoded string without NULL bytes, not str
on the line:
s3.upload_file(video_file.read(), S3_BUCKET, 'video.mp4')
I did get this to work by first saving the file and then accessing that saved file, so it's not an issue with catching the request file. This works:
video_file.save(form['video_id']+".mp4")
s3.upload_file(form['video_id']+".mp4", S3_BUCKET, form['video_id']+".mp4")
What would be the best method to handle this file data in memory and pass it to the s3.upload_file() method? I am using the boto3 methods here, and I am only finding examples with the filename used in the first parameter, so I'm not sure how to process this correctly using the file in memory. Thanks!

First you need to be able to access the raw data sent to Flask. This is not as easy as it seems, since you're reading a form. To be able to read the raw stream you can use flask.request.stream, which behaves similarly to StringIO. The trick here is, you cannot call request.form or request.file because accessing those attributes will load the whole stream into memory or into a file.
You'll need some extra work to extract the right part of the stream (which unfortunately I cannot help you with because it depends on how your form is made, but I'll let you experiment with this).
Finally you can use the set_contents_from_file function from boto, since upload_file does not seem to deal with file-like objects (StringIO and such).
Example code:
from boto.s3.key import Key
#bp.route('/video_upload', methods=['POST'])
def VideoUploadHandler():
# form = request.form <- Don't do that
# video_file = request.files['video_data'] <- Don't do that either
video_file_and_metadata = request.stream # This is a file-like object which does not only contain your video file
# This is what you need to implement
video_title, video_stream = extract_title_stream(video_file_and_metadata)
# Then, upload to the bucket
s3 = boto3.client('s3')
bucket = s3.create_bucket(bucket_name, location=boto.s3.connection.Location.DEFAULT)
k = Key(bucket)
k.key = video_title
k.set_contents_from_filename(video_stream)

Text file content loss on Boto S3 upload

I have been uploading text files to S3 and I came across this interesting error: the files aren't always uploaded, just the file name. So sometimes the entire file uploaded and sometimes I have a 0 byte file on S3. I have been using this tutorial:
http://stackabuse.com/example-upload-a-file-to-aws-s3/
Here is the code I have been using (minus keys and such):
#NOTE Section 8: Uploading to Amazon
AWS_ACCESS_KEY = ''
AWS_ACCESS_SECRET_KEY = ''
filea = open(date + '.txt', 'r+')
key = filea.name
bucket = ''
import os
import boto
from boto.s3.key import Key
##Beginning of function
def upload_to_s3(aws_access_key_id, aws_secret_access_key, filea, bucket, key, callback=None, md5=None, reduced_redundancy=False, content_type=None):
"""
Uploads the given file to the AWS S3
bucket and key specified.
callback is a function of the form:
def callback(complete, total)
The callback should accept two integer parameters,
the first representing the number of bytes that
have been successfully transmitted to S3 and the
second representing the size of the to be transmitted
object.
Returns boolean indicating success/failure of upload.
"""
# try:
# size = os.fstat(file.fileno()).st_size
# except:
# Not all file objects implement fileno(),
# so we fall back on this
# file.seek(0, os.SEEK_END)
# size = file.tell()
conn = boto.connect_s3(aws_access_key_id, aws_secret_access_key)
bucket = conn.get_bucket(bucket, validate=False)
k = Key(bucket)
k.key = key
print k.key
#if content_type:
# k.set_metadata('Content-Type', content_type)
sent = k.set_contents_from_file(filea, cb=callback, md5=md5, reduced_redundancy=reduced_redundancy, rewind=True)
print sent
# Rewind for later use
filea.seek(0)
#print size
##End of function
upload_to_s3(AWS_ACCESS_KEY, AWS_ACCESS_SECRET_KEY, filea, bucket, key)
os.remove(date + '.txt')
Now some info about what I feed into this: earlier sections of the code write out a text file, there are multiple lines paragraphs, but still all one text file that was created with a+ permissions. The file is named using (date + '.txt') and is not closed in earlier sections of the code using .close() unless there is some subprocess that the python interpreter carries out that I am not aware of (.close() gave me a few issues so I just left it open, since the last line of my code here erases it).
I have tried looping the uploading process, but it seems like the file is just not read properly. What am I doing wrong?

Boto does not rewind the file to 0 before it starts to upload. If the file pointer you pass to k.set_contents_from_file is not at the beginning of the file then any data from the beginning to the file to its current position (as reported by fp.tell()) will not be sent. This is by design and I would not consider this a bug in boto.
If you want to be sure the entire file is uploaded to S3, make sure the file pointer is at the beginning of the file before passing it to boto. From the code you show above, you are doing a rewind after the upload but not before.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

error sending multiple files to aws s3 using python - python

Related

How to handle dynamic file naming convention, while AWS s3 file coping using boto3 python

How to get file key out of an SNS connected to S3 Bucket

Writing a file to S3 using Lambda in Python with AWS

Flask - Handling Form File & Upload to AWS S3 without Saving to File

Text file content loss on Boto S3 upload

Categories

Resources