S3 InvalidDigest when calling the PutObject operation [duplicate] - python

I have tried to upload an XML File to S3 using boto3. As recommended by Amazon, I would like to send a Base64 Encoded MD5-128 Bit Digest(Content-MD5) of the data.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.put
My Code:
with open(file, 'rb') as tempfile:
body = tempfile.read()
tempfile.close()
hash_object = hashlib.md5(body)
base64_md5 = base64.encodebytes(hash_object.digest())
response = s3.Object(self.bucket, self.key + file).put(
Body=body.decode(self.encoding),
ACL='private',
Metadata=metadata,
ContentType=self.content_type,
ContentEncoding=self.encoding,
ContentMD5=str(base64_md5)
)
When i try this the str(base64_md5) create a string like 'b'ZpL06Osuws3qFQJ8ktdBOw==\n''
In this case, I get this Error Message:
An error occurred (InvalidDigest) when calling the PutObject operation: The Content-MD5 you specified was invalid.
For Test purposes I copied only the Value without the 'b' in front: 'ZpL06Osuws3qFQJ8ktdBOw==\n'
Then i get this Error Message:
botocore.exceptions.HTTPClientError: An HTTP Client raised and unhandled exception: Invalid header value b'hvUe19qHj7rMbwOWVPEv6Q==\n'
Can anyone help me how to save Upload a File to S3?
Thanks,
Oliver

Starting with #Isaac Fife's example, stripping it down to identify what's required vs not, and to include imports and such to make it a full reproducible example:
(the only change you need to make is to use your own bucket name)
import base64
import hashlib
import boto3
contents = "hello world!"
md = hashlib.md5(contents.encode('utf-8')).digest()
contents_md5 = base64.b64encode(md).decode('utf-8')
boto3.client('s3').put_object(
Bucket="mybucket",
Key="test",
Body=contents,
ContentMD5=contents_md5
)
Learnings: first, the MD5 you are trying to generate will NOT look like what an 'upload' returns. We actually need a base64 version, it returns a md.hexdigest() version. hex is base16, which is not base64.

(Python 3.7)
Took me hours to figure this out because the only error you get is "The Content-MD5 you specified was invalid." Super useful for debugging... Anyway, here is the code I used to actually get the file to upload correctly before refactoring.
json_results = json_converter.convert_to_json(result)
json_results_utf8 = json_results.encode('utf-8')
content_md5 = md5.get_content_md5(json_results_utf8)
content_md5_string = content_md5.decode('utf-8')
metadata = {
"md5chksum": content_md5_string
}
s3 = boto3.resource('s3', config=Config(signature_version='s3v4'))
obj = s3.Object(bucket, 'filename.json')
obj.put(
Body=json_results_utf8,
ContentMD5=content_md5_string,
ServerSideEncryption='aws:kms',
Metadata=metadata,
SSEKMSKeyId=key_id)
and the hashing
def get_content_md5(data):
digest = hashlib.md5(data).digest()
return base64.b64encode(digest)
The hard part for me was figuring out what encoding you need at each step in the process and not being very familiar with how strings are stored in python at the time.
get_content_md5 takes a utf-8 bytes-like object only, and returns the same. But to pass the md5 hash to aws, it needs to be a string. You have to decode it before you give it to ContentMD5.
Pro-tip - Body on the other hand, needs to be given bytes or a seekable object. Make sure if you pass a seekable object that you seek(0) to the beginning of the file before you pass it to AWS or the MD5 will not match. For that reason, using bytes is less error prone, imo.

Related

Verifying a pdf signature with Endesive raises an error when accessing SignerInfo native

I am trying to compare a signature with a certificate for a pdf file in python.
I found this very nice package called endesive.
I followed the example for verifying a pdf signature and I have something like this:
pdf_file_path = "/workspaces/test.pdf"
data = open(pdf_file_path, 'rb').read()
certificates = (
open("/workspaces/certificates/pki.pem", 'rt').read(),
open("/workspaces/certificates/pki-chain.pem", 'rt').read()
)
(hashok, signatureok, certok) = pdf.verify(data, certificates)
print('signature ok?', signatureok)
print('hash ok?', hashok)
print('cert ok?', certok)
This should be pretty straight forward. I read the pdf, I open the certificates and then I 'pdf.verify' to see that everything is in order.
pdf.verify, at one point calls this: signed_data = cms.ContentInfo.load(bcontents)['content'].native which makes ans1crypto raise this error File "/home/vscode/.local/lib/python3.9/site-packages/asn1crypto/core.py", line 4060, in native raise e repeatedly until it gets to
ValueError: Unknown element - context class, constructed method, tag 0
while parsing asn1crypto.core.Sequence
while parsing asn1crypto.cms.SetOfAny
while parsing asn1crypto.cms.CMSAttribute
while parsing asn1crypto.cms.CMSAttributes
while parsing asn1crypto.cms.SignerInfo
What could go wrong here?
Instead of addressing signer data info like this:
signature = signed_data['signer_infos'][0].native['signature']
It should have been addressed like this:
signature = signed_data['signer_infos'][0]['signature'].native
This has been addressed here.

ValueError("Expected: ASCII-armored PGP data") when using pgp_key.from_blob(key_string)

I am getting ValueError("Expected: ASCII-armored PGP data") when using pgp_key.from_blob(key_string) when trying to parse the key.
pgp_key = pgpy.PGPKey()
key = pgp_key.from_blob(key_string);
I tried using parse method as well but getting the same error.
I fixed this error by:
With your key as a file, run base64 /path/to/file_name new_encoded_file_name
Put your encoded key in your desired place (AWS Secrets Manager in my case)
Within your program, add the following line BEFORE getting your pgp key:
key_string = base64.decode(key_string)
Now key = pgp_key.from_blob(key_string) will no longer throw an error as the decoded string will be an ASCII-armored bytearray.

Key given by lambda S3 event cannot be used when containing non-ASCII characters

I have a Python lambda script that shrinks images as they are uploaded to S3. When the uploaded filename contains non-ASCII characters (Hebrew in my case), I cannot get the object (Forbidden as if the file doesn't exist).
Here's (some of) my code:
s3_client = boto3.client('s3')
def handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
s3_client.download_file(bucket, key, "/tmp/somefile")
This raises An error occurred (403) when calling the HeadObject operation: Forbidden: ClientError. I also see in the log that the key contains characters like %D7%92.
Following the web I also tried to unquote the key according to some sources (http://blog.rackspace.com/the-devnull-s3-bucket-hacking-with-aws-lambda-and-python/) like so, with no luck:
key = urllib.unquote_plus(record['s3']['object']['key'])
Same error, although this time the log states that I'm trying to retrieve a key with characters like this: פ×קס×.
Note that this script is verified to work on English keys, and the tests were done on keys with no spaces.
#This worked for me
import urllib.parse
encodedStr = 'My+name+is+Tarak'
urllib.parse.unquote_plus(encodedStr)
"My name is Tarak"
I had a similar problem. I solved it adding an encode before doing the unquote:
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode("utf8"))

how to format the JSON for boto aws sdk?

I m using boto and cloudformation to orchestrate few resource
For creating templates for cloud formation. I m reading a json-file from my local disk and creating json-string to pass as a parameter for template_body
try:
fileObj = open(filename,'r')
json_data = json.loads(fileObj.read())
return json_data
except IOError as e:
print e
exit()
And my cloud formation connection string and stack creation goes like this
cfnConnectObj = cfn.connection.CloudFormationConnection(aws_access_key_id=aKey, aws_secret_access_key=sKey, is_secure=True,debug=2,path='/',validate_certs=True,region=region[3]) #created connection object for cloudformation service
stackID = cfnConnectObj.create_stack('demodrupal',template_body=templateJson, template_url=None,parameters=[],notification_arns=[],disable_rollback=False,timeout_in_minutes=None,capabilities=['CAPABILITY_IAM'],tags=None)
I m getting Boto Error [ERROR]:{"Error":{"Code":"ValidationError","Message":"Template format error: JSON not well-formed. (line 1, column 3)","Type":"Sender"}
Why is this error ?
I have used json.loads but still it shows Json not well formed. Is there anything i m missing ?
Please en-light me
**I m new to python and boto
json.loads takes json and converts it into a python object. If you have a JSON file already you can just pass that file directly to the service. Alternately you can load the JSON into python make any adjustments in python and then use json.dumps to get your well formed JSON.

python gnupg sign, and verify

I am trying to see if I can get the python-gnupg module working to sign and verify a file using a python script. I have the following code, which does not interpret any errors when called.
However the code prints "unverified" at the end, when I thought that I had signed the file (example.txt).
I must be missing something in the documentation but after I read it this is what I came up with for signing and verifying. Any help please?
import gnupg
gpg = gnupg.GPG(gnupghome="/home/myname")
stream = open("example.txt", "rb")
signed_data = gpg.sign_file(stream)
verified = gpg.verify_file(stream)
print "Verified" if verified else "Unverified"
There are a few issues with your code,
1.) gpg = gnupg.GPG(gnupghome="/home/myname") needs to be gpg = gnupg.GPG(gnupghome="/home/myname/.gnupg")
2.) You are attempting to verify the stream, using verify_file(stream), however the stream is still a handle to the original, unsigned file. You would first need to either write the signed data to a new file and call verify_file() against a handle to that file, or verify the result sign_file.
Below is a working example of your demo, using the result of sign_file - but before we get to that, the way to troubleshoot what is happening in your script, you can review the output of stderr on the returned object of the gnupg methods. for example, you can review the result of the signed data by printing out signed_data.stderr. Likewise for return of the verify_file method.
On to the code -
import gnupg
gpg = gnupg.GPG(gnupghome="/home/myname/.gnupg")
stream = open("example.txt", "rb")
signed_data = gpg.sign_file(stream)
verified = gpg.verify(signed_data.data)
print "Verified" if verified else "Unverified"
I hope this helps!

Categories