I m using boto and cloudformation to orchestrate few resource
For creating templates for cloud formation. I m reading a json-file from my local disk and creating json-string to pass as a parameter for template_body
try:
fileObj = open(filename,'r')
json_data = json.loads(fileObj.read())
return json_data
except IOError as e:
print e
exit()
And my cloud formation connection string and stack creation goes like this
cfnConnectObj = cfn.connection.CloudFormationConnection(aws_access_key_id=aKey, aws_secret_access_key=sKey, is_secure=True,debug=2,path='/',validate_certs=True,region=region[3]) #created connection object for cloudformation service
stackID = cfnConnectObj.create_stack('demodrupal',template_body=templateJson, template_url=None,parameters=[],notification_arns=[],disable_rollback=False,timeout_in_minutes=None,capabilities=['CAPABILITY_IAM'],tags=None)
I m getting Boto Error [ERROR]:{"Error":{"Code":"ValidationError","Message":"Template format error: JSON not well-formed. (line 1, column 3)","Type":"Sender"}
Why is this error ?
I have used json.loads but still it shows Json not well formed. Is there anything i m missing ?
Please en-light me
**I m new to python and boto
json.loads takes json and converts it into a python object. If you have a JSON file already you can just pass that file directly to the service. Alternately you can load the JSON into python make any adjustments in python and then use json.dumps to get your well formed JSON.
Related
I am using the below code to append data to Azure blob using python.
from azure.storage.blob import AppendBlobService
append_blob_service = AppendBlobService(account_name='myaccount', account_key='mykey')
# The same containers can hold all types of blobs
append_blob_service.create_container('mycontainer')
# Append blobs must be created before they are appended to
append_blob_service.create_blob('mycontainer', 'myappendblob')
append_blob_service.append_blob_from_text('mycontainer', 'myappendblob', u'Hello, world!')
append_blob = append_blob_service.get_blob_to_text('mycontainer', 'myappendblob')
The above code works fine, but when I tried to insert new data, the old data gets overwritten.
Is there any way I can append data to 'myappendblob'
Considering you are calling the same code to append the data, the issue is with the following line of code:
append_blob_service.create_blob('mycontainer', 'myappendblob')
If you read the documentation for create_blob method, you will notice the following:
Creates a blob or overrides an existing blob. Use if_none_match=* to
prevent overriding an existing blob.
So essentially you are overriding the blob every time you call your code.
You should call this method with if_none_match="*" as the documentation suggests. If the blob exists, your code will throw an exception which you will need to handle.
Try this code which is taken from the Document and it is given by #Harsh Jain ,
from azure.storage.blob import AppendBlobService
def append_data_to_blob(data):
service = AppendBlobService(account_name="<Storage acc name>",
account_key="<Storage acc key>")
try:
service.append_blob_from_text(container_name="<name of Conatiner >", blob_name="<The name of file>", text = data)
except:
service.create_blob(container_name="<name of Conatiner >", blob_name="<the name of file>")
service.append_blob_from_text(container_name="<name of Conatiner>", blob_name="<the name of file>", text = data)
print('Data got Appended ')
append_data_to_blob('Hi blob')
Taken References from:
https://www.educative.io/answers/how-to-append-data-in-blob-storage-in-azure-using-python
I have tried to upload an XML File to S3 using boto3. As recommended by Amazon, I would like to send a Base64 Encoded MD5-128 Bit Digest(Content-MD5) of the data.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.put
My Code:
with open(file, 'rb') as tempfile:
body = tempfile.read()
tempfile.close()
hash_object = hashlib.md5(body)
base64_md5 = base64.encodebytes(hash_object.digest())
response = s3.Object(self.bucket, self.key + file).put(
Body=body.decode(self.encoding),
ACL='private',
Metadata=metadata,
ContentType=self.content_type,
ContentEncoding=self.encoding,
ContentMD5=str(base64_md5)
)
When i try this the str(base64_md5) create a string like 'b'ZpL06Osuws3qFQJ8ktdBOw==\n''
In this case, I get this Error Message:
An error occurred (InvalidDigest) when calling the PutObject operation: The Content-MD5 you specified was invalid.
For Test purposes I copied only the Value without the 'b' in front: 'ZpL06Osuws3qFQJ8ktdBOw==\n'
Then i get this Error Message:
botocore.exceptions.HTTPClientError: An HTTP Client raised and unhandled exception: Invalid header value b'hvUe19qHj7rMbwOWVPEv6Q==\n'
Can anyone help me how to save Upload a File to S3?
Thanks,
Oliver
Starting with #Isaac Fife's example, stripping it down to identify what's required vs not, and to include imports and such to make it a full reproducible example:
(the only change you need to make is to use your own bucket name)
import base64
import hashlib
import boto3
contents = "hello world!"
md = hashlib.md5(contents.encode('utf-8')).digest()
contents_md5 = base64.b64encode(md).decode('utf-8')
boto3.client('s3').put_object(
Bucket="mybucket",
Key="test",
Body=contents,
ContentMD5=contents_md5
)
Learnings: first, the MD5 you are trying to generate will NOT look like what an 'upload' returns. We actually need a base64 version, it returns a md.hexdigest() version. hex is base16, which is not base64.
(Python 3.7)
Took me hours to figure this out because the only error you get is "The Content-MD5 you specified was invalid." Super useful for debugging... Anyway, here is the code I used to actually get the file to upload correctly before refactoring.
json_results = json_converter.convert_to_json(result)
json_results_utf8 = json_results.encode('utf-8')
content_md5 = md5.get_content_md5(json_results_utf8)
content_md5_string = content_md5.decode('utf-8')
metadata = {
"md5chksum": content_md5_string
}
s3 = boto3.resource('s3', config=Config(signature_version='s3v4'))
obj = s3.Object(bucket, 'filename.json')
obj.put(
Body=json_results_utf8,
ContentMD5=content_md5_string,
ServerSideEncryption='aws:kms',
Metadata=metadata,
SSEKMSKeyId=key_id)
and the hashing
def get_content_md5(data):
digest = hashlib.md5(data).digest()
return base64.b64encode(digest)
The hard part for me was figuring out what encoding you need at each step in the process and not being very familiar with how strings are stored in python at the time.
get_content_md5 takes a utf-8 bytes-like object only, and returns the same. But to pass the md5 hash to aws, it needs to be a string. You have to decode it before you give it to ContentMD5.
Pro-tip - Body on the other hand, needs to be given bytes or a seekable object. Make sure if you pass a seekable object that you seek(0) to the beginning of the file before you pass it to AWS or the MD5 will not match. For that reason, using bytes is less error prone, imo.
I have hosted files in AWS s3 bucket, I need only all S3 bucket object URL's in CSV file
Please suggest
You can get all S3 Object URLS by using the AWS SDK for S3. First, what you need to do is read all items in a bucket. You can use Python code similar to this Java code (you can port the logic):
ListObjectsRequest listObjects = ListObjectsRequest
.builder()
.bucket(bucketName)
.build();
ListObjectsResponse res = s3.listObjects(listObjects);
List<S3Object> objects = res.contents();
for (ListIterator iterVals = objects.listIterator(); iterVals.hasNext(); ) {
S3Object myValue = (S3Object) iterVals.next();
System.out.print("\n The name of the key is " + myValue.key());
}
Then iterate through the list and get the key as shown above. For each object, you can get the URL using Python code similar to this:
GetUrlRequest request = GetUrlRequest.builder()
.bucket(bucketName)
.key(keyName)
.build();
URL url = s3.utilities().getUrl(request);
System.out.println("The URL for "+keyName +" is "+url.toString());
Put each URL value into a collection and then write the collection out to a CSV. That is how you achieve your use case.
I want to be able to split some large JSON files in blob storage (~1GB each) into individual files (one file per record)
I have tried using get_blob_to_stream from the Azure Python SDK, but am getting the following error:
AzureHttpError: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
To test, I've just been printing the text that has been downloaded from the blob, and haven't yet tried writing back to individual JSON files
with BytesIO() as document:
block_blob_service = BlockBlobService(account_name=STORAGE_ACCOUNT_NAME, account_key=STORAGE_ACCOUNT_KEY)
block_blob_service.get_blob_to_stream(container_name=CONTAINER_NAME, blob_name=BLOB_ID, stream=document)
print(document.getvalue())
Interestingly, when I limit the size of the blob information that I'm downloading, the error message doesn't appear, and I can get some information out:
with BytesIO() as document:
block_blob_service = BlockBlobService(account_name=STORAGE_ACCOUNT_NAME, account_key=STORAGE_ACCOUNT_KEY)
block_blob_service.get_blob_to_stream(container_name=CONTAINER_NAME, blob_name=BLOB_ID, stream=document, start_range=0, end_range=100000)
print(document.getvalue())
Does anyone know what is going on here, or have any better approaches to splitting a large JSON out?
Thanks!
This Error message "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature" is you get usually when the header is not formed correctly. you get following when you get this error:
<?xml version="1.0" encoding="utf-8"?>
<Error>
<Code>AuthenticationFailed</Code>
<Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:096c6d73-f01e-0054-6816-e8eaed000000
Time:2019-03-31T23:08:43.6593937Z</Message>
<AuthenticationErrorDetail>Authentication scheme Bearer is not supported in this version.</AuthenticationErrorDetail>
</Error>
and the solution to resolve this is to add below header:
x-ms-version: 2017-11-09
But since you are saying that it is working when you limit the size which means you have to write your code using chunk approach. Here is something you can try.
import io
import datetime
from azure.storage.blob import BlockBlobService
acc_name = 'myaccount'
acc_key = 'my key'
container = 'storeai'
blob = "orderingai2.csv"
block_blob_service = BlockBlobService(account_name=acc_name, account_key=acc_key)
props = block_blob_service.get_blob_properties(container, blob)
blob_size = int(props.properties.content_length)
index = 0
chunk_size = 104,858 # = 0.1meg don't make this to big or you will get memory error
output = io.BytesIO()
def worker(data):
print(data)
while index < blob_size:
now_chunk = datetime.datetime.now()
block_blob_service.get_blob_to_stream(container, blob, stream=output, start_range=index, end_range=index + chunk_size - 1, max_connections=50)
if output is None:
continue
output.seek(index)
data = output.read()
length = len(data)
index += length
if length > 0:
worker(data)
if length < chunk_size:
break
else:
break
Hope it helps.
I'd like to copy data from an S3 directory to the Amazon ElasticSearch service. I've tried following the guide, but unfortunately the part I'm looking for is missing. I don't know how the lambda function itself should look like (and all the info about this in the guide is: "Place your application source code in the eslambda folder."). I'd like ES to autoindex the files.
Currently I'm trying
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = urllib.unquote_plus(record['s3']['object']['key'])
index_name = event.get('index_name', key.split('/')[0])
object = s3_client.Object(bucket, key)
data = object.get()['Body'].read()
helpers.bulk(es, data, chunk_size=100)
But I get like a massive error stating
elasticsearch.exceptions.RequestError: TransportError(400, u'action_request_validation_exception', u'Validation Failed: 1: index is missing;2: type is missing;3: index is missing;4: type is missing;5: index is missing;6: type is missing;7: ...
Could anyone explain to me, how can I set things up so that my data gets moved from S3 to ES where it gets auto-mapped and auto-indexed? Apparently it's possible, as mentioned in the reference here and here.
While mapping can automatically be assigned in Elasticsearch, the indexes are not automatically generated. You have to specify the index name and type in the POST request. If that index does not exist, then Elasticsearch will create the index automatically.
Based on your error, it looks like you're not passing through an index and type.
For example, here's how a simple POST request to add a record to the index MyIndex and type MyType which would first create the index and type if it did not already exist.
curl -XPOST 'example.com:9200/MyIndex/MyType/' \
-d '{"name":"john", "tags" : ["red", "blue"]}'
I wrote a script to download a csv file from S3 and then transfer the data to ES.
Made an S3 client using boto3 and downloaded the file from S3
Made an ES client to connect to Elasticsearch.
Opened the csv file and used the helpers module from elasticsearch to insert csv file contents into elastic search.
main.py
import boto3
from elasticsearch import helpers, Elasticsearch
import csv
import os
from config import *
#S3
Downloaded_Filename=os.path.basename(Prefix)
s3 = boto3.client('s3', aws_access_key_id=awsaccesskey,aws_secret_access_key=awssecretkey,region_name=awsregion)
s3.download_file(Bucket,Prefix,Downloaded_Filename)
#ES
ES_index = Downloaded_Filename.split(".")[0]
ES_client = Elasticsearch([ES_host],http_auth=(ES_user, ES_password),port=ES_port)
#S3 to ES
with open(Downloaded_Filename) as f:
reader = csv.DictReader(f)
helpers.bulk(ES_client, reader, index=ES_index, doc_type='my-type')
config.py
awsaccesskey = ""
awssecretkey = ""
awsregion = "us-east-1"
Bucket=""
Prefix=''
ES_host = "localhost"
ES_port = "9200"
ES_user = "elastic"
ES_password = "changeme"