SageMaker delete Models and Endpoint configurations with python API - python

I've tried deleting/recreating endpoints with the same name, and wasted a lot of time before I realized that changes do not get applied unless you also delete the corresponding Model and Endpoint configuration so that new ones can be created with that name.
Is there a way with the sagemaker python api to delete all three instead of just the endpoint?

I believe you are looking for something like this? :
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.delete_endpoint_config
Examples:
import boto3
deployment_name = 'my_deployment_name'
client = boto3.client('sagemaker')
response = client.describe_endpoint_config(EndpointConfigName=deployment_name)
model_name = response['ProductionVariants'][0]['ModelName']
client.delete_model(ModelName=model_name)
client.delete_endpoint(EndpointName=deployment_name)
client.delete_endpoint_config(EndpointConfigName=deployment_name)

It looks like AWS is currently in the process of supporting model deletion via API with this pull request.
For the time being Amazon's only recommendation is to delete everything via the console.
If this is critical to your system you can probably manage everything via Cloud Formation and create/delete services containing your Sagemaker models and endpoints.

Related

How should StackDriver entries be created?

I am testing out using stackdriver, and I'm curious how to set additional attributes other than the message itself. For example, I'd like to see what application or server is sending the message. Perhaps something like this:
message: "Hello"
tags: ["Application-1", "Server-XYZ"]
Is there a way to do this?
Additionally, is it suggested that a straight text message is sent, or a json struct? For example:
You can create user-defined log-based metric labels, see https://cloud.google.com/logging/docs/logs-based-metrics/labels
You can send custom attributes by using "Structured Logging".
https://cloud.google.com/logging/docs/structured-logging
I'm not sure which product you are running your application (such as Google App Engine Standard/Flexible, Google Cloud Functions, Google Compute Engine, Google Kubernetes Engine), it's recommended to use JSON formatted structured log.
In the case you need to configure logging agent (in the case of GCE), you can set up the agent accordingly.
https://cloud.google.com/logging/docs/agent/installation

Do i need .ebextensions to use AWS resources like DynamoDB or SNS?

I was building a Python web-app with AWS Elastic Beanstalk, and I was wondering if it's necessary to need to create a .ebextensions/xyz.config file to use resources like DynamoDB, SNS, etc
here is a sample code using boto3 and I was able to connect from my web-app and put data into the table without defining any configuration files ...
db = boto3.resource('dynamodb', region_name='us-east-1')
table = db.Table('StudentInfo')
appreciate your inputs
You do not need .ebextensions to create a DynamoDB to work with Beanstalk. However, you can, as described here. This example uses the CloudFormation template syntax to specify a DynamoDB resource. If not in a .ebextensions file, you'd create the DynamoDB through an AWS SDK/Dynamo DB console and make the endpoint available to your Django application.
You can specify an SNS topic for Beanstalk to use to publish events to or as in the above DynamoDB example, create one as a CFN resource. The difference between the two approaches is that, whereas in the former, the Beanstalk environment owns the SNS topic, in the latter, it is the underlying CloudFormation stack that does. If you want to use the SNS topic for things other than to publish environment health events to, you would use the latter approach. For example, to integrate the SNS topic with DynamoDB, you must use the latter approach (i.e. , specify it as a resource in a ebextensions file, rather than as an option setting).
You would need to switch to using IAM roles. Read more here.
I am assuming that you didn't change the default role that gets assigned to the Elastic Beanstalk (EB) instance during creation. The default instance profile role allows EB to utilize other AWS services it needs to create the various components.
Until you understand more about IAM, creating roles, and assigning permissions you can attach AWS managed permissions to this role to test your application (just search for Dynamo and SNS).

Is there anyway to get a callback after every insert/delete on Google Cloud Datastore?

I would like to sync my Cloud Datastore contents with an index in ElasticSearch. I would like for the ES index to always be up to date with the contents of Datastore.
I noticed that an equivalent mechanism is available in the Appengine Python Standard Environment by implementing a _post_put_hook method in a Datastore Model. This doesn't seem to be possible however using the google-cloud-datastore library available for use in the flex environment.
Is there any way to receive a callback after every insert? Or will I have to put up a "proxy" API in front of the datastore API which will update my ES index after every insert/delete?
The _post_put_hook() of NDB.Model does only work if you have written the entity through NDB to Datastore, and yes, unfortunately the NDB library is only available in App Engine Python Standard Environment. I don't know of such feature in Cloud Datastore. If I remember correctly, Firebase Realtime Database or Firestore have triggers for writes, but I guess you are not eager to migrate the database neither.
In Datastore you would either need a "proxy" API with the above method as you suggested, or you would need to modify your Datastore client(s) to do this upon any successful write op. The latter may come with higher risk of fails and stale data in ElasticSearch, especially if the client is outside your control.
I believe that a custom API makes sense if consistent and up-to-date search records is important for your use-cases. Datastore and Python / NDB (maybe with Cloud Endpoints) would be a good approach.
I have a similar solution running on GAE Python Standard (although with the builtin Search API instead of ElasticSearch). If you choose this route you should be aware of two potential caveats:
_post_put_hook() is always called, even if the put operation failed. I have added a code sample below. You can find more details in the docs: model hooks,
hook methods,
check_success()
Exporting the data to ElasticSearch or Search API will prolong your response time. This might be no issue for background tasks, just call the export feature inside _post_put_hook(). But if a user made the request, this could be a problem. For these cases, you can defer the export operation to a different thread, either by using the deferred.defer() method or by creating a push task). More or less, they are the same. Below, I use defer().
Add a class method for every kind of which you want to export search records. Whenever something went wrong or you move apps / datastores, add new search indexes etc. you can call this method that will then query all entities of that kind from datastore batch by batch, and export the search records.
Example with deferred export:
class CustomModel(ndb.Model):
def _post_put_hook(self, future):
try:
if future.check_success() is None:
deferred.defer(export_to_search, self.key)
except:
pass # or log error to Cloud Console with logging.error('blah')
def export_to_search(key=None):
try:
if key is not None:
entity = key.get()
if entity is not None:
call_export_api(entity)
except:
pass # or log error to Cloud Console with logging.error('blah')
```

How to use access credentials in code instead of environment variables with PynamoDB

I have a python app that uses several services from aws. I have one access key and secret for each service. For most of the services I use boto and don't need AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY in the environment. For dynamoDB I use pynamoDB and I have no idea how to set the credentials without these variables.
I want to standardize the credential in a settings file to avoid errors like clash of credentials.
Is this possible? If so, how is it done?
From the PynamoDB documentation:
PynamoDB uses botocore to interact with the DynamoDB API. Thus, any
method of configuration supported by botocore works with PynamoDB. For
local development the use of environment variables such as
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY is probably preferable.
You can of course use IAM users, as recommended by AWS. In addition
EC2 roles will work as well and would be recommended when running on
EC2.
Note that if all the services you are interacting with are within the same AWS account, then the preferred way to supply credentials would be to create a single IAM account with all the necessary permissions attached, or an IAM role if the code is running on EC2 or Lambda.
I was searching for this online and came across this question, though this is old, I am sharing my solution so that it might be helpful someone.
When defining the Dynamo DB model all we need is to add one additional line of code which contains the IAM rolename. Below is a sample model.
If you change the model like the one below we don't need ~/.aws/credentials file on the container.
Note: Make sure you attach DynamoDBRead or write policy to the IAM role, I have attached AmazonDynamoDBFullAccess policy for my instances IAM role.
from pynamodb.models import Model
from pynamodb.attributes import (
UnicodeAttribute, NumberAttribute, UnicodeSetAttribute, UTCDateTimeAttribute
)
import urllib2
class TestClass(Model):
email = UnicodeAttribute(hash_key=True)
UUID = UnicodeAttribute(range_key=True)
class Meta:
region = 'eu-west-2'
# Refer: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html
instanceProfileName = urllib2.urlopen(
'http://169.254.169.254/latest/meta-data/iam/security-credentials/').read()
table_name = 'dynamodb-tablename'

Google AppEngine - How To Perform a Partial Datastore Download

I have a running GAE app that has been collecting data for a while. I am now at the point where I need to run some basic reports on this data and would like to download a subset of the live data to my dev server. Downloading all entities of a kind will simply be too big a data set for the dev server.
Does anyone know of a way to download a subset of entities from a particular kind? Ideally it would be based on entity attributes like date, or client ID etc... but any method would work. I've even tried a regular, full, download then arbitrarily killing the process when I thought I had enough data, but it seems the data is locked up in the .sql3 files generated by the bulkloader.
It looks like that default download/upload from/to GAE datastore utilities don't support filtering (appcfg.py and bulkloader.py).
It seems reasonable to do one of two things:
write a utility (select+export+save-to-local-file) and execute it locally accessing remotely GAE datastore in remote api shell
write a admin web function for select+export+zip - new url in handler + upload to GAE + call-it-using-http

Categories