is there an example of the Python AWS Data Science SDK for stepfunctions stepfunctions.steps.Parallel class implementation?
Parallel execution requires branches, but i cant seem to find the methods or documentation about their description.
Generating a synchronous list of steps works fine, but i cant find how to define the parallel step, anyone knows?
Are there any other libraries that can do this? boto3 as far as i looked doesnt have the functionality and CDK is not suitable, as this will be a service.
I'd like to be able to generate something like this by using just code:
{
"Comment": "Parallel Example.",
"StartAt": "LookupCustomerInfo",
"States": {
"LookupCustomerInfo": {
"Type": "Parallel",
"End": true,
"Branches": [
{
"StartAt": "LookupAddress",
"States": {
"LookupAddress": {
"Type": "Task",
"Resource":
"arn:aws:lambda:us-east-1:123456789012:function:AddressFinder",
"End": true
}
}
},
{
"StartAt": "LookupPhone",
"States": {
"LookupPhone": {
"Type": "Task",
"Resource":
"arn:aws:lambda:us-east-1:123456789012:function:PhoneFinder",
"End": true
}
}
}
]
}
}
}
Related
attached an example AVRO-Schema
{
"type": "record",
"name": "DummySampleAvroValue",
"namespace": "de.company.dummydomain",
"fields": [
{
"name": "ID",
"type": "int"
},
{
"name": "NAME",
"type": [
"null",
"string"
]
},
{
"name": "STATE",
"type": "int"
},
{
"name": "TIMESTAMP",
"type": [
"null",
"string"
]
}
]
}
Regarding the section "JSON Encoding" of the official AVRO-Specs - see: https://avro.apache.org/docs/current/spec.html#json_encoding - a JSON Message which validates against the above AVRO-Schema should look like the following because of the UNION-Types used:
{
"ID":1,
"NAME":{
"string":"Kafka"
},
"STATE":-1,
"TIMESTAMP":{
"string":"2022-04-28T10:57:03.048413"
}
}
When producing this message via Confluent Rest Proxy (AVRO), everything works fine, the data is accepted, validated and present in Kafka.
When using the "SearializingProducer" from the confluent_kafka Python Package, the example message is not accepted and only "regular" JSON works, e. g.:
{
"ID":1,
"NAME":"Kafka",
"STATE":-1,
"TIMESTAMP":"2022-04-28T10:57:03.048413"
}
Is this intended behaviour or am I doing something wrong? Can I tell the SerializingProducer to accept this encoding?
I need to hold open both ways to produce messages but the sending system can/want´s only to provide one of the above Payloads. Is there a way to use both with the same payload?
Thanks in advance.
Best regards
I have a Python Azure function that triggers based on messages to a topic, which works fine independently. However, if I then try to also write a message to a different ServiceBus Queue it doesn't work (as in the Azure Function won't even trigger if new messages are published to the topic). Feels like the trigger conditions aren't met when I include the msg_out: func.Out[str] component. Any help would be much appreciated!
__init.py
import logging
import azure.functions as func
def main(msg: func.ServiceBusMessage, msg_out: func.Out[str]):
# Log the Service Bus Message as plaintext
# logging.info("Python ServiceBus topic trigger processed message.")
logging.info("Changes are coming through!")
msg_out.set("Send an email")
function.json
{
"scriptFile": "__init__.py",
"entryPoint": "main",
"bindings": [
{
"name": "msg",
"type": "serviceBusTrigger",
"direction": "in",
"topicName": "publish-email",
"subscriptionName": "validation-sub",
"connection": "Test_SERVICEBUS"
},
{
"type": "serviceBus",
"direction": "out",
"connection": "Test_SERVICEBUS",
"name": "msg_out",
"queueName": "email-test"
}
]
}
host.json
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
},
"extensions": {
"serviceBus": {
"prefetchCount": 100,
"messageHandlerOptions": {
"autoComplete": true,
"maxConcurrentCalls": 32,
"maxAutoRenewDuration": "00:05:00"
},
"sessionHandlerOptions": {
"autoComplete": false,
"messageWaitTimeout": "00:00:30",
"maxAutoRenewDuration": "00:55:00",
"maxConcurrentSessions": 16
}
}
}
}
I can reproduce your problem, it seems to be caused by the following error:
Property sessionHandlerOptions is not allowed.
Deleting sessionHandlerOptions can be triggered normally.
I'm newbie to AWS Step Functions and AWS Batch. I'm trying to integrate AWS Batch Job with Step Function. AWS Batch Job executes simple python scripts which output string value (High level simplified requirement) . I need to have the python script output available to the next state of the step function. How I should be able to accomplish this. AWS Batch Job output does not contain results of the python script. instead it contains all the container related information with input values.
Example : AWS Batch Job executes python script which output "Hello World". I need "Hello World" available to the next state of the step function to execute a lambda associated with it.
I was able to do it, below is my state machine, I took the sample project for running the batch job Manage a Batch Job (AWS Batch, Amazon SNS) and modified it for two lambdas for passing input/output.
{
"Comment": "An example of the Amazon States Language for notification on an AWS Batch job completion",
"StartAt": "Submit Batch Job",
"TimeoutSeconds": 3600,
"States": {
"Submit Batch Job": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobName": "BatchJobNotification",
"JobQueue": "arn:aws:batch:us-east-1:1234567890:job-queue/BatchJobQueue-737ed10e7ca3bfd",
"JobDefinition": "arn:aws:batch:us-east-1:1234567890:job-definition/BatchJobDefinition-89c42b1f452ac67:1"
},
"Next": "Notify Success",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Notify Failure"
}
]
},
"Notify Success": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:1234567890:function:readcloudwatchlogs",
"Parameters": {
"LogStreamName.$": "$.Container.LogStreamName"
},
"ResultPath": "$.lambdaOutput",
"Next": "ConsumeLogs"
},
"ConsumeLogs": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:1234567890:function:consumelogs",
"Parameters": {
"randomstring.$": "$.lambdaOutput.logs"
},
"End": true
},
"Notify Failure": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"Message": "Batch job submitted through Step Functions failed",
"TopicArn": "arn:aws:sns:us-east-1:1234567890:StepFunctionsSample-BatchJobManagement17968f39-e227-47ab-9a75-08a7dcc10c4c-SNSTopic-1GR29R8TUHQY8"
},
"End": true
}
}
}
The key to read logs was in the Submit Batch Job output which contains LogStreamName, that I passed to my lambda named function:readcloudwatchlogs and read the logs and then eventually passed the read logs to the next function named function:consumelogs. You can see in the attached screenshot consumelogs function printing the logs.
{
"Attempts": [
{
"Container": {
"ContainerInstanceArn": "arn:aws:ecs:us-east-1:1234567890:container-instance/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/6d11fdbfc9eb4f40b0d6b85c396bb243",
"ExitCode": 0,
"LogStreamName": "BatchJobDefinition-89c42b1f452ac67/default/2ad955bf59a8418893f53182f0d87b4b",
"NetworkInterfaces": [],
"TaskArn": "arn:aws:ecs:us-east-1:1234567890:task/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/2ad955bf59a8418893f53182f0d87b4b"
},
"StartedAt": 1611329367577,
"StatusReason": "Essential container in task exited",
"StoppedAt": 1611329367748
}
],
"Container": {
"Command": [
"echo",
"Hello world"
],
"ContainerInstanceArn": "arn:aws:ecs:us-east-1:1234567890:container-instance/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/6d11fdbfc9eb4f40b0d6b85c396bb243",
"Environment": [
{
"Name": "MANAGED_BY_AWS",
"Value": "STARTED_BY_STEP_FUNCTIONS"
}
],
"ExitCode": 0,
"Image": "137112412989.dkr.ecr.us-east-1.amazonaws.com/amazonlinux:latest",
"LogStreamName": "BatchJobDefinition-89c42b1f452ac67/default/2ad955bf59a8418893f53182f0d87b4b",
"TaskArn": "arn:aws:ecs:us-east-1:1234567890:task/BatchComputeEnvironment-4a1593ce223b3cf_Batch_7557555f-5606-31a9-86b9-83321eb3e413/2ad955bf59a8418893f53182f0d87b4b",
..
},
..
"Tags": {
"resourceArn": "arn:aws:batch:us-east-1:1234567890:job/d36ba07a-54f9-4acf-a4b8-3e5413ea5ffc"
}
}
Read Logs Lambda code:
import boto3
client = boto3.client('logs')
def lambda_handler(event, context):
print(event)
response = client.get_log_events(
logGroupName='/aws/batch/job',
logStreamName=event.get('LogStreamName')
)
log = {'logs': response['events'][0]['message']}
return log
Consume Logs Lambda Code
import json
print('Loading function')
def lambda_handler(event, context):
print(event)
You could pass your step function execution ID ($$.Execution.ID) to the batch process and then your batch process could write its response to DynamoDB using the execution ID and a primary key (or other filed). You would then need a subsequent step to read directly from DynamoDB and capture the process response.
I have been on the hunt for a way to do this without the subsequent step, but thus far no dice.
While you can't do waitForTaskToken with submitJob, you can still use the callback pattern by passing the task token in the Parameters and referencing it in the command override with Ref::TaskToken:
...
"Submit Batch Job": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"TaskToken.$": "$$.Task.Token"
},
"ContainerOverrides": {
"command": ["python3",
"my_script.py",
"Ref::TaskToken"]
}
...
Then when your script is done doing its processing, you just call StepFunctions.SendTaskSuccess or StepFunctions.SendTaskFailure:
import boto3
client = boto3.client('stepfunctions')
def main()
args = sys.argv[1:]
client.send_task_success(taskToken=args[0], output='Hello World')
This will tell StepFunctions your job is complete and the output should be 'Hello World'. This pattern can also be useful if your Batch job completes the work required to resume the state machine, but needs to do some cleanup work afterward. You can send_task_success with the results and the state machine can resume while the Batch job does the cleanup work.
Thanks #samtoddler for your answer.
We used it for a while.
However, recently my friend #liorzimmerman found a better solution.
Using stepfunctions send-task-success
When calling the job from the state machine you need to send the task-token:
"States": {
"XXX_state": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "arn:aws:batch:us-east-1:XXX:job-definition/task_XXX:4",
"JobQueue": "arn:aws:batch:us-east-1:XXX:job-queue/XXX-queue",
"JobName": "XXX",
"Parameters": {
"TASK_TOKEN.$": "$$.Task.Token",
}
},
"ResultPath": "$.payload",
"End": true
}
Next, inside the docker run by the job, the results are sent by:
aws stepfunctions send-task-success --task-token $TASK_TOKEN --task-output $OUTPUT_JSON
I'm looking for a way to set service/status/loadBalance/ingress-ip after creating k8s service of type=loadbalancer (as appears in 'Type LoadBalancer' section at the next link https://kubernetes.io/docs/concepts/services-networking/service/ ).
My problem is similiar to the issue described in following link (Is it possible to update a kubernetes service 'External IP' while watching for the service? ) but couldn't find the answer.
Thanks in advance
There's two ways to do this. With a json patch or with a merge patch. Here's how you do the latter:
[centos#ost-controller ~]$ cat patch.json
{
"status": {
"loadBalancer": {
"ingress": [
{"ip": "8.3.2.1"}
]
}
}
}
Now, here you can see the for merge patches, you have to make a dictionary containing all the Object tree (begins at status) that will need some change to be merged. If you wanted to replace something, then you'd have to use the json patch strategy.
Once we have this file we send the request and if all goes well, we'll receive a response consisting on the object with the merge already applied:
[centos#ost-controller ~]$ curl --request PATCH --data "$(cat patch.json)" -H "Content-Type:application/merge-patch+json" http://localhost:8080/api/v1/namespaces/default/services/kubernetes/status{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "kubernetes",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/services/kubernetes/status",
"uid": "b8ece320-76c1-11e7-b468-fa163ea3fb09",
"resourceVersion": "2142242",
"creationTimestamp": "2017-08-01T14:00:06Z",
"labels": {
"component": "apiserver",
"provider": "kubernetes"
}
},
"spec": {
"ports": [
{
"name": "https",
"protocol": "TCP",
"port": 443,
"targetPort": 6443
}
],
"clusterIP": "10.0.0.129",
"type": "ClusterIP",
"sessionAffinity": "ClientIP"
},
"status": {
"loadBalancer": {
"ingress": [
{
"ip": "8.3.2.1"
}
]
}
}
The index has the capability of taking custom scripting in Python, but I can't find an example of custom scripting written in Python anywhere. Does anybody have an example of a working script? One with something as simple as an if-statement would be amazing.
A simple custom scoring query using python (assuming you have the plugin installed).
{
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"lang": "python",
"script": [
"if _score:",
" _score"
]
},
"boost_mode": "replace"
}
},
"track_scores": true
}
Quoted from elasticsearch ML -
Luca pointed out that ES calls python with an 'eval'
PyObject ret = interp.eval((PyCode) compiledScript);
Just make sure your code pass through the eval.