I have a project to implement where I need to communicate with a Step Function that another department created. I'm new to Step Functions so please bear with me if I'm missing anything.
We have a UI where a user can request their data or have their data deleted. This request is sent to an API Gateway and then to a Step Function which creates multiple workers/subscriptions. My task is to create an Azure Function (using Python) to process the tasks and link to the relevant places where we have data and either delete it or return it to an S3 bucket. I thus have the below script:
import datetime
import logging
import boto3
import os
import json
workerName = creds['workerName']
region_name = creds['region_name']
activityArn = creds['activityArn']
aws_access_key_id = creds['aws_access_key_id']
aws_secret_access_key = creds['aws_secret_access_key']
bucket = creds['bucket']
sfn_client = boto3.client(
service_name='stepfunctions',
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name
)
activity = sfn_client.get_activity_task(
activityArn = activityArn,
workerName = workerName
)
task_token, task = activity['taskToken'], json.loads(activity['input'])
# TODO Process Task
I notice that every time I run activity = ... I get a new task instead of a list and have read through the documentation that I need to use the send_task_failure(), send_task_heartbeat(), and send_task_success() methods which is fine. Since it returns one activity per run I was planning to run a loop until I have no more activities but when I get to the end (or when there are no activities to run) the script just hangs until timeout.
Is there a way to get a count of unstarted activities only so I can use that to loop through or is there a better approach to this?
Ok so after reading through the documentation I found that I had to add a read_timeout > than the default...I think the default is 60s so I added a 65s timeout as per below
import datetime
import logging
import boto3
from botocore.client import Config
import os
import json
connect_timeout = creds['connect_timeout'] + 5
read_timeout = creds['read_timeout'] + 5
workerName = creds['workerName']
region_name = creds['region_name']
activityArn = creds['activityArn']
aws_access_key_id = creds['aws_access_key_id']
aws_secret_access_key = creds['aws_secret_access_key']
bucket = creds['bucket']
cfg = creds['cfg']
config = Config(
connect_timeout=connect_timeout,
read_timeout=read_timeout,
retries={'max_attempts': 0}
)
sfn_client = boto3.client(
service_name='stepfunctions',
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name,
config=config
)
while True:
activity_task = sfn_client.get_activity_task(
activityArn = activityArn,
workerName = workerName
)
if 'input' not in activity_task.keys() or 'taskToken' not in activity_task.keys():
print(f"No more activity tasks")
break
taskToken, task = activity_task['taskToken'], json.loads(activity_task['input'])
On the final pass through it returns a JSON with the same keys as all other activities but without the input and taskToken
Related
I am writing a new cloud function and am using the new Google Cloud Logging library as announced at https://cloud.google.com/blog/products/devops-sre/google-cloud-logging-python-client-library-v3-0-0-release.
I am also using functions-framework to debug my code locally before pushing it to GCP. Setup and Invoke Cloud Functions using Python has been particularly useful here.
The problem I have is that when using these two things together I cannot see logging output in my IDE, I can only see print statements. Here's a sample of my code:
from flask import Request
from google.cloud import bigquery
from datetime import datetime
import google.cloud.logging
import logging
log_client = google.cloud.logging.Client()
log_client.setup_logging()
def main(request) -> str:
#
# do stuff to setup a bigquery job
#
bq_client = bigquery.Client()
job_config = bigquery.QueryJobConfig(labels={"key": "value"})
nowstr = datetime.now().strftime("%Y%m%d%H%M%S%f")
job_id = f"qwerty-{nowstr}"
query_job = bq_client.query(
query=export_script, job_config=job_config, job_id=job_id
)
print("Started job: {}".format(query_job.job_id))
query_job.result() # Waits for job to complete.
logging.info(f"job_id={query_job.job_id}")
logging.info(f"total_bytes_billed={query_job.total_bytes_billed}")
return f"{query_job.job_id} {query_job.state} {query_job.error_result}"
However when I run the function using cloud functions the only output I see is in my terminal is
Started job: qwerty-20220306181905424093
As you can see the call to print(...) has outputted to my terminal but the call to logging.info(...) has not. Is there a way to redirect logging output to my terminal when running locally using functions-framework but not affect logging when the function is running as an actual cloud function in GCP?
Thanks to the advice from #cryptofool I figured out that I needed to change the default logging level to get output to appear in the terminal.
from flask import Request
from google.cloud import bigquery
from datetime import datetime
import google.cloud.logging
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def main(request) -> str:
#
# do stuff to setup a bigquery job
#
bq_client = bigquery.Client()
job_config = bigquery.QueryJobConfig(labels={"key": "value"})
nowstr = datetime.now().strftime("%Y%m%d%H%M%S%f")
job_id = f"qwerty-{nowstr}"
query_job = bq_client.query(
query=export_script, job_config=job_config, job_id=job_id
)
print("Started job: {}".format(query_job.job_id))
query_job.result() # Waits for job to complete.
logging.info(f"job_id={query_job.job_id}")
logging.info(f"total_bytes_billed={query_job.total_bytes_billed}")
return f"{query_job.job_id} {query_job.state} {query_job.error_result}"
Started job: qwerty-20220306211233889260
INFO:root:job_id=qwerty-20220306211233889260
INFO:root:total_bytes_billed=31457280
However, I still can't any output in the terminal when using google.cloud.logging
from flask import Request
from google.cloud import bigquery
from datetime import datetime
import google.cloud.logging
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
log_client = google.cloud.logging.Client()
log_client.setup_logging()
def main(request) -> str:
#
# do stuff to setup a bigquery job
#
bq_client = bigquery.Client()
job_config = bigquery.QueryJobConfig(labels={"key": "value"})
nowstr = datetime.now().strftime("%Y%m%d%H%M%S%f")
job_id = f"qwerty-{nowstr}"
query_job = bq_client.query(
query=export_script, job_config=job_config, job_id=job_id
)
print("Started job: {}".format(query_job.job_id))
query_job.result() # Waits for job to complete.
logging.info(f"job_id={query_job.job_id}")
logging.info(f"total_bytes_billed={query_job.total_bytes_billed}")
return f"{query_job.job_id} {query_job.state} {query_job.error_result}"
Started job: qwerty-20220306211718088936
I think I'll start another thread about this.
We are going to get some metrics from Azure Monitor.
I Installed the libraries and made a script to do that.
The script is following.
#! /usr/bin/env python
import datetime
from azure.mgmt.monitor import MonitorManagementClient
from azure.identity import ClientSecretCredential
subscription_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
resource_group_name = 'xxxx-xxxxx'
vm_name = 'xxxxxxxxxx'
resource_id = (
"subscriptions/{}/"
"resourceGroups/{}/"
"providers/Microsoft.Compute/virtualMachines/{}"
).format(subscription_id, resource_group_name, vm_name)
TENANT_ID = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
CLIENT = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
KEY = 'xxxxxxxxx'
credentials = ServicePrincipalCredentials(
client_id = CLIENT,
secret = KEY,
tenant = TENANT_ID
)
client = MonitorManagementClient(
credentials,
subscription_id
)
today = datetime.datetime.now()
nexttime = today - datetime.timedelta(minutes=1)
metrics_data = client.metrics.list(
resource_id,
timespan="{}/{}".format(nexttime, today),
interval='PT1M',
metricnames='Percentage CPU',
aggregation='average'
)
for item in metrics_data.value:
for timeserie in item.timeseries:
for data in timeserie.data:
print("{}".format(data.average))
when I run this script, the result shows sometimes 'None',regardless of I can see the right value on Azure-Monitor.
When I run the script that getting metric from Azure Load Balancer, the script returns the right value.
This URL (https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-supported#microsoftcomputevirtualmachines) says the aggregation type of Percentage CPU is Average, so the script seems to be correct.
I don't know why this script of getting Percentage CPU doesn't return the same value as Azure-Monitor.
Does anyone know the solution?
Azure monitor logs metrics by UTC time, so you should use today = datetime.datetime.utcnow() to define your timespan.
Based on your code, you are querying the latest 1 min VM CPU Percentage, it will be some latency that Azure monitors log metrics, so maybe you can't get this value due to latency, so maybe you can try to get i,e last 5th min CPU Percentage.
Just try the code snippet below:
today = datetime.datetime.utcnow()
nexttime = today - datetime.timedelta(minutes=5)
query_timespan = "{}/{}".format(nexttime, today - datetime.timedelta(minutes=4))
print(query_timespan)
metrics_data = client.metrics.list(
resource_id,
timespan=query_timespan,
interval='PT1M',
metricnames='Percentage CPU',
aggregation='average'
)
for item in metrics_data.value:
for timeserie in item.timeseries:
for data in timeserie.data:
print("{}".format(data.average))
Result on my side:
Portal display(my timezone is UTC+8):
I made an API for my AI model but I would like to not have any down time when I update the model. I search a way to load in background and once it's loaded I switch the old model with the new. I tried passing values between sub process but doesn't work well. Do you have any idea how can I do that ?
You can place the serialized model in a raw storage, like an S3 bucket if you're on AWS. In S3's case, you can use bucket versioning which might prove helpful. Then setup some sort of trigger. You can definitely get creative here, and I've thought about this a lot. In practice, the best options I've tried are:
Set up an endpoint that when called will go open the new model at whatever location you store it at. Set up a webhook on the storage/S3 bucket that will send a quick automated call to the given endpoint and auto-load that new item
Same thing as #1, but instead you just manually load it. In both cases you'll really want some security on that endpoint or anyone that finds your site can just absolutely abuse your stack.
Set a timer at startup that calls a given function nightly, internally running within the application itself. The function is invoked and then goes and reloads.
Could be other ideas I'm not smart enough (yet!) to use, just trying to start some dialogue.
Found a way to do it with async and multiprocessing
import asyncio
import random
from uvicorn import Server, Config
from fastapi import FastAPI
import time
from multiprocessing import Process, Manager
app = FastAPI()
value = {"latest": 1, "b": 2}
#app.get("/")
async def root():
global value
return {"message": value}
def background_loading(d):
time.sleep(2)
d["test"] = 3
async def update():
while True:
global value
manager = Manager()
d = manager.dict()
p1 = Process(target=background_loading, args=(d,))
p1.daemon = True
p1.start()
while p1.is_alive():
await asyncio.sleep(5)
print(f'Update to value to {d}')
value = d
if __name__ == "__main__":
loop = asyncio.new_event_loop()
config = Config(app=app, loop=loop)
server = Server(config)
loop.create_task(update())
loop.run_until_complete(server.serve())
credentials = ServicePrincipalCredentials(client_id=client_id, secret=secret, tenant=tenant)
adf_client = DataFactoryManagementClient(credentials, subscription_id)
run_response = adf_client.pipelines.create_run(rg_name, df_name, pipeline_nm,{})
# Monitor the pipeline run
pipeline_run = adf_client.pipeline_runs.get(rg_name, df_name, run_response.run_id)
while (pipeline_run.status == 'InProgress' or pipeline_run.status == 'Queued'):
#print("[INFO]:Pipeline run status: {}".format(pipeline_run.status))
time.sleep(statuschecktime)
pipeline_run = adf_client.pipeline_runs.get(rg_name, df_name, run_response.run_id)
print("[INFO]:Pipeline run status: {}".format(pipeline_run.status))
print('')
activity_runs_paged = list(adf_client.activity_runs.list_by_pipeline_run(rg_name, df_name, pipeline_run.run_id, datetime.now() - timedelta(1), datetime.now() + timedelta(1)))
Activity run is different from the pipeline run, if you want to fetch the pipelines run details, follow the steps below.
1.Register an application with Azure AD and create a service principal
2.Get tenant and app ID values for signing in and Create a new application secret and save it
3.Navigate to the datafactory in the portal -> Access control (IAM) -> Add role assignment -> add your application as a role e.g. Contributor, details follow this.
4.Install the packages.
pip install azure-mgmt-resource
pip install azure-mgmt-datafactory
5.Then use the code below to query pipeline runs in the factory based on input filter conditions.
from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.datafactory import DataFactoryManagementClient
from azure.mgmt.datafactory.models import *
from datetime import datetime, timedelta
subscription_id = "<subscription-id>"
rg_name = "<resource-group-name>"
df_name = "<datafactory-name>"
tenant_id = "<tenant-id>"
client_id = "<application-id (i.e client id)>"
client_secret = "<client-secret>"
credentials = ServicePrincipalCredentials(client_id=client_id, secret=client_secret, tenant=tenant_id)
adf_client = DataFactoryManagementClient(credentials, subscription_id)
filter_params = RunFilterParameters(last_updated_after=datetime.now() - timedelta(1), last_updated_before=datetime.now() + timedelta(1))
pipeline_runs = adf_client.pipeline_runs.query_by_factory(resource_group_name=rg_name, factory_name=df_name, filter_parameters = filter_params)
for pipeline_run in pipeline_runs.value:
print(pipeline_run)
You can also get the specific pipeline run with the Run ID.
specific_pipeline_run = adf_client.pipeline_runs.get(resource_group_name=rg_name,factory_name=df_name,run_id= "xxxxxxxx")
print(specific_pipeline_run)
I am working on my first Step Functions Activity Worker (EC2). Predictably, after 5 minutes of long polling with no activity from the Step Functions state machine, the client connection times out with the error:
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://states.us-east-1.amazonaws.com/"
Would it be better to catch the error and retry the long poll (every 5 minutes when no activity is present), or try to terminate the call early and retry before the error? I've thought about using a different type of loop, but I want to maximize the value of long polling and not repeatedly request against the Step Functions API (although if that's the best way I'll do it).
Thank you,
Andrew
import boto3
import time
import json
region = 'us-east-1'
activity_arn = 'arn:aws:states:us-east-1:754112345676:activity:Process_Imagery'
while True:
client = boto3.client('stepfunctions', region_name=region)
response = client.get_activity_task(activityArn=activity_arn,
workerName='imagery_processor')
activity_token = response['taskToken']
input_params = json.loads(response['input'])
print("================")
print(input_params)
client.send_task_success(taskToken=activity_token, output='true')
I believe I answered my own question here. The AWS documentation states:
"The maximum time the service holds on to the request before responding is 60 seconds. If no task is available within 60 seconds, the poll returns a taskToken with a null string."
However, instead of string being returned, I believe the JSON response from StepFunctions has no 'taskToken' at all. This while loop works:
import boto3
import time
import json
from botocore.config import Config as BotoCoreConfig
region = 'us-east-1'
boto_config = BotoCoreConfig(read_timeout=70, region_name=region)
sf_client = boto3.client('stepfunctions', config=boto_config)
activity_arn = 'arn:aws:states:us-east-1:754185699999:activity:Process_Imagery'
while True:
response = sf_client.get_activity_task(activityArn=activity_arn,
workerName='imagery_processor')
if 'taskToken' not in response:
print('No Task Token')
# time.sleep(2)
else:
print(response['taskToken'])
print("===================")
activity_token = response['taskToken']
sf_client.send_task_success(taskToken=activity_token, output='true')