How to pass multiple parameters to Azure Durable Activity Function - python

My orchestrator receives a payload, with that payload it contains instructions that need to be passed along with other sets of data to activity functions.
how do I pass multiple parameters to an activity function? Or do I have to mash all my data together?
def orchestrator_function(context: df.DurableOrchestrationContext):
# User defined configuration
instructions: str = context.get_input()
task_batch = yield context.call_activity("get_tasks", None)
# Need to pass in instructions too
parallel_tasks = [context.call_activity("perform_task", task) for task in task_batch]
results = yield context.task_all(parallel_tasks)
return results
The perform_task activity needs both the items from task_batch and the user input instructions
Do I do something in my function.json?
Workaround
Not ideal, but I can pass multiple parameters as a single Tuple
something = yield context.call_activity("activity", ("param_1", "param_2"))
I then just need to reference the correct index of the parameter in the activity.

Seems there's no text-book way to do it. I have opted to give my single parameter a generic name like parameter or payload.
Then when passing in the value in the orchestrator I do it like so:
payload = {"value_1": some_var, "value_2": another_var}
something = yield context.call_activity("activity", payload)
then within the activity function, I unpack it again.
edit: Some buried documentation seems to show that https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-error-handling?tabs=python

Just to add to #Ari's great answer, here code to pass data from client function (HTTP request in this case) all the way to activity function:
Client -> Orchestrator -> Activity
Client
async def main(req: func.HttpRequest, starter: str) -> func.HttpResponse:
client = df.DurableOrchestrationClient(starter)
req_data = req.get_json()
img_url = req_data['img_url']
payload = {"img_url": img_url}
instance_id = await client.start_new(req.route_params["functionName"], None, payload)
logging.info(f"Started orchestration with ID = '{instance_id}'.")
return client.create_check_status_response(req, instance_id)
Orchestrator
def orchestrator_function(context: df.DurableOrchestrationContext):
input_context = context.get_input()
img_url = input_context.get('img_url')
some_response= yield context.call_activity('MyActivity', img_url)
return [some_response]
Activity
def main(imgUrl: str) -> str:
print(f'.... Image URL = {imgUrl}')
return imgUrl

You can use #dataclass and #dataclass_json class decorators for your input and output types, like this:
#dataclass_json
#dataclass
class Command:
param1: str
param2: int
#dataclass_json
#dataclass
class Result:
val1: str
val2: int
and then you can use those in Azure Functions, e.g. in Activity ones:
def main(input: DownloadFileRequest) -> DownloadFileResponse:
# function code
result: DownloadFileResponse = DownloadFileResponse("some", 123)
return result
This provides you with a clean API and descriptive code. Much better approach than using dictionaries, at least for me.

I would also suggest the dataclass-wizard as a viable option, and one which should also be a bit more lightweight alternative to dataclasses-json; it is lighter in the sense that it does not use external libraries like marshmallow for generating schemas. It also performs a little better FWIW anyway.
I didn't really understand the schema of the data as outlined in the question unfortunately, thus I decided to roll my own for the purposes of a quick and dirty demo, so check it out:
from dataclasses import dataclass
from dataclass_wizard import JSONWizard
#dataclass
class Batman(JSONWizard):
suit_color: str
side: 'Sidekick'
#dataclass
class Sidekick:
name: str
mood: str
# usage
bats = Batman.from_dict({'suitColor': 'BLACK',
'side': {'Name': 'Robin', 'mood': 'Mildly depressed'}})
print(repr(bats))
# prints:
# Batman(suit_color='BLACK', side=Sidekick(name='Robin', mood='Mildly depressed'))
print(bats.to_dict())
# prints:
# {'suitColor': 'BLACK', 'side': {'name': 'Robin', 'mood': 'Mildly depressed'}}

Related

Conditional type hint

Im using a pattern that all my adapters returns a Result objects instead of the result itself. Let me explain:
from typing import Generic, Optional, TypeVar
from pydantic import BaseModel
Dto = TypeVar("Dto", bound=BaseModel)
class Result(BaseModel, Generic[Dto]):
error: Optional[Exception]
data: Optional[Dto]
#property
def is_success(self) -> bool:
return bool(self.data) and not self.error
class Config:
arbitrary_types_allowed = True
def adapter_example(input: Any) -> Result[int]:
try:
# some complex stuff here
result = Result[int](data=10)
except SomethingBad as e:
return Result[int](error=e)
The point is, check if error is None do not ensures me that data != None. There is a way to force that at least (and conditionally) one of them are mandatory (or, not Optional)?
Like:
Result[str](data='a') # VALID
Result[str](error=Exception()) # VALID
Result[str](data='', error=Exception()) # VALID
Result[str]() # INVALID
if result.data:
# Here any linter are 100% sure that result.error is None
else:
# Here any linter are 100% sure that result.error != None
ps: Im only using pydantic.BaseModel here because its easier in this implementation. If any sugestions about how to type this class conditionally dont use pydantic is fine to mee

Look up items in list of dataclasses by value

I'm using python to filter data in elasticsearch based on request params provided. I've got a working example, but know it can be improved and am trying to think of a better way. The current code is like this:
#dataclass(frozen=True)
class Filter:
param: str
custom_es_field: Optional[str] = None
is_bool_query: bool = False
is_date_query: bool = False
is_range_query: bool = False
def es_field(self) -> str:
if self.custom_es_field:
field = self.custom_es_field
elif "." in self.param:
field = self.param.replace(".", "__")
else:
field = self.param
return field
filters = [
Filter(param="publication_year", is_range_query=True),
Filter(param="publication_date", is_date_query=True),
Filter(param="venue.issn").
...
]
def filter_records(filter_params, s):
for filter in filters:
# range query
if filter.param in filter_params and filter.is_range_query:
param = filter_params[filter.param]
if "<" in param:
param = param[1:]
validate_range_param(filter, param)
kwargs = {filter.es_field(): {"lte": int(param)}}
s = s.filter("range", **kwargs)
elif filter.param in filter_params and filter.is_bool_query:
....
The thing I think is slow is I am looping through all of the filters in order to use the one that came in as a request variable. I'm tempted to convert this to a dictionary so I can do filter["publication_year"], but I like having the extra methods available via the dataclass. Would love to hear any thoughts.

How to call the right function (as a string) based on an argument?

I have a class which is intended to create an IBM Cloud Object Storage object. There are 2 functions I can use for initialization : resource() and client(). In the init function there is an object_type parameter which will be used to decide which function to call.
class ObjectStorage:
def __init__(self, object_type: str, endpoint: str, api_key: str, instance_crn: str, auth_endpoint: str):
valid_object_types = ("resource", "client")
if object_type not in valid_object_types:
raise ValueError("Object initialization error: Status must be one of %r." % valid_object_types)
method_type = getattr(ibm_boto3, object_type)()
self._conn = method_type(
"s3",
ibm_api_key_id = api_key,
ibm_service_instance_id= instance_crn,
ibm_auth_endpoint = auth_endpoint,
config=Config(signature_version="oauth"),
endpoint_url=endpoint,
)
#property
def connect(self):
return self._conn
If I run this, I receive the following error:
TypeError: client() missing 1 required positional argument: 'service_name'
If I use this in a simple function and call it by using ibm_boto3.client() or ibm_boto3.resource(), it works like a charm.
def get_cos_client_connection():
COS_ENDPOINT = "xxxxx"
COS_API_KEY_ID = "yyyyy"
COS_INSTANCE_CRN = "zzzzz"
COS_AUTH_ENDPOINT = "----"
cos = ibm_boto3.client("s3",
ibm_api_key_id=COS_API_KEY_ID,
ibm_service_instance_id=COS_INSTANCE_CRN,
ibm_auth_endpoint=COS_AUTH_ENDPOINT,
config=Config(signature_version="oauth"),
endpoint_url=COS_ENDPOINT
)
return cos
cos = get_cos_client_connection()
It looks like it calls the client function on this line, but I am not sure why:
method_type = getattr(ibm_boto3, object_type)()
I tried using:
method_type = getattr(ibm_boto3, lambda: object_type)()
but it was a silly move.
The client function looks like this btw:
def client(*args, **kwargs):
"""
Create a low-level service client by name using the default session.
See :py:meth:`ibm_boto3.session.Session.client`.
"""
return _get_default_session().client(*args, **kwargs)
which refers to:
def client(self, service_name, region_name=None, api_version=None,
use_ssl=True, verify=None, endpoint_url=None,
aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None,
ibm_api_key_id=None, ibm_service_instance_id=None, ibm_auth_endpoint=None,
auth_function=None, token_manager=None,
config=None):
return self._session.create_client(
service_name, region_name=region_name, api_version=api_version,
use_ssl=use_ssl, verify=verify, endpoint_url=endpoint_url,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
aws_session_token=aws_session_token,
ibm_api_key_id=ibm_api_key_id, ibm_service_instance_id=ibm_service_instance_id,
ibm_auth_endpoint=ibm_auth_endpoint, auth_function=auth_function,
token_manager=token_manager, config=config)
Same goes for resource()
If you look at the stracktrace, it will probably point to this line:
method_type = getattr(ibm_boto3, object_type)()
And not the one after where you actually call it. The reason is simple, those last two parenthese () mean you're calling the function you just retrieved via getattr.
So simply do this:
method_type = getattr(ibm_boto3, object_type)
Which means that method_type is actually the method from the ibm_boto3 object you're interested in.
Can confirm that by either debugging using import pdb; pdb.set_trace() and inspect it, or just add a print statement:
print(method_type)

Setting Initial Values In a Dataclass Used in a Mock Before Every Test

My goal is to write unit tests for a REST API that interfaces with Flash Memory on a device. To do that, I need a way to mock the class that interfaces with Flash Memory.
I attempted to do that by using a Python Dataclass in the Mock, but I've discovered I do not have any way to set initial values prior to each test. As a result, each test case is getting values that are set by the previous test case. I need to fix that.
To test the API, I'm using the following code:
#dataclass
class FlashMemoryMock:
mac_address: str = 'ff:ff:ff:ff:ff:ff'
#pytest.fixture
def client(mocker):
mocker.patch('manufacturing_api.bsp.flash_memory.FlashMemory', new=FlashMemoryMock)
app = connexion.App(__name__, specification_dir='../openapi_server/openapi/')
app.app.json_encoder = JSONEncoder
app.add_api('openapi.yaml', pythonic_params=True)
app.app.config['TESTING'] = True
with app.app.test_client() as client:
yield client
def test_get_mac_address(client):
"""Test case for get_mac_address
Get the MAC Address
"""
headers = {
'Accept': 'application/json',
}
response = client.open(
'/mac_address',
method='GET',
headers=headers)
assert response.status_code == 200
assert response.is_json
assert response.json.get('status') == 'success'
assert response.json.get('mac_address') == 'ff:ff:ff:ff:ff:ff'
This test case will pass because the FlashMemoryMock Dataclass initializes mac_address to ff:ff:ff:ff:ff:ff Unfortunately it would fail if I run it after a test_put_mac_address test case if that test changes the mac_address value.
The flash memory controller code looks like this:
flash_memory = FlashMemoryWrapper()
def get_mac_address(): # noqa: E501
return flash_memory.get_mac_address()
The FlashMemoryWrapper class validates inputs (i.e. is the user trying to set a valid Mac Address) and includes the following code:
class FlashMemoryWrapper:
def __init__(self):
# Initialize the Flash controller
self.flash_memory = FlashMemory()
It's this FlashMemory class that I am trying to replace with a Mock. When I debug the test cases, I have verified FlashMemoryWrapper.flash_memory is referencing FlashMemoryMock. Unfortunately I no longer have any way to set initial values in the FlashMemoryMock Dataclass.
Is there a way to set initial values? Or should I set up the Mock a different way?
I think what you are looking for can be achieved with a bit of meta-programming in tandem with parametrization of fixtures.
def initializer(arg):
#dataclass
class FlashMemoryMock:
mac_address: str = arg
return FlashMemoryMock
#pytest.fixture
def client(mocker, request):
mocker.patch('manufacturing_api.bsp.flash_memory.FlashMemory', new=initializer(request.param))
app = connexion.App(__name__, specification_dir='../openapi_server/openapi/')
app.app.json_encoder = JSONEncoder
app.add_api('openapi.yaml', pythonic_params=True)
app.app.config['TESTING'] = True
with app.app.test_client() as client:
yield client
#pytest.mark.parametrize('client', ['ff:ff:ff:ff:ff:ff'], indirect=['client'])
def test_get_mac_address(client):
"""Test case for get_mac_address
Get the MAC Address
"""
headers = {
'Accept': 'application/json',
}
response = client.open(
'/mac_address',
method='GET',
headers=headers)
assert response.status_code == 200
assert response.is_json
assert response.json.get('status') == 'success'
assert response.json.get('mac_address') == 'ff:ff:ff:ff:ff:ff'
# some other test with a different value for mac address
#pytest.mark.parametrize('client', ['ab:cc:dd'], indirect=['client'])
def test_put_mac_address(client):
# some code here

Type annotating for ndb.tasklets

GvRs App Engine ndb Library as well as monocle and - to my understanding - modern Javascript use Generators to make async code look like blocking code.
Things are decorated with #ndb.tasklet. They yield when they want to give back execution to the runloop and when they have their result ready they raise StopIteration(value) (or the alias ndb.Return):
#ndb.tasklet
def get_google_async():
context = ndb.get_context()
result = yield context.urlfetch("http://www.google.com/")
if result.status_code == 200:
raise ndb.Return(result.content)
raise RuntimeError
To use such a Function you get a ndb.Future object back and call the get get_result() Function on that to wait for the result and get it. E.g.:
def get_google():
future = get_google_async()
# do something else in real code here
return future.get_result()
This all works very nice. but how to add type Annotations? The correct types are:
get_google_async() -> ndb.Future (via yield)
ndb.tasklet(get_google_async) -> ndb.Future
ndb.tasklet(get_google_async).get_result() -> str
So far, I came only up with casting the async function.
def get_google():
# type: () -> str
future = get_google_async()
# do something else in real code here
return cast('str', future.get_result())
Unfortunately this is not only about urlfetch but about hundreds of Methods- mainly of ndb.Model.
get_google_async itself is a generator function, so type hints can be () -> Generator[ndb.Future, None, None], I think.
As for get_google, if you don't want to cast, type checking may work.
like
def get_google():
# type: () -> Optional[str]
future = get_google_async()
# do something else in real code here
res = future.get_result()
if isinstance(res, str):
return res
# somehow convert res to str, or
return None

Categories