SimpleHttpOperator Airflow, data templated

SimpleHttpOperator Airflow, data templated - python

I'm trying to rendered correctly data inside a SimpleHttpOperator in Airflow with configuration that I send via dag_run
result = SimpleHttpOperator(
task_id="schema_detector",
http_conn_id='schema_detector',
endpoint='api/schema/infer',
method='PUT',
data=json.dumps({
'url': '{{ dag_run.conf["url"] }}',
'fileType': '{{ dag_run.conf["fileType"] }}',
}),
response_check=lambda response: response.ok,
response_filter=lambda response: response.json())
Issue is that the rendered data appears to be like this
{"url": "{{ dag_run.conf[\"url\"] }}", "fileType": "{{ dag_run.conf[\"fileType\"] }}"}
I'm not sure what I'm doing wrong here.
Edit
Full code below
default_args = {
'owner': 'airflow',
'start_date': days_ago(0),
}
def print_result(**kwargs):
ti = kwargs['ti']
pulled_value_1 = ti.xcom_pull(task_ids='schema_detector')
pprint.pprint(pulled_value_1)
with DAG(
dag_id='airflow_http_operator',
default_args=default_args,
catchup=False,
schedule_interval="#once",
tags=['http']
) as dag:
result = SimpleHttpOperator(
task_id="schema_detector",
http_conn_id='schema_detector',
endpoint='api/schema/infer',
method='PUT',
headers={"Content-Type": "application/json"},
data=json.dumps({
'url': '{{ dag_run.conf["url"] }}',
'fileType': '{{ dag_run.conf["fileType"] }}',
}),
response_check=lambda response: response.ok,
response_filter=lambda response: response.json())
pull = PythonOperator(
task_id='print_result',
python_callable=print_result,
)
result >> pull

I struggled a lot due to the same error. So, I created my own Operator (called as ExtendedHttpOperator) which is a combination of PythonOperator and SimpleHttpOperator. This worked for me :)
This operator receives a function where we can collect data passed from the API (using dag_run.conf), and parse it (if required) before passing it to an API.
from plugins.operators.extended_http_operator import ExtendedHttpOperator
testing_extend = ExtendedHttpOperator(
task_id="process_user_ids",
http_conn_id="user_api",
endpoint="/kafka",
headers={"Content-Type": "application/json"},
data_fn=passing_data,
op_kwargs={"api": "kafka"},
method="POST",
log_response=True,
response_check=lambda response: True
if validate_response(response) is True
else False,
)
def passing_data(**context):
api = context["api"]
dag_run_conf = context["dag_run"].conf
return json.dumps(dag_run_conf[api])
def validate_response(res):
if res.status_code == 200:
return True
else:
return False
Here is how you can add ExtendedHttpOperator to your airflow:
Put extended_http_operator.py file inside your_airflow_project/plugins/operators folder
# extended_http_operator.py file
from airflow.operators.http_operator import SimpleHttpOperator
from airflow.utils.decorators import apply_defaults
from airflow.exceptions import AirflowException
from airflow.hooks.http_hook import HttpHook
from typing import Optional, Dict
"""
Extend Simple Http Operator with a callable function to formulate data. This data function will
be able to access the context to retrieve data such as task instance. This allow us to write cleaner
code rather than writing one long template line to formulate the json data.
"""
class ExtendedHttpOperator(SimpleHttpOperator):
#apply_defaults
def __init__(
self,
data_fn,
log_response: bool = False,
op_kwargs: Optional[Dict] = None,
*args,
**kwargs
):
super(ExtendedHttpOperator, self).__init__(*args, **kwargs)
if not callable(data_fn):
raise AirflowException("`data_fn` param must be callable")
self.data_fn = data_fn
self.context = None
self.op_kwargs = op_kwargs or {}
self.log_response = log_response
def execute(self, context):
context.update(self.op_kwargs)
self.context = context
http = HttpHook(self.method, http_conn_id=self.http_conn_id)
data_result = self.execute_callable(context)
self.log.info("Calling HTTP method")
self.log.info("Post Data: {}".format(data_result))
response = http.run(
self.endpoint, data_result, self.headers, self.extra_options
)
if self.log_response:
self.log.info(response.text)
if self.response_check:
if not self.response_check(response):
raise AirflowException("Invalid parameters")
def execute_callable(self, context):
return self.data_fn(**context)
Dont forget to create empty __init__.py files inside plugins and plugins/operators folders.

I couldn't find a solution.
Only way I could do this, like passing information that I send over the --conf to the operator was adding a new PythonOperator that collect the info, and using then XCom on my SimpleHTTPOperator
Code
def generate_data(**kwargs):
confs = kwargs['dag_run'].conf
logging.info(confs)
return {'url': confs["url"], 'fileType': confs["fileType"]}
with DAG(
dag_id='airflow_http_operator',
default_args=default_args,
catchup=False,
schedule_interval="#once",
tags=['http']
) as dag:
generate_dict = PythonOperator(
task_id='generate_dict',
python_callable=generate_data,
provide_context=True
)
result = SimpleHttpOperator(
task_id="schema_detector",
http_conn_id='schema_detector',
endpoint='api/schema/infer',
method='PUT',
headers={"Content-Type": "application/json"},
data="{{ task_instance.xcom_pull(task_ids='generate_dict') |tojson}}",
log_response=True,
response_check=lambda response: response.ok,
response_filter=lambda response: response.json())

Related

Run Multiple Athena Queries in Airflow 2.0

I am trying to create a DAG in which one of the task does athena query using boto3. It worked for one query however I am facing issues when I try to run multiple athena queries.
This problem can be broken as follows:-
If one goes through this blog, it can be seen that athena uses start_query_execution to trigger query and get_query_execution for getting status, queryExecutionId and other data about the query (docs for athena)
After following the above pattern I have following code:-
import json
import time
import asyncio
import boto3
import logging
from airflow import DAG
from airflow.operators.python import PythonOperator
def execute_query(client, query, database, output_location):
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': database
},
ResultConfiguration={
'OutputLocation': output_location
}
)
return response['QueryExecutionId']
async def get_ids(client_athena, query, database, output_location):
query_responses = []
for i in range(5):
query_responses.append(execute_query(client_athena, query, database, output_location))
res = await asyncio.gather(*query_responses, return_exceptions=True)
return res
def run_athena_query(query, database, output_location, region_name, **context):
BOTO_SESSION = boto3.Session(
aws_access_key_id = 'YOUR_KEY',
aws_secret_access_key = 'YOUR_ACCESS_KEY')
client_athena = BOTO_SESSION.client('athena', region_name=region_name)
loop = asyncio.get_event_loop()
query_execution_ids = loop.run_until_complete(get_ids(client_athena, query, database, output_location))
loop.close()
repetitions = 900
error_messages = []
s3_uris = []
while repetitions > 0 and len(query_execution_ids) > 0:
repetitions = repetitions - 1
query_response_list = client_athena.batch_get_query_execution(
QueryExecutionIds=query_execution_ids)['QueryExecutions']
for query_response in query_response_list:
if 'QueryExecution' in query_response and \
'Status' in query_response['QueryExecution'] and \
'State' in query_response['QueryExecution']['Status']:
state = query_response['QueryExecution']['Status']['State']
if state in ['FAILED', 'CANCELLED']:
error_reason = query_response['QueryExecution']['Status']['StateChangeReason']
error_message = 'Final state of Athena job is {}, query_execution_id is {}. Error: {}'.format(
state, query_execution_id, error_message
)
error_messages.append(error_message)
query_execution_ids.remove(query_response['QueryExecutionId'])
elif state == 'SUCCEEDED':
result_location = query_response['QueryExecution']['ResultConfiguration']['OutputLocation']
s3_uris.append(result_location)
query_execution_ids.remove(query_response['QueryExecutionId'])
time.sleep(2)
logging.exception(error_messages)
return s3_uris
DEFAULT_ARGS = {
'owner': 'ubuntu',
'depends_on_past': True,
'start_date': datetime(2021, 6, 8),
'retries': 0,
'concurrency': 2
}
with DAG('resync_job_dag', default_args=DEFAULT_ARGS, schedule_interval=None) as dag:
ATHENA_QUERY = PythonOperator(
task_id='athena_query',
python_callable=run_athena_query,
provide_context=True,
op_kwargs={
'query': 'SELECT request_timestamp FROM "sampledb"."elb_logs" limit 10;', # query provide in athena tutorial
'database':'sampledb',
'output_location':'YOUR_BUCKET',
'region_name':'YOUR_REGION'
}
)
ATHENA_QUERY
On running above code, I am getting following error:-
[2021-06-16 20:34:52,981] {taskinstance.py:1455} ERROR - An asyncio.Future, a coroutine or an awaitable is required
Traceback (most recent call last):
File "/home/ubuntu/venv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/ubuntu/venv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/ubuntu/venv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/home/ubuntu/venv/lib/python3.6/site-packages/airflow/operators/python.py", line 117, in execute
return_value = self.execute_callable()
File "/home/ubuntu/venv/lib/python3.6/site-packages/airflow/operators/python.py", line 128, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/home/ubuntu/iac-airflow/dags/helper/tasks.py", line 93, in run_athena_query
query_execution_ids = loop.run_until_complete(get_ids(client_athena, query, database, output_location))
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/home/ubuntu/iac-airflow/dags/helper/tasks.py", line 79, in get_ids
res = await asyncio.gather(*query_responses, return_exceptions=True)
File "/usr/lib/python3.6/asyncio/tasks.py", line 602, in gather
fut = ensure_future(arg, loop=loop)
File "/usr/lib/python3.6/asyncio/tasks.py", line 526, in ensure_future
raise TypeError('An asyncio.Future, a coroutine or an awaitable is '
TypeError: An asyncio.Future, a coroutine or an awaitable is required
I am unable to get where I am going wrong. Would appreciate some hint over the issue

I think what you are doing here isn't really needed.
Your issues ares:
Executing multiple queries in parallel.
Being able to recover queryExecutionId per query.
Both issues are solved simply by using AWSAthenaOperator. The operator already handles everything you mentioned for you.
Example:
from airflow.models import DAG
from airflow.utils.dates import days_ago
from airflow.operators.dummy import DummyOperator
from airflow.providers.amazon.aws.operators.athena import AWSAthenaOperator
with DAG(
dag_id="athena",
schedule_interval='#daily',
start_date=days_ago(1),
catchup=False,
) as dag:
start_op = DummyOperator(task_id="start_task")
query_list = ["SELECT 1;", "SELECT 2;" "SELECT 3;"]
for i, sql in enumerate(query_list):
run_query = AWSAthenaOperator(
task_id=f'run_query_{i}',
query=sql,
output_location='s3://my-bucket/my-path/',
database='my_database'
)
start_op >> query_op
Athena tasks will be created dynamically simply by adding more queries to query_list:
Note that the QueryExecutionId is pushed to xcom thus you can access the in a downstream task if needed.

Following as well worked for me. I just complicated simple problem with asyncio.
Since I needed S3 URIs for each query at last therefore I went for writing script from scratch. In the current implementation of AWSAthenaOperator, one can get the queryExecutionId and then do the remaining processing(i.e create another task) for getting S3 URI of CSV result file. This can add some overhead in terms of delay between two tasks(of getting queryExecutionId and retrieving S3 URI) along with added resource usuage.
Therefore I went for doing the complete operation in a single operator as follows:-
Code:-
import json
import time
import asyncio
import boto3
import logging
from airflow import DAG
from airflow.operators.python import PythonOperator
def execute_query(client, query, database, output_location):
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': database
},
ResultConfiguration={
'OutputLocation': output_location
}
)
return response
def run_athena_query(query, database, output_location, region_name, **context):
BOTO_SESSION = boto3.Session(
aws_access_key_id = 'YOUR_KEY',
aws_secret_access_key = 'YOUR_ACCESS_KEY')
client_athena = BOTO_SESSION.client('athena', region_name=region_name)
query_execution_ids = []
if message_list:
for parameter in message_list:
query_response = execute_query(client_athena, query, database, output_location)
query_execution_ids.append(query_response['QueryExecutionId'])
else:
raise Exception(
'Error in upstream value recived from kafka consumer. Got message list as - {}, with type {}'
.format(message_list, type(message_list))
)
repetitions = 900
error_messages = []
s3_uris = []
while repetitions > 0 and len(query_execution_ids) > 0:
repetitions = repetitions - 1
query_response_list = client_athena.batch_get_query_execution(
QueryExecutionIds=query_execution_ids)['QueryExecutions']
for query_response in query_response_list:
if 'QueryExecution' in query_response and \
'Status' in query_response['QueryExecution'] and \
'State' in query_response['QueryExecution']['Status']:
state = query_response['QueryExecution']['Status']['State']
if state in ['FAILED', 'CANCELLED']:
error_reason = query_response['QueryExecution']['Status']['StateChangeReason']
error_message = 'Final state of Athena job is {}, query_execution_id is {}. Error: {}'.format(
state, query_execution_id, error_message
)
error_messages.append(error_message)
query_execution_ids.remove(query_response['QueryExecutionId'])
elif state == 'SUCCEEDED':
result_location = query_response['QueryExecution']['ResultConfiguration']['OutputLocation']
s3_uris.append(result_location)
query_execution_ids.remove(query_response['QueryExecutionId'])
time.sleep(2)
logging.exception(error_messages)
return s3_uris
DEFAULT_ARGS = {
'owner': 'ubuntu',
'depends_on_past': True,
'start_date': datetime(2021, 6, 8),
'retries': 0,
'concurrency': 2
}
with DAG('resync_job_dag', default_args=DEFAULT_ARGS, schedule_interval=None) as dag:
ATHENA_QUERY = PythonOperator(
task_id='athena_query',
python_callable=run_athena_query,
provide_context=True,
op_kwargs={
'query': 'SELECT request_timestamp FROM "sampledb"."elb_logs" limit 10;', # query provide in athena tutorial
'database':'sampledb',
'output_location':'YOUR_BUCKET',
'region_name':'YOUR_REGION'
}
)
ATHENA_QUERY
However, the approach shared by #Elad is more clean and apt if one wants to get queryExecutionIds of all the queries.

how to mock jwt.decode inside aws lambda for unitesting

I have created a lambda that will receive JWT in lambda's event header and subsequently decode the jwt payload and give me the subject.
lambda snippet looks like this:
def handler(event, context):
print("hello world!")
# print(event)
message = dict()
jwt_token = event['headers']['x-jwt-token']
# more info on PyJWT
# https://pyjwt.readthedocs.io/en/latest/index.html
try:
decoded_jwt = jwt.decode(jwt_token,
options={"verify_signature": False})
except jwt.ExpiredSignatureError:
print("Signature expired. Get new one!!!")
message['Body'] = {
'Status': 'Lambda failure',
'Reason': 'JWT Signature expired. Get new one!!!'
}
except jwt.InvalidTokenError:
print("Invalid Token")
message['Body'] = {
'Status': 'Lambda failure',
'Reason': 'JWT Invalid Token'
}
else:
# all are good to go
if event['httpMethod'] == 'GET':
resource_owner_name = "".join(decoded_jwt["subject"])
now I have created the below fixtures for my unit tests:
sample_events.json
{
"resource": "/path",
"path": "/path'",
"httpMethod": "GET",
"headers": {
"x-jwt-token": "welcome.here.1234"
}
}
and in my test_main.py
def load_json_from_file(json_path):
with open(json_path) as f:
return json.load(f)
#pytest.fixture
def events():
return load_json_from_file('unit_tests/fixtures/sample_events.json')
def test_main(events, context):
# jwt = Mock()
# jwt.decode.return_value =
response = handler(events, context)
Now I wonder how to bind the jwt and mock it in my python handler? what is the solution, is there any other approach which I could follow?
I also tried to patch the Jwt.decode still no luck...anyone can shed some light on patching the jwt decode that might help?

You can use patch to patch a given target:
def test_main(events, context):
with patch("handler.jwt.decode") as decode_mock:
decode_mock.return_value =
response = handler(events, context)
Just make sure you path the correct path to the patch target.

FastAPI variable query parameters

I am writing a Fast API server that accepts requests, checks if users are authorized and then redirects them to another URL if successful.
I need to carry over URL parameters, e.g. http://localhost:80/data/?param1=val1&param2=val2 should redirect to
http://some.other.api/?param1=val1&param2=val2, thus keeping previously allotted parameters.
The parameters are not controlled by me and could change at any moment.
How can I achieve this?
Code:
from fastapi import FastAPI
from starlette.responses import RedirectResponse
app = FastAPI()
#app.get("/data/")
async def api_data():
params = '' # I need this value
url = f'http://some.other.api/{params}'
response = RedirectResponse(url=url)
return response

In the docs they talk about using the Request directly, which then lead me to this:
from fastapi import FastAPI, Request
from starlette.responses import RedirectResponse
app = FastAPI()
#app.get("/data/")
async def api_data(request: Request):
params = request.query_params
url = f'http://some.other.api/?{params}'
response = RedirectResponse(url=url)
return response

If the query parameters are known when starting the API but you still wish to have them dynamically set:
from fastapi import FastAPI, Depends
from pydantic import create_model
app = FastAPI()
# Put your query arguments in this dict
query_params = {"name": (str, "me")}
query_model = create_model("Query", **query_params) # This is subclass of pydantic BaseModel
# Create a route
#app.get("/items")
async def get_items(params: query_model = Depends()):
params_as_dict = params.dict()
...
This has the benefit that you see the parameters in the automatic documentation:
But you are still able to define them dynamically (when starting the API).
Note: if your model has dicts, lists or other BaseModels as field types, the request body pops up. GET should not have body content so you might want to avoid those types.
See more about dynamic model creation from Pydantic documentation.

As mention in docs of FastAPI https://fastapi.tiangolo.com/tutorial/query-params-str-validations/.
#app.get("/")
def read_root(param1: Optional[str] = None, param2: Optional[str] = None):
url = f'http://some.other.api/{param1}/{param2}'
return {'url': str(url)}
output

I use a combination of Depends, BaseModel and the Request object itself.
Here's an example for a HTTP request like localhost:5000/api?requiredParam1=value1&optionalParam2=value2&dynamicParam1=value3&dynamicParam2=value4
# imports
from typing import Union
from pydantic import BaseModel
from fastapi import Depends, Request
# the base model
class QueryParams(BaseModel):
required: str
optional: Union[None, str] = None
dynamic: dict
# dependency
async def query_params(
request: Request, requiredParam1: str, optionalParam1: Union[None, str] = None
):
# process the request here
dynamicParams = {}
for k in request.query_params.keys():
if 'dynamicParam' not in k:
continue
dynamicParams[k] = request.query_params[k]
# also maybe do some other things on the arguments
# ...
return {
'required': requiredParam1,
'optional': optionalParam1,
'dynamic': dynamicParams
}
# the endpoint
#app.get("api/")
async def hello(params: QueryParams = Depends(query_params)):
# Maybe do domething with params here,
# Use it as you would any BaseModel object
# ...
return params
Refer the Starlette documentation on how to use the request object: https://www.starlette.io/requests/
Note that you can put query_params in a different module, and need not add any more code to explicitly pass the Request object. FastAPI already does that when you make a call to the endpoint :)

This is a code I derived from #Hajar Razip using a more pydantic like approach:
from pydantic import (
BaseModel,
)
from typing import (
Dict,
List,
Optional,
)
from fastapi import (
Depends,
FastAPI,
Query,
Request,
)
class QueryParameters(BaseModel):
"""Model for query parameter."""
fixId: Optional[str]
fixStr: Optional[str]
fixList: Optional[List[str]]
fixBool: Optional[bool]
dynFields: Dict
_aliases: Dict[str,str] = {"id": "fixId"}
#classmethod
def parser(
cls,
request: Request,
fixId: Optional[str] = Query(None, alias="id"),
fixStr: Optional[str] = Query(None),
fixList: Optional[List[str]] = Query(None),
fixBool: bool = Query(True),
) -> Dict:
"""Parse query string parameters."""
dynFields = {}
reserved_keys = cls.__fields__
query_keys = request.query_params
for key in query_keys:
key = cls._aliases.get(key, key)
if key in reserved_keys:
continue
dynFields[key] = request.query_params[key]
return {
"fixId": fixId,
"fixStr": fixStr,
"fixList": fixList,
"fixBool": fixBool,
"dynFields": dynFields
}
app = FastAPI()
#app.get("/msg")
def get_msg(
parameters: QueryParameters = Depends(
QueryParameters.parser,
),
) -> None:
return parameters
The output documentation is then
Here it is the result of calling GET /msg
> curl -s -X 'GET' 'http://127.0.0.1:8000/msg?id=Victor&fixStr=hi&fixList=eggs&fixList=milk&fixList=oranges&fixBool=true' -H 'accept: application/json' | python3 -m json.tool
{
"fixId": "Victor",
"fixStr": "hi",
"fixList": [
"eggs",
"milk",
"oranges"
],
"fixBool": true,
"dynFields": {}
}
Here it is the GET /msg call using dynamic fields
> curl -s -X 'GET' 'http://127.0.0.1:8000/msg?id=Victor&fixStr=hi&fixList=eggs&fixList=milk&fixList=oranges&fixBool=true&key1=value1&key2=value2' -H 'accept: application/json' | python3 -m json.tool
{
"fixId": "Victor",
"fixStr": "hi",
"fixList": [
"eggs",
"milk",
"oranges"
],
"fixBool": true,
"dynFields": {
"key1": "value1",
"key2": "value2"
}
}

Airflow SimpleHttpOperator

Hi I am experiencing weird behavior from SimpleHttpOperator.
I have extended this operator like this:
class EPOHttpOperator(SimpleHttpOperator):
"""
Operator for retrieving data from EPO API, performs token validity check,
gets a new one, if old one close to not valid.
"""
#apply_defaults
def __init__(self, entity_code, *args, **kwargs):
super().__init__(*args, **kwargs)
self.entity_code = entity_code
self.endpoint = self.endpoint + self.entity_code
def execute(self, context):
try:
token_data = json.loads(Variable.get(key="access_token_data", deserialize_json=False))
if (datetime.now() - datetime.strptime(token_data["created_at"],
'%Y-%m-%d %H:%M:%S.%f')).seconds >= 19 * 60:
Variable.set(value=json.dumps(get_EPO_access_token(), default=str), key="access_token_data")
self.headers = {
"Authorization": f"Bearer {token_data['token']}",
"Accept": "application/json"
}
super(EPOHttpOperator, self).execute(context)
except HTTPError as http_err:
logging.error(f'HTTP error occurred during getting EPO data: {http_err}')
raise http_err
except Exception as e:
logging.error(e)
raise e
And I have written a simple unit test:
def test_get_EPO_data(requests_mock):
requests_mock.get('http://ops.epo.org/rest-services/published-data/publication/epodoc/EP1522668',
text='{"text": "test"}')
requests_mock.post('https://ops.epo.org/3.2/auth/accesstoken',
text='{"access_token":"test", "status": "we just testing"}')
dag = DAG(dag_id='test_data', start_date=datetime.now())
task = EPOHttpOperator(
xcom_push=True,
do_xcom_push=True,
http_conn_id='http_EPO',
endpoint='published-data/publication/epodoc/',
entity_code='EP1522668',
method='GET',
task_id='get_data_task',
dag=dag,
)
ti = TaskInstance(task=task, execution_date=datetime.now(), )
task.execute(ti.get_template_context())
assert ti.xcom_pull(task_ids='get_data_task') == {"text": "test"}
Test doesn't pass though, the XCOM value from HttpHook is never pushed as an XCOM, I have checked that code responsible for the push logic in the hook class gets called:
....
if self.response_check:
if not self.response_check(response):
raise AirflowException("Response check returned False.")
if self.xcom_push_flag:
return response.text
What did I do wrong? Is this a bug?

So I actually managed to make it work by setting an xcom value to the result of super(EPOHttpOperator, self).execute(context).
def execute(self, context):
try:
.
.
.
self.headers = {
"Authorization": f"Bearer {token_data['token']}",
"Accept": "application/json"
}
super(EPOHttpOperator, self).execute(context) -> Variable.set(value=super(EPOHttpOperator, self).execute(context),key='foo')
Documentation is kind of misleading on this one; or am I doing something wrong after all?

Python AWS Lambda String-argument constructor/factory method to deserialize from String value issue

I am developing a Amazon Lex Chatbot in AWS Lambda in python which will make a API post call and get a response in JSON string as below
'{"_id":"598045d12e1f98980a00001e","unique_id":"ed7e4e17c7db499caee576a7761512","cerebro":{"_id":"59451b239db9fa8b0a000004","acc_id":"533a9f0d2eda783019000002","name":"cerebro","access_id":"g01n0XTwoYfEWSIP","access_token":"3Yxw8ZiUlfSPsbEVLI6Z93vZyKyBFFIV"},"bot":{"_id":"59452f42dbd13ad867000001","name":"helloword"},"rundata":{"arguments":"","target":"local"},"state":"created","queue_id":null,"s_ts":null,"e_ts":null,"response":{},"responses":[],"summary":null,"resolve_incident":false,"err":null}'
But i am interested in the id value only so i am converting the json into a dictionary as below and getting the id value
res = requests.post(botrun_api, json=botrun_payload, headers=headers)
data = json.loads(res.content)
new_id=json_data.get('_id', None)
return new_id
If i am testing the code in Lambda console i am getting the output
Output in AWS Lambda console
But i am getting output as below in my Chatbot
I was unable to process your message. DependencyFailedException: Invalid Lambda Response: Received invalid response from Lambda: Can not construct instance of IntentResponse: no String-argument constructor/factory method to deserialize from String value ('59832ba22e1f98980a00009b') at [Source: "59832ba22e1f98980a00009b"; line: 1, column: 1]
My Source code as below:
import math
import dateutil.parser
import datetime
import time
import os
import logging
import requests
import uuid
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
""" --- Helpers to build responses which match the structure of the necessary dialog actions --- """
def get_slots(intent_request):
return intent_request['currentIntent']['slots']
def elicit_slot(session_attributes, intent_name, slots, slot_to_elicit, message):
return {
'sessionAttributes': session_attributes,
'dialogAction': {
'type': 'ElicitSlot',
'intentName': intent_name,
'slots': slots,
'slotToElicit': slot_to_elicit,
'message': message
}
}
def close(session_attributes, fulfillment_state, message):
response = {
'sessionAttributes': session_attributes,
'dialogAction': {
'type': 'Close',
'fulfillmentState': fulfillment_state,
'message': message
}
}
return response
def delegate(session_attributes, slots):
return {
'sessionAttributes': session_attributes,
'dialogAction': {
'type': 'Delegate',
'slots': slots
}
}
""" --- Helper Functions --- """
def parse_int(n):
try:
return int(n)
except ValueError:
return float('nan')
def build_validation_result(is_valid, violated_slot, message_content):
if message_content is None:
return {
"isValid": is_valid,
"violatedSlot": violated_slot,
}
return {
'isValid': is_valid,
'violatedSlot': violated_slot,
'message': {'contentType': 'PlainText', 'content': message_content}
}
def APIbot(intent_request):
"""
Performs dialog management and fulfillment for cluster configuration input arguments.
Beyond fulfillment, the implementation of this intent demonstrates the use of the elicitSlot dialog action
in slot validation and re-prompting.
"""
value1 = get_slots(intent_request)["myval1"]
value2 = get_slots(intent_request)["myval2"]
intense_type = get_slots(intent_request)["Instance"]
source = intent_request['invocationSource']
api_endpoint = 'url'
api_creds = {
'apiuser': 'user',
'apikey': 'key'
}
#trigger a bot run
botrun_api = api_endpoint + '/botruns'
botrun_payload = {
"botname":"helloword",
"arguments":"",
"target":"local",
"unique_id": uuid.uuid4().hex[:30] #unique run id - 30 chars max
}
headers = {
'Content-Type': 'application/json',
'Authorization': 'Key apiuser=%(apiuser)s apikey=%(apikey)s' % api_creds
}
res = requests.post(botrun_api, json=botrun_payload, headers=headers)
data = json.loads(res.content)
new_id=json_data.get('_id', None)
return new_id
# Instiate a cluster setup, and rely on the goodbye message of the bot to define the message to the end user.
# In a real bot, this would likely involve a call to a backend service.
return close(intent_request['sessionAttributes'],
'Fulfilled',
{'contentType': 'PlainText',
'content': 'Thanks, your values are {} and {} '.format(value1, value2)})
""" --- Intents --- """
def dispatch(intent_request):
"""
Called when the user specifies an intent for this bot.
"""
logger.debug('dispatch userId={}, intentName={}'.format(intent_request['userId'], intent_request['currentIntent']['name']))
intent_name = intent_request['currentIntent']['name']
# Dispatch to your bot's intent handlers
if intent_name == 'my_Values':
return APIbot(intent_request)
raise Exception('Intent with name ' + intent_name + ' not supported')
""" --- Main handler --- """
def lambda_handler(event, context):
"""
Route the incoming request based on intent.
The JSON body of the request is provided in the event slot.
"""
# By default, treat the user request as coming from the America/New_York time zone.
os.environ['TZ'] = 'America/New_York'
time.tzset()
logger.debug('event.bot.name={}'.format(event['bot']['name']))
return dispatch(event)
Please help me in resolving this thanks in advance :)

Lambda is showing your function succeeding when only the new_id is returned because it does not care about the format of the response.
When connected to AWS Lex, the response must be in the AWS defined response format.
In your above example, you can pass the new_id through in the close method, to output the response via Lex:
return close(intent_request['sessionAttributes'],
'Fulfilled',
{'contentType': 'PlainText',
'content': str(new_id)}
You'll also need to remove the return new_id statement.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

SimpleHttpOperator Airflow, data templated - python

Related

Run Multiple Athena Queries in Airflow 2.0

how to mock jwt.decode inside aws lambda for unitesting

FastAPI variable query parameters

Airflow SimpleHttpOperator

Python AWS Lambda String-argument constructor/factory method to deserialize from String value issue

Categories

Resources