I made a custom airflow operator, this operator takes an input and the output of this operator is on XCOM.
What I want to achieve is to call the operator with some defined input, parse the output as Python callable inside the Branch Operator and then pass the parsed output to another task that calls the same operator tree:
CustomOperator_Task1 = CustomOperator(
data={
'type': 'custom',
'date': '2017-11-12'
},
task_id='CustomOperator_Task1',
dag=dag)
data = {}
def checkOutput(**kwargs):
result = kwargs['ti'].xcom_pull(task_ids='CustomOperator_Task1')
if result.success = True:
data = result.data
return "CustomOperator_Task2"
return "Failure"
BranchOperator_Task = BranchPythonOperator(
task_id='BranchOperator_Task ',
dag=dag,
python_callable=checkOutput,
provide_context=True,
trigger_rule="all_done")
CustomOperator_Task2 = CustomOperator(
data= data,
task_id='CustomOperator_Task2',
dag=dag)
CustomOperator_Task1 >> BranchOperator_Task >> CustomOperator_Task2
In task CustomOperator_Task2 I would want to pass the parsed data from BranchOperator_Task. Right now it is always empty {}
What is the best way to do that?
I see your issue now. Setting the data variable like you are won't work because of how Airflow works. An entirely different process will be running the next task, so it won't have the context of what data was set to.
Instead, BranchOperator_Task has to push the parsed output into another XCom so CustomOperator_Task2 can explicitly fetch it.
def checkOutput(**kwargs):
ti = kwargs['ti']
result = ti.xcom_pull(task_ids='CustomOperator_Task1')
if result.success:
ti.xcom_push(key='data', value=data)
return "CustomOperator_Task2"
return "Failure"
BranchOperator_Task = BranchPythonOperator(
...)
CustomOperator_Task2 = CustomOperator(
data_xcom_task_id=BranchOperator_Task.task_id,
data_xcom_key='data',
task_id='CustomOperator_Task2',
dag=dag)
Then your operator might look something like this.
class CustomOperator(BaseOperator):
#apply_defaults
def __init__(self, data_xcom_task_id, data_xcom_key, *args, **kwargs):
self.data_xcom_task_id = data_xcom_task_id
self.data_xcom_key = data_xcom_key
def execute(self, context):
data = context['ti'].xcom_pull(task_ids=self.data_xcom_task_id, key=self.data_xcom_key)
...
Parameters may not be required if you just want to hardcode them. It depends on your use case.
As your comment suggests, the return value from your custom operator is None, therefore your xcom_pull should expect to be empty.
Please use xcom_push explicitly, as the default behavior of airflow could change over time.
Related
I've been looking around and can't find a solution for my issue. I have a DAG that is mainly checking that the backups are correct, so 1 task connects to a MySql DB and the 2nd one connects to a Postgres. Once I get those counts I want to send those results to another task that checks whether or not they match:
def mysql_count_validator(**kwargs):
db_hook = MySqlHook(mysql_conn_id='MySQL_DB')
# Query to grab desired results:
df_mysql = db_hook.get_pandas_df('''
SELECT COUNT(*)
FROM `schema`.`table`;
''')
# Save query results in a variable:
return df_mysql
def postgres_count_validator(**kwargs):
db_hook = PostgresHook(postgres_conn_id='Postgres_DB')
# Query to grab desired results:
df_postgress = db_hook.get_pandas_df('''
SELECT COUNT(*)
FROM `schema`.`table`;
''')
# Save query results in a variable:
return df_postgres
def validator(**kwargs):
if df_mysql == df_postgres:
print('Matched')
else:
print('Not Matched!')
mysql_count_validator = PythonOperator(
task_id = 'mysql_count_validator',
python_callable = mysql_count_validator
)
postgres_count_validator = PythonOperator(
task_id = 'postgres_count_validator',
python_callable = postgres_count_validator
)
validator = PythonOperator(
task_id = 'validator',
python_callable = validator,
op_kwarg = {df_mysql, df_postgres}
)
[mysql_count_validator, postgres_count_validator] >> validator
I tried passing it to the Xcom since it's only one line per task, so the data is not that big; but still not luck. Is it the way I'm saving the query results that is causing the issue or am I missing anything else?
Thanks in advance!
Ok, so after some trial and error I was able to pass the variable into the 3rd task.
My issue was not calling the pull in the third function:
def validator(**kwargs):
df_mysql = kwargs['task_instance'].xcom_pull(task_ids='mysql_count_validator')
df_postgress = kwargs['task_instance'].xcom_pull(task_ids='postgres_count_validator')
if df_mysql == df_postgress:
print('Matched')
else:
print(f'Not Matched!\nMySQL: {df_mysql}\nPostgres: {df_postgress}')
validator = PythonOperator(
task_id = 'validator',
python_callable = validator,
provide_context = True
)
I would like to use a list (converted into a generator) to serve as a mock for my API calls (using unittest.mock). My function is:
def monitor_order(order_id)
order_info = client.get_order_status(order_id)
order_status = order_info['status']
while order_status != 'filled':
print('order_status: ', order_status)
time.sleep(5)
order_info = client.get_order_status(order_id)
order_status = order_info['status']
return order_info
My test function is:
#patch('my_package.client.get_order_status')
def test_monitor_order(mocked_get_order_status):
order_states = [
dict(status='open'),
dict(status='open'),
dict(status='filled'),
]
# Make into a generator
status_changes = (status for status in order_states)
mocked_get_order_status.return_value = next(order_states)
# Execute function to test
monitor_order("dummy_order")
However, I can see that the status is always 'open' when executing the test:
order_status: open
order_status: open
order_status: open
I think I understand why it's wrong, but how could I implement it correctly?
To achieve what you want, you can rewrite your test as follows:
#patch('my_package.client.get_order_status')
def test_monitor_order(mocked_get_order_status):
order_states = [
dict(status='open'),
dict(status='open'),
dict(status='filled'),
]
mocked_get_order_status.side_effect = order_states
# Execute function to test
monitor_order("dummy_order")
I have created an ObjectType using Python Graphene however in the query object to return the data, I don't know what the return should be from the resolver.
My code is below:
class RunLog(ObjectType):
status = String()
result = String()
log = List(String)
def resolve_status(self, resolve, run_id):
return r.hget("run:%i" % run_id, "status").decode('utf-8')
def resolve_result(self, resolve, run_id):
return r.hget("run:%i" % run_id, "result").decode('utf-8')
def resolve_log(self, resolve, run_id):
log_data = r.lrange("run:%i:log" % run_id, 0, -1)
log_data = [entry.decode('utf-8') for entry in log_data]
return log_data
class Query(ObjectType):
log_by_run_id = Field(RunLog, run_id=Int(required=True))
def resolve_log_by_run_id(root, info, run_id):
return ???
The RunLog object should read from a redis database and return the data at the relevant run_id.
I want to be able to execute the following query to get the data associated with that run:
{
logByRunId(runId: 1) {
status
result
log
}
}
What should the return be from 'resolve_log_by_run_id'? The Graphene documentation is not helpful.
Try returning a RunLog object, i.e.
def resolve_log_by_run_id(root, info, run_id):
# fetch values from Redis here using run_id (status, result, log)
run_log = RunLog(status=status, result=result, log=log)
return run_log
Also, since the arguments passed to your method are named, you should consider renaming your method something less restrictive like resolve_log or resolve_run_log. If you need to add another filter to your resolver, you won't need to add another method.
def images_custom_list(args, producer_data):
tenant, token, url = producer_data
url = url.replace(".images", ".servers")
url = url + '/' + 'detail'
output = do_request(url, token)
output = output[0].json()["images"]
custom_images_list = [custom_images for custom_images in output
if custom_images["metadata"].get('user_id', None)]
temp_image_list = []
for image in custom_images_list:
image_temp = ( { "status": image["status"],
"links": image["links"][0]["href"],
"id": image["id"], "name": image["name"]} )
temp_image_list.append(image_temp)
print json.dumps(temp_image_list, indent=2)
def image_list_detail(args, producer_data):
tenant, token, url = producer_data
url = url.replace(".images", ".servers")
uuid = args['uuid']
url = url + "/" + uuid
output = do_request(url, token)
print output[0]
I am trying to make the code more efficient and clean looking by utilizing the Python's function decoration. Since these 2 functions share the same first 2 lines, how could I make a function decorator with these 2 lines and have these 2 functions be decorated it?
here's a way to solve it:
from functools import wraps
def fix_url(function):
#wraps(function)
def wrapper(*args, **kwarg):
kwarg['url'] = kwarg['url'].replace(".images", ".servers")
return function(*args, **kwarg)
return wrapper
#fix_url
def images_custom_list(args, tenant=None, token=None, url=None):
url = url + '/' + 'detail'
output = do_request(url, token)
output = output[0].json()["images"]
custom_images_list = [custom_images for custom_images in output
if custom_images["metadata"].get('user_id', None)]
temp_image_list = []
for image in custom_images_list:
image_temp = ( { "status": image["status"],
"links": image["links"][0]["href"],
"id": image["id"], "name": image["name"]} )
temp_image_list.append(image_temp)
print json.dumps(temp_image_list, indent=2)
#fix_url
def image_list_detail(args, tenant=None, token=None, url=None):
uuid = args['uuid']
url = url + "/" + uuid
output = do_request(url, token)
print output[0]
sadly for you, you may notice that you need to get rid of producer_data, but have it split in multiple arguments because you cannot factorize that part of the code, as you'll anyway need to split it again in each of the functions. I chose to use keyword arguments (by setting a default value to None), but you could use positional arguments as well, your call.
BTW, note that it's not making the code more efficient, though it's helping in making it a bit more readable (you know that you're changing the URL the same way for both methods, and when you fix the URL changing part, it's done the same way everywhere), but it's making 2 more function calls each time you call the function, so it's in no way more "efficient".
N.B.: It's basically based over #joel-cornett's example (I wouldn't have used #wraps otherwise, just plain old double function decorator), I just specialized it. (I don't think he deserves a -1)
Please at least +1 his answer or accept it.
But I think a simpler way to do it would be:
def fix_url(producer_data):
return (producer_data[0], producer_data[1], producer_data[2].replace(".images", ".servers"))
def images_custom_list(args, producer_data):
tenant, token, url = fix_url(producer_data)
# stuff ...
def image_list_detail(args, producer_data):
tenant, token, url = fix_url(producer_data)
# stuff ...
which uses a simpler syntax (no decorator) and does only one more function call.
Like this:
from functools import wraps
def my_timesaving_decorator(function):
#wraps(function)
def wrapper(*args, **kwargs):
execute_code_common_to_multiple_function()
#Now, call the "unique" code
#Make sure that if you modified the function args,
#you pass the modified args here, not the original ones.
return function(*args, **kwargs)
return wrapper
I have this pattern already in use,
but I'm trying now to pass multiple parameters to the function
should I just add another parameter or is there another syntactical piece I'm
missing?
def newChannel(hname, cname):
pass
action = {'newChannel': (newChannel, hname),
'newNetwork': (newNetwork, cname) , 'loginError': (loginError, nName)}
handler, param = action.get(eventType)
handler(param)
How can i pass multiple params though?
like such......
action = { 'newChannel': (newChannel, hname, cname) }
is that correct?
EDIT:
Would this fly?
action = {'newChannelQueueCreated': (newChannelQueueCreated, (channelName, timestamp)), 'login':
(login, networkName), 'pushSceneToChannel': (pushSceneToChannel, channelName),
'channelRemovedFromNetwork': (channelRemovedFromNetwork, (channelName, timestamp))}
print ok
handler, getter = action.get(eventType(ok))
handler(*getter(ok))
Why not use a tuple?
action = {'newChannel': (newChannel, (hname, cname)),
'newNetwork': (newNetwork, (cname,)),
'loginError': (loginError, (nName,))}
handler, params = action.get(eventType)
handler(*params)