Usually things go other way - you use DataDog to monitor Airflow, but in my case I need to access DataDog metrics over some DataDog API from Airflow so I can send it to Snowflake table.
Idea it to use this table to build alerting system in ThoughtSpot Cloud when Kafka lag happens since ThoughtSpot Cloud doesn't support calling API, at least not from cloud version.
I'm googling over this options endlessly but not finding any more optimal and less complicated solution. Any advices is highly appreciated.
I know this is not a direct code related question but more a best practices. I have several azure HTTPfunctions running but they timeout due to long calculations. I have added Durable orchestrations but even they time out.
As certain processes are long and time consuming (aka training an AI model) I have switched to Azure VM. What I would like to add to this is the possibility to start an Python task from an HTTP request on my azure VM.
basically doing the exact same as the Azure HTTPFunctions. What would be the best way to do this, any great documentation or recommendations much appreciated. So running an API on my VM in Python.
I have a Docker based Python Function App running, which is connected to an Application Insights resource. I get all the usual metrics, but the Live Metrics fails telling me "Not available: your app is offline or using an older SDK".
I am using the azure-functions/python:4-python3.9-appservice image as a base. If I remember correctly I was able to view Live Metrics when I simply deployed a Function App via ZIP deploy, but since switching to Docker this option has disappeared. Online I'm not able to find the right information to fix this or to determine if it is even possible.
AFAIK, currently Live Metric Stream for Python is not supported.
The MSDOC says that currently supported languages are .NET, Java and Node.js.
For achieving this you can refer the alternate solution given by #AJG for that you have to create a LogHandler and write the messages into Cosmos DB container. It will stream into console.
I am new at using Apache Airflow.
Some operators of my dag have a failed status. I am trying to understand the origin of the error.
Here are the details of the problem:
My dag is pretty big, and certain parts of it are composed of sub-dags.
What I notice in the Composer UI, is that the Subdags that failed, all did in a task_id named download_file that uses XCom with a GoogleCloudStorageDownloadOperator.
>> GoogleCloudStorageDownloadOperator(
task_id='download_file',
bucket="sftp_sef",
object="{{task_instance.xcom_pull(task_ids='find_file') | first }}",
filename="/home/airflow/gcs/data/zips/{{{{ds_nodash}}}}_{0}.zip".format(table)
)
The logs in the said Subdag do not show anything useful.
LOG :
[2020-04-07 15:19:25,618] {models.py:1359} INFO - Dependencies all met
for [2020-04-07 15:19:25,660]
{models.py:1359} INFO - Dependencies all met for [2020-04-07 15:19:25,660]
{models.py:1577} INFO -
------------------------------------------------------------------------------- Starting attempt 10 of 1
[2020-04-07 15:19:25,685] {models.py:1599} INFO - Executing
on
2020-04-06T11:44:31+00:00 [2020-04-07 15:19:25,685]
{base_task_runner.py:118} INFO - Running: ['bash', '-c', 'airflow run
datamart_integration.consentement_email download_file
2020-04-06T11:44:31+00:00 --job_id 156313 --pool integration --raw -sd
DAGS_FOLDER/datamart/datamart_integration.py --cfg_path
/tmp/tmpacazgnve']
I am not sure if there is somewhere I am not checking... Here are my questions :
How do I debug errors in my Composer DAGs in general
Is it a good idea to create a local airflow environment to run &
debug my dags locally?
How do I verify if there are errors in XCom?
Regarding your three questions:
First, when using Cloud Composer you have several ways of debugging error in your code. According to the documentation, you should:
Check the Airflow logs.
These logs are related to single DAG tasks. It is possible to view them in the Cloud Storage's logs folder and in the Web Airflow interface.
When you create a Cloud Composer environment a Cloud Storage Bucket is also created and associate with it. Thus, Cloud Composer stores the logs for single DAG tasks in the logs folder inside this bucket, each workflow folder has a folder for its DAGs and sub-DAGs. You can check its structure here.
Regarding the Airflow web interface, it is refreshed every 60 seconds.Also, you can check more about it here.
Review the Google Cloud's operations suite.
You can use Cloud Monitoring and Cloud Logging with Cloud Composer. Whereas Cloud Monitoring provides visibility into the performance and overall health of cloud-powered applications, Cloud Logging shows the logs that the scheduler and worker containers produce. Therefore, you can use both or just the one you find more useful based on your need.
In the Cloud Console, check for errors on the pages for the Google Cloud components running your environment.
In the Airflow web interface, check in the DAG's Graph View for failed task instances.
Thus, these are the steps recommended when troubleshooting your DAG.
Second, regarding testing and debugging, it is recommended that you separate production and test environment to avoid DAG interference.
Furthermore, it is possible to test your DAG locally, there is a tutorial in the documentation about this topic, here. Testing locally allows you to identify syntax and task errors. However, I must point that it won't be possible to check/evaluate dependencies and communication to the database.
Third, in general, in order to verify errors in Xcom you should check:
If there is any error code/number;
Check with a sample code from the documentation if your syntax is correct;
Check if the packages if they are deprecated;
I would like to point that, according to this documentation, the path to GoogleCloudStorageDownloadOperator was updated to GCSToLocalOperator.
In addition, I also encourage you to have a look at this: code and documentation to check Xcom syntax and errors.
Feel free to share the error code with me if you need further help.
Coming from AWS, I am completely new to Azure in general and to Cloud Services spesifically.
I want to write a python application that leverages GPU on Azure in a PaaS architecture (Platform as a Service). The application will hopefully be deployed somewhere central, and then a number of GPU enabled nodes will spin up and run the application until it is done before closing down again.
I want to know, what is the shortest way to accomplish this in Azure?
Is my assumption correct that I will need to use what is called Cloud Services with a worker role, or will I have to create my own infrastructure based on single VMs running in IaaS?
It sounds like you created an application which need to do some general-purpose computing on GPU via Cuda or OpenCL. If so, you need to install GPGPU driver on Azure to support your Python application, so the Azure NC & NV Series VMs are suitable for this scenario like on AWS, as the figure below from here.
Hope it helps. Any concern, please feel free to let me know.