Usually things go other way - you use DataDog to monitor Airflow, but in my case I need to access DataDog metrics over some DataDog API from Airflow so I can send it to Snowflake table.
Idea it to use this table to build alerting system in ThoughtSpot Cloud when Kafka lag happens since ThoughtSpot Cloud doesn't support calling API, at least not from cloud version.
I'm googling over this options endlessly but not finding any more optimal and less complicated solution. Any advices is highly appreciated.
Related
I have several accounts in aws, which i control daily. In some of them i have eks clusters. Usually, if i want to see the available namespaces in the cluster, i’m logging into the cluster and then run kubectl get ns from windows terminal.
How can i use this command with python?
(I’m using boto3 to write my query in aws, and trying to do everything within boto3 module)
I’ve already entered inside the cluster, but using describe_cluster i don’t have that info i’m looking for
You can not get cluster resources with boto3. With additional libraries you can get k8s cluster resources with python. kubernetes, eks_token libraries can help you.
An example usage
I know this is not a direct code related question but more a best practices. I have several azure HTTPfunctions running but they timeout due to long calculations. I have added Durable orchestrations but even they time out.
As certain processes are long and time consuming (aka training an AI model) I have switched to Azure VM. What I would like to add to this is the possibility to start an Python task from an HTTP request on my azure VM.
basically doing the exact same as the Azure HTTPFunctions. What would be the best way to do this, any great documentation or recommendations much appreciated. So running an API on my VM in Python.
I have a Python script that pulls some data from an Azure Data Lake cluster, performs some simple compute, then stores it into a SQL Server DB on Azure. The whole shebang runs in about 20 seconds. It needs sqlalchemy, pandas, and some Azure data libraries. I need to run this script daily. We also have a Service Fabric cluster available to use.
What are my best options? I thought of containerizing it with Docker and making it into an http triggered API, but then how do I trigger it 1x per day? I'm not good with Azure or microservices design so this is where I need the help.
You can use Web Jobs in App Service. It has two types of Azure Web Jobs for you to choose: Continuous and Trigger. As I see you need the type Trigger
You could refer to the document here for more details.In addition, here shows how to run tasks in WebJobs.
Also, you can use Azure function timer-based on python which was made generally available in recent months.
I am running python pod on Kubernetes and only one main pod keeps restarting.
Memory is increasing continuously so that k8s restart it as implemented liveness, readiness there.
Using flask with python-3.5 & socket.io.
Is there any way i can do profiling on Kubernetes pod without doing code changes using installing any agent or any how. Please let me know.
I am getting Terminated with code 137.
Thanks in advance
You are using GKE right ?
You should use stackriver monitoring in order to profile, capture, analyze metrics and strackdriver logs in order to understand what's happening.
Stackdriver Kubernetes Engine Monitoring is the default option, starting with GKE version 1.14. It's really intuitive but some knowledge and understanding of the platform is required. You should be able to create a graph based on memory utilization.
Have a look at the documentation:
Stackdriver support for GKE
Stackdriver monitoring
If you want an open source solution, you can do this with Robusta. Disclaimer: I wrote this.
Essentially it injects tracemalloc into your pod on demand and sends you the results in Slack. No restart needed.
I'm new to Airflow and Python. I'm trying to connect Airflow with Google Sheets and although I have no problem connecting with Python, I do not know how I could do it from Airflow.
I have searched for information everywhere but I only find Python information with gspread or with BigQuery, but not with Google Sheets.
I would appreciate any advice or link.
As far as I know there is no gsheet hook or operator in airflow at the moment. If security is not a concern you could publish it to the web and pull it in airflow using the SimpleHttpOperator.
If security is a concern I recommend going the PythonOperator route and use df2gspread library. Airflow version >= 1.9 can help obtaining credentials for df2gspread