I'm new to Airflow and Python. I'm trying to connect Airflow with Google Sheets and although I have no problem connecting with Python, I do not know how I could do it from Airflow.
I have searched for information everywhere but I only find Python information with gspread or with BigQuery, but not with Google Sheets.
I would appreciate any advice or link.
As far as I know there is no gsheet hook or operator in airflow at the moment. If security is not a concern you could publish it to the web and pull it in airflow using the SimpleHttpOperator.
If security is a concern I recommend going the PythonOperator route and use df2gspread library. Airflow version >= 1.9 can help obtaining credentials for df2gspread
Related
Usually things go other way - you use DataDog to monitor Airflow, but in my case I need to access DataDog metrics over some DataDog API from Airflow so I can send it to Snowflake table.
Idea it to use this table to build alerting system in ThoughtSpot Cloud when Kafka lag happens since ThoughtSpot Cloud doesn't support calling API, at least not from cloud version.
I'm googling over this options endlessly but not finding any more optimal and less complicated solution. Any advices is highly appreciated.
I have several accounts in aws, which i control daily. In some of them i have eks clusters. Usually, if i want to see the available namespaces in the cluster, i’m logging into the cluster and then run kubectl get ns from windows terminal.
How can i use this command with python?
(I’m using boto3 to write my query in aws, and trying to do everything within boto3 module)
I’ve already entered inside the cluster, but using describe_cluster i don’t have that info i’m looking for
You can not get cluster resources with boto3. With additional libraries you can get k8s cluster resources with python. kubernetes, eks_token libraries can help you.
An example usage
I just learnt about GCP Composer and am trying to move the DAGs from my local airflow instance to cloud and had a couple of questions about the transition.
In local instance I used HiveOperator to read data from hive and create tables and write it back into hive. If I had to do this in GCP how would this be possible? Would I have to upload my data to Google Bucket and does the HiveOperator work in GCP?
I have a DAG which uses sensor to check if another DAG is complete, is that possible on Composer?
Yes, Cloud Composer is just managed Apache Airflow so you can do that.
Make sure that you use the same version of Airflow that you used locally. Cloud Composer supports Airflow 1.9.0 and 1.10.0 currently.
Composer have connection store. See menu Admin--> Connection. Check connection type available.
Sensors are available.
I'm working on a project where I need to create and manage clusters, pods, services and deployments on google container engine.I have googled a lot to find an API for that, Google's Container engine REST API is available, is there any python client for that API? what I need exactly.
Help me, please!
Thanks in advance!
On this page you can find information about using Python including installation of the client library and
Google Container Engine API: The Google Container Engine API is used
for building and managing container based applications, powered by the
open source Kubernetes technology.
This page contains information about getting started with the Google
Container Engine API using the Google API Client Library for Python.
In addition, you may be interested in the following documentation.
More generally there is this page about Google APIs and Python libraries and a getting started using Python in GCE example on Github.
Struggling to finish building out a python script intended to pull a few lines of text from 3 different columns in a Google spreadsheet.
When I run the script, I get the following error message:
File "pr_email_robot.py", line 2, in <module>
import gspread
ImportError: No module named gspread
Pats-MacBook-Pro:pr-email-robot-master patahern$
The area of code that must be off is:
import smtplib
import gspread
from gmail_variables import *
gc = gspread.login(GMAIL_USERNAME, GMAIL_PASSWORD)
wks = gc.open("PR-Command-Line-Emails").sheet1
recipients = wks.get_all_values()
I'm guessing that I have the wrong terminology to pull the Google Spreadsheet, but I can't find anything online about what to put in place of "gspread"
Thanks in advance for your help!
Have you tried using Google Fusion Tables (still in beta)? You can query Google Fusion Tables' Rest API using SQL syntax and urllib's urllib.request.urlopen() to issue GET and POST requests. If you are heart set on using Google Sheets, the Google Sheets API looks like it functions in much of the same way as Google Fusion Tables REST API. Meaning, you can still use the built in urllib Python library to issue GET and POST requests to the Sheets API.
It may also be helpful to note that Google has posted a Getting Started in Python Guide for the Google Sheets API. The example shown there, has no mention of any "gspread" imports.
You need to make sure you have the gspread module installed somewhere your Python installation will find it.
If you have pip:
pip install gspread
will make sure that it's installed.
ALSO
Gspread no longer supports using email/login for authentication, and relies on OAuth2 for authentication.
Check out how to set that up here.
Then check out the gspread docs for how to access info from your sheet!