How to auth into BigQuery on Google Compute Engine? - python

What's the easiest way to authenticate into Google BigQuery when on a Google Compute Engine instance?

Make sure that your instance has the scope to access BigQuery first of all - you can decide this only at creation time.
in a bash script, get a oauth token by calling :
ACCESSTOKEN=`curl -s "http://metadata/computeMetadata/v1/instance/service-accounts/default/token" -H "X-Google-Metadata-Request: True" | jq ".access_token" | sed 's/"//g'`
echo "retrieved access token $ACCESSTOKEN"
now let's say you want a list of the data sets in a project :
CURL_URL="https://www.googleapis.com/bigquery/v2/projects/YOURPROJECTID/datasets"
CURL_OPTIONS="-s --header 'Content-Type: application/json' --header 'Authorization: OAuth $ACCESSTOKEN' --header 'x-goog-project-id:YOURPROJECTID' --header 'x-goog-api-version:1'"
CURL_COMMAND="curl --request GET $CURL_URL $CURL_OPTIONS"
CURL_RESPONSE=`eval $CURL_COMMAND`
the response in JSON format can be found in the variable CURL_RESPONSE
PS: I realize now that this question is tagged as Python, but same principles apply.

In Python:
AppAssertionCredentials is a python class that allows a Compute Engine instance to identify itself to Google and other OAuth 2.0 servers, withour requiring a flow.
https://developers.google.com/api-client-library/python/
The project id can be read from the metadata server, so it doesn't need to be set as a variable.
https://cloud.google.com/compute/docs/metadata
The following code gets a token using AppAssertionCredentials, the project id from the metadata server, and instantiates a BigqueryClient with this data:
import bigquery_client
import urllib2
from oauth2client import gce
def GetMetadata(path):
return urllib2.urlopen(
'http://metadata/computeMetadata/v1/%s' % path,
headers={'Metadata-Flavor': 'Google'}
).read()
credentials = gce.AppAssertionCredentials(
scope='https://www.googleapis.com/auth/bigquery')
client = bigquery_client.BigqueryClient(
credentials=credentials,
api='https://www.googleapis.com',
api_version='v2',
project_id=GetMetadata('project/project-id'))
For this to work, you need to give the GCE instance access to the BigQuery API when creating it:
gcloud compute instances create <your_instance_name> --scopes storage-ro bigquery

Related

notebook to execute Databricks job

Is there an api or other way to programmatically run a Databricks job. Ideally, we would like to call a Databricks job from a notebook. Following just gives currently running job id but that's not very useful:
dbutils.notebook.entry_point.getDbutils().notebook().getContext().currentRunId().toString()
To run a databricks job, you can use Jobs API. I have a databricks job called for_repro which I ran using the 2 ways provided below from databricks notebook.
Using requests library:
You can create an access token by navigating to Settings -> User settings. Under Access token tab, click generate token.
Use the above generated token along with the following code.
import requests
import json
my_json = {"job_id": <your_job-id>}
auth = {"Authorization": "Bearer <your_access-token>"}
response = requests.post('https://<databricks-instance>/api/2.0/jobs/run-now', json = my_json, headers=auth).json()
print(response)
The <databricks-instance> value from the above code can be extracted from your workspace URL.
Using %sh magic command script:
You can also use magic command %sh in your python notebook cell to run a databricks job.
%sh
curl --netrc --request POST --header "Authorization: Bearer <access_token>" \
https://<databricks-instance>/api/2.0/jobs/run-now \
--data '{"job_id": <your job id>}'
The following is my job details and run history for reference.
Refer to this Microsoft documentation to know all other operations that can be achieved using Jobs API.

I got some problem in python env when use GCP api key

I want to use google translation api but I have some problems.
My env is Linux ubuntu 18 and python with Atom idle
I was used gcloud to set my configuration and got auth login, auth login token.
export GOOGLE_APPLICATION_CREDENTIALS=//api_key.json
gcloud init
gcloud auth application-default login
gcloud auth application-default print-access-token
so I could use curl and got some test data
curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" --data
"{
'q': 'Hello world',
'q': 'My name is Jeff',
'target': 'de'
}" "https://translation.googleapis.com/language/translate/v2"
{
"data": {
"translations": [
{
"translatedText": "Hallo Welt",
"detectedSourceLanguage": "en"
},
{
"translatedText": "Mein Name ist Jeff",
"detectedSourceLanguage": "en"
}
]
}
}
When I run test code in Atom idle, my project number is wrong.
It is my past project.
Even I run test code in bash python, it is same situation
I dont know what is wrong, I just guess some problem in python env.
raised error
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 POST
https://translation.googleapis.com/language/translate/v2: Cloud Translation
API has not been used in project [wrong number] before or it is disabled.
Enable it by visiting
https://console.developers.google.com/apis/api/translate.googleapis.com
/overview?project=[wrong number] then retry. If you enabled this API
recently, wait a few minutes for the action to propagate to our systems and
retry.
This error message is usually thrown when the application is not being authenticated correctly due to several reasons such as missing files, invalid credential paths, incorrect environment variables assignations, among other causes. Since the Client Libraries need to pull the credentials data from the environment variable or the client object, it is required to ensure you are pointing to the correct authentication files. Keep in mind this issue might not occur when using CURL command because you were passing the access-token directly.
Based on this, I recommend you to confirm that you are using the JSON file credentials of your current project, as well as follow the Obtaining and providing service account credentials manually guide in order to explicitly specify your service account file directly into your code; In this way, you will be able to set it permanently and verify if you are passing the service credentials correctly. Additionally, you can take a look on Using Client Libraries guide that contains the step-by-step process required to use the Translation API with Python.
Passing the path to the service account key in code example:
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json('service_account.json')
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)

how to obtain GCR access token with python / listing docker images

The access token im getting with gcloud auth print-access-token is obviously a different access token than the one i can get with some basic python code:
export GOOGLE_APPLICATION_CREDENTIALS=/the-credentials.json
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
credentials.get_access_token()
What i am trying to do is get a token that would work with:
curl -u _token:<mytoken> https://eu.gcr.io/v2/my-project/my-docker-image/tags/list
I'd prefer not to install gcloud utility as a dependency for my app, hence my tries to obtain the access token progrmatically via oath google credentials
I know this is a very old question, but I just got faced with the exact same problem of requiring an ACCESS_TOKEN in Python and not being able to generate it, and managed to make it work.
What you need to do is to use the variable credentials.token, except it won't exist once you first create the credentials object, returning None. In order to generate a token, the credentials must be used by a google.cloud library, which in my case was done by using the googleapiclient.discovery.build method:
sqladmin = googleapiclient.discovery.build('sqladmin', 'v1beta4', credentials=credentials)
response = sqladmin.instances().get(project=PROJECT_ID, instance=INSTANCE_ID).execute()
print(json.dumps(response))
After which the ACCESS_TOKEN could be properly generated using
access_token = credentials.token
I've also tested it using google.cloud storage as a way to test credentials, and it also worked, by just trying to access a bucket in GCS through the appropriate Python library:
from google.oauth2 import service_account
from google.cloud import storage
PROJECT_ID = your_project_id_here
SCOPES = ['https://www.googleapis.com/auth/sqlservice.admin']
SERVICE_ACCOUNT_FILE = '/path/to/service.json'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
try:
list(storage.Client(project=PROJECT_ID, credentials=credentials).bucket('random_bucket').list_blobs())
except:
print("Failed because no bucket exists named 'random_bucket' in your project... but that doesn't matter, what matters is that the library tried to use the credentials and in doing so generated an access_token, which is what we're interested in right now")
access_token = credentials.token
print(access_token)
So I think there are a few questions:
gcloud auth print-access-token vs GoogleCredentials.get_application_default()
gcloud doesn't set application default credentials by default anymore when performing a gcloud auth login, so the access_token you're getting from gcloud auth print-access-token is going to be the one corresponding to the used you used to login.
As long as you follow the instructions to create ADC's for a service account, that account has the necessary permissions, and the environment from which you are executing the script has access to the ENV var and the adc.json file, you should be fine.
How to make curl work
The Docker Registry API specifies that a token exchange should happen, swapping your Basic auth (i.e. Authorization: Basic base64(_token:<gcloud_access_token>)) for a short-lived Bearer token. This process can be a bit involved, but is documented here under "How to authenticate" and "Requesting a Token". Replace auth.docker.io/token with eu.gcr.io/v2/token and service=registry.docker.io with service=eu.gcr.io, etc. Use curl -u oauth2accesstoken:<mytoken> here.
See also: How to list images and tags from the gcr.io Docker Registry using the HTTP API?
Avoid the question entirely
We have a python lib that might be relevant to your needs:
https://github.com/google/containerregistry

Running Job On Airflow Based On Webrequest

I wanted to know if airflow tasks can be executed upon getting a request over HTTP. I am not interested in the scheduling part of Airflow. I just want to use it as a substitute for Celery.
So an example operation would be something like this.
User submits a form requesting for some report.
Backend receives the request and sends the user a notification that the request has been received.
The backend then schedules a job using Airflow to run immediately.
Airflow then executes a series of tasks associated with a DAG. For example, pull data from redshift first, pull data from MySQL, make some operations on the two result sets, combine them and then upload the results to Amazon S3, send an email.
From whatever I read online, you can run airflow jobs by executing airflow ... on the command line. I was wondering if there is a python api which can execute the same thing.
Thanks.
The Airflow REST API Plugin would help you out here. Once you have followed the instructions for installing the plugin you would just need to hit the following url: http://{HOST}:{PORT}/admin/rest_api/api/v1.0/trigger_dag?dag_id={dag_id}&run_id={run_id}&conf={url_encoded_json_parameters}, replacing dag_id with the id of your dag, either omitting run_id or specify a unique id, and passing a url encoded json for conf (with any of the parameters you need in the triggered dag).
Here is an example JavaScript function that uses jQuery to call the Airflow api:
function triggerDag(dagId, dagParameters){
var urlEncodedParameters = encodeURIComponent(dagParameters);
var dagRunUrl = "http://airflow:8080/admin/rest_api/api/v1.0/trigger_dag?dag_id="+dagId+"&conf="+urlEncodedParameters;
$.ajax({
url: dagRunUrl,
dataType: "json",
success: function(msg) {
console.log('Successfully started the dag');
},
error: function(e){
console.log('Failed to start the dag');
}
});
}
A new option in airflow is the experimental, but built-in, API endpoint in the more recent builds of 1.7 and 1.8. This allows you to run a REST service on your airflow server to listen to a port and accept cli jobs.
I only have limited experience myself, but I have run test dags with success. Per the docs:
/api/experimental/dags/<DAG_ID>/dag_runs creates a dag_run for a given dag id (POST).
That will schedule an immediate run of whatever dag you want to run. It does still use the scheduler, though, waiting for a heartbeat to see that dag is running and pass tasks to the worker. This is exactly the same behavior as the CLI, though, so I still believe it fits your use-case.
Documentation on how to configure it is available here: https://airflow.apache.org/api.html
There are some simple example clients in the github, too, under airflow/api/clients
You should look at Airflow HTTP Sensor for your needs. You can use this to trigger a dag.
Airflow's experimental REST API interface can be used for this purpose.
Following request will trigger a DAG:
curl -X POST \
http://<HOST>:8080/api/experimental/dags/process_data/dag_runs \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '{"conf":"{\"START_DATE\":\"2018-06-01 03:00:00\", \"STOP_DATE\":\"2018-06-01 23:00:00\"}'
Following request retrieves a list of Dag Runs for a specific DAG ID:
curl -i -H "Accept: application/json" -H "Content-Type: application/json" -X GET http://<HOST>:8080/api/experimental/dags/process_data/dag_runs
For the GET API to work set rbac flag to True at airflow.cfg.
Following are the list of APIs available: here & there.
UPDATE: stable Airflow REST API released:
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
Almost everything stays the same, except API URL change.
Also "conf" is now required to be an object, so I added additional wrapping:
def trigger_dag_v2(self, dag_id, run_id=None, conf=None, execution_date=None):
endpoint = '/api/v1/dags/{}/dagRuns'.format(dag_id)
url = urljoin(self._api_base_url, endpoint)
data = self._request(url, method='POST',
json={
"run_id": run_id,
"conf": {'conf': json.dumps(event)},
"execution_date": execution_date,
})
return data['message']
OLD ANSWER:
Airflow has REST API (currently experimental) - available here:
https://airflow.apache.org/api.html#endpoints
If you do not want to install plugins as suggested in other answers - here is code how you can do it directly with the API:
def trigger_dag(self, dag_id, run_id=None, conf=None, execution_date=None):
endpoint = '/api/experimental/dags/{}/dag_runs'.format(dag_id)
url = urljoin(self._api_base_url, endpoint)
data = self._request(url, method='POST',
json={
"run_id": run_id,
"conf": conf,
"execution_date": execution_date,
})
return data['message']
More examples working with airflow API in python are available here:
https://github.com/apache/airflow/blob/master/airflow/api/client/json_client.py
I found this post while trying to do the same, after further investigation, I switch to ArgoEvents. It is basically the same but based on event-driven flows so it is much more suitable for this use case.
Link:
https://argoproj.github.io/argo
Airflow now has support for stable REST API. Using stable REST API, you can trigger DAG as:
curl --location --request POST 'localhost:8080/api/v1/dags/unpublished/dagRuns' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data-raw '{
"dag_run_id": "dag_run_1",
"conf": {
"key": "value"
}
}'

GitLab CE API: Check if used token has admin rights

I am working on a Python module for the GitLab API. Is there any possibility to check if the user with the private token in use has admin rights on the GitLab server?
One way would be to get something from the API, e.g. a single user and check, if it has the elements only the admin can see like two_factor_enabled. But is there a better, easier way?
According to the api help the is_admin key is now included for all single user api queries.
I just tested it with the api v4 on gitlab.com with the query:
curl --header "PRIVATE-TOKEN: Token" https://gitlab.com/api/v4/users/###
and the json answer included "is_admin":false for the specified user with the id ###.
Can run a query that works just for admins, for example:
curl --header "PRIVATE-TOKEN: Token" https://my.gitlab.host/api/v4/users?admins=true
Because admins=true works only for users with admin-rights, if you get any reply it means that token has admin-rights.
(from: GitLab Docs: Users API - For Administrators)
If you GET /users and pass the sudo parameter you will get a JSON back that includes an is_admin attribute with a boolean value. You could use that Here is the documentation

Categories