Python BigQuery API: how to get data asynchronously? - python

I am getting started with the BigQuery API in Python, following the documentation.
This is my code, adapted from an example:
credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
try:
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * FROM [mytable] LIMIT 10;"
)
}
query_response = query_request.query(
projectId=project_id,
body=query_data).execute()
for row in query_response['rows']:
print('\t'.join(field['v'] for field in row['f']))
The problem I'm having is that I keep getting the response:
{u'kind': u'bigquery#queryResponse',
u'jobComplete': False,
u'jobReference': {u'projectId': 'myproject', u'jobId': u'xxxx'}}
So it has no rows field. Looking at the docs, I guess I need to take the jobId field and use it to check when the job is complete, and then get the data.
The problem I'm having is that the docs are a bit scattered and confusing, and I don't know how to do this.
I think I need to use this method to check the status of the job, but how do I adapt it for Python? And how often should I check / how long should I wait?
Could anyone give me an example?

There is code to do what you want here.
If you want more background on what it is doing, check out Google BigQuery Analytics chapter 7 (the relevant snippet is available here.)
TL;DR:
Your initial jobs.query() call is returning before the query completes; to wait for the job to be done you'll need to poll on jobs.getQueryResults(). You can then page through the results of that call.

Related

how do i paginate my query in Athena using Lambda and Boto3

I am querying my data in Athena from lambda using Boto3.
My result is json format.
when I run my lambda function I get the whole record.
Now how can I paginate this data.
I only want to get fewer data per page and
send that small dataset to the UI to display.
Here is my Python code:
def lambda_handler(event, context):
athena = boto3.client('athena')
s3 = boto3.client('s3')
query = event['query']
# Execution
query_id = athena.start_query_execution(
QueryString=query,
QueryExecutionContext={'Database': DATABASE},
ResultConfiguration = {'OutputLocation': output}
)['QueryExecutionId']
I use postman to pass my query to get data and
I am aware of the SQl query LIMIT and OFFSET
but want to know if there is any other better way to pass LIMIT and OFFSET parameter in my function.
Please help me in this case.
Thanks.
A quick google search and found this answer in the Athena docs, which seems to be promising. Example from the docs
response_iterator = paginator.paginate(
QueryExecutionId='string',
PaginationConfig={
'MaxItems': 123,
'PageSize': 123,
'StartingToken': 'string'
})
I hope this helps!

Tweepy API: unable to get queries to return user_fields

I've got a python flask app whose job is to work with the Twitter V2.0 API. I got to using the Tweepy API in my app because I was having difficulty cold coding the 3 legged auth flow. Anyway, since I got that working, I'm now running into difficulties executing some basic queries, like get_me() and get_user()
This is my code:
client = tweepy.Client(
consumer_key=private.API_KEY,
consumer_secret=private.API_KEY_SECRET,
access_token=access_token,
access_token_secret=access_token_secret)
user = client.get_me(expansions='author_id', user_fields=['username','created_at','location'])
print(user)
return('success')
And this is invariably the error:
tweepy.errors.BadRequest: 400 Bad Request
The expansions query parameter value [author_id] is not one of [pinned_tweet_id]
Per the Twitter docs for this endpoint, this should certainly work...I fail to understand why I the 'pinned_tweet_id' expansion is the particular issue.
I'm left wondering if I'm missing something basic here or if Tweepy is just a POS and I should considering rolling my own queries like I originally intended.
Tweet Author ID
You may have read the Twitter Docs incorrectly as the expansions parameter value has only pinned_tweet_id, and the tweet fields parameter has the author_id value you're looking for. Here is a screenshot for better clarification:
The code would look like:
client = tweepy.Client(
consumer_key=private.API_KEY,
consumer_secret=private.API_KEY_SECRET,
access_token=access_token,
access_token_secret=access_token_secret)
user = client.get_me(tweet_fields=['author_id'], user_fields=[
'username', 'created_at', 'location'])
print(user)
return('success')
User ID
If you're looking for the user id then try omitting tweet_fields and add id in the user_fields also shown in the Twitter Docs.
The code would look like:
client = tweepy.Client(
consumer_key=private.API_KEY,
consumer_secret=private.API_KEY_SECRET,
access_token=access_token,
access_token_secret=access_token_secret)
user = client.get_me(user_fields=['id', 'username', 'created_at', 'location'])
print(user)
return('success')
You can obtain the user id with user.data.id.
The solution is to drop the 'expansions' kwag and leave 'user_fields' as is. I was further confused by the fact that printing the returned user object does not show the requested user_fields as part of the data attribute. You have to explicitly access them through the data attribute, as below.

API response does not return OrderArray.Order.SellerUserID

I have a question about ebay trading api.
I'm trying to get the information about my purchases so I can follow up on late/failed deliveries.
I have managed to get almost all of the information I need, however I just can't seem to work out how to get eaby-api to return the seller user id.
api = Trading(
config_file=None,
appid=load_settings['appid'],
certid=load_settings['certid'],
devid=load_settings['devid'],
token=load_settings['token'],
timeout=None
)
response = api.execute('GetOrders', {
'CreateTimeFrom': create_time_from,
'CreateTimeTo': create_time_to,
'OrderRole': 'Buyer',
'DetailLevel': 'ReturnAll',
'Pagination': {
'EntriesPerPage': 100,
'PageNumber': page
}
})
data = response.dict()
print(data)
I read in the docs that to get OrderArray.Order.SellerUserID you have to change the DetailLevel
However even if I set 'DetailLevel': 'ReturnAll' I do not get SellerUserID in my response.
Is there something I'm over looking?
https://developer.ebay.com/devzone/xml/docs/reference/ebay/getorders.html#DetailLevel
using eBay API getOrders, without any sdk
it returns correctly the SellerUserID even without setting DetailLevel to ReturnAll
Looks like the information was there, just not in the place the docs said.
I found it at
response.dict()["OrderArray"]["Order"]['MonetaryDetails']['Payments']['Payment']['Payee']

CloudKit Server-to-Server auth: Keep getting 401 Authentication failed

I have been recently exploring the CloudKit and related frameworks. I got the communication with my app working, as well as with my website using CloudKitJS. Where I am struggling is the Server-to-Server communication (which I would need for exporting data from public database in csv.
I have tried Python package requests-cloudkit, which others were suggesting. I have created a Server-to-Server token, and have copied only the key between START and END line once creating the eckey.pem file. I then got this code:
from requests_cloudkit import CloudKitAuth
from restmapper import restmapper
import json
KEY_ID = '[my key ID from CK Dashboard]'
SECRET_FILE_KEY = 'eckey.pem'
AUTH = CloudKitAuth(KEY_ID, SECRET_FILE_KEY)
PARAMS = {
'query':{
'recordType': '[my record type]'
},
}
CloudKit = restmapper.RestMapper("https://api.apple-cloudkit.com/database/1/[my container]/development/")
cloudkit = CloudKit(auth=AUTH)
response = cloudkit.POST.public.records.query(json.dumps(PARAMS))
I am then getting the 401 Authentication failed response. I am stuck on this for days, so I would be grateful for any help or advice. 😊
Creating the server-to-server key is an important first step, but in order to make HTTP requests after that, you have to sign each request.
Look for the Authenticate Web Service Requests section near the bottom of this documentation page.
It's a little bit convoluted, but you have to carefully construct signed headers to include with each request you make. I'm not familiar with how to do it in Python, but here's how I do it in NodeJS which may help:
//Get the timestamp in a very specific format
let date = moment().utc().format('YYYY-MM-DD[T]HH:mm:ss[Z]')
//Construct the subpath
let endpoint = '/records/lookup'
let path = '/database/1/iCloud.*****/development/public'
let subpath = path+endpoint
//Get the key file
let privateKeyFile = fs.readFileSync('../../'+SECRET_FILE_KEY, 'utf8')
//Make a string out of your JSON query
let query = {
recordType: '[my record type]'
}
let requestBody = JSON.stringify(query)
//Hash the query
let bodyHash = crypto.createHash('sha256').update(requestBody, 'utf8').digest('base64')
//Assemble the components you just generated in a special format
//[Current date]:[Request body]:[Web service URL subpath]
let message = date+':'+bodyHash+':'+subpath
//Sign it
let signature = crypto.createSign('RSA-SHA256').update(message).sign(privateKeyFile, 'base64')
//Assemble your headers and include them in your HTTP request
let headers = {
'X-Apple-CloudKit-Request-KeyID': KEY_ID,
'X-Apple-CloudKit-Request-ISO8601Date': date,
'X-Apple-CloudKit-Request-SignatureV1': signature
}
This is a bit hairy at first, but I just put all this stuff in a function that I reuse whenever I need to make a request.
Apple's documentation has pretty much been abandoned and it's hard to find good help with CloudKit Web Services these days.

Bigquery API python script returns no data in batch mode

I wrote a basic script to grab my GA import (data of yesterday) from Bigquery in the morning (6AM, UTC).
I ran it literally humdreds of times myself and I'm calling it the exact same way as I do in crontab.
Yet, one time out of two, the API call to the Bigquery API returns no data.
Have you experienced such issue?
#EDIT: sorry, i sent a slightly truncated messsage.
The code I use is more or less the official one available here:
https://cloud.google.com/bigquery/docs/data
service=get_service('bigquery', 'v2', scope, key_file_location, service_account_email)
#query_request = service.jobs()
query_response = sync_query(service,project_id, sql_query)
results = []
page_token = None
while True:
page = service.jobs().getQueryResults(
pageToken=page_token,
**query_response['jobReference']).execute(num_retries=2)
results.extend(page.get('rows', []))
page_token = page.get('pageToken')
if not page_token:
break
And here is there sync_query():
def sync_query(bigquery, project_id, query, timeout=60000, num_retries=5):
query_data = {
'query': query,
'timeoutMs': timeout,
}
return bigquery.jobs().query(
projectId=project_id,
body=query_data).execute(num_retries=num_retries)
Will try to increase the parameter even more

Categories