I'm trying to update a view in bigquery via python. I've been able to create the view using the following approach;
def createView(client):
viewDataset = 'dataset'
viewName = 'name'
view_ref = client.dataset(viewDataset).table(viewName)
view = bigquery.Table(view_ref)
view_sql = """
select * from '{}.{}' where thing = 2
"""".format(viewDataSet, viewName)
view.view_query = view_sql
client.create_table(view)
(Code for explanation purposes)
This worked fine and created the view. I then wanted to run a function that updates the view definition. I reused the same code and it failed with an error saying the view exists already - this makes sense. I then followed this example here;
https://cloud.google.com/bigquery/docs/managing-views
Using the code to update a views SQL query. Basically I swapped the line
client.create_table(view)
for
client.update_table(view)
I get an error saying I have not added the fields attribute... Being a view, I though I wouldn't have to do this.
Can anyone tell me the correct way to use python to update an existing bigquery view?
Cheers
Look! You are using:
"select * from '{}.{}' where thing = 2"
Notice this:
from '{}.{}'
But a table should be referenced as:
from '{}.{}.{}'
This piece of code works to me:
from google.cloud import bigquery
if __name__ == "__main__":
client = bigquery.Client()
dataset_view_id= 'dataset_name'
table_view_id = 'view_name'
view = bigquery.Table(client.dataset(dataset_view_id).table(table_view_id))
##############
###what was in that table? request table info
##############
get_view = client.get_table(view) # API Request
# Display OLD view properties
print('View at {}'.format(get_view.full_table_id))
print('View Query:\n{}'.format(get_view.view_query))
##############
#update the table:
##############
sql_template = (
'SELECT * FROM `{}.{}.{}` where disease="POLIO" ')
source_project_id = "project_from_the_query"
source_dataset_id = "dataset_from_the_query"
source_table_id = "table_from_the_query"
view.view_query = sql_template.format(source_project_id, source_dataset_id, source_table_id)
view = client.update_table(view, ['view_query']) # API request
##############
#Now print the view query to be sure it's been updated:
##############
get_view = client.get_table(view) # API Request
# Display view properties
print('\n\n NEW View at {}'.format(get_view.full_table_id))
print('View Query:\n{}'.format(get_view.view_query))
# [END bigquery_get_view]
Related
I'm trying to download data from Hubspot with the Python helper library from several tables
from hubspot import HubSpot
api_client = HubSpot()
api_client.access_token = 'TOKEN'
contacts = api_client.crm.contacts.get_all()
tickets = api_client.crm.tickets.get_all()
deals = api_client.crm.deals.get_all()
and so on...
Instead of calling every table in a separate way, I was thiniking about looping over a list like this:
def getting_tables (table, api_client):
return api_client.crm.table.get_all()
api_client = HubSpot()
api_client.access_token = 'TOKEN'
tables = ['contacts', 'tickets', 'deals', 'owners' ]
for table in tables:
table = getting_tables(table,api_client)
but when I call api_client.crm.table.get_all() it doesnt take "table" as a place holder for what it comes when I call the function.
How could I do that? Is that possible?
I know I could just call them all separately, but this is for learning purposes mostly.
I have Django app which creates collections in MongoDB automatically. But when I tried to integrate the delete functionality, collections that are created with delete functionality are not deleted. Collections that are automatically created are edited successfully. This method is called in another file, with all parameters.
An interesting thing to note is when I manually tried to delete via python shell it worked. I won't be deleting the collections which are not required anymore.
import pymongo
from .databaseconnection import retrndb #credentials from another file all admin rights are given
mydb = retrndb()
class Delete():
def DeleteData(postid,name):
PostID = postid
tbl = name + 'Database'
liketbl = PostID + 'Likes'
likecol = mydb[liketbl]
pcol = mydb[tbl]
col = mydb['fpost']
post = {"post_id":PostID}
ppost = {"PostID":PostID}
result1 = mydb.commentcol.drop() #this doesn't work
result2 = mydb.likecol.drop() #this doesn't work
print(result1,'\n',result2) #returns none for both
try:
col.delete_one(post) #this works
pcol.delete_one(ppost) #this works
return False
except Exception as e:
return e
Any solutions, I have been trying to solve this thing for a week.
Should I change the database engine as Django doesn't support NoSQL natively. Although I have written whole custom scripts that do CRUD using pymongo.
I'm learning django and I want to feed my django db with https://pokeapi.co API so i can make a drop down list on HTML with every pokemon name up to date.
fetchnames.py
import requests as r
def nameslist():
payload = {'limit':809}
listpokemons = []
response = r.get('https://pokeapi.co/api/v2/pokemon', params=payload)
pokemons = response.json()
for line in pokemons['results']:
listpokemons.append(line['name'])
return listpokemons
### Function that request from API and returns a list of pokemon names (['Bulbassaur', 'Ivyssaur',...)
core_app/management/commands/queryapi.py
from core_app.models import TablePokemonNames
from core_app.fetchnames import nameslist
class FetchApi(BaseCommand):
help = "Update DB with https://pokeapi.co/"
def add_model_value(self):
table = TablePokemonNames()
table.names = nameslist()
table.save()
core_app/models.py
class TablePokemonNames(models.Model):
id = models.AutoField(primary_key=True)
names = models.CharField(max_length=100)
i'm pretty sure that i'm missing a lot since i'm still learning to use django and i'm still confuse on how should i use django commands, but, i tried to make a django command with nameslist() function and nothing happend on the db, there is something wrong with using a list to feed a db?
I am writing my Python API using Flask. This API accept only 1 parameter called questionID. I would like it to accept a second parameter called lastDate. I tried to look around on how to add this parameter, but couldn't find a good method to do this. My current code looks as follows:
from flask import Flask, request
from flask_restful import Resource, Api, reqparse
from sqlalchemy import create_engine
from json import dumps
from flask_jsonpify import jsonify
import psycopg2
from pandas import read_sql
connenction_string = "DB Credentials'";
app = Flask(__name__)
api = Api(app)
class GetUserAnswers(Resource):
def get(self, questionID):
conn = psycopg2.connect(connenction_string);
cursor = conn.cursor();
userAnswers = read_sql('''
select * from <tablename> where questionid = ''' + "'" + questionID + "' order by timesansweredincorrectly desc limit 15" +'''
''', con=conn)
conn.commit();
conn.close();
result = {}
for index, row in userAnswers.iterrows():
result[index] = dict(row)
return jsonify(result)
api.add_resource(GetUserAnswers, '/GetUserAnswers/<questionID>')
if __name__ == '__main__':
app.run(port='5002')
Question 1: I'm guessing I can accept the second parameter in the get definition. If this is not true, how should I accept the second parameter?
Question 2: How do I modify the api.add_resource() call to accept the second parameter?
Question 3: I currently use http://localhost:5002/GetUserAnswers/<some question ID> to call this API from the browser. How would this call change with a second parameter?
I have never developed an API before, so any help would be much appreciated.
If you want to add multiple parameters within the url path for example:
http://localhost:5002/GetUserAnswers/<question_id>/answers/<answer_id>
Then you need to add multiple parameters to your get method:
def get(self, question_id, answer_id):
# your code here
But if you instead want to add multiple query parameters to the url for example:
http://localhost:5002/GetUserAnswers/<question_id>?lastDate=2020-01-01&totalCount=10>
Then you can use request arguments:
def get(self, question_id):
lastDate = request.args.get('lastDate')
totalCount = request.args.get('totalCount')
# your code here
Consider several adjustments to your code:
For simpler implementation as you have, use decorators in Flask API and avoid need to initialize and call the class object;
Use parameterization in SQL and avoid the potentially dangerous and messy string concatenation;
Avoid using the heavy data analytics library, pandas, and its inefficient row by row iterrows loop. Instead, handle everything with cursor object, specifically use DictCursor in psycopg2;
Refactored Python code (adjust assumption of how to use lastDate):
#... leave out the heavy pandas ...
app = Flask(__name__)
#app.route('/GetUserAnswers', methods= ['GET'])
def GetUserAnswers():
questionID = request.args.get('questionID', None)
lastDate = request.args.get('lastDate', None)
conn = psycopg2.connect(connenction_string)
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
userAnswers = '''SELECT * FROM <tablename>
WHERE questionid = %s
AND lastdate = %s
ORDER BY timesansweredincorrectly DESC
LIMIT 15
'''
# EXECUTE SQL WITH PARAMS
cur.execute(userAnswers, (questionID, lastDate))
# SAVE TO LIST OF DICTIONARIES
result = [dict(row) for row in cur.fetchall()]
cur.close()
conn.close()
return jsonify(result)
if __name__ == '__main__':
app.run(port='5002')
Browser Call
http://localhost:5002/GetUserAnswers?questionID=8888&lastDate=2020-01-08
I have some code that automatically generates a bunch of different SQL queries that I would like to insert into the bigquery to generate views, though one of the issues that I have is that these views need to be generated dynamically every night because of the changing nature of the data. So what I would like to be able to do is use the google bigquery api for python to be able to make a view. I understand how to do it using the 'bq' command line tool, but I'd like to be able to have this built directly into the code as opposed to using a shell to run bq. I have played with the code provided at
https://cloud.google.com/bigquery/bigquery-api-quickstart
I don't understand how to use this bit of code to create a view instead of just returning the results of a SELECT statement. I can see the documentation about doing table inserts here
https://cloud.google.com/bigquery/docs/reference/v2/tables/insert
but that refers to using the REST API to generate new tables as opposed to the example provided above.
Is it just not possible? Should I just give in and use bq?
Thanks
*** Some additional questions in response to Felipe's comments.
The table resource document indicates that there are a number of required fields, some of which make sense even if I don't fully understand what they're asking for, others do not. For example, externalDataConfiguration.schema. Does this refer to the schema for the database that I'm connecting to (I assume it does), or the schema for storing the data?
What about externalDataConfiguration.sourceFormat? Since I'm trying to make a view of a pre-existing database, I'm not sure I understand how the source format is relevant. Is it the source format of the database I'm making a view from? How would I identify that?
ANd externalDataConfiguration.sourceUris[], I'm not importing new data into the database, so I don't understand how this (or the previous element) are required.
What about schema?
tableReference.datasetId, tableReference.projectId, and tableReference.tableId are self explanatory.
Type would be view, and view.query would be the actual sql query used to make the view. So I get why those are required for making a view, but I don't understand the other parts.
Can you help me understand these details?
Thanks,
Brad
Using https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert
Submit something like below, assuming you add the authorization
{
"view": {
"query": "select column1, count(1) `project.dataset.someTable` group by 1",
"useLegacySql": false
},
"tableReference": {
"tableId": "viewName",
"projectId": "projectName",
"datasetId": "datasetName"
}
}
Alternatively in Python using, assuming you have a service key setup and the environmental variable GOOGLE_APPLICATION_CREDENTIALS=/path/to/my/key. The one caveat is that as far as I can tell this can only create views using legacy sql, and as an extension can only be queried using legacy sql, though the straight API method allows legacy or standard.
from google.cloud import bigquery
def create_view(dataset_name, view_name, project, viewSQL):
bigquery_client = bigquery.Client(project=project)
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(view_name)
table.view_query = viewSQL
try:
table.create()
return True
except Exception as err:
print(err)
return False
Note: this changed a little bit with 0.28.0 of the library - see the following for further details:
Google BigQuery: creating a view via Python google-cloud-bigquery version 0.27.0 vs. 0.28.0
my example function
# create a view via python
def create_view(dataset_name, view_name, sqlQuery, project=None):
try:
bigquery_client = bigquery.Client(project=project)
dataset_ref = bigquery_client.dataset(dataset_name)
table_ref = dataset_ref.table(view_name)
table = Table(table_ref)
table.view_query = sqlQuery
table.view_use_legacy_sql = False
bigquery_client.create_table(table)
return True
except Exception as e:
errorStr = 'ERROR (create_view): ' + str(e)
print(errorStr)
raise
Everything that web UI or the bq tool does is made through the BigQuery API, so don't give up yet :).
Creating a view is akin to creating a table, just be sure to have a table resource that contains a view property when you call tables.insert().
https://cloud.google.com/bigquery/querying-data#views
https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
bigquery.version -> '1.10.0'
def create_view(client, dataset_name, view_name, view_query):
try:
dataset_ref = client.dataset(dataset_name)
view = dataset_ref.table(view_name)
# view.table_type = 'VIEW'
view.view_query = view_query
view.view_query_legacy_sql = False
client.create_table(view)
pass
except Exception as e:
errorStr = 'ERROR (create_view): ' + str(e)
print(errorStr)
raise
create a table not a view !!!!
This is the right code to create a view:
def create_view(client, dataset_name, view_name, view_query):
try:
dataset_ref = client.dataset(dataset_name)
view_ref = dataset_ref.table(view_name)
table = bigquery.Table(view_ref)
table.view_query = view_query
table.view_use_legacy_sql = False
client.create_table(table)
except Exception as e:
errorStr = 'ERROR (create_view): ' + str(e)
print(errorStr)
raise
Is necessary
table = bigquery.Table(view_ref)