How to get dataset information via the Python API - python

I'm trying to get dataset information via the Python client libraries. In the BigQuery UI I can see the created date, data location etc are set, but when I try to get it via the API, it's just returning None. In the documentation, it says "(None until set from the server)", but if I can see it on the UI, I assumed (presumably wrongly) that it was set.
Here's my code, what am I doing wrong?
dataset_ref = client.dataset('myDatasetName')
dataset_info = bigquery.dataset.Dataset(dataset_ref)
print(dataset_info.created)

You were almost there.
dataset_ref = client.dataset('myDatasetName')
dataset_info = client.get_dataset(dataset_ref)
print(dataset_info.created)

Related

Better way to import data from REST API to SQL DB using Python?

I've written some python code to extract data from a rest api and load them in an Azure SQL database. But this process is taking almost half a hour for 20,000 lines. Is there a more efficient way of doing this? I'm thinking maybe extract the data as json file and put it in blob storate then use azure data factory to load the data into SQL, but have no idea how to code this way.
def manualJournalLineItems(tenantid):
endpoint = "api.xro/2.0/manualjournals/?page=1"
result = (getAPI(endpoint,token,tenantid))
page = 1
while result['ManualJournals']:
endpoint = "api.xro/2.0/manualjournals/?page="+str(page)
result = (getAPI(endpoint,token,tenantid))
for inv in result['ManualJournals']:
for li in inv['JournalLines']:
cursor.execute("INSERT INTO [server].dbo.[Xero_ManualJournalLines](ManualJournalID,AccountID,Description,LineAmount,TaxAmount,AccountCode,Region) VALUES(?,?,?,?,?,?,?)",inv['ManualJournalID'],li['AccountID'],li.get('Description',''),li.get('LineAmount',0),li.get('TaxAmount',0),li.get('AccountCode',0),tenantid)
conn.commit()
page = int(page)+1
If Python is not a mandatory requirement, yes, you can use Data Factory.
You will need to create a pipeline with the following components:
'Copy Data' Activity
Source Dataset (REST API)
Sink Dataset (Azure SQL)
** Also may I know where is your REST API hosted? Is it within Azure through App Service? If not, you will also need to setup a [Self-Hosted Integration Runtime]1
You can refer to the steps here which copies data from Blob storage to Azure SQL
You can also follow my screenshots below which is to create REST API as a Source.
Create a new pipeline.
Type 'copy' in the 'Activity' search box. Drag the 'Copy Data' activity to the pipeline
Click on 'Source' tab, and click on 'New' to create a new Source Dataset.
Type 'REST' in the 'data source' search box.
In the 'REST' dataset window, click on 'Connection' tab. Click on 'New' to create a linked service to point to the REST API.
Here fill up the credentials to the REST API.
Continue setting up the Sink dataset to point to the Azure SQL and test out your pipeline to make sure it works. Hope it helps!
Found the answer. append() the values to a list and insert the list into SQL with executemany()

How to retrieve all documents in a collection from firebase in Python

I am unable to retrieve the documents which are available in my collection inside the firestore database. Here is my code.
Every time I run this console dosen't print anything. I am following the documentation avaliable on this link https://firebase.google.com/docs/firestore/query-data/get-data, but it dosen't seems to work.
database_2 = firestore.client()
all_users_ref_2 = database_2.collection(u'user').stream()
for users in all_users_ref_2:
print(u'{} => {}'.format(users.id, users.to_dict()))
Do you have multiple projects? If so, double check that you open a client to the correct project. One quick way to confirm is to pass the project ID to the client:
db = firestore.Client('my-project-id')
Could be an authentication issue, you could download a service account key and use that in your project at the top.
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALs"] = "path/to/key.json"
or as mentioned
database_2 = firestore.Client("<project ID>")
make sure Client is a capital C

Bigquery Python API create partitioned table by specific field

I need to create a table in Bigquery partitioned by a specific field. I have noticed that this is only available via API Rest. Is there a way to do this via Python API?
Any help?
My guess is that the docs just haven't been updated yet (not that rolling a http request and calling the API would be hard anyway), because if you look at the code for the BigQuery Python client library, it does indeed appear to support specifying the field when creating a partitioned table:
Expanding on Graham Polley's answer: You can set this by setting the time_partitioning property.
Something like this:
import google.cloud.bigquery as bq
bq_client = bq.Client()
dataset = bq_client.dataset('dataset_name')
table = dataset.table('table_name')
table = bq.Table(table, schema=[
bq.SchemaField('timestamp', 'TIMESTAMP', 'REQUIRED'),
bq.SchemaField('col_name', 'STRING', 'REQUIRED')])
table.time_partitioning = bq.TimePartitioning(field='timestamp')
bq_client.create_table(table)

Firebase database data to R

I have a database in Google Firebase that has streaming sensor data. I have a Shiny app that needs to read this data and map the sensors and their values.
I am trying to pull the data from Firebase into R, but couldn't find any package that does this. The app is currently running on local downloaded data.
I found the FireData package, but have no idea how it works.
I do know that you can pull data from Firebase with Python, but I don't know enough Python to do so, but I would be willing to code it in R with rPython if necessary.
I have:
- The Firebase project link
- The username
- The password
Has anyone tried Firebase and R / Shiny in the past?
I hope my question is clear enough.
The basics to get started with the R package fireData are as follows. First you need to make sure that you have set up a firebase account on GCP (Google Cloud Platform). Once there set up a new project and give it a name
Now that you have a project select the option on the overview page that says "Add Firebase to your web app". It will give you all the credential information you need.
[
One way of dealing with this kind of information in R is to add it to an .Renviron file so that you do not need to share it with your code (for example if it goes to github). There is a good description about how to manage .Renviron files in the Efficient R Programming Book.
API_KEY=AIzaSyBxxxxxxxxxxxxxxxxxxxLwX1sCBsFA
AUTH_DOMAIN=stackoverflow-1c4d6.firebaseapp.com
DATABASE_URL=https://stackoverflow-1c4d6.firebaseio.com
PROJECT_ID=stackoverflow-1c4d6
This will be available to your R session after you restart R (if you have made any changes).
So now you can try it out. But first, change the rules of your firebase Database to allow anyone to make changes and to read (for these examples to work)
Now you can run the following examples
library(fireData)
api_key <- Sys.getenv("API_KEY")
db_url <- Sys.getenv("DATABASE_URL")
project_id <- Sys.getenv("PROJECT_ID")
project_domain <- Sys.getenv("AUTH_DOMAIN")
upload(x = mtcars, projectURL = db_url, directory = "new")
The upload function will return the name of the document it saved, that you can then use to download it.
> upload(x = mtcars, projectURL = db_url, directory = "main")
[1] "main/-L3ObwzQltt8IKjBVgpm"
The dataframe (or vector of value) you uploaded will be available in your Firebase Database Console immediately under that name, so you can verify that everything went as expected.
Now, for instance, if the name that was returned read main/-L3ObwzQltt8IKjBVgpm then you can download it as follows.
download(projectURL = db_url, fileName = "main/-L3ObwzQltt8IKjBVgpm")
You can require authentication, once you have created users. For example, you can create users like so (the users appear in your firebase console).
createUser(projectAPI = api_key, email = "test#email.com", password = "test123")
You can then get their user information and token.
registered_user <- auth(api_key, email = "test#email.com", password = "test123")
And then use the tokenID that is returned to access the files.
download(projectURL = db_url, fileName = "main/-L3ObwzQltt8IKjBVgpm",
secretKey = api_key,
token = registered_user$idToken)

Extract metadata about table using BigQuery Client API

I have a table in a BigQuery dataset and I'm trying to find out when the table was last modified via the BigQuery client API.
I have tried (in Python)
from gcloud import bigquery
client = bigquery.Client(project="my_project")
dataset = client.dataset("my_dataset")
tables = dataset.list_tables()
table = tables[0][5] # Extract the table that I want
I can check that I've got the right table by running print(table.name), however I don't know how to get the table metadata. In particular, I want to know how to find out when the table was last modified.
Although, I've written the above in Python (I'm more familiar with it than other programming languages) I don't mind if the answer is in Python or Javascript (I think I'm going to have to implement it in the latter).
Under the hood, tables = dataset.list_tables() is making an API request to Tables.list. The result of this request does not contain all the table meta information - like last modified for example.
The Tables.get API request is needed for this type of table information. To make this request you need to call reload() on the table. For example:
bigquery_service = bigquery.Client()
dataset = bigquery_service.dataset("<your-dataset>")
tables = dataset.list_tables()
for table in tables:
table.reload()
print(table.modified)
In my test/dataset, this prints:
2016-12-30 08:57:15.679000+00:00
2016-12-18 23:57:24.570000+00:00
2016-12-19 05:18:28.371000+00:00
See here (Github) and here (Python docs) for more details.

Categories