Automatically Importing Data From a Website in Google Cloud - python

I am trying to find a way to automatically update a big query table using this link: https://www6.sos.state.oh.us/ords/f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:1
This link is updated with new data every week and I want to be able to replace the Big Query table with this new data. I have researched that you can export spreadsheets to Big Query, but that is not a streamlined approach.
How would I go about submitting a script that imports the data and having that data be fed to Big Query?

I assume you already have a working script that parses the content of the URL and places the contents in BigQuery. Based on that I would recommend the following workflow:
Upload the script as a Google Cloud Function. If your script isn't written in a compatible language (i.e. Python, Node, Go), you can use Google Cloud Run instead. Set the Cloud Function to be triggered by a Pub/Sub message. In this scenario, the content of your Pub/Sub message doesn't matter.
Set up a Google Cloud Scheduler job to (a) run at 12am every Saturday (or whatever time you wish) and (b) send a dummy message to the Pub/Sub topic that your Cloud Function is subscribed to.

You can try using a HTTP request to the page using a programming language like Python with the Request library, save the data into a Pandas Dataframe or a CSV file, and then using the BigQuery libraries you can push that data into a BigQuery table.

Related

Google BigQuery + Python

I need to do an exploratory analysis using python over two tables that are in a Google BigQuery database.
The only thing I was provided is a JSON file containing some credentials.
How can I access the data using this JSON file?
This is the first time I try to do something like this, so I have no idea on how to do it.
I tried reading different tutorials and documentations, but nothing worked.

How do I extract data from Data Layer via Google Tag Manager/Analytics API

Data Layer Has been deployed to my website. However, I'm struggling to find a way to extract the data which is held in Google Tag Manager, for it to be used in PowerBI, etc
Preferably, I would like to use python to create this ETL pipeline. Can you give me a direction to look into?
The general Google Tag Manager seems to be used for configuring the data layers/variable/accounts etc. With no clear way to extract the data
Google Tag Manager doesnot holds the data. It's a tag manager, which means it contains Javascript codes known as tags to send the data to Google Analytics. Datalayer is the JS object that exists on your web platform and contains metadata that you push from backend/frontend.
The data is actually collected in Google Analytics.
So, if you want to pull the data from Google Analytics, you have two options
If you are a GA360 premium customer: you can use Big Query data dump to export that data from BQ to the internal Data warehouse. You can unflatten the structure of the data(GA3) and get the pipeline started. You can also use a direct BQ connector in Power BI
If you use GA free( universal analytics) - Your only hope is to use GA API to pull the data in the warehouse.
If you use GA4 - then GA4 has Big Query connection for the free version as well, you can use BQ to pull the data or connect BQ as a connector in Power BI
Great Question.
Please refer to the documentation on
https://developers.google.com/tag-platform/tag-manager/web/datalayer

Is there a way to serverlessly update a big aggregated table daily?

I am trying to build a big aggregated table with googles tools but I am a bit lost on the 'how to do it'.
Here is what I would like to create: I have a big table in bigquery. Its updated daily with about 1.2M events for evert user of the application. I would like to have an auto updating aggregate table(udpated once every day) built upon that with all user data broken by userID. But how do I continiously update the data inside of it?
I read a bit about firebase and bigquery but since they are very new to me I cant figure out if this is possible to do serverlessly?
I know how to do it with a jenkins process that queries the big events table for the last day, gets all userIDs, joins with the data from the existing aggregate values for the userIDs, takes whatever is changed and deletes from the aggregate in order to insert the updated data. (In python)
The issue is I want to do this entirely within the structure of google. Is firebase able to do that? Is bigquery able? How? What tools? Could this be solved using the serverless functions available?
I am more familiar with Redshift.
You can use a pretty new BigQuery feature that to schedule queries.
I use it to create rollup tables. If you need more custom stuff you can use cloud scheduler to call any Google product that can be triggered by an HTTP request such as cloud function, cloud run, or app engine.

Unable to get campaigns data to push to database in google adwords api

I am facing a couple of issues in figuring out what-is-what, in spite of the humungous documentation I am unable to figure out these issues
1.Which report type should be used to get the campaign level totals. I am trying to get the data in the format of headers
-campaign_id|campaign_name|CLicks|Impressions|Cost|Conversions.
2.I have tried to use "CAMPAIGN_PERFORMANCE_REPORT" but I get broken up information at a keyword level, but I am trying to pull the data at a campaign level.
3.I also need to push the data to a database. In the API documentation, i get samples which will either print the results on my screen or it will create a file on my machine. is there a way where I can get the data in JSON to push it to the database.
4.I have 7 accounts on my MCC account as of now, the number will increase in the coming days. I don't want to manually hard code the client customer ids into my code as there will be new accounts which will be created. is there a way where I can get the list of client customer ids which are on my MCC ac
I am trying to get this data using python as my code base and adwords api V201710.
To retrieve campaign performance data you need to run a campaign_performance_report. Follow this link to view all available columns for Campaign performance report.
The campaign performance report does not include stats aggregated at a keyword level. Are you using AWQL to pull your report?
Can you paste your code here, I find it odd you are getting keyword level data.
Run this python example code to get campaign data (you should definitely not be getting keyword level data with this example code).
Firstly Google AdWords API only returns report data in the following file formats CSVFOREXCEL, CSV, TSV, XML, GZIPPED_CSV, GZIPPED_XML. Unfortunately JSON is not supported for your use case. I would recommend GZIPPED_CSV and set the following properties to false:
skipReportHeader
skipColumnHeader
skipReportSummary
This will simply skip all headers, report titles & totals from the report making is very simple to upsert data into a table.
It is not possible to enter a MCC ID and expect the API to fetch a report for all client accounts. Each API report request contains the client ID, so therefore you are required to create an array of all client IDs and then iterate through each id. If you are using the client library (recommended) then you can simply set the clientID within the session i.e. session.setClientCustomerId("xxx");
To automate this use the ManagedCustomerService to automatically retrieve all clientIDs then iterate through this therefore you would not need to hard code each ClientID. Google have created a handy python file which returns the account hierarchy including child account ID (click here).
Lastly I based on your question I assume you attempting to run an ETL process. Google have an opensource AdWords extractor which I highly recommend.

error in retrieving tables in unicode data using Azure/Python

I'm using Azure and the python SDK.
I'm using Azure's table service API for DB interaction.
I've created a table which contains data in unicode (hebrew for example). Creating tables and setting the data in unicode seems to work fine. I'm able to view the data in the database using Azure Storage Explorer and the data is correct.
The problem is when retrieving the data.. Whenever I retrieve specific row, data retrieval works fine for unicoded data:
table_service.get_entity("some_table", "partition_key", "row_key")
However, when trying to get a number of records using a filter, an encode exception is thrown for any row that has non-ascii chars in it:
tasks = table_service.query_entities('some_table', "PartitionKey eq 'partition_key'")
Is this a bug on the azure python SDK? Is there a way to set the encoding beforehand so that it won't crash? (azure doesn't give access to sys.setdefaultencoding and using DEFAULT_CHARSET on settings.py doesn't work as well)
I'm using https://www.windowsazure.com/en-us/develop/python/how-to-guides/table-service/ as reference to the table service API
Any idea would be greatly appreciated.
This looks like a bug in the Python library to me. I whipped up a quick fix and submitted a pull request on GitHub: https://github.com/WindowsAzure/azure-sdk-for-python/pull/59.
As a workaround for now, feel free to clone my repo (remembering to checkout the dev branch) and install it via pip install <path-to-repo>/src.
Caveat: I haven't tested my fix very thoroughly, so you may want to wait for the Microsoft folks to take a look at it.

Categories