I created a Python function for an API call so I longer have to do that in Power BI. It creates 5 XML files that are then combined into a single CSV-file. I would like the function to run on Google Cloud (correct me if this is not a good idea).
I don't think it' s possible to create XML files in the function (maybe it's possible to write to a bucket) but ideally I would like to skip the XML file creation and just go straight to creating the CSV.
Please find the code for generating the XML files and combining into CSV below:
offices = ['NL001', 'NL002', 'NL003', 'NL004', 'NL005']
#Voor elke office inloggen, office veranderen en een aparte xml maken
for office in offices:
xmlfilename = office+'.xml'
session.service.SelectCompany(office, _soapheaders={'Header': auth_header})
proces_url = cluster + r'/webservices/processxml.asmx?wsdl'
proces = Client(proces_url)
response = proces.service.ProcessXmlString(query.XML_String, _soapheaders={'Header': auth_header})
f = open(xmlfilename, 'w')
f.write(response)
f.close()
to csv
if os.path.exists('CombinedFinance.csv'):
os.remove('CombinedFinance.csv')
else:
print("The file does not exist")
xmlfiles = ['NL001.xml','NL002.xml','NL003.xml','NL004.xml','NL005.xml']
for xmlfile in xmlfiles:
with open(xmlfile, encoding='windows-1252') as xml_toparse:
tree = ET.parse(xml_toparse)
root = tree.getroot()
columns = [element.attrib['label'] for element in root[0]]
columns.append('?')
data = [[field.text for field in row] for row in root[1::]]
df = pd.DataFrame(data, columns=columns)
df = df.drop('?', axis=1)
df.to_csv('CombinedFinance.csv', mode='a', header=not os.path.exists('CombinedFinance.csv'))
Any ideas?
n.b. If i can improve my code please let me know, I'm just learning all of this
EDIT: In response to some comments, code now looks like this. When deploying to cloud I get the following error:
ERROR: (gcloud.functions.deploy) OperationError: code=13, message=Function deployment failed due to a health check failure. This usually indicates that your code was built successfully but failed during a test execution. Examine the logs to determine the cause. Try deploying again in a few minutes if it appears to be transient.
My requirements.txt looks like this:
zeep==3.4.0
pandas
Any ideas?
import pandas as pd
import xml.etree.ElementTree as ET
from zeep import Client
import query
import authentication
import os
sessionlogin = r'https://login.twinfield.com/webservices/session.asmx?wsdl'
login = Client(sessionlogin)
auth = login.service.Logon(authentication.username, authentication.password, authentication.organisation)
auth_header = auth['header']['Header']
cluster = auth['body']['cluster']
#Use cluster to create a session:
url_session = cluster + r'/webservices/session.asmx?wsdl'
session = Client(url_session)
#Select a company for the session:
offices = ['NL001', 'NL002', 'NL003', 'NL004', 'NL005']
#Voor elke office inloggen, office veranderen en een aparte xml maken
for office in offices:
session.service.SelectCompany(office, _soapheaders={'Header': auth_header})
proces_url = cluster + r'/webservices/processxml.asmx?wsdl'
proces = Client(proces_url)
response = proces.service.ProcessXmlString(query.XML_String, _soapheaders={'Header': auth_header})
treetje = ET.ElementTree(ET.fromstring(response))
root = treetje.getroot()
columns = [element.attrib['label'] for element in root[0]]
columns.append('?')
data = [[field.text for field in row] for row in root[1::]]
df = pd.DataFrame(data, columns=columns)
df = df.drop('?', axis=1)
df.to_csv('/tmp/CombinedFinance.csv', mode='a', header=not os.path.exists('/tmp/CombinedFinance.csv'))
A few things to consider about turning a regular Python script (what you have here) into a Cloud Function:
Cloud Functions respond to events -- either an HTTP request or some other background trigger. You should think about the question "what is going to trigger my function?"
HTTP functions take in a request that corresponds to the incoming request, and must return some sort of HTTP response
The only available part of the filesystem that you can write to is /tmp. You'll have to write all files there during the execution of your function
The filesystem is ephemeral. You can't expect files to stick around between invocations. Any file you create must either be stored elsewhere (like in a GCS bucket) or returned in the HTTP response (if it's an HTTP function)
A Cloud Function has a very specific signature that you'll need to wrap your existing business logic in:
def my_http_function(request):
# business logic here
...
return "This is the response", 200
def my_background_function(event, context):
# business logic here
...
# No return necessary
Related
I'm trying to build my first cloud function. Its a function that should get data from API, transform to DF and push to bigquery. I've set the cloud function up with a http trigger using validate_http as entry point. The problem is that it states the function is working but it doesnt actually write anything. Its a similiar problem as the problem discussed here: Passing data from http api to bigquery using google cloud function python
import pandas as pd
import json
import requests
from pandas.io import gbq
import pandas_gbq
import gcsfs
#function 1: Responding and validating any HTTP request
def validate_http(request):
request.json = request.get_json()
if request.args:
get_api_data()
return f'Data pull complete'
elif request_json:
get_api_data()
return f'Data pull complete'
else:
get_api_data()
return f'Data pull complete'
#function 2: Get data and transform
def get_api_data():
import pandas as pd
import requests
import json
#Setting up variables with tokens
base_url = "https://"
token= "&token="
token2= "&token="
fields = "&fields=date,id,shippingAddress,items"
date_filter = "&filter=date in '2022-01-22'"
data_limit = "&limit=99999999"
#Performing API call on request with variables
def main_requests(base_url,token,fields,date_filter,data_limit):
req = requests.get(base_url + token + fields +date_filter + data_limit)
return req.json()
#Making API Call and storing in data
data = main_requests(base_url,token,fields,date_filter,data_limit)
#transforming the data
df = pd.json_normalize(data['orders']).explode('items').reset_index(drop=True)
items = df['items'].agg(pd.Series)[['id','itemNumber','colorNumber', 'amount', 'size','quantity', 'quantityReturned']]
df = df.drop(columns=[ 'items', 'shippingAddress.id', 'shippingAddress.housenumber', 'shippingAddress.housenumberExtension', 'shippingAddress.address2','shippingAddress.name','shippingAddress.companyName','shippingAddress.street', 'shippingAddress.postalcode', 'shippingAddress.city', 'shippingAddress.county', 'shippingAddress.countryId', 'shippingAddress.email', 'shippingAddress.phone'])
df = df.rename(columns=
{'date' : 'Date',
'shippingAddress.countryIso' : 'Country',
'id' : 'order_id'})
df = pd.concat([df, items], axis=1, join='inner')
#Push data function
bq_load('Return_data_api', df)
#function 3: Convert to bigquery table
def bq_load(key, value):
project_name = '375215'
dataset_name = 'Returns'
table_name = key
value.to_gbq(destination_table='{}.{}'.format(dataset_name, table_name), project_id=project_name, if_exists='replace')
The problem is that the script doesnt write to bigquery and doesnt return any error. I know that the get_api_data() function is working since I tested it locally and does seem to be able to write to BigQuery. Using cloud functions I cant seem to trigger this function and make it write data to bigquery.
There are a couple of things wrong with the code that would set you right.
you have list data, so store as a csv file (in preference to json).
this would mean updating (and probably renaming) the JsonArrayStore class and its methods to work with CSV.
Once you have completed the above and written well formed csv, you can proceed to this:
reading the csv in the del_btn method would then look like this:
import python
class ToDoGUI(tk.Tk):
...
# methods
...
def del_btn(self):
a = JsonArrayStore('test1.csv')
# read to list
with open('test1.csv') as csvfile:
reader = csv.reader(csvfile)
data = list(reader)
print(data)
Good work, you have a lot to do, if you get stuck further please post again.
I am trying to pull twitter streaming data in cloud function and essentially export the stream data into big query.
Currently, i have this code. The Entry Point is set to stream_twitter.
main.txt:
import os
import tweepy
import pandas as pd
import datalab.bigquery as bq
from google.cloud import bigquery
import os
import tweepy
import pandas as pd
import datalab.bigquery as bq
from google.cloud import bigquery
#access key
api_key = os.environ['API_KEY']
secret_key = os.environ['SECRET_KEY']
bearer_token = os.environ['BEARER_TOKEN']
def stream_twitter(event, context):
#authentication
auth = tweepy.Client(bearer_token = bearer_token)
api = tweepy.API(auth)
#create Stream Listener
class Listener(tweepy.StreamingClient):
#save list to dataframe
tweets = []
def on_tweet(self, tweet):
if tweet.referenced_tweets == None: #Original tweet not reply or retweet
self.tweets.append(tweet)
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
stream_tweet = Listener(bearer_token)
#filtered Stream using rules
rule = tweepy.StreamRule("(covid OR covid19 OR coronavirus OR pandemic OR #covid19 OR #covid) lang:en")
stream_tweet.add_rules(rule, dry_run = True)
stream_tweet.filter(tweet_fields=["referenced_tweets"])
#insert into dataframe
columns = ["UserID", "Tweets"]
data = []
for tweet in stream_tweet.tweets:
data.append([tweet.id, tweet.text, ])
stream_df = pd.DataFrame(data, columns=columns)
## Insert time col - TimeStamp to give the time that data is pulled from API
stream_df.insert(0, 'TimeStamp', pd.to_datetime('now').replace(microsecond=0))
## Converting UTC Time to SGT(UTC+8hours)
stream_df.insert(1,'SGT_TimeStamp', '')
stream_df['SGT_TimeStamp'] = stream_df['TimeStamp'] + pd.Timedelta(hours=8)
## Define BQ dataset & table names
bigquery_dataset_name = 'streaming_dataset'
bigquery_table_name = 'streaming-table'
## Define BigQuery dataset & table
dataset = bq.Dataset(bigquery_dataset_name)
table = bq.Table(bigquery_dataset_name + '.' + bigquery_table_name)
if not table.exists():
# Create or overwrite the existing table if it exists
table_schema = bq.Schema.from_dataframe(stream_df)
table.create(schema = table_schema, overwrite = False)
# Write the DataFrame to a BigQuery table
table.insert_data(stream_df)
requirement.txt:
tweepy
pandas
google-cloud-bigquery
However, i keep getting a
"Deployment failure: Function deployment failed due to a health check failure. This usually indicates that your code was built successfully but failed during a test execution. Examine the logs to determine the cause. Try deploying again in a few minutes if it appears to be transient."
I can't seem to figure how to solve this error. Is there something wrong with my codes? Or is there something that i should have done? I test the streaming codes on Pycharm and was able to pull the data.
Would appreicate any help i can get. Thank you.
The logs to the function are this. (I am unfamiliar with Logs hence i shall include a screenshot.) Essentially, those were the 2 info and error i've been getting.
I managed to replicate your error message. All I did was add datalab==1.2.0 inside requirements.txt. Since you are importing the datalab library, you need to include the support package for it, which is the latest version of datalab.
Here's the reference that I used: Migrating from the datalab Python package.
See the requirements.txt file to view the versions of the libraries used for these code snippets.
Here's the screenshot of the logs:
I'm facing the following error while trying to persist log files to Azure Blob storage from Azure Batch execution - "FileUploadMiscError - A miscellaneous error was encountered while uploading one of the output files". This error doesn't give a lot of information as to what might be going wrong. I tried checking the Microsoft Documentation for this error code, but it doesn't mention this particular error code.
Below is the relevant code for adding the task to Azure Batch that I have ported from C# to Python for persisting the log files.
Note: The container that I have configured gets created when the task is added, but there's no blob inside.
import datetime
import logging
import os
import azure.storage.blob.models as blob_model
import yaml
from azure.batch import models
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.common.cloudstorageaccount import CloudStorageAccount
from dotenv import load_dotenv
LOG = logging.getLogger(__name__)
def add_tasks(batch_client, job_id, task_id, io_details, blob_details):
task_commands = "This is a placeholder. Actual code has an actual task. This gets completed successfully."
LOG.info("Configuring the blob storage details")
base_blob_service = BaseBlobService(
account_name=blob_details['account_name'],
account_key=blob_details['account_key'])
LOG.info("Base blob service created")
base_blob_service.create_container(
container_name=blob_details['container_name'], fail_on_exist=False)
LOG.info("Container present")
container_sas = base_blob_service.generate_container_shared_access_signature(
container_name=blob_details['container_name'],
permission=blob_model.ContainerPermissions(write=True),
expiry=datetime.datetime.now() + datetime.timedelta(days=1))
LOG.info(f"Container SAS created: {container_sas}")
container_url = base_blob_service.make_container_url(
container_name=blob_details['container_name'], sas_token=container_sas)
LOG.info(f"Container URL created: {container_url}")
# fpath = task_id + '/output.txt'
fpath = task_id
LOG.info(f"Creating output file object:")
out_files_list = list()
out_files = models.OutputFile(
file_pattern=r"../stderr.txt",
destination=models.OutputFileDestination(
container=models.OutputFileBlobContainerDestination(
container_url=container_url, path=fpath)),
upload_options=models.OutputFileUploadOptions(
upload_condition=models.OutputFileUploadCondition.task_completion))
out_files_list.append(out_files)
LOG.info(f"Output files: {out_files_list}")
LOG.info(f"Creating the task now: {task_id}")
task = models.TaskAddParameter(
id=task_id, command_line=task_commands, output_files=out_files_list)
batch_client.task.add(job_id=job_id, task=task)
LOG.info(f"Added task: {task_id}")
There is a bug in Batch's OutputFile handling which causes it to fail to upload to containers if the full container URL includes any query-string parameters other than the ones included in the SAS token. Unfortunately, the azure-storage-blob Python module includes an extra query string parameter when generating the URL via make_container_url.
This issue was just raised to us, and a fix will be released in the coming weeks, but an easy workaround is instead of using make_container_url to craft the URL, craft it yourself like so: container_url = 'https://{}/{}?{}'.format(blob_service.primary_endpoint, blob_details['container_name'], container_sas).
The resulting URL should look something like this: https://<account>.blob.core.windows.net/<container>?se=2019-01-12T01%3A34%3A05Z&sp=w&sv=2018-03-28&sr=c&sig=<sig> - specifically it shouldn't have restype=container in it (which is what the azure-storage-blob package is including)
This is a duplicate to this question:
How to convert suds object to xml
But the question has not been answered: "totxt" is not an attribute on the Client class.
Unfortunately I lack of reputation to add comments. So I ask again:
Is there a way to convert a suds object to its xml?
I ask this because I already have a system that consumes wsdl files and sends data to a webservice. But now the customers want to alternatively store the XML as files (to import them later manually). So all I need are 2 methods for writing data: One writes to a webservice (implemented and tested), the other (not implemented yet) writes to files.
If only I could make something like this:
xml_as_string = My_suds_object.to_xml()
The following code is just an example and does not run. And it's not elegant. Doesn't matter. I hope you get the idea what I want to achieve:
I have the function "write_customer_obj_webservice" that works. Now I want to write the function "write_customer_obj_xml_file".
import suds
def get_customer_obj():
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
customer = c.factory.create("ns0:CustomerType")
return customer
def write_customer_obj_webservice(customer):
wsdl_url = r'file:C:/somepathhere/Customer.wsdl'
service_url = r'http://someiphere/Customer'
c = suds.client.Client(wsdl_url, location=service_url)
response = c.service.save(someparameters, None, None, customer)
return response
def write_customer_obj_xml_file(customer):
output_filename = r'C\temp\testxml'
# The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
xml = customer.to_xml()
fo = open(output_filename, 'a')
try:
fo.write(xml)
except:
raise
else:
response = 'All ok'
finally:
fo.close()
return response
# Get the customer object always from the wsdl.
customer = get_customer_obj()
# Since customer is an object, setting it's attributes is very easy. There are very complex objects in this system.
customer.name = "Doe J."
customer.age = 42
# Write the new customer to a webservice or store it in a file for later proccessing
if later_processing:
response = write_customer_obj_xml_file(customer)
else:
response = write_customer_obj_webservice(customer)
I found a way that works for me. The trick is to create the Client with the option "nosend=True".
In the documentation it says:
nosend - Create the soap envelope but don't send. When specified, method invocation returns a RequestContext instead of sending it.
The RequestContext object has the attribute envelope. This is the XML as string.
Some pseudo code to illustrate:
c = suds.client.Client(url, nosend=True)
customer = c.factory.create("ns0:CustomerType")
customer.name = "Doe J."
customer.age = 42
response = c.service.save(someparameters, None, None, customer)
print response.envelope # This prints the XML string that would have been sent.
You have some issues in write_customer_obj_xml_file function:
Fix bad path:
output_filename = r'C:\temp\test.xml'
The following line is the problem. "to_xml" does not exist and I can't find a way to do it.
What's the type of customer? type(customer)?
xml = customer.to_xml() # to be continued...
Why mode='a'? ('a' => append, 'w' => create + write)
Use a with statement (file context manager).
with open(output_filename, 'w') as fo:
fo.write(xml)
Don't need to return a response string: use an exception manager. The exception to catch can be EnvironmentError.
Analyse
The following call:
customer = c.factory.create("ns0:CustomerType")
Construct a CustomerType on the fly, and return a CustomerType instance customer.
I think you can introspect your customer object, try the following:
vars(customer) # display the object attributes
help(customer) # display an extensive help about your instance
Another way is to try the WSDL URLs by hands, and see the XML results.
You may obtain the full description of your CustomerType object.
And then?
Then, with the attributes list, you can create your own XML. Use an XML template and fill it with the object attributes.
You may also found the magic function (to_xml) which do the job for you. But, not sure the XML format matches your need.
client = Client(url)
client.factory.create('somename')
# The last XML request by client
client.last_sent()
# The last XML response from Web Service
client.last_received()
Have anyone used the function importRows() from fusion table API?
As the API reference below,
https://developers.google.com/fusiontables/docs/v1/reference/table/importRows
I have to supply CSV data in the request body.
But what should I do for the html body exactly?
My code:
http = getAuthorizedHttp()
DISCOVERYURL = 'https://www.googleapis.com/discovery/v1/apis/{api}/{apiVersion}/rest'
ftable = build('fusiontables', 'v1', discoveryServiceUrl=DISCOVERYURL, http=http)
body = create_ft(CSVFILE,"title here") # the function to load csv file and create the table with columns from csv file.
result = ftable.table().insert(body=body).execute()
print result["tableId"] # good, I have got the id for new created table
# I have no idea how to go on here..
f = ftable.table().importRows(tableId=result["tableId"])
f.body = ?????????????
f.execute()
I finally fixed my problem, my code can be found in the following link.
https://github.com/childnotfound/parser/blob/master/uploader.py
I fixed the problem like this:
media = http.MediaFileUpload('example.csv', mimetype='application/octet-stream', resumable=True)
request = service.table().importRows(media_body=media, tableId='1cowubQ0vj_H9q3owo1vLM_gMyavvbuoNmRQaYiZV').execute()