An unknown error has occurred in Cloud Function: GCP Python - python

Log Viewer
Unknown Error Image
I am running into an Unknown Error while executing a Cloud Function in GCP (Python).
Steps:
Running Cloud Function to retrieve the data from BigQuery DataStore and save the CSV file in GCP Storage.
Running Cloud Function to retrieve the data from BigQuery DataStore and save the CSV file in GCP Storage.
Running Cloud Function to retrieve the data from BigQuery DataStore and save the CSV file in GCP Storage.
It is executing successfully and files are stored in Storage. If you view the Logs it is showing Finished with Status Code 200 (attached is the log view image), which is success code.
However, in the end we are getting Unknown Error with some tracking number as per the attached screen shot.
Have anyone seen this earlier and suggestions for resolution.

Based on my follow up with Google Support, it seems this is related to Cloud Console itself.
The error message which we are experiencing is related to Cloud Function's Tester UI timing out. Currently it is set to 1 minute maximum even when Cloud Function itself has a timeout window different (between 1 min to 9mins maximum). So if we are using the CF UI Testing (Test Function option in CF), it will time out in 1 min, even though CF will successfully execute (Success Code 200 in view log)
As per the Google Support, CF Product team is working on delivering a more descriptive message (for 1 min UT Testing timeout) instead of this error. Also they are not sure if CF’s Product Team is going to set the CF’s testing UI timeout same as the CF timeout. No ETA yet.
So we will running our CF differently and not using CF UI Console for testing.

Related

Query from a BigQuery database via a Google Cloud Function (Python)

I have a big Query Database connected to a Google Sheet in which I have a read only access
My request is that I want to get data from a table and this request is working perfectly fine in the Big Query editor but I want to create a Google Cloud function to have an API and access this request directly from URL
I have ceated a Service Account using this command:
gcloud iam service-accounts create connect-to-bigquery
gcloud projects add-iam-policy-binding test-24s --member="serviceAccount:connect-to-bigquery#test-24s.iam.gserviceaccount.com" --role="roles/owner"
and I have created a Google cloud function as follow :
Creating Cloud Function
Service account settings
Here is my code for main.py file :
from google.cloud import bigquery
def hello_world(request):
client = bigquery.Client()
query = "SELECT order_id, MAX(status) AS last_status FROM `test-24s.Dataset.Order_Status` GROUP BY order_id"
query_job = client.query(query)
print("The query data:")
for row in query_job:
print("name={}, count ={}".format(row[0], row["last_status"]))
return f'The query run successfully'
And for the requirements.txt file :
# Function dependencies, for example:
# package>=version
google-cloud-bigquery
The function deploys successfully however when I try to test it I get this error :
Error: function terminated. Recommended action: inspect logs for termination reason. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging Details:
500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
And when reading the log file I found this error
403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
Please help me to solve this I already tried all the solutions that I found on the net without any success
Based on this: "Permission denied while getting Drive credentials" - I would say that your service account's IAM permissions are not 'transient' => while that service account probably has relevant access to the BigQuery, it does not have access to the underlined spreadsheet maintained on the Drive...
I would try - either
extend the scope of the service account's credentials (if possible, but that may not be very straightforward). Here is an article by Zaar Hai with some details - Google Auth — Dispelling the Magic and a comment from Guillaume - "Yes, my experience isn't the same";
or (preferably from my point of view)
make a copy (may be with regular updates) of the original spreadsheet based table as a native BigQuery table, and use the later in your cloud function. A side effect of this approach - a significant performance improvement (and cost savings).

Processing the data from salesforce to GCP BigQuery by using python script it's throwing the Time out error

When cloud-function-1 is triggered, SalesForce data will be stored in GCP_Bucket-1 and Cloud-function-2 is triggered, The data should be stored in GCP BigQuery Sql database. Here the Python script is working fine. But the problem is, It is processing only few records and throwing the time out error. Can anyone please suggest me solution for this one.
As stated in Cloud Run documentation: https://cloud.google.com/functions/docs/concepts/exec#timeout
By default, a function times out after 1 minute, but you can extend this period up to 9 minutes.
So try to increase the timeout. In the link above you have the details how you can do it.
If 9 minutes are not enough, you can use Cloud Run that has a timeout of 1h. But keep in mind that Cloud Run needs a little more configuration than Cloud Function.

Long running python process with Google Cloud Functions

I'm trying to run a python script in Google Cloud which will download 50GB of data once a day to a storage bucket. That download might take longer than the timeout limit on the Google Cloud Functions which is set to 9 minutes.
The request to invoke the python function is triggered by HTTP.
Is there a way around this problem ? I don't need to run a HTTP Restful service as this is called once a day from an external source. (Can't be scheduled) .
The whole premise is do download the big chuck of data directly to the cloud.
Thanks for any suggestions.
9 minutes is a hard limit for Cloud Functions that can't be exceeded. If you can't split up your work into smaller units, one for each function invocation, consider using a different product. Cloud Run limits to 15 minutes, and Compute Engine has no limit that would apply to you.
Google Cloud Scheduler may work well for that.
Here is a nice google blog post that shows example of how to set up a python script.
p.s. you would probably want to connect it to the App Engine for the actual execution.

AWS Batch analog in GCP?

I was using AWS and am new to GCP. One feature I used heavily was AWS Batch, which automatically creates a VM when the job is submitted and deletes the VM when the job is done. Is there a GCP counterpart? Based on my research, the closest is GCP Dataflow. The GCP Dataflow documentation led me to Apache Beam. But when I walk through the examples here (link), it feels totally different from AWS Batch.
Any suggestions on submitting jobs for batch processing in GCP? My requirement is to simply retrieve data from Google Cloud Storage, analyze the data using a Python script, and then put the result back to Google Cloud Storage. The process can take overnight and I don't want the VM to be idle when the job is finished but I'm sleeping.
You can do this using AI Platform Jobs which is now able to run arbitrary docker images:
gcloud ai-platform jobs submit training $JOB_NAME \
--scale-tier BASIC \
--region $REGION \
--master-image-uri gcr.io/$PROJECT_ID/some-image
You can define the master instance type and even additional worker instances if desired. They should consider creating a sibling product without the AI buzzword so people can find this functionality easier.
I recommend checking out dsub. It's an open-source tool initially developed by the Google Genomics teams for doing batch processing on Google Cloud.
UPDATE: I have now used this service and I think it's awesome.
As of July 13, 2022, GCP now has it's own new fully managed Batch processing service (GCP Batch), which seems very akin to AWS Batch.
See the GCP Blog post announcing it at: https://cloud.google.com/blog/products/compute/new-batch-service-processes-batch-jobs-on-google-cloud (with links to docs as well)
Officially, according to the "Map AWS services to Google Cloud Platform products" page, there is no direct equivalent but you can put a few things together that might get you to get close.
I wasn't sure if you were or had the option to run your python code in Docker. Then the Kubernetes controls might do the trick. From the GCP docs:
Note: Beginning with Kubernetes version 1.7, you can specify a minimum size of zero for your node pool. This allows your node pool to scale down completely if the instances within aren't required to run your workloads. However, while a node pool can scale to a zero size, the overall cluster size does not scale down to zero nodes (as at least one node is always required to run system Pods).
So, if you are running other managed instances anyway you can scale up or down to and from 0 but you have the Kubernetes node is still active and running the pods.
I'm guessing you are already using something like "Creating API Requests and Handling Responses" to get an ID you can verify that the process is started, instance created, and the payload is processing. You can use that same process to submit that the process completes as well. That takes care of the instance creation and launch of the python script.
You could use Cloud Pub/Sub. That can help you keep track of the state of that: can you modify your python to notify the completion of the task? When you create the task and launch the instance, you can also report that the python job is complete and then kick off an instance tear down process.
Another thing you can do to drop costs is to use Preemptible VM Instances so that the instances run at 1/2 cost and will run a maximum of 1 day anyway.
Hope that helps.
The Product that best suits your use-case in GCP is Cloud Task. We are using it for a similar use-case where we are retrieving files from another HTTP server and after some processing storing them in Google Cloud Storage.
This GCP documentation describes in full detail the steps to create tasks and using them.
You schedule your task programmatically in Cloud Tasks and you have to create task handlers(worker services) in the App Engine. Some limitation For worker services running in App Engine
the standard environment:
Automatic scaling: task processing must finish in 10 minutes.
Manual and basic scaling: requests can run up to 24 hours.
the flex environment: all types have a 60 minutes timeout.
I think the Cron job can help you in this regard and you can implement it with the help of App engine, Pub/sub and Compute engine. Reliable Task Scheduling on Google Compute Engine In distributed systems, such as a network of Google Compute Engine instances, it is challenging to reliably schedule tasks because any individual instance may become unavailable due to autoscaling or network partitioning.
Google App Engine provides a Cron service. Using this service for scheduling and Google Cloud Pub/Sub for distributed messaging, you can build an application to reliably schedule tasks across a fleet of Compute Engine instances.
For a detailed look you can check it here: https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine

Google Stackdriver does not show trace

Previously when an error occurred in my application I could find a trace of the entire code to where it happened ( file, line number ). In the Google Cloud console.
Right now I only receive a request ID and a timestamp, with no indication of a trace or line number in the code when in the 'logging' window in the Google Cloud Console. Selecting a 'log event' only shows some sort of JSON structure of a request, but not anything about the code or any helpful information what went wrong with the application.
What option should be selected in the google cloud console to show a stack trace for Python App Engine applications?
Google has in the mean time update the cloud console and debugger, which now does contain full stack traces for Python.

Categories