How to schedule longer python scripts in GCP without cloud functions - python

I have a script that downloads larger amounts of data from an API. The script takes around two hours to run. I would like to run the script on GCP and schedule it to run once a week on Sundays, so that we have the newest data in our SQL database (also on GCP) by the next day.
I am aware of cronjobs, but would not like to run an entire server just for this single script. I have taken a look at cloud functions and cloud scheduler, but because the script takes so long to execute I cannot run it on cloud functions as the maximum execution time is 9 minutes (from here). Is there any other way how I could schedule the python script to run?
Thank you in advance!

For running a script more than 1h, you need to use a Compute Engine. (Cloud Run can live only 1h).
However, you can use Cloud Scheduler. Here how to do
Create a cloud scheduler with the frequency that you want
On this scheduler, use the Compute Engine Start API
In the advanced part, select a service account (create one or reuse one) which have the right to start a VM instance
Select OAuth token as authentication mode (not OIDC)
Create a compute engine (that you will start with the Cloud Scheduler)
Add a startup script that trigger your long job
At the end on the script, add a line to shutdown the VM (with Gcloud for example)
Note: the startup script is run as ROOT user. Take care of the default home directory and the permission of the created files.

Related

How to setup a webhook in Google Cloud that make the code runs in automation way

I have connection between Google Sheets and python script that I read cells from a column then read these cells's data and then run the code to do its job.
So basically, what I want to do is setting up a webhook or anything that Make the code runs always and catching a new data in the google sheet.
I made some searches I got 3 Apps can do this but I don't know which one is suitable
1- Cloud Run
2- Cloud Build API
3-Cloud Deploy
and how to setup the the webhook, pretty appreciate
Cloud Scheduler is a fully managed cron job scheduler. It allows you to schedule virtually any job. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all your automation tasks from one place.

Deploying Python script daily on Azure

I have a Python script that pulls some data from an Azure Data Lake cluster, performs some simple compute, then stores it into a SQL Server DB on Azure. The whole shebang runs in about 20 seconds. It needs sqlalchemy, pandas, and some Azure data libraries. I need to run this script daily. We also have a Service Fabric cluster available to use.
What are my best options? I thought of containerizing it with Docker and making it into an http triggered API, but then how do I trigger it 1x per day? I'm not good with Azure or microservices design so this is where I need the help.
You can use Web Jobs in App Service. It has two types of Azure Web Jobs for you to choose: Continuous and Trigger. As I see you need the type Trigger
You could refer to the document here for more details.In addition, here shows how to run tasks in WebJobs.
Also, you can use Azure function timer-based on python which was made generally available in recent months.

What's the best way to run a python script daily?

I have a python script that connects to Redshift, executes a series of SQL commands, and generates a new derived table.
But for the life of me, I can't figure out a way to have it automatically run every day.
I've tried AWS Data Pipeline but my shell script won't run the first copy statement.
I can't get Lambda or Glue to work because my company's IAM policies are restrictive.
Airflow seems like overkill to just run a single python script daily.
Any suggestions for services to look into?
Cron job?
00 12 * * * /home/scottie/bin/my_python_script.py
Run my_python_script.py at the top of the hour (0th minute), at noon, every day.
use a cron job on an ec2 instance or set up a scheduled event to invoke your aws python lambda function http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html
I use a scheduled task on Windows. Either enter it using the GUI or the at command.
If you are using AWS Glue or have some other reason to install a development endpoint, you can use Apache Zeppelin to run any code from any language (if you have the jar files) on a schedule based on a cron command. Here's the notebook I use to run Redshift nightly maintenance:
Redshift Maintenance in a Zeppelin notebook

Azure Web-App Maximum Execution Time Issue

I am developing a python django app for my project. Due the nature of one of my apps, I need to run a certain script for a long period of time (maybe several hours).
Obviously everything is fine in my local environment. However, when I publish that app to the azure, the app crashes after a period of time due to max. execution time (It is not giving some error related to max. execution instead it throws an internal server error)
At this point I have 2 questions:
Is it possible to increase max. execution time for a python web-app in azure? If yes, how can I do that?
Should I be using some other azure service rather than a web-app for such an operation?
Thank you.
You can try to leverage WebJobs to run your certain scripts or programs for on demand, continuously, or on a schedule tasks on background.
And at the same time, as web apps are unloaded if they are idle for some period of time
by default. This lets the system conserve resources. In Basic or Standard mode, you can enable Always On to keep the app loaded all the time. Especially if you need to run continuous or a long time job or task, you should enable Always On.
You can modify this setting in manage portal, refer to Configure web apps in Azure App Service for details.

Start and stop GCE instances using bash or python script

I have GCE instance setup and already being used. With some services setup and running. I need to be able to stop it and start it with bash or python scripts in a cron job as I won't it to be running only at specific times and days. Is this possible? Also would be nice if I could make a snapshot and restore from it.
You use command line (gcloud tool) or Google Compute API to start or stop the instances. You can implement any of the above method in your script.
Moreover, you can take a look at Preemptible instances which are recently announced. These instances runs on a periodic basis and are very suitable for jobs like batch processing.

Categories