I have a couple of Python apps in CloudFoundry. Now I would like to schedule their execution. For example a specific app has to be executed on the second day of each month.
I coudldn't find anything on the internet. Is that even possible?
Cloud Foundry will deploy your application inside a container. You could use libraries to execute your code on a specific schedule but either way you're paying to have that instance run the whole time.
What you're trying to do is a perfect candidate for "serverless computing" (also known as "event-driven" or "function as a service" computing.
These deployment technologies execute functions on response to a trigger e.g. a REST api call, a certain timestamp, a new database insert etc...
You could execute your python cloud foundry apps using the Openwhisk serverless compute platform.
IBM offer a hosted version of this running on their cloud platform, Bluemix.
I don't know what your code looks like so I'll use this sample hello world function:
import sys
def main(dict):
if 'message' in dict:
name = dict['message']
else:
name = 'stranger'
greeting = 'Hello ' + name + '!'
print(greeting)
return {'greeting':greeting}
You can upload your actions (functions) to OpenWhisk using either the online editor or the CLI.
Once you've uploaded your actions you can automate them on a specific schedule by using the Alarm Package. To do this in the online editor click "automate this process" and pick the alarm package.
To do this via the CLI we need to first create a trigger:
$ wsk trigger create regular_hello_world --feed /whisk.system/alarms/alarm -p cron '0 0 9 * * *'
ok: created trigger feed regular_hello_world
This will trigger every day at 9am. We then need to link this trigger to our action by creating a rule:
$ wsk rule create regular_hello_rule regular_hello_world hello_world
ok: created rule regular_hello_rule
For more info see the docs on creating python actions.
The CloudFoundry platform itself does not have a scheduler (at least not at this time) and the containers where you application runs do not have cron installed (unlikely to ever happen).
If you want to schedule code to periodically run, you have a few options.
You can deploy an application that includes a scheduler. The scheduler can run your code directly in that container or it can trigger the code to run elsewhere (ex: it sends an HTTP request to another application and that request triggers the code to run). If you trigger the code to run elsewhere, you can make the scheduler app run pretty lean (maybe with 64m of memory or less) to reduce costs.
You can look for a third party scheduler service. The availability of and cost of services like this will vary depending on your CF provider, but there are service offerings to handle scheduling. These typically function like the previous example where an HTTP request is sent to your app at a specific time and that triggers your scheduled code. Many service providers offer free tiers, which give you a small number of triggers per month at no cost.
If you have a server outside of CF with cron installed, you can use cron there to schedule the tasks and trigger the code to run on CF. You can do this like the previous examples by sending HTTP requests to your app, however, this option also gives you the possibility to make use of CloudFoundry's task feature.
CloudFoundry has the concept of a task, which is a one-time execution of some code. With it, you can execute the cf run-task command to trigger the task to run. Ex: cf run-task <app-name> "python my-task.py". More on that in the docs, here. The nice part about using tasks is that your provider will only bill you while the task is running.
To see if your provider has tasks available, run cf feature-flags and look to see if task_creation is set to enabled.
Hope that helps!
Related
I am trying to write a python program that will alert me when a VM is down. I know PowerShell might be better but would prefer python.
Why do you think it would be better with PowerShell :). Python rules ;)
If you want a more reactive programming, you should look at EventGrid + LogicApp + WebApp/Function first. It's like IFTTT for Azure, EventGrid will trigger an event, and LogicApp will be able to consume this event and send it to a WebApp or Function (that you can write in Python).
Example:
https://learn.microsoft.com/en-us/azure/event-grid/monitor-virtual-machine-changes-event-grid-logic-app
If you want a more "I pull every minute" experience, just use the azure-mgmt-compute package:
https://pypi.org/project/azure-mgmt-compute/
Basic sample:
https://github.com/Azure-Samples/virtual-machines-python-manage
You will need the instance view of the VM, to get the power state, with instance_view
Hope this helps!
(I work at MS in the Azure SDK for Python team)
EDIT:
It seems EventGrid does not support trigger from VM power state yet, you could still use LogicApp with a poll schedule for solution 1: https://learn.microsoft.com/en-us/azure/connectors/connectors-native-recurrence
I was using AWS and am new to GCP. One feature I used heavily was AWS Batch, which automatically creates a VM when the job is submitted and deletes the VM when the job is done. Is there a GCP counterpart? Based on my research, the closest is GCP Dataflow. The GCP Dataflow documentation led me to Apache Beam. But when I walk through the examples here (link), it feels totally different from AWS Batch.
Any suggestions on submitting jobs for batch processing in GCP? My requirement is to simply retrieve data from Google Cloud Storage, analyze the data using a Python script, and then put the result back to Google Cloud Storage. The process can take overnight and I don't want the VM to be idle when the job is finished but I'm sleeping.
You can do this using AI Platform Jobs which is now able to run arbitrary docker images:
gcloud ai-platform jobs submit training $JOB_NAME \
--scale-tier BASIC \
--region $REGION \
--master-image-uri gcr.io/$PROJECT_ID/some-image
You can define the master instance type and even additional worker instances if desired. They should consider creating a sibling product without the AI buzzword so people can find this functionality easier.
I recommend checking out dsub. It's an open-source tool initially developed by the Google Genomics teams for doing batch processing on Google Cloud.
UPDATE: I have now used this service and I think it's awesome.
As of July 13, 2022, GCP now has it's own new fully managed Batch processing service (GCP Batch), which seems very akin to AWS Batch.
See the GCP Blog post announcing it at: https://cloud.google.com/blog/products/compute/new-batch-service-processes-batch-jobs-on-google-cloud (with links to docs as well)
Officially, according to the "Map AWS services to Google Cloud Platform products" page, there is no direct equivalent but you can put a few things together that might get you to get close.
I wasn't sure if you were or had the option to run your python code in Docker. Then the Kubernetes controls might do the trick. From the GCP docs:
Note: Beginning with Kubernetes version 1.7, you can specify a minimum size of zero for your node pool. This allows your node pool to scale down completely if the instances within aren't required to run your workloads. However, while a node pool can scale to a zero size, the overall cluster size does not scale down to zero nodes (as at least one node is always required to run system Pods).
So, if you are running other managed instances anyway you can scale up or down to and from 0 but you have the Kubernetes node is still active and running the pods.
I'm guessing you are already using something like "Creating API Requests and Handling Responses" to get an ID you can verify that the process is started, instance created, and the payload is processing. You can use that same process to submit that the process completes as well. That takes care of the instance creation and launch of the python script.
You could use Cloud Pub/Sub. That can help you keep track of the state of that: can you modify your python to notify the completion of the task? When you create the task and launch the instance, you can also report that the python job is complete and then kick off an instance tear down process.
Another thing you can do to drop costs is to use Preemptible VM Instances so that the instances run at 1/2 cost and will run a maximum of 1 day anyway.
Hope that helps.
The Product that best suits your use-case in GCP is Cloud Task. We are using it for a similar use-case where we are retrieving files from another HTTP server and after some processing storing them in Google Cloud Storage.
This GCP documentation describes in full detail the steps to create tasks and using them.
You schedule your task programmatically in Cloud Tasks and you have to create task handlers(worker services) in the App Engine. Some limitation For worker services running in App Engine
the standard environment:
Automatic scaling: task processing must finish in 10 minutes.
Manual and basic scaling: requests can run up to 24 hours.
the flex environment: all types have a 60 minutes timeout.
I think the Cron job can help you in this regard and you can implement it with the help of App engine, Pub/sub and Compute engine. Reliable Task Scheduling on Google Compute Engine In distributed systems, such as a network of Google Compute Engine instances, it is challenging to reliably schedule tasks because any individual instance may become unavailable due to autoscaling or network partitioning.
Google App Engine provides a Cron service. Using this service for scheduling and Google Cloud Pub/Sub for distributed messaging, you can build an application to reliably schedule tasks across a fleet of Compute Engine instances.
For a detailed look you can check it here: https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine
I'm trying to define an architecture where multiple Python scripts need to be run in parallel and on demand. Imagine the following setup:
script requestors (web API) -> Service Bus queue -> script execution -> result posted back to script requestor
To this end, the script requestor places a script request message on the queue, together with an API endpoint where the result should be posted back to. The script request message also contains the input for the script to be run.
The Service Bus queue decouples producers and consumers. A generic set of "workers" simply look for new messages on the queue, take the input message and call a Python script with said input. Then they post back the result to the API endpoint. But what strategies could I use to "run the Python scripts"?
One possible strategy could be to use Webjobs. Webjobs can execute Python scripts and run on a schedule. Let's say that you run a Webjob every 5 minutes, the Python script can pool the queue, do some processing and post the results back to you API.
Per my exprience, I think there are two strategies below you could use.
Developing a Python script for Azure HDInsight. Azure HDInsight as a platform based on Hadoop that has the power of parallel compute, you can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-streaming-python/ to know it.
Develoging a Python script based on the parallel compute framework like dispy or jug running Azure VMs.
Hope it helps. Best Regards.
I wrote a Python script that will pull data from a 3rd party API and push it into a SQL table I set up in AWS RDS. I want to automate this script so that it runs every night (e.g., the script will only take about a minute to run). I need to find a good place and way to set up this script so that it runs each night.
I could set up an EC2 instance, and a cron job on that instance, and run it from there, but it seems expensive to keep an EC2 instance alive all day for only 1 minute of run-time per night. Would AWS data pipeline work for this purpose? Are there other better alternatives?
(I've seen similar topics discussed when googling around but haven't seen recent answers.)
Thanks
Based on your case, I think you can try to use shellCommandActivity in data pipeline. It will launch a ec2 instance and execute the command you give to data pipeline on your schedule. After finishing the task, pipeline will terminate ec2 instance.
Here is doc:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-shellcommandactivity.html
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-ec2resource.html
Alternatively, you could use a 3rd-party service like Crono. Crono is a simple REST API to manage time-based jobs programmatically.
I want to schedule an email to be sent to a user upon a specific action.
However, if the user takes another action I want to cancel that email and have it not send.
How would I do that in django or python?
Beanstalkd
If you can install beanstalkd and run python script from command line I would use that to schedule emails. With beanstalkc client you can easily accomplish this. On ubuntu you might first need to install:
sudo apt-get install python-yaml python-setuptools
consumer.py:
import beanstalkc
def main():
beanstalk = beanstalkc.Connection(host='localhost', port=11300)
while True:
job = beanstalk.reserve()
print job.body
job.delete()
if __name__ == '__main__':
main()
Will print job 5 seconds after it get's inserted by producer.py. Offcourse this should be set longer to when you want to schedule your emails, but for demonstration purposes it will do. You don't want to wait half an hour to schedule message when testing ;).
producer.py:
import beanstalkc
def main():
beanstalk = beanstalkc.Connection(host='localhost', port=11300)
jid = beanstalk.put('foo', delay=5)
if __name__ == '__main__':
main()
GAE Task Queue
You could also use Google App engine task queue to accomplish this. You can specify an eta for your Task. Google App engine has a generous free quota. In the task queue webhook make Asynchronous Requests to fetch URL on your server which does the sending of emails.
I would set up a cron job which could handle everything you want to do...
If you didn't have access to cron, you could easily do this:
Write a model that stores the email, the time to send, and a BooleanField indicating if the email has been sent.
Write a view which selects all emails that haven't been sent yet but should have by now, and sends them.
Use something like OpenACS Uptime, Pingdom or any other service capable of sending HTTP GET requests periodically to call that view, and trigger the email sending. (Both are free, the former should request once every 15 minutes, and the latter can be configured to request up to every minute, and will do so from several locations.)
Sure, it's inelegant, but it's a method that works on basically any web host. I used to do something like this when I was writing PHP apps to run on a host that killed all processes after something like 15 seconds.
Are your using celery ? If true, see http://ask.github.com/celery/userguide/executing.html#eta-and-countdown
You said that you want to do it through Python or Django, but it seems as though something else will need to be involved. Considering you are on a shared host, there is a chance installing other packages could also be a problem.
Another possible solution could be something like this:
Use a javascript framework which can setup timed events, start/cancel them etc. I have done timed events using a framework called ExtJS. Although ExtJS is rather large im sure other frameworks such as jQuery or even raw javascript you could do a similar thing.
Set up a task on a user action, that will execute in 5 minutes. The action could be an ajax call to a python script which sends the email... If a user does something where the task needs to be stopped, just cancel the event.
It kind of seems complicated and convoluted, but it really isn't. If this seems like a path you would like to try out, let me know and I'll edit with some code