I look for a possibility to create pseudo-cronjobs as I cannot use the real jobs on UNIX.
Since Python scripts can run for an unlimited period, I thought Python would be a great solution.
On Google App Engine you can set up Python scripts and it's free. So I should use the App Engine.
The App Engine allows 160,000 external URL accesses (right?) so you should have 160000/31/24/60 = 3,6 accesses per minute.
So my script would be:
import time
import urllib
while time.clock() < 86400:
# execute pseudo-cronjob file and then wait 60 seconds
content = urllib.urlopen('http://www.example.org/cronjob_file.php').read()
time.sleep(60)
Unfortunately, I have no possibility to test the script, so my questions are:
1) Do you think this would work?
2) Is it allowed (Google TOS) to use the service for such an activity?
3) Is my calculation for the URL accesses per minute right?
Thanks in advance!
Maybe I'm misunderstanding you, but the cron config files will let you do this (without Python).
You can add something like this to you cron.yaml file:
cron:
- description: job that runs every minute
url: /cronjobs/job1
schedule: every minute
See Google's documentation for more info on scheduling.
Google has some limits on how long a task can run.
URLFetch calls made in the SDK now have a 5 second timeout, here
They allow you to schedule up to 20 cron tasks in any given day. Here
Duplicate, see cron jobs on google appengine
Cron jobs are now officaly supported on GAE:
http://code.google.com/appengine/docs/python/config/cron.html
You may want to clarify which way around you want to do it
Do you want to use appengine to RUN the job? Ie, the job runs on google's server?
or
Do you want to use your OWN code on your server, and trigger it by using google app engine?
If it's the former: google does cron now. Use that :)
If it's the latter: you could use google's cron to trigger your own, even if it's indirectly (ie, google-cron calls google-app-engine which calls your-app).
If you can, spin up a thread to do the job, so your page returns immediatly. Dont forgot: if you call http://whatever/mypage.php, and your browser dies (or in this case, google kills your process for running too long), the php script usually still runs to the end - the output just goes no where.
Failing that, try to spin up a thread (not sure if you can do that in PHP tho - I'm a C# guy new to PHP)
And if all else fails: get a better webhost! I pay $6/month or so for dreamhost.com, and I can run cron jobs on their servers - it's included. They do PHP, Rails et al. You could even ping me for a discount code :) (view profile for website etc)
Do what Nic Wise said or also outsource the cronjob using a service like www.guardiano.pm so you can actually call www.yoursite.com/myjob.php and every time you call that url something you want will be executed.
Ps is free
Pss is my pet project and is in beta
Related
I am working on a web scraping project using python and an API
I want the python script to be ran everyday for 5 days for 12 hours as a job
I don't want to keep my system alive to either do it in CMD or in Jupyter so I was looking for a solution wherein any cloud service would help me automate the process
One way to do this would be to write a web scraper in Python, and run it on an AWS Lambda, which is essentially a serverless function with no underlying ops to manage. Depending on your use case, you could either perform some action based on the contents of that page data, or you could write the result out to S3 as a file.
To have your function execute in a recurring fashion, you can then set your AWS Lambda event trigger to be a CloudWatch event (in this case, some recurring timer at whatever frequencies/times you'd like, such as once each hour for a 12 hour window during Mon-Fri).
This is typically going to be an easier approach when compared to spinning up a virtual server (EC2 instance), and managing a persistent process that could error during waits/operation for any number of reasons.
I have an application written in Web2Py that contains some modules. I need to call some functions out of a module on a periodic basis, say once daily. I have been trying to get a scheduler working for that purpose but am not sure how to get it working properly. I have referred to this and this to get started.
I have got a scheduler.py class in the models directory, which contains code like this:
from gluon.scheduler import Scheduler
from Module1 import Module1
def daily_task():
module1 = Module1()
module1.action1(arg1, arg2, arg3)
daily_task_scheduler = Scheduler(db, tasks=dict(my_daily_task=daily_task))
In default.py I have following code for the scheduler:
def daily_periodic_task():
daily_task_scheduler.queue_task('daily_running_task', repeats=0, period=60)
[for testing I am running it after 60 seconds, otherwise for daily I plan to use period=86400]
In my Module1.py class, I have this kind of code:
def action1(self, arg1, arg2, arg3):
for row in db().select(db.table1.ALL):
row.processed = 'processed'
row.update_record()
One of the issues I am facing is that I don't understand clearly how to make this scheduler work to automatically handle the execution of action1 on daily basis.
When I launch my application using syntax similar to: python web2py.py -K my_app it shows this in the console:
web2py Web Framework
Created by Massimo Di Pierro, Copyright 2007-2015
Version 2.11.2-stable+timestamp.2015.05.30.16.33.24
Database drivers available: sqlite3, imaplib, pyodbc, pymysql, pg8000
starting single-scheduler for "my_app"...
However, when I see the browser at:
http://127.0.0.1:8000/my_app/default/daily_periodic_task
I just see "None" as text displayed on the screen and I don't see any changes produced by the scheduled task in my database table.
While when I see the browser at:
http://127.0.0.1:8000/my_app/default/index
I get an error stating This web page is not available, basically indicating my application never got started.
When I start my application normally using python web2py.py my application loads fine but I don't see any changes produced by the scheduled task in my database table.
I am unable to figure out what I am doing wrong here and how to properly use the scheduler with Web2Py. Basically, I need to know how can I start my application normally alongwith the scheduled tasks properly running in background.
Any help in this regard would be highly appreciated.
Running python web2py.py starts the built-in web server, enabling web2py to respond to HTTP requests (i.e., serving web pages to a browser). This has nothing to do with the scheduler and will not result in any scheduled tasks being run.
To run scheduled tasks, you must start one or more background workers via:
python web2py.py -K myapp
The above does not start the built-in web server and therefore does not enable you to visit web pages. It simply starts a worker process that will be available to execute scheduled tasks.
Also, note that the above does not actually result in any tasks being scheduled. To schedule a task, you must insert a record in the db.scheduler_task table, which you can do via any of the usual methods of inserting records (including using appadmin) or programmatically via the scheduler.queue_task method (which is what you use in your daily_periodic_task action).
Note, you can simultaneously start the built-in web server and a scheduler worker process via:
python web2py.py -a yourpassword -K myapp -X
So, to schedule a daily task and have it actually executed, you need to (a) start a scheduler worker and (b) schedule the task. You can schedule the task by visiting your daily_periodic_task action, but note that you only need to visit that action once, as once the task has been scheduled, it remains in effect indefinitely (given that you have set repeats=0).
If the task does not appear to be working, it is possible there is something wrong with the task itself that is resulting in an error.
I'm trying to use Google App Engine (Python) to make a simple web app. I want to maintain one number x in the datastore that models a random walk. I need a script running 24 hours a day that, every second, randomly chooses to either increment or decrement x (saving the change to the datastore). Users should be able to go to a url to see the current value of x.
I've thought of two ways to accomplish the constant script issue:
1) I can have an admin-access page that runs a continuous loop in javascript which, each second, makes an AJAX request to the server to update x. If I leave this page open on my computer 24 hours a day, this should work. The problem with this approach is that if my computer crashes then the script dies with it.
2) I can use a CRON job. But the interval between jobs cannot be smaller than 1 minute, so this doesn't really work.
It seems like there should be a simple way to just run a script constantly (that exists only server side) with Google App Engine.
I appreciate any advice. Thanks for your time!
Start a backend instance using Modules (either programmatically or by hitting a special URL accessible to admins only). Run the script for as long as the instance lives.
Note that an instance can die, just like your computer can crash. For this reason, you are probably better off with a Google Compute Engine instance (choose the smallest) than with an App Engine instance. Note that the Compute Engine instance will be many times cheaper.
Compute Engine instances can also fail, though it is much less likely. There are ways to create a fail-over implementation (when one instance is creating your random numbers while the other instance - which can run on some other platform - waits for the first one to fail), but this will obviously cost more.
To work a bit on my Python, I decided to try to code a simple script for my private use which monitors sites with offers and sends you an email whenever a new offer which you are interested in pops out. I guess I could handle the coding part (extracting the newest one from HTML and such) but I've never really run online any script which requires being fired every N minutes or so. What kind of hosting/server do I need to make my script run independently of my computer and refresh every, say, 5 minutes sending me an email when there's an update?
If you have shell access, you an use crontab to schedule a recurring job.
Otherwise you can use a service like SetCronJob or EasyCron or similar to invoke a script regularly.
Some hosters also provide similar functionalities in their administration interface...
This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.
So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.