I have a python script that connects to Redshift, executes a series of SQL commands, and generates a new derived table.
But for the life of me, I can't figure out a way to have it automatically run every day.
I've tried AWS Data Pipeline but my shell script won't run the first copy statement.
I can't get Lambda or Glue to work because my company's IAM policies are restrictive.
Airflow seems like overkill to just run a single python script daily.
Any suggestions for services to look into?
Cron job?
00 12 * * * /home/scottie/bin/my_python_script.py
Run my_python_script.py at the top of the hour (0th minute), at noon, every day.
use a cron job on an ec2 instance or set up a scheduled event to invoke your aws python lambda function http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html
I use a scheduled task on Windows. Either enter it using the GUI or the at command.
If you are using AWS Glue or have some other reason to install a development endpoint, you can use Apache Zeppelin to run any code from any language (if you have the jar files) on a schedule based on a cron command. Here's the notebook I use to run Redshift nightly maintenance:
Redshift Maintenance in a Zeppelin notebook
Related
I have a script that downloads larger amounts of data from an API. The script takes around two hours to run. I would like to run the script on GCP and schedule it to run once a week on Sundays, so that we have the newest data in our SQL database (also on GCP) by the next day.
I am aware of cronjobs, but would not like to run an entire server just for this single script. I have taken a look at cloud functions and cloud scheduler, but because the script takes so long to execute I cannot run it on cloud functions as the maximum execution time is 9 minutes (from here). Is there any other way how I could schedule the python script to run?
Thank you in advance!
For running a script more than 1h, you need to use a Compute Engine. (Cloud Run can live only 1h).
However, you can use Cloud Scheduler. Here how to do
Create a cloud scheduler with the frequency that you want
On this scheduler, use the Compute Engine Start API
In the advanced part, select a service account (create one or reuse one) which have the right to start a VM instance
Select OAuth token as authentication mode (not OIDC)
Create a compute engine (that you will start with the Cloud Scheduler)
Add a startup script that trigger your long job
At the end on the script, add a line to shutdown the VM (with Gcloud for example)
Note: the startup script is run as ROOT user. Take care of the default home directory and the permission of the created files.
I'm fairly new to both Python and AWS, so I'm trying to get some advice on how to best approach this problem.
I have a Python script that I run locally and it targets a production AWS environment. The script will show me certain errors. I have a read-only account to be able to run this script.
I'd like to be able to automate this so it runs the script maybe hourly and sends an email with the output.
After some research, I thought maybe a Lambda function would work. However, I need to be able to run the script from an AWS environment separate from the one I'm targeting. The reason is I don't have (or want) to add or change anything in the production environment. However, I do have access to a separate environment.
Is Lambda even the best way? If not, what is the most efficient way to achieve this?
To run the job hourly, you can create a CloudWatch Events Rule with a schedule (cron expression) and add the Lambda function as the target.
This lambda function may execute the python script in concern.
If from the Python script, you are invoking some AWS API actions on the resources of your production account, you would need to allow cross-account access. You can find more details around that here: Cross account role for an AWS Lambda function
Say I have a file "main.py" and I just want it to run at 10 minute intervals, but not on my computer. The only external libraries the file uses are mysql.connector and pip requests.
Things I've tried:
PythonAnywhere - free tier is too limiting (need to connect to external DB)
AWS Lambda - Only supports up to Python 2.7, converted my code but still had issues
Google Cloud Platform + Heroku - can only find tutorials covering deploying applications, I think these could do what I'm looking for but I can't figure out how.
Thanks!
I'd start by taking a look at this question/answer that I asked previously on unix.stackexchange - I went with an AWS redhat installation and it was free to use.
Once you've decided on your VM, you can add SSH onto your server using any SSH client and upload your Python script. A personal preference is this application.
If you need to update the Python version on the server, you can do this by installing the required Python RPMs. A quick google should return the yum [or whichever RPM management system you're using] repository for the required RPMs.
Once you've installed the version of Python that you need, I'd suggest looking into the 'crontab' which can be used to schedule jobs. You can set a cronjob to run every 10minutes which will call your script.
See this site for more information on how to use the crontab
This sounds like a perfect use case for AWS Lambda which supports Python. You can invoke your Lambda on a schedule using Scheduled Events.
I see that you tried Lambda and it didn't work out for you which is too bad as that seems like the easiest route. You could also launch an EC2 instance and use userdata to schedule a cron when the instance starts.
Another option would be an Elastic Beanstalk worker with a cron.yml that defines your schedule. Elastic Beanstalk supports Python 3.4.
Update: AWS does now support Python 3.6. Just select Python 3.6 from the runtime environments when configuring.
How to create shedule on OpenShift hosting to run python script that parses RSS feeds and will send filtered information to my email? It feature is available? Please help, who works with free version of this hosting. I have script that works fine. But i dont know how to run it every 10 min to catch freelance jobs. Or anyone does know free hosting with python that can create shedule for scripts.
You are looking for the add-on cartridge that is called cron. However, by default the cron cartridge only supports jobs that run every minute or every hour. You would have to write a job that runs minutely to determine if its a 10 minute interval and then execute your script.
Make sense?
rhc cartridge add cron -a yourAppName
Then you will have a cron directory in application directory under .openshift for placing the cron job.
You could so something like this here but setup for 10 minutes instead of 5: https://github.com/openshift-quickstart/openshift-cacti-quickstart/blob/master/.openshift/cron/minutely/cactipoll
I have a python web app that essentially allows 2 computers to talk with one another. If a session ends abruptly the record is still stored in pymongo, I want to be able to run a cron job to clean up old records, but I am not clear on how to do that, can't figure how to use bash to talk to pymongo...
What else could I do, call python from the cron job?
You could write a python script using pymongo (or any other mongodb client library) that does the necessary cleanup and configure cron to run it regularly.
Here is the article on OpenShift on how to get Cron up and running
https://www.redhat.com/openshift/community/blogs/getting-started-with-cron-jobs-on-openshift