I wrote a Python script which scrapes a website and sends emails if a certain condition is met. It repeats itself every day in a loop.
I converted the Python file to an EXE and it runs as an application on my computer. But I don't think this is the best solution to my needs since my computer isn't always on and connected to the internet.
Is there a specific website I can host my Python code on which will allow it to always run?
More generally, I am trying to get the bigger picture of how this works. What do you actually have to do to have a Python script running on the cloud? Do you just upload it? What steps do you have to undertake?
Thanks in advance!
well i think one of the best option is pythonanywhere.com there you can upload your python script(script.py) and then run it and then finish.
i did this with my telegram bot
You can deploy your application using AWS Beanstalk. It will provide you with the whole python environment along with server configuration likely to be changed according to your needs. Its a PAAS offering from AWS cloud.
The best and cheapest solution I have found so far is to use AWS Event Bridge with AWS Lambda.
AWS Lambda allows you upload and execute any script you want in most popular programming languages without needing to pay for a server monthly.
And you can use AWS Event Bridge to trigger an execution of a Lambda function.
You only get charged for what you use in AWS Lambda and it is extremely cheap. Below is the pricing for Lambda in the AWS N. Virginia region. For most scripts, the minimum memory is more than enough. So to run a script every hour for a month that takes 5 seconds to finish, it will cost $0.00756 a month (less than a cent!).
Memory (MB)
Price per 1ms
128
$0.0000000021
512
$0.0000000083
1024
$0.0000000167
1536
$0.0000000250
2048
$0.0000000333
3072
$0.0000000500
4096
$0.0000000667
5120
$0.0000000833
6144
$0.0000001000
7168
$0.0000001167
8192
$0.0000001333
9216
$0.0000001500
10240
$0.0000001667
Then you can use AWS Event Bridge to schedule to run an AWS Lambda function every minute, hour, etc.
Here are some articles to help you run any script every minute, hour, etc.
How to Create Lambda Functions in Python
How to Schedule Running an AWS Lambda Function
Related
I'm trying to run a python script in Google Cloud which will download 50GB of data once a day to a storage bucket. That download might take longer than the timeout limit on the Google Cloud Functions which is set to 9 minutes.
The request to invoke the python function is triggered by HTTP.
Is there a way around this problem ? I don't need to run a HTTP Restful service as this is called once a day from an external source. (Can't be scheduled) .
The whole premise is do download the big chuck of data directly to the cloud.
Thanks for any suggestions.
9 minutes is a hard limit for Cloud Functions that can't be exceeded. If you can't split up your work into smaller units, one for each function invocation, consider using a different product. Cloud Run limits to 15 minutes, and Compute Engine has no limit that would apply to you.
Google Cloud Scheduler may work well for that.
Here is a nice google blog post that shows example of how to set up a python script.
p.s. you would probably want to connect it to the App Engine for the actual execution.
I am writing a program in Python that will need to be having an uptime of 30 days straight. It is connecting to an MQTT-client, and listens for messages for a number of topics.
I have using an EC2 server instance running Linux AMI and I wonder how I could set this up to run constantly for this duration of time?
I was looking for cronjobs and rebooting every X days, but preferably the system should have no down time if possible.
However, I am unsure how to set this up and make sure the script restarts if the server/program was ever to fail.
The client will connect to an OpenVPN VPC through amazon, and then run the script and keep it running. Would this be possible to setup?
The version I am running is:
Amazon Linux AMI 2018.03.0.20180811 x86_64 HVM GP2
NAME="Amazon Linux AMI"
VERSION="2018.03"
ID_LIKE="rhel fedora"
VERSION_ID="2018.03"
You can accomplish this by using Auto Scaling to automatically maintain the required number of EC2 instances. If an instance becomes unresponsive or fails health checks, auto scaling will launch a new one. See: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-maintain-instance-levels.html
You'll want to make an AMI of your system to use to launch new instances, or maybe put your configuration into a user data script.
If your use case is simply to receive messages over MQTT I would recommend that you take a look at the AWS IoT Core service as a solution rather than running an EC2 instance. This will solve your downtime issues because it's a managed service with a high degree of resiliency built-in.
You can choose the route the messages to a variety of targets, including storing them in S3 for batch processing or using AWS Lambda to process them as they arrive without having to run EC2 instances. With Lambda, you get 1 million invokes per month for free so if your volume is less than this, your compute costs will be zero too.
I have been working on a website for over a year now, using Django and Python3 primarily. A few of my buddies and I built a front end where a user enters some parameters and submits, this goes to the GAE to run the job and return the results.
In my local dev environment, everything works well. I have two separate dev environments. One builds the entire service up in a docker container. This produces the desired results in roughly 11 seconds. The other environment runs the source files locally on my computer and connects to the Postgres database hosted in Google Cloud. The Python application runs locally. It takes roughly 2 minutes for it to run locally, a lot of latency between the cloud and the post/gets from my local machine.
Once I perform the Gcloud app deploy and attempt to run in production, it never finishes. I have some print statements built into the code, I know it gets to the part where the submitted parameters go to the Python code. I monitor via this command on my local computer: gcloud app logs read.
I suspect that since my local computer is a beast (i7-7770 processor with 64 GB of RAM), it runs the whole thing no problem. But in the GAE, I don't think it's providing the proper machines to do the job efficiently (not enough compute, not enough RAM). That's my guess.
So, I need help in how to troubleshoot this. I tried changing my app.yaml file so that resources would scale to 16 GB of memory, but it would never deploy. I received an error 13.
One other note, after it spins around trying to run the job for 60 minutes, the website crashes and displays this message:
502 Server Error
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
OK, so just in case anybody in the future is having a similar problem...the constant crashing of my Google App Engine workers was because of using Pandas dataframes in the production environment. I don't know exactly what Pandas was doing, but I kept getting Memory Errors, it would crash the site...and it didn't appear to be occurring in a single line of code. That is, it randomly happened somewhere in a Pandas Dataframe operation.
I am still using a Pandas Dataframe simply to read in a csv file. I then use
data_I_care_about = dict(zip(df.col1, df.col2))
#or
other_data = df.col3.values.tolist()
and then go to town with processing. As a note, on my local machine (my development environment basically) - it took 6 seconds to run from start to finish . That's a long time for a web request but I was in a hurry, thus why I used Pandas to begin with.
After refactoring, the same job completed in roughly 200ms using python lists and dicts (again, in my dev environment). The website is up and running very smoothly now. It takes a maximum of 7 seconds after pressing "Submit" for the back-end to return the data sets and render on the web page. Thanks for the help peeps!
I'm trying to define an architecture where multiple Python scripts need to be run in parallel and on demand. Imagine the following setup:
script requestors (web API) -> Service Bus queue -> script execution -> result posted back to script requestor
To this end, the script requestor places a script request message on the queue, together with an API endpoint where the result should be posted back to. The script request message also contains the input for the script to be run.
The Service Bus queue decouples producers and consumers. A generic set of "workers" simply look for new messages on the queue, take the input message and call a Python script with said input. Then they post back the result to the API endpoint. But what strategies could I use to "run the Python scripts"?
One possible strategy could be to use Webjobs. Webjobs can execute Python scripts and run on a schedule. Let's say that you run a Webjob every 5 minutes, the Python script can pool the queue, do some processing and post the results back to you API.
Per my exprience, I think there are two strategies below you could use.
Developing a Python script for Azure HDInsight. Azure HDInsight as a platform based on Hadoop that has the power of parallel compute, you can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-streaming-python/ to know it.
Develoging a Python script based on the parallel compute framework like dispy or jug running Azure VMs.
Hope it helps. Best Regards.
I wrote a Python script that will pull data from a 3rd party API and push it into a SQL table I set up in AWS RDS. I want to automate this script so that it runs every night (e.g., the script will only take about a minute to run). I need to find a good place and way to set up this script so that it runs each night.
I could set up an EC2 instance, and a cron job on that instance, and run it from there, but it seems expensive to keep an EC2 instance alive all day for only 1 minute of run-time per night. Would AWS data pipeline work for this purpose? Are there other better alternatives?
(I've seen similar topics discussed when googling around but haven't seen recent answers.)
Thanks
Based on your case, I think you can try to use shellCommandActivity in data pipeline. It will launch a ec2 instance and execute the command you give to data pipeline on your schedule. After finishing the task, pipeline will terminate ec2 instance.
Here is doc:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-shellcommandactivity.html
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-ec2resource.html
Alternatively, you could use a 3rd-party service like Crono. Crono is a simple REST API to manage time-based jobs programmatically.