So far when dealing with web scraping projects, I've used GAppsScript, meaning that I can easily trigger the script to be run once a day.
Is there an equivalent service when dealing with python scripts? I have a RaspberryPi, so I guess I can keep it on 24/7 and use cronjobs to trigger the script daily. But that seems rather wasteful, since I'm talking about a few small scripts that take only a few seconds to run.
Is there any service that allows me to trigger a python script once a day? (without a need to keep a local machine on 24/7) The simpler the solution the better, wouldn't want to overengineer such a basic use case if a ready-made system already exists.
The only service I've found so far to do this with is WayScript and here's a python example running in the cloud. The free tier that should be enough for most simple/hobby-tier usecases.
Related
I have a folder with different folders in it.
In one of the folder I have a python script.
The python script reads an excel file (which is in the same folder), scrapes information from the internet, updates the excel file and creates another excel file in the main directory.
My question is:
As I can't run my computer non stop, I imagine it's possible (easy? and free) to upload all my folders on a website which will allow me to run my python (3.8) script. Do you have any suggestions ? Which website could be appropriate ? Pythonanywhere.com ?
Plus, I'd like to run this script every morning at 6am.
Thank you for your answers ! :)
Yes, you could use PythonAnywhere -- free accounts allow you to create one scheduled task, which can run once a day. If you have an account, you can set it up on the "Tasks" page.
Some public cloud providers, such as GCP, AWS, and Azure, offer free tier VMs. Simply run the code on those and set up a cron job. Though the network usage probably still costs you a few cents a month, this is a very cheap way to go. You could also consider setting up a FaaS solution against very low cost.
As #Klaus said, this is not a programming question. If you are on linux you can use crontab to schedule your process.
crontab
And if you want to run it on the cloud you can use free services like Heroku
I wrote a python script to send Data from a local DB via REST to Kafka.
My goal: I would like this script to run indefinitely, by either restarting in set intervals (i.e. every 5min) or whenever the DB gets new entries. I assume the set Intervals thing would be good enough, easier and safer.
Someone suggested to me to either run it via a cronjob and use a monitoring tool or do it using jenkins (which he considered better).
My Setting: I am not a DevOps engineer, and would like to know about the possibilities and risks setting this Script up. It would be no trouble to recreate the Script in Java if this improves the situation.
My Question: I did try to learn what jenkins is about and i think i understood the CI and CD part. But i don't see how this could help me with my goal. Can someone elaborate on this with some experience on this topic?
If you would suggest a cronjob, what are common methods or tools to monitor such a case? I think the main risks are, failing to send the data due to connection issues on the local machine to REST or the local DB or not beieng started properly at the specified time.
Jobs can be scheduled at regular intervals in Jenkins just like with cron, in fact it uses the same syntax. What's nice about scheduling the job via Jenkins, is that it's very easy to have it send an email if the job exits with a non-zero return code. I've moved all of my cron jobs into Jenkins and it's working well. So by running it via Jenkins you're covering the execution side and the monitoring side at the same time.
My intention is to run python code 24/7 over several months to collect data through API calls and alert me if certain conditions are met.
How can I do that without keeping the code running on my laptop 24/7? Is there a way of doing it in "the cloud"?
Preferably free but would consider paying. Simplicity also a plus.
There is a plenty of free possibilities. You can use heroku servers. They have a strong documentation for python and is free (but of course with limit on megabytes and time usage). Heroku is also good because they have a lot of extensions (databases, dashboards, loggers etc.). In order to deploy your app on heroku you need to have some experience of working with git (but not that much, don't worry).
Another possibility could be Google Cloud Platform, which should be also easy to use.
Let me explain what I'm trying to achieve. In the past while working on Java platform, I used to write Java codes(say, to push or pull data from MySQL database etc.) then create a war file which essentially bundles all the class files, supporting files etc and put it under a servlet container like Tomcat and this becomes a web service and can be invoked from any platform.
In my current scenario, I've majority of work being done in Java, however the Natural Language Processing(NLP)/Machine Learning(ML) part is being done in Python using the NLTK, Scipy, Numpy etc libraries. I'm trying to use the services of this Python engine in existing Java code. Integrating the Python code to Java through something like Jython is not that straight-forward(as Jython does not support calling any python module which has C based extensions, as far as I know), So I thought the next option would be to make it a web service, similar to what I had done with Java web services in the past. Now comes the actual crux of the question, how do I run the ML engine as a web service and call the same from any platform, in my current scenario this happens to be Java. I tried looking in the web, for various options to achieve this and found things like CherryPy, Werkzeug etc but not able to find the right approach or any sample code or anything that shows how to invoke a NLTK-Python script and serve the result through web, and eventually replicating the functionality Java web service provides. In the Python-NLTK code, the ML engine does a data-training on a large corpus(this takes 3-4 minutes) and we don't want the Python code to go through this step every time a method is invoked. If I make it a web service, the data-training will happen only once, when the service starts and then the service is ready to be invoked and use the already trained engine.
Now coming back to the problem, I'm pretty new to this web service things in Python and would appreciate any pointers on how to achieve this .Also, any pointers on achieving the goal of calling NLTK based python scripts from Java, without using web services approach and which can deployed on production servers to give good performance would also be helpful and appreciable. Thanks in advance.
Just for a note, I'm currently running all my code on a Linux machine with Python 2.6, JDK 1.6 installed on it.
One method is to build an XML-RPC server, but you may wish to fork a new process for each connection to prevent the server from seizing up. I have written a detailed tutorial on how to go about this: https://speakerdeck.com/timclicks/case-studies-of-python-in-parallel?slide=68.
NLTK based system tends to be slow at response per request, but good throughput can be achieved given enough RAM.
I'm a complete novice in this area, so please excuse my ignorance.
I have three questions:
What's the best (fastest, easiest, headache-free) way of hosting a python program online?
I'm currently looking at Google App Engine and Web Frameworks for Python, but all the options are a bit overwhelming.
Which gui/viz libraries will transfer to a web app environment without problems?
I'm willing to sacrifice some performance for the sake of simplicity.
(Google App Engine can't do C libraries, so this is causing a dilemma.)
Where can I learn more about running a program locally vs. having a program continuously run on a server and taking requests from multiple users?
Currently I have a working Python program that only uses standard Python libraries. It currently uses around 2.7gb of ram, but as I increase my dataset, I'm predicting it will use closer to 6gb. I can run it on my personal machine, and everything is just peachy. I'd like to continue developing on the front end on my home machine and implement the web app later.
Here is a relevant, previous post of mine.
Depending on your knowledge with server administration, you should consider a dedicated server. I was doing running some custom Python modules with Numpy, Scipy, Pandas, etc. on some data on a shared server with Godaddy. One program I wrote took 120 seconds to complete. Recently we switched to a dedicated server and it now takes 2 seconds. The shared environment used CGI to run Python and I installed mod_python on the dedicated server.
Using a dedicated server allows COMPLETE control (including root access) to the server which allows the compilation and/or installation of anything. It is a bit pricy but if you're making money with your stuff it might be worth it.
Another option would be to use something like http://www.dyndns.com/ where you can host a domain on your own machine.
So with that said, perhaps some answers:
It depends on your requirements. ~4gb of RAM might require a dedicated server. What you are asking is not necessarily an easy task so don't be afraid to get your hands dirty.
Not sure what you mean here.
A server is just a computer that responds to requests. On the dedicated server (I keep mentioning) you are operating in a Unix (or Windows) environment just like you would locally. You use SOFTWARE (e.g. Apache web server) to serve client requests. My vote is mod_python.
It's a greater headache than a dedicated server, but it should be much closer to your needs to go with an Amazon EC2 instance.
http://aws.amazon.com/ec2/#instance
Their extra large instance should be more than large enough for what you need to do, and you only turn the instance on when you need it so you don't have the massive bill that you get with a dedicated server that's the same size.
There are some nice javascript based visualization toolkits out there, so you can model your application to return raw (json) data and render that on the client.
I can mention d3.js http://mbostock.github.com/d3/ and the JavaScript InfoVis Toolkit http://thejit.org/