I currently have a handful of small Python scripts on my laptop that are set to run every 1-15 minutes, depending on the script in question. They perform various tasks for me like checking for new data on a certain API, manipulating it, and then posting it to another service, etc.
I have a NAS/personal server (unRAID) and was thinking about moving the scripts to there via Docker, but since I'm relatively new to Docker I wasn't sure about the best approach.
Would it be correct to take something like the Phusion Baseimage which includes Cron, package my scripts and crontab as dependencies to the image, and write the Dockerfile to initialize all of this? Or would it be a more canonical approach to modify the scripts so that they are threaded with recursive timers and just run each script individually in it's own official Python image?
No dude just install python on the docker container/image, move your scripts and run them as normal.
You may have to expose some port or add firewall exception but your container can be as native linux environment.
Related
I have a python script that runs locally via a scheduled task each day. Most of the time, this is fine -- except when I'm on vacation and the computer it runs on needs to be manually restarted. Or when my internet/power is down.
I am interested in putting it on some kind of rented server time. I'm a totally newbie at this (having never had a production-type process like this). I was unable to find any tutorials that seemed to address this type of use case. How would I install my python environment and any config, data files, or programs that the script needs (e.g., it does some web scraping and uses headless chrome w/a defined user profile).
Given the nature of the program, is it possible to do or would I need to get a dedicated server whose environment can be better set up for my specific needs? The process runs for about 20 seconds a day.
setting up a whole dedicated server for 20s worth of work is really a suboptimal thing to do. I see a few options:
Get a cloud-based VM that gets spin up and down only to run your process. That's relatively easy to automate on Azure, GCP and AWS.
Dockerize the application, along with the whole environment and running it as an image on the cloud - e.g. on a service like Beanstalk (AWS) or App Service (Azure) - this is more complex, but should be cheaper as it consumes less resources
Get a dedicated VM (droplet?) on a service like Digital Ocean, Heroku or pythonanywhere.com - dependent upon the specifics of your script, it may be quite easy and cheap to set up. This is the easiest and most flexible solution for a newbie I think, but it really depends on your script - you might hit some limitations.
In terms of setting up your environment - there are multiple options, with the most often used being:
pyenv (my preferred option)
anaconda (quite easy to use)
virtualenv / venv
To efficiently recreate your environment, you'll need to come up with a list of dependencies (libraries your script uses).
A summary of the steps:
run $pip freeze > requirements.txt locally
manually edit the requirements.txt file by removing all packages that are not used by your script
create a new virtual environment via pyenv, anaconda or venv and activate it wherever you want to run the script
copy your script & requirements.txt to the new location
run $pip install -r requirements.txt to install the libraries
ensure the script works as expected in its new location
set up the cornjob
If the script only runs for 20 seconds and you are not worried about scalability, running it directly on a NAS or raspberry could be a solution for a private environment if you have the hardware on hand.
If you don’t have the necessary hardware available, you may want to have a look at PythonAnywhere which offers a free version.
https://help.pythonanywhere.com/pages/ScheduledTasks/
https://www.pythonanywhere.com/
However, in any professional environment I would opt for a tool like Apache Airflow. Your process of “it does some web scraping and uses headless chrome w/a defined user profile” describes an ETL workflow.
https://airflow.apache.org/
A pretty large Python based project I'm working on has to deal with a situation some of you might know:
you have a local checkout which your server can not be run from (history), you alter a couple of files, e.g. by editing or git-operations and then you want to locally 'patch' a running server residing at a different location of the file system.
[Local Checkout, e.g. /home/me/project] = deploy => [Running Environment, e.g. /opt/project]
The 'deployment' process might have to run arbitrary build scripts, copy modified files, maybe restart a running service and so on.
Note that I'm not talking about CI or web-deployment - it's more like you change something on your source files and want to know if it runs (locally).
Currently we do this with a self-grown hierarchy scripts and want to improve this approach, e.g. with a make-based approach.
Personally I dislike make for Python projects for a couple of reasons, but in principle the thing I'm looking for could be done with make, i.e. it detects modifications, knows dependencies and it can do arbitrary stuff to meet the dependencies.
I'm now wondering if there isn't something like make for Python projects with same basic features as make but with 'Python-awareness' (Python binding, nice handling of command line args, etc).
Has this kind of 'deploy my site for development'-process a name I should know? I'm not asking what program I should use but how I should inform myself (examples are very welcome though)
I am trying to use docker to run numerical experiments (eventually on a node like AWS, but let's leave that for now). The code is in python, with some underlying c libraries. The code changes frequently, so the docker image needs to be recreated frequently. Also parameter files change for every experiment I run. I want to use docker to reduce clutter on the machine I run my experiment on.
I don't want to have a docker image per experiment sitting on my hard disk, so I wanted to know if there a way to create, execute, and then delete a docker image in sequence from a python script.
You could use python's subprocess module to call docker commands necessary in linux/ windows depending on where your docker is. Eg for linux,
import subprocess
subprocess.call(["docker", "rmi", "<your-image-name>"])
subprocess.call(["docker", "build", "--tag", "<your-image-name>", "<dir-of-Dockerfile>"])
If your machine is Windows, it may need different arguments, which you can find by Googling.
I know there are a ton of articles, blogs, and SO questions about running python applications inside a docker container. But I am looking for information on doing the opposite. I have a distributed application with a lot of independent components, and I have put each of these components inside a docker container(s) that I can run by manually via
docker run -d <MY_IMAGE_ID> mycommand.py
But I am trying to find a way (a pythonic, object-oriented way) of running these containerized applications from a python script on my "master" host machine.
I can obviously wrap the command line calls into a subprocess.Popen(), but I'm looking to see if something a bit more managed exists.
I have seen docker-py but I'm not sure if this is what I need or not; I can't find a way to use it to simply run a container, capture output, etc.
If anyone has any experience with this, I would appreciate any pointers.
I have a django site that needs to be rebuilt every night. I would like to check out the code from the Git repo and then begin doing the stuff like setting up the virtual environment, downloading the packages, etc. This would have no manual intervention as this would be run from cron
I'm really confused as to what to use for this. Should I write a Python script or a Shell script? Are there any tools that assist in this?
Thanks.
So what I'm looking for is CI and from what I've seen I'll probably end up using Jenkins or Buildbot for it. I've found the docs to be rather cryptic for someone who's never attempted anything like this before.
Do all CI like Buildbot/Jenkins simply run tests and more test and send you reports or do they actually set up a working Django environment that you can access through your browser?
You'll need to create some sort of build script that does everything but the GIT checkout. I've never used any Python build tools, but perhaps something like: http://www.scons.org/.
Once you've created a script you can use Jenkins to schedule a nightly build and report success/failure: http://jenkins-ci.org/. Jenkins will know how to checkout your code and then you can have it run your script.
There are litterally 100's of different tools to do this. You can write python scripts to be run from cron, you can write shell scripts, you can use one of the 100's of different build tools.
Most python/django shops would likely recommend Fabric. This really is a matter of you running through and making sure you understand everything that needs to be done and how to script it. Do you need to run a test suite before you deploy to ensure it doesn't really break everything? Do you need to run South database migrations? You really need to think about what needs to be done and then you just write a fabric script to do those things.
None of this even touches the fact that overall what you're asking for is continuous integration which itself has a whole slew of tools to help manage that.
What you are asking for is Continuous Integration.
There are many CI tools out there, but in the end it boils down to your personal preferences (like always, hopefully) and which one just works for you.
The Django project itself uses buildbot.
If you would ask me, then I would recommend you continuous.io, which works ouf the box with Django applications.
You can manually set how many times you would like to build your Django project, which is great.
You can, of course, write a shell script which rebuilds your Django project via cron, but you should deserve better than that.