Make virtual environment setup faster - python

I have some automated tests on Jenkins where part of the build steps is to setup a virtual environment (Virtualenv Builder option).
This step performs a pip install for about 6 libraries maintained externally.
The problem I have is that this process takes time.
Is there a way of loading an image of the Python virtual environment?
That way I could create another Jenkins job which generates this image, and this job would only be executed if there are significant changes to the libraries. And then my original automated test jobs will simply get the artifact from this external job at the start of the tests and load the virtual environment from there.

Related

Why should a python google cloud function not contain a Pipfile?

According to the documentation here
Dependency specification using the Pipfile/Pipfile.lock standard is currently not supported. Your project should not include these files.
I use Pipfile for managing my dependencies and create a requirements.txt file through
pipenv lock --requirements
Till now everything works and my gcloud function is up and running. So why should a python google cloud function not contain a Pipfile?
If it shouldn't contain, what is the preferred way suggested to manage an isolated environment ?
When you deploy your function, you deploy it on its own environment. You won't manage several environment because the cloud function deployment is dedicated to one and only one piece of code.
That's why, it's useless to have a virtual environment in a single usage environment. You could use Cloud Run to do that because you can customize your build and runtime environment. But, here again, it's useless: You won't have concurrent environment in the same container, it does not make sense.

Automate daily python process on remote server for improved reliability

I have a python script that runs locally via a scheduled task each day. Most of the time, this is fine -- except when I'm on vacation and the computer it runs on needs to be manually restarted. Or when my internet/power is down.
I am interested in putting it on some kind of rented server time. I'm a totally newbie at this (having never had a production-type process like this). I was unable to find any tutorials that seemed to address this type of use case. How would I install my python environment and any config, data files, or programs that the script needs (e.g., it does some web scraping and uses headless chrome w/a defined user profile).
Given the nature of the program, is it possible to do or would I need to get a dedicated server whose environment can be better set up for my specific needs? The process runs for about 20 seconds a day.
setting up a whole dedicated server for 20s worth of work is really a suboptimal thing to do. I see a few options:
Get a cloud-based VM that gets spin up and down only to run your process. That's relatively easy to automate on Azure, GCP and AWS.
Dockerize the application, along with the whole environment and running it as an image on the cloud - e.g. on a service like Beanstalk (AWS) or App Service (Azure) - this is more complex, but should be cheaper as it consumes less resources
Get a dedicated VM (droplet?) on a service like Digital Ocean, Heroku or pythonanywhere.com - dependent upon the specifics of your script, it may be quite easy and cheap to set up. This is the easiest and most flexible solution for a newbie I think, but it really depends on your script - you might hit some limitations.
In terms of setting up your environment - there are multiple options, with the most often used being:
pyenv (my preferred option)
anaconda (quite easy to use)
virtualenv / venv
To efficiently recreate your environment, you'll need to come up with a list of dependencies (libraries your script uses).
A summary of the steps:
run $pip freeze > requirements.txt locally
manually edit the requirements.txt file by removing all packages that are not used by your script
create a new virtual environment via pyenv, anaconda or venv and activate it wherever you want to run the script
copy your script & requirements.txt to the new location
run $pip install -r requirements.txt to install the libraries
ensure the script works as expected in its new location
set up the cornjob
If the script only runs for 20 seconds and you are not worried about scalability, running it directly on a NAS or raspberry could be a solution for a private environment if you have the hardware on hand.
If you don’t have the necessary hardware available, you may want to have a look at PythonAnywhere which offers a free version.
https://help.pythonanywhere.com/pages/ScheduledTasks/
https://www.pythonanywhere.com/
However, in any professional environment I would opt for a tool like Apache Airflow. Your process of “it does some web scraping and uses headless chrome w/a defined user profile” describes an ETL workflow.
https://airflow.apache.org/

Should I activate my Python virtual environment before running my app in upstart?

I am working through the process of installing and configuring the Superset application. (A Flask app that allows real-time slicing and analysis of business data.)
When it comes to the Python virtual environment, I have read a number of articles and how-to guides and understand the concept of how it allows you to install packages into the virtual environment to keep things neatly contained for my application.
Now that I am preparing this application for (internal) production use, do I need to be activating the virtual environment before launching gunicorn in my upstart script? Or is the virtual environment more just for development and installing/updating packages for my application? (In which case I can just launch gunicorn without the extra step of activating the virtualenv.)
You should activate a virtualenv on the production server the same way as you do on the development machine. It allows you to run multiple Python applications on the same machine in a controlled environment. No need to worry that an update of packages in one virtualenv will cause an issue in the other one.
If I may suggest something. I really enjoy using virtualenvwrapper to simplify the use of virtualenvs even more. It allows you to define hooks, e.g.: preactivate, postactivate, predeactivate and postdeactivate using the scripts in $VIRTUAL_ENV/bin/. It's a good place for setting up environmental variables that your Python application can utilize.
And a good and simple tool for process control is supervisord.

python conda deployment on server

Let say I have two projects that I develop on my personal machine. I use conda to manage my python dependencies. I created environments to manage these projects. When I'm done with the dev, I want to export them to a remote machine that will run regularly, in the same time, these two projects. How should I manage this deployment ?
After some researches, I came up with this:
clone your environments as described on conda's doc.
export your environment file on the server along with your project.
import the environment on the server's conda.
create a bash script like that
#!/bin/bash
source activate my_environment
python ~/my_project/src/code.py
set up cron as usual calling this previous bash script

python tox, creating rpm virtualenv, as part of ci pipeline, unsure about where in workflow

I'm investigating how Python applications can also use a CI pipeline, but I'm not sure how to create the standard work-flow.
Jenkins is used to do the initial repository clone, and then initiates tox. Basically this is where maven, and/or msbuild, would get dependency packages and build.... which tox does via pip, so all good here.
But now for the confusing part, the last part of the pipeline is creating and uploading packages. Devs would likely upload created packages to a local pip repository, BUT then also possibly create a deployment package. In this case it would need to be an RPM containing a virtualenv of the application. I have made one manually using rpmvenev, but regardless of how its made, how what such a step be added to a tox config? In the case if rpmvenv, it creates its own virtualenv, a self contained command so to speak.
I like going with the Unix philosophy for this problem. Have a tool that does one thing incredibly well, then compose other tools together. Tox is purpose built to run your tests in a bunch of different python environments so using it to then build a deb / rpm / etc for you I feel is a bit of a misuse of that tool. It's probably easier to use tox just to run all your tests then depending on the results have another step in your pipeline deal with building a package for what was just tested.
Jenkins 2.x which is fairly recent at the time of this writing seems to be much better about building pipelines. BuildBot is going through a decent amount of development and already makes it fairly easy to build a good pipeline for this as well.
What we've done at my work is
Buildbot in AWS which receives push notifications from Github on PR's
That kicks off a docker container that pulls in the current code and runs Tox (py.test, flake8, as well as protractor and jasmine tests)
If the tox step comes back clean, kick off a different docker container to build a deb package
Push that deb package up to S3 and let Salt deal with telling those machines to update
That deb package is also just available as a build artifact, similar to what Jenkins 1.x would do. Once we're ready to go to staging, we just take that package and promote it to the staging debian repo manually. Ditto for rolling it to prod.
Tools I've found useful for all this:
Buildbot because it's in Python thus easier for us to work on but Jenkins would work just as well. Regardless, this is the controller for the entire pipeline
Docker because each build should be completely isolated from every other build
Tox the glorious test runner to handle all those details
fpm builds the package. RPM, DEB, tar.gz, whatever. Very configurable and easy to script.
Aptly makes it easy to manage debian repositories and in particular push them up to S3.

Categories