Adding Python Libraries to Airflow-Puckel on Docker - python

I am new to Docker and Airflow and am having trouble figuring out the correct place to add the httplib2 Python library to the container. I am using the Airflow-Puckel image. Do I need to add it to the Dockerfile or the docker-compose yml file or both and once added do I just need to rebuild the container with up and it will run?

From my own experience while learning Airflow and Docker, I strongly recommend using the official docker-compose file, maintained by Airflow. If you are on your first steps with Docker and Airflow, the guides and docs may come in very handy and comprehensive. Also, there is the fact that the images are more likely to be updated with the last Airflow version.
For example, once you are done with the initialization, you can take a look at this article where it's is explained how to add packages to each of the services being run on Compose or how to set it up as production-ready. You could check this answer for an example too.
Good luck!

Related

how to run docker compose using docker python sdk

I would like to run docker-compose via python docker sdk.
However I couldn't find any reference on how to achieve this using these reference Python SDK? I could also use subprocess but I have some other difficulty while using that. see here docker compose subprocess
I am working on the same issue and was looking for answers, but nothing so far. The best shot I can give it is to simplify that docker-compose logic. For example, you have a YAML file with a network and services - create them separately using Python Docker SDK and connect containers to a network.
It gets cumbersome, but eventually you can get things working that way from Python.
I created a package to make this easy: python-on-whales
Install with
pip install python-on-whales
Then you can do
from python_on_whales import docker
docker.compose.up()
docker.compose.stop()
# and all the other commands.
You can find the source for the package in my GitHub repository: https://gabrieldemarmiesse.github.io/python-on-whales/

How to make images of Docker containers for Python, R and MongoDB work together

The stack I selected for my project are Python, R and MongoDB. However, I'd like to adopt Docker for this project but when I did my research on the internet, I pretty much found example for MySQL with PHP or Wordpress. So, I'm curious to know where I can find tutorials or example for using containers with Python, R, and MongoDB or any idea on how to put them together. What will the Dockerfile will be like? Especially, in my project, R used for data processing and data visualisation will be called from Python used for data collector as a sub-module for data cleaning as well.
Any help will be appreciated.
Option 1:
Split them in multiple docker images and run them all using docker-compose from a YAML that will set them all up easier.
There's probably already an image for each of those services that you can use and just add some code to them using docker volumes. Just look for them at Docker Hub.
Example of use with exiting Python Image are already in its description. It even shows how to create your own Docker Image using a Dockerfile which you need for each image.
Option 2:
You can build just one image using a less specific image (let's say debian/ubuntu), install all interpreters, libraries and other requirements inside and then create an ENTRYPOINT which will call a script that will run each service and keep open to avoid the container finalisation.

Workflow for Python with Docker + IDE for non-web applications

I am currently trying to insert Docker in my Python development workflow of non-web applications.
What are the current best practices in Python development using Docker and an IDE?
I need the possibility to isolate my environments with Docker and debug my code.
On the web I found many articles about the use of Docker to deploy your code:
Production deployments: how to build Docker images ready to spin with your application already packaged inside
Development environments that mirror production: extension of the above, where you can use a container to fully QA the current status of a project before deploying to production while developing
I found a lot less about an actual development workflow, apart from some tips on how to use containers with shared volumes mapped to the directories on the host while developing web applications. This approach does not apply to non-web applications and it has some issues where a simple reload (with a LiveReload-like mechanism) is not enough so you need to restart your container(s).
The closest writing I could find is this "Eight Docker Development Patterns" blog post, but it does not consider an IDE (like PyCharm I am using now).
Maybe this question is the result of the 3-4 hours (and counting) spent configuring PyCharm to use a remote Python interpreter running in a Docker container. I expected a much better integration between the two.
Actually, I believe that using the Docker interpreter in PyCharm is the way to go. Which version of PyCharm do you have? If you have the 2016 version, it should be set up within seconds. You just have to make sure your docker machine is running and you must have your image built that you would like to use with your project. PyCharm will find the Docker machine in the "add remote interpreter" dialog automatically. Then select your image and you're all set up.
You can run your code as usual then, almost without any delay.
Here's what worked for me: https://www.jetbrains.com/help/pycharm/2016.1/configuring-remote-interpreters-via-docker.html
And make sure to update PyCharm, that solved some issues I had.

Reusable Django apps + Ansible provisioning

I'm a long-time Django developer and have just started using Ansible, after using Vagrant for the last 18 months. Historically I've created a single VM for development of all my projects, and symlinked the reusable Django apps (Python packages) I create, to the site-packages directory.
I've got a working dev box for my latest Django project, but I can't really make changes to my own reusable apps without having to copy those changes back to a Git repo. Here's my ideal scenario:
I checkout all the packages I need to develop as Git submodules within the site I'm working on
I have some way (symlinking or a better method) to tell Ansible to setup the box and install my packages from these Git submodules
I run vagrant up or vagrant provision
It reads requirements.txt and installs the remaining packages (things like South, Pillow, etc), but it skips my set of tools because it knows they're already installed
I hope that makes sense. Basically, imagine I'm developing Django. How do I tell Vagrant (via Ansible I assume) to find my local copy of Django, rather than the one from PyPi?
Currently the only way I can think of doing this is creating individual symlinks for each of those packages I'm developing, but I'm sure there's a more sensible model.
Thanks!
You should probably think of it slightly differently. You create a Vagrant file which specifies Ansible as a provisioner. In that Vagrant file you also specify what playbook to use for your vagrant provision portion.
If your playbooks are written in an idempotent way, running them multiple times will skip steps that already match the desired state.
You should also think about what your desired end-state of a VM should look like and write playbooks to accomplish that. Unless I'm misunderstanding something, all your playbook actions should be happening inside of VM, not directly on your local machine.

Installing a my Django app on ec2

Im in the process of launching a Django app on ec2, but have hit a wall trying to install my code on my AMI instance. This is my situation: I have a bitnami AMI up and running that has Django, apache, Postgresql, and nearly all my dependancies pre installed, and I have my fully functional Django app running on my local machine that I have been testing thus far with the Django Dev server. After quite a bit of googling, the most common methods of installing an app to an ec2 instance seem either using ssh/sftp/scp to drop a tarball in the instance, or creating a repository and importing code from there. If anyone can tell me the method they prefer, and guide me through the process, or provide a link to a good tutorial, it would be hugely appreciated!
tar -pczf yourfile.tar.gz MyProject
scp -i /home/user/.cert/yourcert.pem yourfile.tar.gz user#serveripaddress:/home/user
tar -xvf /home/user/yourfile.tar
I usually simply scp -R my whole site directory into /home/bitnami of my AMI. I'm using Apache/NGINX/Django with mod_wsgi. So the directory (for example /home/bitnami/djangosites/) gets referred to based on my mod_wsgi path in my apache cfg file.
In other words, why not just move the whole directory recursively (scp -R) instead of making a tarball etc?
Directly copy the folder where your project resides may work. However you mention that you are using a BitNami image, so it is likely that you are using the BitNami Django Stack Amazon image. BitNami also provides a native version of the BitNami Django Stack so I would suggest that you first try to deploy your application on top of the native installer and see what exact steps you need to follow. For instance you may need to install python dependencies or if you plan to use Apache on production instead of the Django development server you will need to configure Apache to serve your project. I'm a BitNami developer and I mention this because make easier the deployment in different platforms (including ec2) is one of the goal of BitNami and as you are already using it you can take advantage of this.

Categories