Is there a standard way to use Python on Apache NiFi Docker image? - python

My team is using NiFi scripts in order to do some data processing tasks. We are planning to deploy the NiFi cluster on Kubernetes so we are using apache/nifi docker image for this purpose. We are starting with the NiFi, so we don't have much experience working on the same. We are running some python scripts using NiFi. As the team mentioned that Python is required in the NiFi environment. So, I've made some modifications to the Dockerfile on the apache/nifi source code and built a custom image that has Python installed. Currently, everything is working.
Currently, I'm thinking that is it the right approach? Is there any other methods to work with Python on NiFi Docker? The other difficulties that I'll met soon is when I want to upgrade NiFi, I'll have to go through all the process like fetching the latest NiFi source code and then edit and add Python to the source code and then building and pushing the image to my repository/registry.
Is there a standard way to do this?

Related

Setting up Jupyter lab for python scripts on a cloud provider as a beginner

I have python scripts for automated trading for currency and I want to deploy them by running on Jupter Lab on a cloud instance. I have no experience with cloud computing or linux, so I have been trying weeks to get into this cloud computing mania, but I found it very difficult to participate in it.
My goal is to set up a full-fledged Python infrastructure on a cloud instance from whichever provider so that I can run my trading bot on the cloud.
I want to set up a cloud instance on whichever provider that has the latest python
installation plus the typically needed scientific packages (such as NumPy and pandas and others) in combination with a password-protected and Secure Sockets Layer (SSL)-encrypted Jupyter
Lab server installation.
So far I have gotten no where. I am currently looking at the digital ocean website for setting jupter lab up but there are so many confusing terms.
What is Ubuntu or Debian? Is it like a sub-variant of Linux operating system? Why do I have only 2 options here? I use neither of the operating system, I use the windows operating system on my laptop and it is also where I developed my python script. Do I need a window server or something?
How can I do this? I tried a lot of tutorials but I just got more confused.
Your question raises several more about what you are trying to accomplish. Are you just trying to run your script on cloud services? Or do you want to schedule a server to spin up and execute your code? Are you running a bot that trades for you? These are just some initial questions after reading your post.
Regarding your specific question regarding Ubuntu and Debian, they are indeed Linux distributions which are popular option for servers. You can set up a Windows server on AWS or another cloud provider, but Linux distributions being much more popular are going to have lots of documentation, articles, stackoverflow posts around a Linux based server.
If you just want to run a script on a cloud on demand, you would probably have a lot of success following Wayne's comment around PythonAnywhere or Google Colab.
If you want your own cloud server, I would suggest starting small and slow with a small or free tier EC2 instance by following a tutorial such as this https://dataschool.com/data-modeling-101/running-jupyter-notebook-on-an-ec2-server/ Alternatively, you could splurge for an AWS AMI which will have much more compute power and be configured.
I have similar problem and the most suiteble solution to me is using docker container for jupyter notebooks. The instructions on how to install Docker can be found at https://docs.docker.com/engine/install/ubuntu/ There is ready to use Docker image docker pull jupyter/datascience-notebook for jupyter notebook python stack. The docker compose files und sone addional insruction you will fid at https://github.com/stefanproell/jupyter-notebook-docker-compose/blob/master/README.md.

What is the diference between using base images and using apt?

I have read this question and I understand that you need some kind of foundation to build your docker image. However, I still don't see the purpose of docker images like python.
Why do I need this:
FROM python:latest
when I can just do that:
FROM ubuntu
RUN apt install python3
Say I want to run a container where a python server is hosted using apache. What would be the difference between
Using the apache base image and installing python manually
Using the python base image and installing apache manually
Using the ubuntu base image and installing both manually
The difference is slim in the given example because in the end you will get the same thing but using slightly different commands.
Things change when you need to use either latest or specific version of the software. The required version may not be available in standard Ubuntu repositories or may come with a delay.
What you get from using python or apache2 as a base is the ability to choose the version you need with just one line of code as soon as it's published.
More significantly, there may be no need to combine python and apache. Docker containers are usually made to host a single process and it is more common to have a python backend in one container and a web-server as a proxy in another.
In this case you don't care about installing apache at all, you just mount its config into the container at runtime. Eliminating the web server you only need to focus on the application and its dependencies, so in the end you will have less code and easier time maintaining it.

Can I deploy an app that uses both Python and Node.js on Amazon Elastic Beanstalk?

See TL;DR before the question if you wish, but any quick responses are appreciated.
I have a web-app that uses Node.js for the backend and Python for running a particular script.
The app basically takes an MS Excel file, does some computations on it (here is where Python comes into play) and returns the result MS Excel file.
I've used Python successfully along with Node.js using the child_process class in Node.js.
I've also deployed the app on Heroku using Heroku buildpacks . (I've added both Python and Node.js buildpacks in order for it to work.)
Now, the slug size limit of Heroku is a problem for me as my Python program (which is basically an ML model) uses resources more than 300mb (around 13.8k individual files). Before someone points it out, I cannot use S3 in this case as my Python script needs to read and use the files (which are the parameters for the ML model) from a folder.
So, I thought of moving to AWS Elastic Beanstalk but I haven't been able to find out any features analogous to Heroku buildpacks. Is there any way I can make it work?
TL;DR
I want to deploy an app that uses Node.js along with Python with some (heavy) dependencies.
Can I do that? If yes then how? Are there any features like Heroku buildpacks?
Any detailed pointers are welcome.
Thank you =)

how to run docker compose using docker python sdk

I would like to run docker-compose via python docker sdk.
However I couldn't find any reference on how to achieve this using these reference Python SDK? I could also use subprocess but I have some other difficulty while using that. see here docker compose subprocess
I am working on the same issue and was looking for answers, but nothing so far. The best shot I can give it is to simplify that docker-compose logic. For example, you have a YAML file with a network and services - create them separately using Python Docker SDK and connect containers to a network.
It gets cumbersome, but eventually you can get things working that way from Python.
I created a package to make this easy: python-on-whales
Install with
pip install python-on-whales
Then you can do
from python_on_whales import docker
docker.compose.up()
docker.compose.stop()
# and all the other commands.
You can find the source for the package in my GitHub repository: https://gabrieldemarmiesse.github.io/python-on-whales/

Build Process Tool for Python

I am a Java Developer writing (Stand-Alone) Applications for Apache Spark. To create Artifacts I use Gradle along with the ShadowJar Plugin.
A few team mates want to use Python. Currently, they use JetBrains PyCharm to write these Python Scripts and remotely execute them on the Spark Cluster Environment. However, this process does not scale well (What to do if there is more than one file involved?) and I am looking for a solution in the Python Ecosystem. A problem is that neither I nor one of my Team members is a Python Expert (in fact the other team mates are no developers, but have to write code. Management decisions...), so we do not have any clue what is the best practice for Python development.
I tried PyGradle, but it did not feel smoothly integratable, especially with Apache Spark. I tripped over names like Pip, Pex, Setuptools, VirtualEnv. What are those tools? How do they interfere with each other?
To prevent the X-Y Problem: I want a Codebase which can be build, (unit-)tested and packaged with one command (like gradle build). The resulting artifact should be able to be deployed and executed on a Spark Cluster.
i am also new to this world and want to setup process in the AI startup. i think http://pybuilder.github.io/ is at least good startpoint for automation as i am trying to setup this among us.

Categories