How can I develop locally when using Iguazio platform? - python

I want to be able to test my jobs and code on my local machine before executing on a remote cluster. Ideally this will not require a lot of setup on my end. Is this possible?

Yes, this is possible. A common development pattern with the Iguazio platform is to utilize a local version of MLRun and Nuclio on a laptop/workstation and move/execute jobs on the cluster at a later point.
There are two main options for installing MLRun and Nuclio on a local environment:
docker-compose - Simpler and easier to get up and running, however restricted to running jobs within the environment it was executed in (i.e. Jupyter or IDE). This means you cannot specify resources like CPU/MEM/GPU to run a particular job. This approach is great for quickly getting up and running. Instructions can be found here.
Kubernetes - More complex to get up and running, but allows for running jobs in their own containers with specified CPU/MEM/GPU resources. This approach is a better for better emulating capabilities of the Iguazio platform in a local environment. Instructions can be found here.
Once you have installed MLRun and Nuclio using one of the above options and have created a job/function you can test it locally as well as deploy to the Iguazio cluster directly from your local development environment:
To run your job locally, utilize the local=True flag when specifying your MLRun function like in the Quick-Start guide.
To run your job remotely, specify the required environment files to allow connectivity to the Iguazio cluster as specified in this guide, and run your job with local=False

Related

Why should a python google cloud function not contain a Pipfile?

According to the documentation here
Dependency specification using the Pipfile/Pipfile.lock standard is currently not supported. Your project should not include these files.
I use Pipfile for managing my dependencies and create a requirements.txt file through
pipenv lock --requirements
Till now everything works and my gcloud function is up and running. So why should a python google cloud function not contain a Pipfile?
If it shouldn't contain, what is the preferred way suggested to manage an isolated environment ?
When you deploy your function, you deploy it on its own environment. You won't manage several environment because the cloud function deployment is dedicated to one and only one piece of code.
That's why, it's useless to have a virtual environment in a single usage environment. You could use Cloud Run to do that because you can customize your build and runtime environment. But, here again, it's useless: You won't have concurrent environment in the same container, it does not make sense.

Automate daily python process on remote server for improved reliability

I have a python script that runs locally via a scheduled task each day. Most of the time, this is fine -- except when I'm on vacation and the computer it runs on needs to be manually restarted. Or when my internet/power is down.
I am interested in putting it on some kind of rented server time. I'm a totally newbie at this (having never had a production-type process like this). I was unable to find any tutorials that seemed to address this type of use case. How would I install my python environment and any config, data files, or programs that the script needs (e.g., it does some web scraping and uses headless chrome w/a defined user profile).
Given the nature of the program, is it possible to do or would I need to get a dedicated server whose environment can be better set up for my specific needs? The process runs for about 20 seconds a day.
setting up a whole dedicated server for 20s worth of work is really a suboptimal thing to do. I see a few options:
Get a cloud-based VM that gets spin up and down only to run your process. That's relatively easy to automate on Azure, GCP and AWS.
Dockerize the application, along with the whole environment and running it as an image on the cloud - e.g. on a service like Beanstalk (AWS) or App Service (Azure) - this is more complex, but should be cheaper as it consumes less resources
Get a dedicated VM (droplet?) on a service like Digital Ocean, Heroku or pythonanywhere.com - dependent upon the specifics of your script, it may be quite easy and cheap to set up. This is the easiest and most flexible solution for a newbie I think, but it really depends on your script - you might hit some limitations.
In terms of setting up your environment - there are multiple options, with the most often used being:
pyenv (my preferred option)
anaconda (quite easy to use)
virtualenv / venv
To efficiently recreate your environment, you'll need to come up with a list of dependencies (libraries your script uses).
A summary of the steps:
run $pip freeze > requirements.txt locally
manually edit the requirements.txt file by removing all packages that are not used by your script
create a new virtual environment via pyenv, anaconda or venv and activate it wherever you want to run the script
copy your script & requirements.txt to the new location
run $pip install -r requirements.txt to install the libraries
ensure the script works as expected in its new location
set up the cornjob
If the script only runs for 20 seconds and you are not worried about scalability, running it directly on a NAS or raspberry could be a solution for a private environment if you have the hardware on hand.
If you don’t have the necessary hardware available, you may want to have a look at PythonAnywhere which offers a free version.
https://help.pythonanywhere.com/pages/ScheduledTasks/
https://www.pythonanywhere.com/
However, in any professional environment I would opt for a tool like Apache Airflow. Your process of “it does some web scraping and uses headless chrome w/a defined user profile” describes an ETL workflow.
https://airflow.apache.org/

How to use python script taskin vsts release pipeline

I am new to CI and CD world. I am using VSTS pipelines to automate my build and release processs.
This question is about the Release Pipeline. My deploy my build drop to a AWS VM. I created a Deployment group and ran the script in the VM to generate a deployment Agent on the AWS VM.
This works well and I am able to deploy successfully.
I would like to run few automation scripts in python after successful deployment.
I tried using Python Script Task. One of the settings is Python Interpretor. the help information says:
"Absolute path to the Python interpreter to use. If not specified, the task will use the interpreter in PATH.
Run the Use Python Version task to add a version of Python to PATH."
So,
I tried to use Python Version Task and specified the version of python I ususally run my scripts with. The prerequisites for the task mention
"A Microsoft-hosted agent with side-by-side versions of Python installed, or a self-hosted agent with Agent.ToolsDirectory configured (see Q&A)."
reference to Python Version task documentation
I am not sure how and where to set Agent.ToolsDirectory or how to use Microsoft Hosted agent on a release pipeline deploying to AWS VM. I could not find any step by step examples for this. Can anyone help me with clear steps how to run python scripts in my scenario?
the easiest way of doing this is just doing something like in your yaml definition:
- script: python xxx
this will run python and pass arguments to it, you can use python2 or python3 (default version installed on the hosted agent). another way of achieving this (more reliable) is using container inside hosted agent. this way you can explicitly specify python version and guarantee you are getting what you specified. example:
resources:
containers:
- container: my_container # can be anything
image: python:3.6-jessie # just an example
jobs:
- job: job_name
container: my_container # has to be the container name from resources
pool:
vmImage: 'Ubuntu-16.04'
steps:
- checkout: self
fetchDepth: 1
clean: true
- script: python xxx
this will start the python:3.6-jessie container, mount your code inside the container and run the python command in the root of the repo. Reading:
https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azdevops&tabs=schema&viewFallbackFrom=vsts#job
https://learn.microsoft.com/en-us/azure/devops/pipelines/process/container-phases?view=azdevops&tabs=yaml&viewFallbackFrom=vsts
in case you are using your own agent - just install python on it and make sure its in the path, so it should work when you just type python in the console (you'd have to use script task in this case). if you want to use python task, follow these articles:
https://github.com/Microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/tool/use-python-version?view=azdevops

Apache Airflow Continous Integration Workflow and Dependency management

I'm thinking of starting to use Apache Airflow for a project and am wondering how people manage continuous integration and dependencies with airflow. More specifically
Say I have the following set up
3 Airflow servers: dev staging and production.
I have two python DAG'S whose source code I want to keep in seperate repos.
The DAG's themselves are simple, basically just use a Python operator to call main(*args, **kwargs). However the actually code that's run by main is very large and stretches several files/modules.
Each python code base has different dependencies
for example,
Dag1 uses Python2.7 pandas==0.18.1, requests=2.13.0
Dag2 uses Python3.6 pandas==0.20.0 and Numba==0.27 as well as some cythonized code that needs to be compiled
How do I manage Airflow running these two Dag's with completely different dependencies?
Also, how do I manage the continuous integration of the code for both these Dags into each different Airflow enivornment (dev, staging, Prod)(do I just get jenkins or something to ssh to the airflow server and do something like git pull origin BRANCH)
Hopefully this question isn't too vague and people see the problems i'm having.
We use docker to run the code with different dependencies and DockerOperator in airflow DAG, which can run docker containers, also on remote machines (with docker daemon already running). We actually have only one airflow server to run jobs but more machines with docker daemon running, which the airflow executors call.
For continuous integration we use gitlab CI with the Gitlab container registry for each repository. This should be easily doable with Jenkins.

Workflow for Python with Docker + IDE for non-web applications

I am currently trying to insert Docker in my Python development workflow of non-web applications.
What are the current best practices in Python development using Docker and an IDE?
I need the possibility to isolate my environments with Docker and debug my code.
On the web I found many articles about the use of Docker to deploy your code:
Production deployments: how to build Docker images ready to spin with your application already packaged inside
Development environments that mirror production: extension of the above, where you can use a container to fully QA the current status of a project before deploying to production while developing
I found a lot less about an actual development workflow, apart from some tips on how to use containers with shared volumes mapped to the directories on the host while developing web applications. This approach does not apply to non-web applications and it has some issues where a simple reload (with a LiveReload-like mechanism) is not enough so you need to restart your container(s).
The closest writing I could find is this "Eight Docker Development Patterns" blog post, but it does not consider an IDE (like PyCharm I am using now).
Maybe this question is the result of the 3-4 hours (and counting) spent configuring PyCharm to use a remote Python interpreter running in a Docker container. I expected a much better integration between the two.
Actually, I believe that using the Docker interpreter in PyCharm is the way to go. Which version of PyCharm do you have? If you have the 2016 version, it should be set up within seconds. You just have to make sure your docker machine is running and you must have your image built that you would like to use with your project. PyCharm will find the Docker machine in the "add remote interpreter" dialog automatically. Then select your image and you're all set up.
You can run your code as usual then, almost without any delay.
Here's what worked for me: https://www.jetbrains.com/help/pycharm/2016.1/configuring-remote-interpreters-via-docker.html
And make sure to update PyCharm, that solved some issues I had.

Categories