According to the documentation here
Dependency specification using the Pipfile/Pipfile.lock standard is currently not supported. Your project should not include these files.
I use Pipfile for managing my dependencies and create a requirements.txt file through
pipenv lock --requirements
Till now everything works and my gcloud function is up and running. So why should a python google cloud function not contain a Pipfile?
If it shouldn't contain, what is the preferred way suggested to manage an isolated environment ?
When you deploy your function, you deploy it on its own environment. You won't manage several environment because the cloud function deployment is dedicated to one and only one piece of code.
That's why, it's useless to have a virtual environment in a single usage environment. You could use Cloud Run to do that because you can customize your build and runtime environment. But, here again, it's useless: You won't have concurrent environment in the same container, it does not make sense.
Related
I want to be able to test my jobs and code on my local machine before executing on a remote cluster. Ideally this will not require a lot of setup on my end. Is this possible?
Yes, this is possible. A common development pattern with the Iguazio platform is to utilize a local version of MLRun and Nuclio on a laptop/workstation and move/execute jobs on the cluster at a later point.
There are two main options for installing MLRun and Nuclio on a local environment:
docker-compose - Simpler and easier to get up and running, however restricted to running jobs within the environment it was executed in (i.e. Jupyter or IDE). This means you cannot specify resources like CPU/MEM/GPU to run a particular job. This approach is great for quickly getting up and running. Instructions can be found here.
Kubernetes - More complex to get up and running, but allows for running jobs in their own containers with specified CPU/MEM/GPU resources. This approach is a better for better emulating capabilities of the Iguazio platform in a local environment. Instructions can be found here.
Once you have installed MLRun and Nuclio using one of the above options and have created a job/function you can test it locally as well as deploy to the Iguazio cluster directly from your local development environment:
To run your job locally, utilize the local=True flag when specifying your MLRun function like in the Quick-Start guide.
To run your job remotely, specify the required environment files to allow connectivity to the Iguazio cluster as specified in this guide, and run your job with local=False
The airflow is to schedule python and jupyter jobs.
There are environment settings, directories and installed python and linux packages used by the python code.
Should airflow be installed in a separate docker or in the same docker?
If it is in a separate docker, how can the env, directories, installed packages be shared to the airflow?
Ideally yes, that way you can scale/restart Airflow and Jupyter independently of each other, without having to take down everything.
For environment variables and packages, you will need to set these on both containers. To avoid code duplication, you might want to look at e.g. a .env file so that you don't have to define the same variables twice. See e.g. https://docs.docker.com/compose/environment-variables/#using-the---env-file--option
Files can be shared on a shared volume. How to set this up depends on your container management system. E.g. Docker Compose or Kubernetes.
My Python App Engine Flex application needs to connect to an external Oracle database. Currently I'm using the cx_Oracle Python package which requires me to install the Oracle Instant Client.
I have successfully run this locally (on macOS) by following the Instant Client installation steps. The steps required me to do the following:
Make a directory called /opt/oracle
Create a symlink from /opt/oracle/instantclient_12_2/libclntsh.dylib.12.1 to ~/lib/
However, I am confused about how to do the same thing in App Engine Flex (instructions). Specifically, here's what I'm confused about:
The instructions say I should run sudo yum install libaio to install the libaio package. How do I do this on GAE Flex? Or is this package already available?
I think I can add the Instant Client files to GAE (a whopping ~100MB!), then set the LD_LIBRARY_PATH environment variable in app.yaml to export LD_LIBRARY_PATH=/opt/oracle/instantclient_12_2:$LD_LIBRARY_PATH. Will this work?
Is this even feasible without using custom Docker containers on App Engine Flex?
Overall I'm not sure if I'm on the right track. Would love to hear from someone who has managed this before :)
If any of your dependencies is not available in the base GAE flex images provided by Google and cannot be installed via pip (because it's not a python package or it's not available in PyPI or whatever other reason) then you can't use the requirements.txt file to get it installed in your GAE flex app.
The proper way to satisfy such dependencies would be to build your own custom runtime. From About Custom Runtimes:
Custom runtimes allow you to define new runtime environments, which
might include additional components like language interpreters or
application servers.
Yes, that means providing a custom Docker file. In your particular case you'd be installing the Instant Client and libaio inside this Dockerfile. See also Building Custom Runtimes.
Answering your first question, I think that the instructions in the oracle website just show that you have to install said library for your application to work.
In the case of App engine flex, they way to ensure that the libraries are present in the deployment is with the requirements.txt textfile. There is a documentation page which does explain how to do so.
On the other hand, I will assume that "Instant Client Files" are not libraries, but necessary data for your App to run. You should use Google Cloud Storage to serve them, or any other alternative of Storage within Google Cloud.
I believe that, if this is all what you need for your App to work, pushing your own custom container should not be necessary.
I'm a long-time Django developer and have just started using Ansible, after using Vagrant for the last 18 months. Historically I've created a single VM for development of all my projects, and symlinked the reusable Django apps (Python packages) I create, to the site-packages directory.
I've got a working dev box for my latest Django project, but I can't really make changes to my own reusable apps without having to copy those changes back to a Git repo. Here's my ideal scenario:
I checkout all the packages I need to develop as Git submodules within the site I'm working on
I have some way (symlinking or a better method) to tell Ansible to setup the box and install my packages from these Git submodules
I run vagrant up or vagrant provision
It reads requirements.txt and installs the remaining packages (things like South, Pillow, etc), but it skips my set of tools because it knows they're already installed
I hope that makes sense. Basically, imagine I'm developing Django. How do I tell Vagrant (via Ansible I assume) to find my local copy of Django, rather than the one from PyPi?
Currently the only way I can think of doing this is creating individual symlinks for each of those packages I'm developing, but I'm sure there's a more sensible model.
Thanks!
You should probably think of it slightly differently. You create a Vagrant file which specifies Ansible as a provisioner. In that Vagrant file you also specify what playbook to use for your vagrant provision portion.
If your playbooks are written in an idempotent way, running them multiple times will skip steps that already match the desired state.
You should also think about what your desired end-state of a VM should look like and write playbooks to accomplish that. Unless I'm misunderstanding something, all your playbook actions should be happening inside of VM, not directly on your local machine.
If we just want to host single django application on a VPS or some cloud instance, is it still benefitial to use virtualenv ?
Or will it be an overkill , and better to use global python setup instead, as only one django application say Project X , will be hosted on that server ?
Does virtualenv provide any major benefits for a single application setup in a production environment that I might not be aware of ? eg. django upgradation, cron scripts , etc
I'd recommend always using virtualenv, because it makes your environment more reproducible -- you can version your dependencies alongside your application, you're not tied to the versions of the python packages in your system repository, and if you need to replicate your environment elsewhere, you can do that even if you're not running exactly the same OS underneath.