Tensorflow on IBM Power9 ppc64le - Can libtensorflow.so be deleted? - python

I tried to build a docker container with python and the tensorflow-gpu package for a ppc64le machine. I installed miniconda3 in the docker container and used the IBM repository to install all the necessary packages. To my surprise the resulting docker container was twice as big (7GB) as its amd64 counterpart (3.8GB).
I think the reason is, that the packages from the IBM repository are bloating the installation. I did some research and found two files libtensorflow.so and libtensorflow_cc.so in the tensorflow_core directory. Both of theses files are about 900MB in size and they are not installed in the amd64 container.
It seems these two files are the API-files for programming with C and C++. So my question is: If I am planning on only using python in this container, can I just delete these two files or do they serve another purpose in the ppc64le installation of tensorflow?

Yes. Those are added as there were many requests for it and it's a pain to cobble together the libraries and headers yourself for an already built TF .whl.
They can be removed if you'd rather have the disk space.
What is the content of your "amd64 container"? Just a pip install tensorflow?

Related

Dynamically linking a pip native package to another pip package

I am building a python module using C++ with pybind11. This project will depend on two separate packages, that provide bindings for native C++ libraries:
opencv (provided by the pip package opencv-python)
imgui (provided by the pip package pyimgui).
This packages, once installed, consist mainly of dynamic libraries, plus some glue code in python.
My question concerns the installation/build process on the final user computer, i.e when pip install . is launched (in build isolation mode): how to find the path to the third parties dynamic libraries when linking?
In the end, they will be located somewhere in the virtual environment of the user; but I need to link them during the pip build (in isolation mode), and I also need to ensure that they will be found when the module will be used.
Note: as far a opencv is concerned, I could link to a local installation of OpenCV (i.e require the user to apt install opencv before, and use find_package(OpenCV)), but I suspect this is not the way to go. Concerning imgui, this solution would not apply.

What is virtualenv & how does it help package dependencies for use in an AWS Lambda Function?

I have a project that used virtualenv to package a python 2.7 project with 3 dependencies I found in a requirements.txt file: boto3, botocore and pypdftk.
Boto3 is an Amazon Web Services SDK for Python, and so is botocore (I believe.) Pypdftk is some external library used for transforming PDFs in python.
Now I am supposed to get this project compressed to a zip and uploaded to AWS Lambda, a service for server less computing.
Additionally, AWS Lambda only supports the standard python 3.9 library & runtime. So because my project has these external libraries and dependencies, it seems the past developer used a virtualenv to:
package a deprecated version of python 2.7
package the dependencies listed in the requirements.txt file
AWS Lambda has a feature called Layers, where you can upload zipped binaries to extend the standard core python3 library.
In summary:
I am failing to understand how to upload my compressed python3.9 project.
Do I upload these dependencies separately in the AWS Lambda Layer?
OR does compressing my file inside a virtualenv take care of the dependencies?
Much like a Docker Container? The virtualenv ships with the compiler/interpreter & dependencies?
Upgrade your Python code/dependencies to 3.9, following the "how-to" here: https://docs.python.org/3/howto/pyporting.html
While it's possible to deploy 2.7 code using a Docker image (which, at least for now , is still provided by AWS), that's not a long-term solution and you'll almost certainly put in more work to make it happen.
For your other questions
package a deprecated version of python 2.7
Virtual environments won't let you do this. There are tools such as pyenv that do, but they won't work for a Lambda because the version of Python that's used to run your Lambda is part of the Lambda configuration, and cannot be replaced.
package the dependencies listed in the requirements.txt file
Yes, this is what a virtual environment can be used for. When you activate the virtual environment and run pip install, it will install the packages into the lib directory in the virtual environment.
To produce a Lambda deployment bundle you must ZIP your source code together with the installed packages. Making this a little more challenging, the packages are actually installed in lib/python3.9/site-packages/, and Lambda doesn't want to see that path; change into the directory while building the ZIP.
Here's is the official doc for producing Python bundles: https://docs.aws.amazon.com/lambda/latest/dg/python-package.html
Do I upload these dependencies separately in the AWS Lambda Layer?
You can, but that's only useful if you're going to share the dependencies between Lambdas. Otherwise it increases the complexity of your deployment with little benefit.

Package python software with pylucene dependency

I'm working on a python project that needs pylucene(python wrapper for lucene, a java library for search-engines programming).
I've created a Dockerfile that automatically downloads and compile pylucene; then also installs other needed pip dependencies.I builded this Dockerfile obtaining a docker image with all the dependencies(both pylucene and the others installed using pip).
Setting in pycharm this image as remote python interpreter I can run my code, but now I need to release my software in a way that allows to execute it also without pycharm or any other IDE that support remote interpreters.
I thought about creating another Dockerfile that starts from the dependency image and then copy in it my source obtaining an image where the code can be executed.
I don't like this solution much beacause the objective of my project is processing large offline datasets, so in this way the user of this image always have to specify bindings between container and host filesystem.
Are there any better options? Maybe creating an archive that contains my source, pylucene and pip dependencies?
Windows 10 64 bit, python 3.8.2, pylucene latest version (8.3.0)

How to configure Python in a GPU cluster?

I have a GPU cluster with one storage-node and several computing nodes each has 8 GPU. I am configuring the cluster.
One of the task is to configure the python, what we need is several versions of Python and some python packages, and for some packages we may require several versions of it, such as different version of tensorflow.
So the question is how to configure the python and the packages so that it' convenient to use different version of the package I want to use.
I have installed both python2.7 and python3.6 in each computing node and in the storage node. But I think it is a good way if one has a huge amount of computing node to configure. One of the solution is to install python in the share directory of the cluster instead of the default /usr/local path.
Anyone has a better way to do this?
What I use now is OpenPBS(Torque) and I am new to HPC.
Thanks a lot.
You can install Modules software environment in a shared directory accessible on every node. Then it will be easy to load a specific version of python or TensorFlow:
module load lang/Python/3.6.0
module load lib/Tensorflow/1.1.0
Then, if for some packages we may require several versions of it, you can have a look at Python virtualenv that permits to install several version of the same package. To share it on all the nodes, consider to create your virtualenv on a shared mount point.
You could install each piece of software on the storage node under a certain directory and mount that directory on the compute nodes. Then you don't have to install each software several times.
A common solution to this problem are Environment Modules. You install your software as a module. This means that the software is installed in a certain directory (e.g /opt/modules/python/3.6/) together with a modulefile. When you do module load python/3.6, the modulefile sets environment variables such that Python3.6 is in PATH, PYTHONPATH, etc.
This results in a nice separation of your software stack and also enables you to install newer versions of tensorflow without messing up the environment.

Offline Installation of python & pip

I need to install python on a sever to run scripts but the server has no access to the internet.
The server has access to a local network that has access to the internet*. I would like to use pip to manage the packages through a local network directory as specified here.
How can I install pip, python and their dependancies on a windows machine, offline so that I can use pip, as specified in the link above to manage the packages I require?
*For Clarity: I have no ability to mirror, hack or otherwise to get information to pass through the local network directly from the internet.
The official Python installer for Windows has no other dependencies. It runs completely offline.
For other packages that may have dependencies (that are difficult to install on Windows); Christopher Gholke maintains a list of Windows installers for common Python packages. These are msi installers (or whl files) that are self-contained.
They are designed to work with the official Python installer for Windows - as they use its registry entries to identify the install location.
You can download these and move them to your Windows machine.
Beyond those two - if you have further requirements you can use tools like basket to download packages and then provide the location as a source for offline pip installs; or create your own pip repository.
If you do decide to create a local pip repository, it is better to create a pip proxy (see pypicache for example) this way you are only requesting those packages that are required, rather than trying to mirror the entire cheeseshop.

Categories