de-Bazel-ing TensorFlow Serving - python

While I admire, and am somewhat baffled by, the documentation's commitment to mediating everything related to TensorFlow Serving through Bazel, my understanding of it is tenuous at best. I'd like to minimize my interaction with it.
I'm implementing my own TF Serving server by adapting code from the Inception + TF Serving tutorial. I find the BUILD files intimidating enough as it is, and rather than slogging through a lengthy debugging process, I decided to simply edit BUILD to refer to the .cc file, in lieu of also building the python stuff which (as I understand it?) isn't strictly necessary.
However, my functional installation of TF Serving can't be imported into python. With normal TensorFlow you build a .whl file and install it that way; is there something similar you can do with TF Serving? That way I could keep the construction and exporting of models in the realm of the friendly python interactive shell rather than editing it, crossing all available fingers, building in bazel, and then /bazel-bin/path/running/whatever.
Simply adding the directory to my PYTHONPATH has so far been unsuccessful.
Thanks!

You are close, you need to update the environment as they do in this script
.../serving/bazel-bin/tensorflow_serving/example/mnist_export
I printed out the environment update, did it manually
export PYTHONPATH=...
then I was able to import tensorflow_serving

Related

How to register custom environment with OpenAI's gym package to use make_vec_env() in SB3 (for multiprocessing)?

Goal: In Stable Baselines 3, I want to be able to run multiple workers on my environment in parallel (multiprocessing) to train my model.
Method: As shown in this Google Colab, I believe I just need to run the below line of code:
vec_env = make_vec_env(env_id, n_envs=num_cpu)
However, I have a custom environment, which doesn't have an env_id. So, when I run it just like "make_vec_env(MyEnvironment(), n_envs=3)", I get an error saying that my environment isn't callable. There seems to be a general lack of documentation around this, but from what I gather from this thread, I need to register my custom environment with Gym so that I can call on it with the make_vec_env() function.
My first question: Is there any other way to run multiple workers on a custom environment? If not...
My second question: How do I register my custom environment with Gym?
Again, documentation seems somewhat lacking. I have found these one, two, three posts which outline the steps. However, I don't understand - can I just place this folder anywhere I want? How does Gym know where to find it? Why do I need two init.py files?
Any guidance whatsoever would be hugely appreciated.

Is there a way to import a .pyd extension file as a simple include in a python file?

Hi there wise people of stack. I'm having trouble with importing pyd files as python objects.
The story:
I have an internal repo on gitlab that runs python files as well as C++ files. the repo uses pybind for both languages to speak to one another. The whole project is built with CI/CD and the artefact I have access to are .pyd extension files.
What I was given as a task would be to access some .pyd files (in different folders) with a single python script and access their classes (encoded inside this .pyd file) to mock them using python.
The problem:
What I was told was that I would need a simple include to be able to access the .pyd as an object through python just like you would with a library. However, I came across errors in the whole process. I have gone through this post and this one, but it seems that none of them works for me.
What was tried:
The first thing I did was start a remote folder with a single .pyd file from the project(let's call it SomeClass.pyd). I then created a python file test.py in the same directory as the pyd file.
The whole architecture looks like the following:
|--folder
|--SomeClass.pyd
|--test.py
Then, in the test.py file, I tried running
import SomeClass.pyd
import SomeClass
import SomeClass.pyd as sc
from SomeClass.pyd import *
from SomeClass import *
which all yielded the same following error:
ImportError: dynamic module does not define module export function
Now, I know that pyd files are similar to dlls, but I was told multiple time that a simple import would let me access the object information without needing anything in particular.
I recall reading about adding the PYTHONPATH before launching the whole process. However, I need that file to access the pyd without having to add any variable to the path as I will likely not always have access rights to the PYTHONPATH.
The project is quite big, so I'm trying to keep it bare minimum, but if you need more info, I'll try to give some more.
Thank you for your feedback!
Alright, after some time and a lot of researching, I found the weird answer for the problem that occured. I really hope it will help anyone encountering the same issue.
The problem was caused by the fact that pycharm has sometimes issues with the whole dynamic import.
First problem: dynamic import
This was solved simply by going on pycharm --> files --> invalidate cache and then tick "Clear file system cache and Local History" as well as "Clear VCS Log caches and indexes". You should then be prompted to reboot.
I also add a note that even after fixing the issue, sometime, for no apparent reason, I still have to invalidate cache again.
Second problem: venv
Once rebooted, you might be able to import manually the path to your pyd file, but you probably won't be able to auto complete. What solved this for me was compiling manually the code responsible for the pyd in order to generate a wheel. In my case, I used poetry:
poetry build
Once the wheel was created, I did a manual pip install of the wheel created by the poetry build to install it directly into the venv:
pip install dist/the_name_of_your_wheel_file.whl
These steps were the ones to fix my problem. I hope this will help anyone encountering the same problem!

Getting started with django-restframework-gis on Windows 10

I have some experience with Python-Django including its REST framework, a reasonable understanding of geographic information (and what GIS is about) and about databases, and I would like to develop a "geo-aware" REST service on my Windows machine based on Django. The application shall be limited to the REST service; visual exploration and other stuff shall be developed independently. Side-remark: once my application is running, it will be ported onto a Linux machine and PostGIS will then be used instead of SpatialLite.
After several hours of web-searching, I still haven't come up with a good "Quickstart" guide. There are many tutorials and docs about various aspects related to my task, but they either refer to Linux, or their installation instructions are outdated. Here is what I have done so far:
1) I use miniconda and Pycharm
2) I created a new virtual environment like so:
conda create -n locations pip numpy requests
activate locations
conda install -c conda-forge django djangorestframework djangorestframework-gis
3) I set-up the new Django project and my application and performed a database migration:
python [path-to..]\django-admin.py startproject locations
cd locations
python [path-to..]\django-admin.py startapp myapp
cd ..
python manage.py migrate
4) I added "rest_framework" and "myapp.apps.MyAppConfig" to the APPLICATIONS in settings.py
5) I stopped reading the general django-restframework tutorial and began searching for django-restframework-gis specific information. What I understood from this is that I need to enhance my SQLite database to become a SpatialLite database. Windows binaries for SpatialLite are available at gaia-sins -- but which of these do I really need? I downloaded the mod_spatialite-4.3.0a-win-x86.7z file and unpacked it, and I added SPATIALITE_LIBRARY_PATH= '[path-to..]\mod_spatialite-4.3.0a-win-x86\mod_spatialite.dll' to my settings.py.
What comes next?
Specific questions:
1) Do I really need to upgrade my SQLite database if I am not planning to store geospatial information but merely build a REST service to deliver information in GeoJSON which is coming from other sources (for example weather model output in netcdf data format)? Or would it suffice to describe my Django model in this case and simply ignore any database-related issues?
2) What is the minimum code to get the basic "wiring" done? This could be an extremely simple service which would accept a lat/lon coordinate as parameter in the URL and return this location in GeoJSON format. Such code should highlight the differences between using the "normal" django-restframework from the gis version. Once I have this, I will probably find my way through the existing documentation (for example Miguel Grinberg Tutorial or GitHub description).
OK, after another day of searching and experimenting, I acknowledge that this has been the wrong question to ask - therefore I answer myself and close this issue.
Apparently, I have been to naive about setting up a "geo-aware" service, thinking that I can get away with a special datatype or two for coordinates, and a special kind of serializer for GeoJSON - and, if really necessary, with a geo-enabled database.
Turns out, that what I want to do in the end, is a GeoDjango application (even if I will use only a tiny fraction of what GeoDjango can do), and so the GeoDjango docs are the place to start from, and in particular their installation guide.
The story isn't over yet as I am still having troubles to load the required libraries from Django, but the direction is clearer now.
More specifically, the issue I ran into wasn't primarily a SpatiaLite issue. I was able to install SpatiaLite and enhance an existing sqlite database by running SELECT load_extension('mod_spatialite'); SELECT InitSpatialMetaData(); (see also this post. Django (python manage.py check) complained about not finding the gdal library, and once it found it, it was the wrong version. The GeoDjango docs report that this is indeed the most common problem when installing GeoDjango. It would be helpful if the error messages from ctypes were a bit more verbose to make it easier to search for solutions. It took several hops across various web sites and an extra print statement in the ctypes init.py file, before I found out that one needs to match the version of gdal, the version of python, and the compiler (DLL hell this was called by someone).
Another part of the confusion arises from the manifold dependencies among the various required "geo packages". For example, SpatiaLite already comes with a gdal library, so why the need for installing gdal separately? Indeed, the GeoDjango docs recommend to work with OSGEO4W, because this program suite bundles everything together. Yet, this is not trivial if one starts from a system where OSGEO4W and Python/Django have been installed independently. This is the situation I start from. I installed OSGEO4W primarily to work with QGIS, and I installed Python and Django for other tasks. The realisation that the two must be linked for a GeoDjango application only came afterwards. I might need to start from scratch, but it would be good to know if people have been successful in a Windows 10, x64 environment with Python >= 3.4 recently.

What is the best way to save sklearn model?

I am working on a python desktop app. This app does some predictions. Right now I train my sklearn model using python script, save the parameters of the model as a dictionary in a yaml file. Then, I build in this yaml into my python app. Then, when I am using the app, the model is recreated using parameters from the dictionary. I realized, that people who have a different version of sklearn get an error. I tried to save my model in a pickle file, but in this case, it produced some warning when app was running on a machine with a different version of sklearn.
There is no guarantee that a given sklearn model would be compatible between versions of sklearn. Indeed, the implementation or the internal API may change between versions. See more informations here.
If you consider one version, the best way is indeed to pickle, and not to save the parameters in a yaml file. It's even better to use joblib to do so. See more informations here.
I realized, that people who have a different version of sklearn get an error.
In this case, create isolated Python environments using virtualenvs
Alternatively you can just generate a Python code from a trained model. This way you eliminate any possibility of object incompatibility. Here is a tool that can help with that https://github.com/BayesWitnesses/m2cgen

Tensorflow contrib.layers compatibility on Windows

I'm currently trying to learn more about the layers API of tensorflow, for this I'm trying the cloud-ml samples (census: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census).
When I launch the script on my Windows computer (Windows-10, run in local, not distributed, CPU mode), I get the following error:
File "\*\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\feature_column.py", line 1652, in insert_transformed_feature name="bucketize")
File "\*\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\ops\bucketization_op.py", line 48, in bucketize
return _bucketization_op.bucketize(input_tensor, boundaries, name=name)
AttributeError: 'NoneType' object has no attribute 'bucketize'
In the code of tensorflow (I used version 1.0.0, and upgraded it to 1.0.1 with the same error), I saw in the file tensorflow\contrib\layers\python\ops\bucketization_op.py that the op was loaded from native code:
_bucketization_op = loader.load_op_library(
resource_loader.get_path_to_datafile("_bucketization_op.so"))
At this point I actually have two questions:
Am I wrong to think that this is only valid on Linux, or the .dll might have been renamed .so to keep a coherent Python code? If there's such a renaming, can someone tell me where I could find this file as a quick search into the folder gave no result for *.dll or *.so (I assume every native code is wrapped by SWIG inside the _pywrap_tensorflow.pyd)?
Does anyone have a clue of why this kind of error could happen?
TL;DR: These ops should now work in the current nightly build of TensorFlow. I've sent out a pull request to add support in the upcoming 1.1 release.
The explanation is a bit tortuous, but I'll attempt to lay out the key points.
In general, the tf.contrib libraries have limited support on Windows, often because they depend on platform-specific code that does not work (or has not historically worked) on Windows. Until very recently the tf.load_op_library() API did not work on Windows, but a recent pull request added support for it. Nightly builds for TensorFlow on Windows now include .dll files for some extension libraries, and the loader library includes code that converts the .so extension to .dll on Windows.
As a historical workaround for this problem, we statically linked every tf.contrib kernel into _pywrap_tensorflow.pyd, and made loader.load_op_library() fail silently if the extension was not present on Windows. However, there are two ways to get the generated Python wrapper functions for each op:
The more common way, which (e.g.) tf.contrib.tensor_forest uses is to generate the Python source at build time and include the generated code in the PIP package. This works fine on Windows.
The less common way, which bucketization_op.py uses is to generate the Python source at run time, and return a generated Python module from loader.load_op_library(). Since we made this fail silently and return None on Windows, calling _bucketization_op.bucketize() doesn't work.
Finally, due to operational concerns, we determined that it would be useful to switch between the static and dynamic linking of the tf.contrib kernels on all platforms, and the easiest way to do that would be to generate the wrapper code statically. A recent change (which alas just missed the branch for the 1.1 release) made the generation of wrapper code consistent across all of the tf.contrib libraries.
I hope this makes sense. As a result of all of these changes, if you upgrade to a nightly build of TensorFlow the problem should be fixed, and
hopefully we can merge the change into the 1.1 release as well!

Categories