Installing from github (with extras) via a requirements.txt file - python

I am trying to add the Haystack library as a dependency of a python project. The installation instructions that Haystack provides are as follows
git clone https://github.com/deepset-ai/haystack.git
cd haystack
pip install -e .[all]
I am trying to translate this into a single line that I can include in a requirements.txt. My current best guess is
farm-haystack[all] # git+https://github.com/deepset-ai/haystack.git
However this emits a bunch of warnings that various versions of farm-haystack don't provide the desired extras, such as
WARNING: farm-haystack 0.1.0.post2 does not provide the extra 'ray'
before failing with the error message
ERROR: Requested dill from https://files.pythonhosted.org/packages/3e/ad/31932a4e2804897e6fd2f946d53df51dd9b4aa55e152b5404395d00354d1/dill-0.3.1.tar.gz#sha256=d3ddddf2806a7bc9858b20c02dc174396795545e9d62f243b34481fd26eb3e2c (from farm-haystack[all]# git+https://github.com/deepset-ai/haystack.git->-r /dss_data/tmp/pip-requirements-install/req3361828774079305889.txt (line 1)) has different version in metadata: '0.3.1.dev0'
What is the proper way to go about doing this?

Since you mention it would be a dependency of another project, the format for listing Haystack in your requirements.txt should be the following (I pin the version here but it's not mandatory):
farm-haystack[all]==1.5.0
If you want to pin a specific git commit instead, the line in your requirements.txt should just be:
git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[all]

Related

Difference between installation of pip git+https and python setup.py

I am aware of this popular topic, however I am running into a different outcome when installing a python app using pip with git+https and python setup.py
I am building a docker image. I am trying to install in an image containing several other python apps, this custom webhook.
Using git+https
RUN /venv/bin/pip install git+https://github.com/alerta/alerta-contrib.git#subdirectory=webhooks/sentry
This seems to install the webhook the right way, as the relevant endpoint is l8r discoverable.
What is more, when I exec into the running container and doing a search for relevant files, I see the following
./venv/lib/python3.7/site-packages/sentry_sdk
./venv/lib/python3.7/site-packages/__pycache__/alerta_sentry.cpython-37.pyc
./venv/lib/python3.7/site-packages/sentry_sdk-0.15.1.dist-info
./venv/lib/python3.7/site-packages/alerta_sentry.py
./venv/lib/python3.7/site-packages/alerta_sentry-5.0.0-py3.7.egg-info
In my second approach I just copy this directory locally and in my Dockerfile I do
COPY sentry /app/sentry
RUN /venv/bin/python /app/sentry/setup.py install
This does not install the webhook appropriately and what is more, in the respective container I see a different file layout
./venv/lib/python3.7/site-packages/sentry_sdk
./venv/lib/python3.7/site-packages/sentry_sdk-0.15.1.dist-info
./venv/lib/python3.7/site-packages/alerta_sentry-5.0.0-py3.7.egg
./alerta_sentry.egg-info
./dist/alerta_sentry-5.0.0-py3.7.egg
(the sentry_sdk - related files must be irrelevant)
Why does the second approach fail to install the webhook appropriately?
Should these two option yield the same result?
What finally worked is the following
RUN /venv/bin/pip install /app/sentry/
I don't know the subtle differences between these two installation modes
I did notice however that /venv/bin/python /app/sentry/setup.py install did not produce an alerta_sentry.py but only the .egg file, i.e. ./venv/lib/python3.7/site-packages/alerta_sentry-5.0.0-py3.7.egg
On the other hand, /venv/bin/pip install /app/sentry/ unpacked (?) the .egg creating the ./venv/lib/python3.7/site-packages/alerta_sentry.py
I don't also know why the second installation option (i.e. the one creating the .egg file) was not working run time.

How to specify a specific github repo version in requirements.txt?

I want to be able to install a specific version of a github repo. I followed the instructions given here and my file requirements.txt looks as follows:
git://github.com/twoolie/NBT#f9e892e
I also tried the following versions:
git+git://github.com/twoolie/NBT#f9e892e
git+git://github.com/twoolie/NBT.git#f9e892e
git://github.com/twoolie/NBT.git#f9e892e
but in every case when I try to install the actual package, which requires the repository NBT from commit hash f9e892e, I get the error message
error in PyBlock setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Invalid requirement, parse error at "'://githu'"
So how to do it correctly?
I solved the problem by adding the following argument to the setup method in `setup.py':
install_requires=['NBT#git+git://github.com/twoolie/NBT#f9e892'],
and using an empty requirements.txt file. With these setting the install of the specific version of the package did work finally.

Full installation of tensorflow (all modules)?

I have this repository with me ; https://github.com/layog/Accurate-Binary-Convolution-Network . As requirements.txt says, it requires tensorflow==1.4.1. So I am using miniconda (in Ubuntu18.04) and for the love of God, I can't get it to run (errors out at the below line)
from tensorflow.examples.tutorial.* import input_data
Gives me an ImportError saying it can't find tensorflow.examples. I have diagnosed the problem that a few modules are missing after I installed tensorflow (Have tried all of the below ways)
pip install tensorflow==1.4.1
conda install -c conda-forge tensorflow==1.4.1
#And various wheel packages avaliable on the internet for 1.4.1
pip install tensorflow-1.4.0rc1-cp36-cp36m-manylinux1_x86_64.whl
Question is, if I want all the modules which are present in the git repo source as my installed copy, do I have to COMPLETELY build tensorflow from source ? If yes, can you mention the flag I should use? Are there any wheel packages available that have all modules present in them ?
A link would save me tonnes of effort!
NOTE: Even if I manually import the examples directory, it says tensorflow.contrib is missing, and if I local import that too, another ImportError pops up. There has to be an easier way I am sure of it
Just for reference for others stuck in the same situation:-
Use latest tensorflow build and bezel 0.27.1 for installing it. Even though the requirements state that we need an older version - use newer one instead. Not worth the hassle and will get the job done.
Also to answer the above question about building only specific directories is possible. Each module consists of BUILD file which is fed to bezel.
See the names category of the file to build specific to that folder. For reference the command I used to generate the wheel package for examples.tutorial.mnist :
bazel build --config=opt --config=cuda --incompatible_load_argument_is_label=false //tensorflow/examples/tutorials/mnist:all_files
Here all_files is the name found in the examples/tutorials/mnist/BUILD file.

How to compare requirement file and actually installed Python modules?

Given requirements.txt and a virtualenv environment, what is the best way to check from a script whether requirements are met and possibly provide details in case of mismatch?
Pip changes it's internal API with major releases, so I seen advices not to use it's parse_requirements method.
There is a way of pkg_resources.require(dependencies), but then how to parse requirements file with all it's fanciness, like github links, etc.?
This should be something pretty simple, but can't find any pointers.
UPDATE: programmatic solution is needed.
You can save your virtualenv's current installed packages with pip freeze to a file, say current.txt
pip freeze > current.txt
Then you can compare this to requirements.txt with difflib using a script like this:
import difflib
req = open('requirements.txt')
current = open('current.txt')
diff = difflib.ndiff(req.readlines(), current.readlines())
delta = ''.join([x for x in diff if x.startswith('-')])
print(delta)
This should display only the packages that are in 'requirements.txt' that aren't in 'current.txt'.
Got tired of the discrepancies between requirements.txt and the actually installed packages (e.g. when deploying to Heroku, I'd often get ModuleNotFoundError for forgetting to add a module to requirements.)
This helps:
Use compare-requirements (GitHub)
(you'll need to pip install pipdeptree to use it.)
It's then as simple as...
cmpreqs --pipdeptree
...to show you (in "Input 2") which modules are installed, but missing from requirements.txt.
You can then examine the list and see which ones should in fact be added to requirements.txt.

Install transitive bitbucket dependencies via pip

The situation I'm trying to resolve is installing a package from a private repository on bitbucket which has it's own dependency on another private repository in bitbucket.
I use this to kick off the install:
pip install -e git+https://bitbucket.org/myuser/project-one.git/master#egg=django_one
which then attempts to download it's dependencies from setup.py that look like:
install_requires = ['project-two',],
dependency_links = ['git+https://bitbucket.org/myuser/project-two.git/master#egg=project_two'],
This fails, the pip log looks like:
Downloading/unpacking project-two (from project-one)
Getting page https://pypi.python.org/simple/project-two/
Could not fetch URL https://pypi.python.org/simple/project-two/: HTTP Error 404: Not Found (project-two does not have any releases)
Will skip URL https://pypi.python.org/simple/project-two/ when looking for download links for project-two (from project-one)
Getting page https://pypi.python.org/simple/
URLs to search for versions for project-two (from project-one):
* https://pypi.python.org/simple/project-two/
* git+https://bitbucket.org/myuser/project-two.git/master#egg=project-two
Getting page https://pypi.python.org/simple/project-two/
Cannot look at git URL git+https://bitbucket.org/myuser/project-two.git/master#egg=project-two
Could not fetch URL https://pypi.python.org/simple/project-two/: HTTP Error 404: Not Found (project-two does not have any releases)
Will skip URL https://pypi.python.org/simple/project-two/ when looking for download links for project-two (from project-one)
Skipping link git+https://bitbucket.org/myuser/project-two.git/master#egg=project-two; wrong project name (not project-two)
Could not find any downloads that satisfy the requirement project-two (from project-one)
The curious thing about this setup is, if I take a clone of project-one and run
python setup install
from there, project-two is fetched from bitbucket and installed into my virtualenv. My understanding was that pip was using setup tools under the hood, so my assumption was the success of that test validated my approach.
Any suggestions appreciated.
FOLLOW UP:
So the accepted answer is quite right - but my problem had the additional complexity of being a private repo (https + http auth-basic). Using the syntax
dependency_links=["http://user:password#bitbucket.org/myuser/..."]
still caused a 401. Running up a shell and using pip.download.py to run urlopen demonstrates the underlying problem (ie pip needs additional setup in urllib2 to get this working).
The problem is mentioned here but I couldn't get that working.
pip created the idea of a VCS installation, so you can use git+https://path/to/repo.git, but setuptools does not understand that.
When you create a setup.py file you are using only setuptools (no pip involved), and setuptools does not understand that kind of URL.
You can use dependency_links with tarballs or zip files, but not with git repositories.
Replace your depencency_links by:
dependency_links=["https://bitbucket.org/myuser/project-two/get/master.zip#egg=project-two"]
And check if it works.
There is a similar question at https://stackoverflow.com/a/14928126/565999
References:
http://peak.telecommunity.com/DevCenter/setuptools#dependencies-that-aren-t-in-pypi

Categories