Problems installing a package from PyPI: root files not installed

Problems installing a package from PyPI: root files not installed - python

After installing the BitTorrent-bencode package, either via easy_install BitTorrent-bencode or pip install BitTorrent-bencode, or by downloading the tarball and installing that via easy_install $tarball, I discover that /usr/local/lib/python2.6/dist-packages/BitTorrent_bencode-5.0.8-py2.6.egg/ contains EGG-INFO/ and test/ directories. Although both of these subdirectories contain files, there are no files in the BitTorr* directory itself. The tarball does contain bencode.py, which is meant to be the actual source for this package, but it's not installed by either of those utils.
I'm pretty new to all of this so I'm not sure if this is a problem with the package or with what I'm doing. The package was packaged a while ago (2007), so perhaps it's using some deprecated configuration aspect that I need to supply a command-line flag for.
I'm more interested in learning what's wrong with either the package or my procedures than in getting this particular package installed; there is another package called hunnyb that seems to do a decent enough job of decoding bencoded data. Mostly I'd like to know how to deal with such problems in other packages. I'd also like to let the package maintainer know if the package needs updating.
edit
#Andrey Popp explains that the problem is likely with the setup.py file. I guess the only way I can really get an answer to my question is by actually R-ing TFM. However since I likely won't have time to do that thoroughly for a while yet, I've posted the setup.py file here.
A quick browse through the easy_install manual reveals that the function find_modules(), which this module's setup.py makes use of, searches for files named __init__.py within the package. The source code file in question is named bencode.py, so perhaps this is the problem: it should be named __init__.py?
edit 2
Having now learned Python packaging, I gather that the problem is that this module is using setuptools.find_packages, and has its source at the root of its directory structure, but hasn't passed anything in package_dir. It would seem to be fairly trivial to fix. However, the author is not reachable by his PyPI contact info. The module's PyPI page lists a "Package Index Owner" as well. I'm not sure what that's supposed to mean, but I did manage to get in touch with that person, who I think is maybe not in a position to maintain the module. In any case, it's still in the same state as when I posted this question back in June.
Given that the module seems to be more or less abandoned, and that there's a suitable replacement for it in hunnyb, I've accepted that #andreypopp's answer is about as good of one as I'm going to get.

It seems this package's setup.py is broken — it does not define right package for distribution. I think, you need to check setup.py in source release and if it is true — report a bug to author of this package.

Related

python import yaml - but which one? [duplicate]

This question already has answers here:
Python: How do I find which pip package a library belongs to?
(2 answers)
Closed 24 days ago.
Ok, so you clone a repo, there's an import
import yaml
ok, so you do pip install yaml and you get:
ERROR: No matching distribution found for yaml
Ok, so you look for a package with yaml in it, and there's like a gazillion of them... usually adding py in front does the job, but...
How on earth should I know which one was used?!
And it's not just yaml, oh no... there's:
import cv2 # python-opencv
import PIL # Pillow
and the list goes on and on...
How can I know which import uses which package? Shouldn't there be a PEP for this? Or a naming convention, e.g. import is always the same as the package name?
There's a similar topic here, if you're not frustrated enough :)

[When I clone a repo,] How can I know which import uses which package?
In short: it is the cloned code's responsibility to explain this, and it is an expected courtesy that the cloned code includes an installer that will take care of it.
If this is just some random person's bundle of .py files on GitHub with no installation instructions, look for notes in the associated documentation; failing that, make an issue on the tracker. (Or just give up. Maybe look for a better-engineered project that does the same thing.)
However, most "serious", contemporary Python projects are meant to be installed by using some form of packaging system. These have evolved over the years, and best practices have changed many times; but generally speaking, a properly "packaged" and "distributed" project will have either a setup.py or (newer; better in many ways, but not universally adopted yet) pyproject.toml file at the top level.
A pyproject.toml file is a config file in TOML format that simply describes a bunch of project metadata. This requires a build backend conforming to PEP 517. For a while, this required third-party tools, such as Poetry; but the standard setuptools can handle this since version 40.8.0. (As of this writing, the current release is 65.7.0.)
A setup.py script is executable code that pip will invoke after downloading a package from PyPI (or another package index). Generally, this script will use either setuptools or distutils (the predecessor to setuptools; it has finally been officially deprecated in 3.10, and will be removed in 3.12) to install the project, by calling a function named setup and passing it a big dict with some project metadata.
Security warning: this file is still executable code. It is arbitrary code, and it doesn't have to be following the standard conventions. Also, the package that is actually downloaded from PyPI doesn't necessarily match the project's source shown on GitHub (or another Git provisioning website), if such is even available. (This problem also affects package managers in other languages and ecosystems, notably npm for Javascript.)
With the setup.py based approach, package dependencies are specified using a keyword argument to the setup function. The specification has changed many times; currently, projects still using a setup.py should use the install_requires keyword argument.
With the pyproject.toml based approach, using setuptools' backend, dependencies will be an array (using JSON terminology, as TOML is a superset) stored under project.dependencies. This will vary for other backends; for example, Poetry expects this information under tool.poetry.dependencies.
In any event, pip freeze will output a list of what's installed in the current environment. It's a somewhat common practice for developers to test the code in a virtual environment where the dependencies are installed, dump this output to a requirements.txt file, and include that as documentation.
[When I want to use a third-party library in my own code,] How can I know which import uses which package?
It's worth considering the question the other way around, too: given that we have installed OpenCV for Python using pip install opencv-python, and want to use it in our own code, how do we know to import cv2 specifically?
The answer: there is no convention, and certainly no requirement for the installed package name to match the PyPI name, nor the GitHub etc. repository name. Read the documentation. Everyone who intends for their code to be used as a library, will be more than willing to show how, on at least a basic level.

Watch for requirements.txt . Big projects usually have it. You can import packages from this file. Else just google.

Keep in mind that it might not be a pip package.
Probably what is happening is that the main script is trying to import a secondary script (yaml.py, in this case) with functions or utils for the main script to use.
Check if the repo contains a file named yaml.py. If it's the case make sure to run the main script while the yaml.py is in the same directory.
Also, check for a requirements.txt file.
You can install all the requirements inside the file running in shell this line:
pip install -r *path to your requirements.txt*
Hope that this helps.

Any package on PyPI or cloned from some online repository is free to set itself up with a base directory name it chooses. That base directory xyz determines the import xyz line. Additionally a package name on PyPI doesn't have to match the repository name where its source code revisions are kept (assuming there is any).
This has the disadvantage that there is no one-to-one relation between package name, repo and/or import-line. But the advantage is that you e.g. can install Pillow, which is backwards compatible with PIL and still use import PIL instead of changing all your sources to use import Pillow as PIL.
If the repo you clone has a requirements.txt look there, you can also look in the setup.py for extra_require. But there is no guarantee that these are available, or contain the names of the packages to install (e.g. I use a generic setup.py that reads its info from a datastructure in the __init__.py file when creating/installing a package).
yaml seems to be a reserved name on PyPI (at least when I tried to upload a package with that name a few years ago). So that might be the reason the package is named PyYAML, although the Py is not very informative as the python code will not function in another programming language. PyPI' search is not very helpful as it relevance ordering is not relevant (at least not for yaml).
PyPI has no entry in the metadata for the import line, but you could extract that from .whl package file as the import line is the top level directory that doesn't match .dist-info. This is normally not possible from a .tar.gz` package file. I don't know of any site that does this kind of automatic scraping.
You can click through the packages on PyPI, after searching the import term, and hope you find something that matches the import in the documentation, but that is no guarantee you get the right one.
You might be best of searching for import yaml here on stackoverflow, and hope that the question or the answer mentions the package name.

thank you very much for your help and ideas. Big thanks to Karl Knechter for his exhaustive answer.
tl;dr: I think using some sort of "package" / "distribution" as a standard, would make everyone's lives easier.
However, my question was half-theoretical, to point out something I'd call, an incoherence in Python. You are of course right, there should be setuptools or requirements.txt or at least some documentation. But, if there isn't any, we're prone to error or additional browsing.
GospelBG pointed out something important. There could be a script yaml.py in the main folder and we need to check and/or guess.
Most importantly, naming imports differently than packages is just plainly misleading. There should be a naming convention or a PEP for this. Again, you can of course eventually get the proper package etc., but it's not explicit and obvious, and it should be! Because in programming, we like it that way, don't we?
I'm no seasoned dev in Python and I'm learning C++, but e.g. in C++, you import a header file with a particular name and static or dynamic libraries by their filename. Now I know this is very "step-by-step, on foot method", but at least you use the exact filenames.
On the upper level you have CMake, which would be an equivalent of setuptools where using find_package or find_library you can import package / library. To be honest, I'm not sure if all packages have the exact equivalent name, but at least the ones I used, did match.
Thanks again for your help and answers! I'm open for discussion and comments :)

How to solve ModuleNotFoundError: No module named 'string' error in Python 3?

I was trying to add the Strings library into the Robot Framework Interpreter folder in PyCharm, where first it showed me error
Command errored out with exit status: 1
So, I googled out this issue first, and this link suggested me to delete 'strings.py' file from the libraries. I did so and now nothing is working.
Anything I do now, it shows error of "ModuleNotFoundError: No module named 'string'".
I could not even install string using
pip install strings
command.
Whatever I try to do with pip, now shows this error.
Can anyone please suggest me the solution for this?
I am using Python version 3.10.

Several notes:
The library in question is 9 years old, is not maintained, is known by the author to be buggy, was written as a joke, and does not contain anything useful. There is no good reason why you should be trying to install it for your project. If you think you need it for something, then you have some other misconception that needs to be cleared up.
The link you found did not tell you to delete strings.py from "the libraries". It said something about deleting string.py - notice, no s at the end - from a local project folder. The reason for this is because that name conflicts with the library name. The page author's own source file, named string.py, sought to import the standard library file string.py (as it clearly says import string in the screen shot), but it cannot - because it finds itself first. This is a common problem for new Python users.
DO NOT EVER MANUALLY EDIT THE CONTENTS OF YOUR INSTALLATION DIRECTORY ON THE ADVICE OF SOME RANDOM WEB PAGE.
Ideally, don't ever do it at all. That content is not intended to be touched. Installers exist for a reason. If for some incredibly specific reason you feel the need to do this, make sure you have backups of everything and that you are 100% sure you can restore everything to its initial state if anything goes wrong.
The person writing that web page was incorrect. The installation error had nothing to do with the string.py file.
The actual cause of the problem is that the package is broken and cannot be installed properly on anyone else's machine. Again, this is no big loss as there is no use for the package anyway.
The reason it is broken is that the setup script for the package tries to import the code that's being installed, in order to get version and author information. This seems to work locally, but fails for everyone else.
To reiterate: the person writing that article wrote nonsense. (I'm not surprised; the page formatting is awful and the grammar isn't particularly great either.) Looking further, it appears that the entire website is authored by the one person, who is clearly just trying to self-promote (with a Youtube channel as well) while lacking the necessary expertise. Browsing around the rest of the site a bit, I see articles that are pedantic and not very insightful, and occasionally inaccurate - but all very SEO optimized.
I recommend ignoring that website entirely.
To reiterate: the string module comes with your Python. You cannot reinstall it with pip - not with the strings package you found, nor any other package. Your options are:
Find the correct string.py contents (possibly from a backup, assuming you thought to make a backup before deleting something from an installation directory) and restore them. This is the official repository for the reference implementation of Python. You might be able to find it in there somewhere. I don't recommend trying. There is a lot to go through and it is possible to damage things further.
Reinstall Python completely.

The error that you are getting is because you deleted the string.py file. There is no string module to be imported from PyPi. This is why pip install string doesn't work. Restore the deleted file as it is not the cause of your problem.
If you try to install Strings library it will fail because you are using Python 3.10 and the Strings library that you want to import and install is quite old and not supported for this Python version. Therefore you get the ModuleNotFoundError: No module named 'strings' that you see in the link that you attached.
For the setup.py file from the library I see that it recommends using Python 3.3, so I will recommend you to use that version of Python if you want to use this specific library.
Please note that the link that you provided is probably in case that you have created your own string.py file.

I solved this question by deleting any __pycache__ folder in the project directory.

Python: Multiple packages in one repository or one package per repository?

I have a big Python 3.7+ project and I am currently in the process of splitting it into multiple packages that can be installed separately. My initial thought was to have a single Git repository with multiple packages, each with its own setup.py. However, while doing some research on Google, I found people suggesting one repository per package: (e.g., Python - setuptools - working on two dependent packages (in a single repo?)). However, nobody provides a good explanation as to why they prefer such structure.
So, my question are the following:
What are the implications of having multiple packages (each with its own setup.py) on the same GitHub repo?
Am I going to face issues with such a setup?
Are the common Python tools (documentation generators, pypi packaging, etc) compatible with with such a setup?
Is there a good reason to prefer one setup over the other?
Please keep in mind that this is not an opinion-based question. I want to know if there are any technical issues or problems with any of the two approaches.
Also, I am aware (and please correct me if I am wrong) that setuptools now allow to install dependencies from GitHub repos, even if the GitHub URL of the setup.py is not at the root of the repository.

One aspect is covered here
https://pip.readthedocs.io/en/stable/reference/pip_install/#vcs-support
In particular, if setup.py is not in the root directory you have to specify the subdirectory where to find setup.py in the pip install command.
So if your repository layout is:
pkg_dir/
setup.py # setup.py for package pkg
some_module.py
other_dir/
some_file
some_other_file
You’ll need to use pip install -e vcs+protocol://repo_url/#egg=pkg&subdirectory=pkg_dir.

"Best" approach? That's a matter of opinion, which is not the domain of SO. But here are a couple of justifications for creating separate packages:
Package is functionally independent of the other packages in your project.
That is, doesn't import from them and performs a function that could be useful to other developers. Extra points if the function this package performs is similar to packages already in PyPI.
Extra points if the package has a stable API and clear documentation. Penalty points if package is a thin grab bag of unrelated functions that you factored out of multiple packages for ease of maintenance, but the functions don't have an unifying principle.
The package is optional with respect to your main project, so there'd be cases where users could reasonably choose to skip installing it.
Perhaps one package is a "client" and the other is the "server". Or perhaps the package provides OS-specific capabilities.
Note that a package like this is not functionally independent of the main project and so does not qualify under the previous bullet point, but this would still be a good reason to separate it.
I agree with #boriska's point that the "single package" project structure is a maintenance convenience well worth striving for. But not (and this is just my opinion, I'm going to get downvoted for expressing it) at the expense of cluttering up the public package index with a large number of small packages that are never installed separately.

I am researching the same issue myself. PyPa documentation recommends the layout described in 'native' subdirectory of: https://github.com/pypa/sample-namespace-packages
I find the single package structure described below, very useful, see the discussion around testing the 'installed' version.
https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
I think this can be extended to multiple packages. Will post as I learn more.

The major problem I've with faced when splitting two interdependent packages into two repos came from CI and testing. Specifically branch protections.
Say you have package A and package B and you make some (breaking) changes in both. The automated tests for package A fail because they use the main branch of B (which is no longer compatible with the new version of A) so you can't merge B. And the same problem the other way around.
tldr:
After breaking changees automated tests on merge will fail because they use the main branch of the other repo. Making it impossible to merge.

How to find out what a python package does

EDIT I was being stupid. Just type help('package_name'.'pyb_name') which worked.
I would like to find out what is actually in a python package I have locally downloaded and installed with pip.
Typing help(package_name) just lists NAME, FILE (where the init.py is) and PACKAGE CONTENTS which is just one .pyd file.
I can't open the .pyd file to check what's inside(tbh not all that familiar with .pyds). These two with a 159byte init.pyc are the only files in the package.
I need to use this (not widely available) package for some university work.
Thanks.

You can't know what a python package does unless it is stated in its docs (on PyPI or in the repository) or without reading the code. A Python package can be anything that has a setup.py and either a single module or multiple files under a folder with a __init__.py file in it.
The fact that the __init__.py is empty doesn't mean anything other than the fact that its existence means there's a python package involved.
Any specific package you want to know about, you should look up for documentation or read the code to get a sense of its purpose.

Load text file in python module after installation using pip/other installer

My goal is to make a program I've written easily accessible to potential employers/etc. in order to... showcase my skills.. or whatever. I am not a computer scientist, and I've never written a python module meant for installation before, so I'm new to this aspect.
I've written a machine learning algorithm, and fit parameters to data that I have locally. I would like to distribute the algorithm with "default" parameters, so that the downloader can use it "out of the box" for classification without having a training set. I've written methods which save the parameters to/load the parameters from text files, which I've confirmed work on my platform. I could simply ask users to download the files I've mentioned seperately and use the loadParameters method I've created to manually load the parameters, but I would like to make the installation process as easy as possible for people who may be evaluating me.
What I'm not sure is how to package the text files in such a way that they can automatically be loaded in the __init__ method of the object I have.
I have put the algorithm and files on github here, and written a setup.py script so that it can be downloaded from github using pip like this:
pip install --upgrade https://github.com/NathanWycoff/SySE/tarball/master
However, this doesn't seem to install the text files containing the data I need, only the __init__.py python file containing my code.
So I guess the question boils down to: How do I force pip to download additional files aside from just the module in __init__.py? Or, is there a better way to load default parameters?

Yes, there is a better way, how you can distribute data files with python package.
First of all, read something about proper python package structure. For instance, it's not recommended to put a code into __init__ files. They're just marking that a directory is a python package, plus you can do some import statements there. So, it's better, if you put your SySE class to (for instance) file syse.py in that directory and in __init__.py you can from .syse import SySE.
To the data files. By default, setuptools will distribute only *.py and several other special files (README, LICENCE and so on). However, you can tell to setuptools that you want distribute some other files with the package. Use setup's kwarg package_data, more about that here. Also don't forget to include all you data file into MANIFEST.in, more on that here.
If you do all the above correctly, than you can use package pkg_resources to discover your data files on runtime. pkg_resources handles all possible situations - your package can be distributed in several ways, it can be installed from pip server, it can be installed from wheel, as egg,...more on that here.
Lastly, if you package is public, I can only recommend to upload it on pypi (in case it is not public, you can run your own pip server). Register there and upload your package. You could than do only pip install syse to install it from anywhere. It's quite likely the best way, how to distribute your package.
It's quite a lot work and reading but I'm pretty sure you will benefit from it.
Hope this help.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.