Python: import and change canonical names in the current module - python

In a Python package directory of my own creation, I have an __init__.py file that says:
from _foo import *
In the same directory there is a _foomodule.so which is loaded by the above. The shared library is implemented in C++ (using Boost Python). This lets me say:
import foo
print foo.MyCppClass
This works, but with a quirk: the class is known to Python by the full package path, which makes it print this:
foo._foo.MyCppClass
So while MyCppClass exists as an alias in foo, foo.MyCppClass is not its canonical name. In addition to being a bit ugly, this also makes help() a bit lame: help(foo) will say that foo contains a module _foo, and only if you say help(foo._foo) do you get the documentation for MyCppClass.
Is there something I can do differently in __init__.py or otherwise to make it so Python sees foo.MyCppClass as the canonical name?
I'm using Python 2.7; it would be great if the solution worked on 2.6 as well.

I had the same problem. You can change the module name in your Boost.Python definition:
BOOST_PYTHON_MODULE(_foo)
{
scope().attr("__name__") = "foo";
...
}
The help issue is a separate problem. I think you need to add each item to __all__ to get it exported to help.
When I do both of these, the name of foo.MyCppClass is just that -- foo.MyCppClass -- and help(foo) gives documentation for MyCppClass.

You can solve the help() problem by adding the line
__all__ = ['MyCppClass']
to your __init__.py file.

Related

How are functions in csv module defined? [duplicate]

I am trying to understand the lower level implementations of python 3. There is one module named _posixsubprocess used by the subprocess module. I tried to find the location of this module in my system and found that it's a stub file.
Could someone guide me as I have no idea about what are the stub files and how are they implemented at the lower level?
_posixsubprocess
The file you are referencing is a Python module written in C. It's not a "stub" file. The real implementation can be found in the stdlib at Modules/_posixsubprocess.c. You can see how writing a C/C++ extension is written by having a look at Building C and C++ Extensions. This should help you understanding the code in _posixsubprocess.c.
In order to add type-hints to that file (which is an "Extension Module" as it is written in C), the type hints are added to a "stub" file with the extension .pyi.
That file can be found in the typeshed which is a collection of stub files. The typeshed also contains stubs for third-party modules which is a historical remnant. That is no longer needed since PEP-561 has been adopted.
Concerning stub/pyi files
Stub files contain type-hinting information of normal Python modules. The full official documentation can be found in the section about stub-files in PEP-484.
For example, if you have a Python module mymodule.py like this:
def myfunction(name):
return "Hello " + name
Then you can add type-hints via a stub-file mymodule.pyi. Note that here the ellipsis (...) is part of the syntax, so the code-block below really shows the complete file contents:
def myfunction(name: str) -> str: ...
They look very similar to C header files in that they contain only the function signatures, but their use is purely optional.
You can also add type hints directly in the .py module like the following:
def myfunction(name: str) -> str:
return "Hello " + name
But there are some cases where you want to keep them separate in stubs:
You want to keep your code Python 2 compatible and don't like the # type: ... comment syntax
You use function annotations for something else but still want to use type-hints
You are adding type-hints into an existing code-base and want to keep code-churn in existing files minimal

Why does __all__ work differently in packages than in modules?

The Python documentation for the import statement (link) contains the following:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module.
The Python documentation for modules (link) contains what is seemingly a contradictory statement:
if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.
It then gives an example where an __init__.py file imports nothing, and simply defines __all__ to be some of the names of modules in that package.
I have tested both ways of using __all__, and both seem to work; indeed one can mix and match within the same __all__ value.
For example, consider the directory structure
foopkg/
__init__.py
foo.py
Where __init__.py contains
# Note no imports
def bar():
print("BAR")
__all__ = ["bar", "foo"]
NOTE: I know one shouldn't define functions in an __init__.py file. I'm just doing it to illustrate that the same __all__ can export both names that do exist in the current namespace, and those which do not.
The following code runs, seemingly auto-importing the foo module:
>>> from foopkg import *
>>> dir()
[..., 'bar', 'foo']
Why does the __all__ attribute have this strange double-behaviour?
The docs seem really unclear on how it is supposed to be used, only mentioning one of its two sides in each place I linked. I understand the overall purpose is to explicitly set the names imported by a wildcard import, but am confused by the additional, seemingly auto-importing behaviour. Is this just a magic shortcut that avoids having to write the import out as well?
The documentation is a bit hard to parse because it does not mention that packages generally also have the behavior of modules, including their __all__ attribute. The behavior of packages is necessarily a superset of the behavior of modules, because packages, unlike modules, can have sub-packages and sub-modules. Behaviors not related to that feature are identical between the two as far as the end-user is concerned.
The python docs can be minimalistic at times. They did not bother to mention that
Package __init__ performs all the module-like code for a package, including support for star-import for direct attributes via __all__, just like a module does.
Modules support all the features of a package __init__.py, except that they can't have a sub-package or sub-module.
It goes without saying that to make a name refer to a sub-module, it has to be imported, hence the apparent, but not really double-standard.
Update: How from M import * actually works?
The __all__ in __init__.py of folder foopkg works the same way as __all__ in foopkg.py
Why it'll auto-import foo you can see here: https://stackoverflow.com/a/54799108/12565014
The most import thing is to look at the cpython implementation: https://github.com/python/cpython/blob/fee552669f21ca294f57fe0df826945edc779090/Python/ceval.c#L5152
It basically loop through __all__ and try to import each element in __all__
That's why it'll auto-import foo and also achieve white listing

Insert Python Packages from Separate Directory into a different Namespace

Suppose in my Python path, I had the namespace foo. I have modules in a separate directory (not in the python path) called bar: x.py, y.py, z.py. So the layout might look something like this:
|--/python/path/site-packages/foo/
|----__init.py__
|--...
|--/some/other/directory/bar/
|----__init__.py
|----x.py
|----y.py
|----z.py
So, given that foo is already in my path, I can easily do import foo. However, is there any sort of black magic I can add to that foo/__init__.py so that in my Python shell, I can start doing something like from foo import x or from foo.x import my_function? Ideally looking for a solution that works on both Python 2.7 and Python 3.6, but that isn't strict.
EDIT: I wanted to add that bar/ could also have sub-folders or sub-packages, in the ideal scenario.
Forgot that I had asked this question here, but, in case anyone else ends up here, this is what I ended up doing.
# /python/path/site-packages/foo/__init__.py
__path__.append("/some/other/directory/bar/")
The __path__ for a particular namespace tells Python which directories that namespace should look at.

Using Python3 C API to add to builtins

I'm looking to use the Python3 C API to add a builtin function. I'm doing this merely as an exercise to help me familiarize myself with the Python C API. The answer to this question does a pretty good job of explaining why one might not want to do this. Regardless, I want to add a function foo to the Python builtins module.
Here's what I've done so far (foo.c):
#include <Python.h>
#include <stdio.h>
static PyObject*
foo(PyObject *self, PyObject *args){
printf("foo called");
return Py_None;
}
char builtin_name[] = "builtins";
char foo_name[] = "foo";
char foo_doc[] = "foo function";
static PyMethodDef foo_method = {foo_name, foo, METH_NOARGS, foo_doc};
PyMODINIT_FUNC
PyInit_foo(void){
PyObject *builtin_module = PyImport_ImportModule(builtin_name);
PyModule_AddFunctions(builtin_module, &foo_method);
return builtin_module;
}
I'm placing this in the Modules/ directory in the Python source directory.
Just because you put it in the Modules/ folder and use the Python-C-API doesn't mean it will be compiled and executed automagically. After you compiled your foo.c to a Python extension module (you did, right?) your code is (roughly) equivalent to:
foo.py
def foo():
"""foo function"""
print("foo called")
import builtins
builtins.foo = foo
What isn't that straightforward in the Python implementation is the fact that when you import foo it won't return your foo module but builtins. But I would say that's not a good idea at all, especially since the builtin function you want to add has the same name as the module you created, so it's likely that by import foo you actually overwrite the manually added builtins.foo again...
Aside from that: Just by putting it in the Modules/ folder doesn't mean it's actually imported when you start Python. You either need to use import foo yourself or modify your Python startup to import it.
Okay, all that aside you should ask yourself the following questions:
Do you want to compile your own Python? If yes, then you can simply edit the bltinsmodule.c in the Python/ folder and then compile Python completely.
Do you even want to compile anything at all but not the complete Python? If yes, then just created your own extension module (essentially like you did already) but don't put it in the Modules/ folder of Python but really create a package (complete with setup.py and so on) and don't return the builtins module inside the module-init. Just create an empty module and return it after you added foo to the builtins module. And use a different module name, maybe _foo so it doesn't collide with the newly added builtins.foo function.
Is the Python-C-API and an extension module the right way in this case? If you thought the Python-C-API would make it easier to add to the builtins then that's wrong. The Python-C-API just allows faster access and a bit more access to the Python functionality. There are only few things that you can do with the C-API that you cannot do with normal Python modules if all you want is to do Python stuff (and not interface to a C library). I would say that for your use-case creating an extension module is total overkill, so maybe just use a normal Python module instead.
My suggestion would be to use the foo.py I mentioned above and let Python import it on startup. For that you put the foo.py file (I really suggest you change the name to something like _foo.py) in the directory where the additional packages are installed (site-packages on windows) and use PYTHONSTARTUP (or another approach to customize the startup) to import that module on Python startup.

Importing Class in Python Subpackage imports more than requested

Overview
I'm running some scientific simulations and I want to process the resulting data in Python. The simulation produces a custom data type that is not used outside of the chain of programs that the authors of the simulation produced, so unfortunately I need what they provide me.
They want me to install two files:
A module called sdds.py that defines a class that provides all user functions and two demos
A compiled module called sddsdatamodule.so that only provides helper functions to sdds.py.
(I find it strange that they're offering me two modules that are so inextricably connected, it doesn't seem like good coding practice to me, but using their code is probably better than rewriting things from scratch.) I'd prefer not to install them directly into my path, side by side. They come from the same company, they're designed to do one specific task together: access and manipulate SDDS-type files.
So I thought I would put them in a package. I could install that on my path, it would be self-contained, and I could easily find and uninstall or upgrade the modules from one location. Then I could hide their un-Pythonic solution in a more-Pythonic package without significantly rewriting things. Seems elegant.
Details
The package I actually use is found here:
http://www.aps.anl.gov/Accelerator_Systems_Division/Accelerator_Operations_Physics/software.shtml#PythonBinaries
Unfortunately, they only support Windows and Mac OS X right now. Compiling the source code is quite onerous, and apparently they have no significant requests for Linux/Unix. I have a Mac, so thankfully this isn't a problem for me.
So my directory tree looks like this:
SDDSPython/ My toplevel package
__init__.py Designed to only import the SDDS class
sdds.py Defines SDDS class and two demo methods
sddsdatamodule.so Defines sddsdata module used by SDDS class.
My __init__.py file literally only contains this:
from sdds import SDDS
The sdds.py file contains the class definition and the two demo definitions. The only other code in the sdds.py file is:
import sddsdata, sys, time
class SDDS:
(lots of code here)
def demo(output):
(lots of code here)
def demo2(output):
(lots of code here)
I can then import SDDSPython and check, using dir:
>>> import SDDSPython
>>> dir(SDDSPython)
['SDDS', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'sdds', 'sddsdata']
So I can now access the SDDS class via SDDSPython.SDDS
Question
How on earth did SDDSPython.sdds and SDDSPython.sddsdata get loaded into the SDDSPython namespace??
>>> SDDSPython.sdds
<module 'SDDSPython.sdds' from 'SDDSPython/sdds.pyc'>
>>> SDDSPython.sddsdata
<module 'SDDSPython.sddsdata' from 'SDDSPython/sddsdatamodule.so'>
I thought by creating an __init__.py file I was specifically excluding the sdds and sddsdata modules from being loaded into the SDDSPython namespace. What is going on? I can only assume this is happening due to something in the sddsdatamodule.so file? But how can a module affect its parent's namespace like that? I'm rather lost, and I don't know where to start. I've looked at the C code, but I don't see anything suspicious. To be fair- I probably don't know what something suspicious would look like, I'm probably not familiar enough with programming C extensions for Python.
Curious question--I did some investigation for you using a similar test case.
XML/
__init__.py -from indent import XMLIndentGenerator
indent.py -contains class XMLIndentGenerator, and Xml
Sink.py
It appears that importing a class from a module, even though you are importing just a portion, the entire module is accessible in the way you described, that is:
>>>import XML
>>>XML.indent
<module 'XML.indent' from 'XML\indent.py'>
>>>XML.indent.Xml #did not include this in the from
<class 'XML.indent.Xml'>
>>>XML.Sink
Traceback (most recent call last):
AttributeError:yadayada no attribute 'Sink'
This is expected, since I did not import Sink in __init__.py.....BUT!
I added a line to indent.py:
import Sink
class XMLIndentGenerator(XMLGenerator):
(code)
Now, since this class imports a module contained within the XML package, if i do:
>>>import XML
>>>XML.Sink
<module 'XML.Sink' from 'XML\Sink.pyc'>
So, it appears that because your imported sdds module also imports sddsdata, you are able to access it. That answers the "How" portion of your question, but "why" this is the case, I'm sure there's an answer somewhere in the docs :)
I hope this helps - I was literally doing this as I was typing the answer! A learning experience for me as well.
This happens because python imports don't work the way you might think. They work like this:
the import machinery looks for a file that should be the module requested from the import
a types.ModuleType instance is created, several attributes on it are set to the corresponding file (__file__, __name__ and so on), and that object is inserted into sys.modules under the fully qualified module name it would have.
if this is a submodule import (ie, sdds.py which is a submodule in SDDSPython), the newly created module is attached as an attribute to the existing python module of the parent package.
the file is "executed" with that module as its global scope; all names defined by that file appear as attributes of the module.
in the case of a from import, an attribute from the module may be returned to the importing script.
So that means if I import a module (say, foo.py) that has, as its source only:
import bar
then there is a global in foo, called bar, and I can access it as foo.bar.
There is no capacity in python for "only execute the part of this python script i want to use right now." The whole thing runs.

Categories