Variable expansion in bitbake recipes - python

folks.
I've been studying yocto building process and I noticed the usage of the following structure:
PN = "${#bb.parse.vars_from_file(d.getVar('FILE', False),d)[0] or 'defaultpkgname'}"
I know that ${} means variable expansion and grep command showed that the function "vars_from_file" is located at bitbake/lib/bb/parse/__init__.py.
I would like to understand how this variable expansion works, so I explored bitbake files and found out that:
oe-init-build-env calls oe-buildenv-internal, and the last one sets PYTHONPATH to bitbake/lib.
I'm infering that bitbake uses PYTHONPATH to search for the function vars_from_file
What I have not understood are:
the meaning of the symbol "#" in the variable expansion;
if bitbake uses PYTHONPATH to search for the function, why did not pass the absolute path ${#bb.parse.__init__.vars_from_file(d.getVar('FILE', False),d)[0] or 'defaultpkgname'} instead of "${#bb.parse.vars_from_file(d.getVar('FILE', False),d)[0] or 'defaultpkgname'}"
Is this type of variable expansion applied only in bitbake?. I searched in gnu website, and there is not the application of "#" in the beginning of the structures.
Can someone help me understand them?

The # symbol, is used bit bitbake for inline python variable expansion, and it basically states that you'd be calling a function and expanding its result, usually assigning it to a variable, in your case:
PN = "${#bb.parse.vars_from_file(d.getVar('FILE', False),d)[0] or 'defaultpkgname'}"
Its assigning to PN (package name) the result of the function vars_from_file coming from the bitbake python module parse, located at bitbake/lib/bb/parse/
Like any other Python module, it automatically loads init.py and has access to those functions when imported.
Whats special about these functions is that they've got access to bitbake's data dictionary, called "d".
AFAIC this IS specific to bitbake.

Related

Python documentation input variable type hints

TLDR
I'm noticing a significant difference in the information presented by the official python docs compared to what I'm seeing in the PyCharm hover-over / quickdocs. I'm hoping someone can point me to where I can find the source of this quickdoc information such that I can use it outside of PyCharm as a general reference.
For example in the python docs for os.makedir I see:
os.makedirs(name, mode=0o777, exist_ok=False)
Recursive directory creation function. Like mkdir(), but makes all intermediate-level directories needed to contain the leaf directory.
The mode parameter is passed to mkdir() for creating the leaf directory; see the mkdir() description for how it is interpreted. To set the file permission bits of any newly created parent directories you can set the umask before invoking makedirs(). The file permission bits of existing parent directories are not changed.
If exist_ok is False (the default), an FileExistsError is raised if the target directory already exists.
Note
makedirs() will become confused if the path elements to create include pardir (eg. “..” on UNIX systems).
This function handles UNC paths correctly.
Raises an auditing event os.mkdir with arguments path, mode, dir_fd.
New in version 3.2: The exist_ok parameter.
Changed in version 3.4.1: Before Python 3.4.1, if exist_ok was True and the directory existed, makedirs() would still raise an error if mode did not match the mode of the existing directory. Since this behavior was impossible to implement safely, it was removed in Python 3.4.1. See bpo-21082.
Changed in version 3.6: Accepts a path-like object.
Changed in version 3.7: The mode argument no longer affects the file permission bits of newly created intermediate-level directories.
But in the quickdocs I see:
os def makedirs(name: str | bytes | PathLike[str] | PathLike[bytes],
mode: int = ...,
exist_ok: bool = ...) -> None
makedirs(name [, mode=0o777][, exist_ok=False]) Super-mkdir; create a leaf directory and all intermediate ones. Works like mkdir, except that any intermediate path segment (not just the rightmost) will be created if it does not exist. If the target directory already exists, raise an OSError if exist_ok is False. Otherwise no exception is raised. This is recursive.
Where is this quickdoc type hinting information coming from and where can I find a complete reference with all these type hints such that I can reference it outside of PyCharm?
Background
Coming mainly from a strongly typed language like Java, I struggle to make constructive use of the python documentation with regards to function input parameter types. I am hoping someone can elucidate a standard process for resolving ambiguity compared to my current trail+[lots of]errors approach.
For example, the os.makedir function's first parameter is name.
os.makedirs(name, mode=0o777, exist_ok=False)
It is not apparent to me what sorts of things I can pass as name here. Could it be:
A str? If so, how should I create this? Via a string literal, double quoted string? Does this accept / separators or \ separators or system dependent?
A pathlib.Path?
Anything pathlike?
[Note the above are rhetorical questions and not the focus of this post]. These are all informed guesses, but if I were completely new to python and was trying to use this documentation to make some directories, I see two options:
Read the source code via some IDE or other indexing
Guess until I get it right
The first is fine for easier to understand functions like makedirs but for more complicated functions this would require gaining expertise in a library that I don't necessarily want to reuse and just want to try out. I simply don't have enough time to become an expert in everything I encounter. This seems quite inefficient.
The second also seems to be quite inefficient, with the added demerit of not knowing how to write robust code to check for inappropriate inputs.
Now I don't want to bash the python docs, as they are LEAPS and BOUNDS better than a fair few other languages I've used, but is this dilemma just a case of unfinished/poor documentation, or is there a standard way of knowing/understanding what input parameters like name in this case should be that I haven't outlined above?
To be fair, this may not be the optimal example, as if you look towards the end of the doc for makedirs you can see it does state:
Changed in version 3.6: Accepts a path-like object.
but this is not specifically referring to name. Yes, in this example it may seem rather obvious it is referring to name, but with the advent of type-hinting, why are the docs not type hinted like the quickdocs from PyCharm? Is this something planned for the future, or is it too large a can of worms to try to hint all possibilities in a flexible language like python?
Just as a comparison, take a look at Java's java.io.file.mkdirs where the various constructors definitely tell you all the options for specifying the path of the file:
File(File parent, String child)
// Creates a new File instance from a parent abstract pathname and a child pathname string.
File(String pathname)
// Creates a new File instance by converting the given pathname string into an abstract pathname.
File(String parent, String child)
// Creates a new File instance from a parent pathname string and a child pathname string.
File(URI uri)
// Creates a new File instance by converting the given file: URI into an abstract pathname.
Just reading this I already know exactly how to make a File object and create directories without running/testing anything. With the quickdoc in PyCharm I can do the same, so where is this type hint information in the official docs?

How to share python classes up a directory tree?

I have an example file structure provided below.
/.git
/README.md
/project
/Operation A
generateinsights.py
insights.py
/Operation B
generatetargets.py
targets.py
generateinsights.py is run; it references insights.py to get the definition of an insight object. Next, generatetargets.py is run; it refrences targets.py to get the definition of a target object. The issue that I have, is generatetargets.py also needs to understand what an insight object is. How can I set up my imports so that insights.py and targets.py can be referenced by anything in the project directory? It seems like I should use _ init _.py for this, but I can't get it to work properly.
Firstly, you have to rename Operation A and Operation B so that they are composed of only letters, numbers and underscores, for example Operation_A - this is needed to be able to use these in an import statement without raising a SyntaxError.
Then, put an __init__.py file into the project, Operation_A and Operation_B folders. You can leave it empty, but you can also for example define additional attributes for your module.
Finally, you need to make Python find your modules - for this, either:
set your PYTHONPATH environment variable so that it includes the folder containing project or
put the package folder somewhere into Python's default import directories, for example in ´/usr/lib/python3/site-packages` (requires root permissions)
After that you can import both targets.py and insights.py from any place like this:
from project.Operation_A import insights
from project.Operation_B import targets

Loading vs linking in Cython modules

While exploring Cython compile steps, I found I need to link C libraries like math explicitly in setup.py. However, such step was not needed for numpy. Why so? Is numpy being imported through usual python import mechanism? If that is the case, we need not explicitly link any extension module in Cython?
I tried to rummage through the official documentation, but unfortunately there was no explanation as to when an explicit linking is required and when it will be dealt automatically.
Call of a cdef-function corresponds more or less just to a jump to an address in the memory - the one from which the command should be read/executed. The question is how this address is provided. There are some cases we need to consider:
A. inline functions
The code of those functions is either inlined or the definition of the function is in the same translation unit, thus the address is known to the linker at the link time (or even compiler at compile-time) - no need for additional libraries.
An example are header-only libraries.
Consequences: Only include path(s) should be provided in setup.py.
B. static linking
The definition/functionality we need is in another translation unit/library - the target-address of the jump is calculated at the link-time and cannot be changed anymore afterwards.
An example are additional c/cpp-files or static libraries which are added to extension-definition.
Consequences: Static library should be added to setup.py, i.e. library-path and library name along with include paths.
C. dynamic linking
The necessary functionality is provided in a shared object/dll. The address to jump to is calculated during the runtime from loader and can be replaced at program start by exchanging the loaded shared objects.
An example are stdlibc++ (usually added automatically by g++) or libm, which is not automatically added to linker command by gcc.
Consequences: Dynamic library should be added to setup.py, i.e. library-path and library name, maybe r-path + include paths. Shared object/dll must be provided at the run time. More (than one probably would like to know) information about Cython/Python using dynamic libraries can be found in this SO-post.
D. Calling via a pointer
Linker is needed only when we call a function via its name. If we call it via a function-pointer, we don't need a linker/loader because the address of the function is already known - the value in the function pointer.
Example: Cython-generated modules uses this machinery to enable access to its cdef-functions exported through pxd-file. It creates a data structure (which is stored as variable __pyx_capi__ in the module itself) of function-pointers, which is filled by the loader once the so/dll is loaded via ldopen (or whatever Windows' equivalent). The lookup in the dictionary happens only once when the module is loaded and the addresses of functions are cached, so the calls during the run time have almost no overhead.
We can inspect it, for example via
#foo.pyx:
cdef void doit():
print("doit")
#foo.pxd
cdef void doit()
>>> cythonize -3 -i foo.pyx
>>> python -c "import foo; print(foo.__pyx_capi__)"
{'doit': <capsule object "void (void)" at 0x7f7b10bb16c0>}
Now, calling a cdef function from another module is just jumping to the corresponding address.
Consequences: We need to cimport the needed funcionality.
Numpy is a little bit more complicated as it uses a sophisticated combination of A and D in order to postpone the resolution of symbols until the run time, thus not needing shared-object/dlls at link time (but at run time!).
Some functionality in numpy-pxd file can be directly used because they are inlined (or even just defines), for example PyArray_NDIM, basically everything from ndarraytypes.h. This is the reason one can use cython's ndarrays without much ado.
Other functionality (basically everything from ndarrayobject.h) cannot be accessed without calling np.import_array() in an initialization step, for example PyArray_FromAny. Why?
The answer is in the header __multiarray_api.h which is included in ndarrayobject.h, but cannot be found in the git-repository as it is generated during the installation, where the definition of PyArray_FromAny can be looked up:
...
static void **PyArray_API=NULL; //usually...
...
#define PyArray_CheckFromAny \
(*(PyObject * (*)(PyObject *, PyArray_Descr *, int, int, int, PyObject *)) \
PyArray_API[108])
...
PyArray_CheckFromAny isn't a name of a function, but a define fo a function pointer saved in PyArray_API, which is not initialized (i.e. is NULL), when module is first loaded! Btw, there is also a (private) function called PyArray_CheckFromAny, which is what the function pointer actually points to - and because the public version is a define there is no name collision when linked...
The last piece of the puzzle - the function _import_array (more or less the working horse behind np.import_array) is an inline function (case A), so only include path is needed, to be able to use it.
_import_array uses a similar approach to Cython's __pyx_capi__ to get the function pointers: The field is called _ARRAY_API and can be inspected via:
>>> import numpy.core._multiarray_umath as macore
>>> macore._ARRAY_API
<capsule object NULL at 0x7f17d85f3810>
More info about how PyArray_API can be initialized can be found in this SO-answer of mine.
However, when using functionality from numpy/math.pxd, one has to staticly link numpy's math-library (see for example this SO-question).

How to call a procedure inside of another procedure

I'm working on creating a large .py file that can be imported and used to solve mathematical formulas. I'd like to store the formulas in a procedure that is called input1_input2_input3(): for example the formual distance=speed*time is called dis_spe_tim().
The code so far is:
def dis_spe_tim():
def distance(speed, time):
result = speed*time
unit = input("What unit are you measuring the distance in?")
print(resule,unit)
def speed():
print("speed")
and I would ideally like the user to use this like so:
import equations #name of the .py file
from equations import *
dis_spe_tim.distance(1,2)
Unfortunately, this is my first time ever doing something like this so I have absolutely no idea how to go about calling the procedure inside of the procedure and providing its arguments.
Short answer: you can't. Nested functions are local to the function they're defined in and only exists during the outer function's execution (def is an executable statement that, at runtime, creates a function object and bind it to it's name in the enclosing namespace).
The canonical python solution is to use modules as namespaces (well, Python modules ARE, mainly, namespaces), ie have a distinct module for each "formula", and define the functions at the module's top-level:
# dis_spe_tim.py
def distance(speed, time):
# code here
def speed():
# code here
Then put all those modules in an equations package (mostly: a folder containing modules and an __init__.py file). Then you can do:
from equations import dis_spe_tim
dis_spe_tim.distance(1,2)
You can check the doc for more on modules and packages here: https://docs.python.org/3/tutorial/modules.html#packages
Also note that
1/ "star imports" (also named "wildcard imports"), ie from somemodule import *, are highly discouraged as they tend to make the code harder to read and maintain and can cause unexpected (and sometimes subtles enough to be hard to spot) breakages.
2/ you shouldn't mix "domain" code (code that do effective computations) with UI code (code that communicates with the user), so any call to input(), print() etc should be outside the "domain" code. This is key to make your domain code usable with different UIs (command-line, text-based (curse etc), GUI, web, whatever), but also, quite simply, to make sure your domain code is easily testable in isolation (unit testing...).

Python - Should I alias imports with underscores?

This is a conceptual question rather than an actual problem, I wanted to ask the great big Internet crowd for feedback.
We all know imported modules end up in the namespace of that module:
# Module a:
import b
__all__ = ['f']
f = lambda: None
That allows you to do this:
import a
a.b # <- Valid attribute
Sometimes that's great, but most imports are side effects of the feature your module provides. In the example above I don't mean to expose b as a valid interface for callers of a.
To counteract that we could do:
import b as _b
This marks the import as private. But I can't find that practice described anywhere nor does PEP8 talk about using aliasing to mark imports as private. So I take it it's not common practice. But from a certain angle I'd say it's definitely semantically clearer, because it cleans up the exposed bits of your module leaving only the relevant interfaces you actually mean to expose. Working with an IDE with autocomplete it makes the suggested list much slimmer.
My question boils down to if you've seen that pattern in use? Does it have a name? What arguments would go against using it?
I have not had success using the the __all__ functionality to hide the b import. I'm using PyCharm and do not see the autocomplete list change.
E.g. from some module I can do:
import a
And the autocomplete box show both b and f.
While Martijn Pieters says that no one actually uses underscore-hiding module imports, that's not exactly true. The traces of this technique can be easily seen in the Python's standard library itself (see a related question). Let's check it:
$ git clone --depth 1 git#github.com:python/cpython.git
$ cd cpython/Lib
$ find -iname '*.py' | xargs grep 'as \+_' | wc -l
183
$ find -iname '*.py' | xargs grep '^import' | wc -l
4578
So, about 4% of all imports are underscore-prefixed — not a majority, but yet far from “no one”. There also are some examples in numpy and matplotlib packages.
For me, this import-underscoring is the only right way to import module without exposing it at public. Unfortunately, it totally ruins code appearance, so many developers avoid using it. But it has some advantages over the __all__ approach:
Library user can decide whether a name is private or not without consulting documentation by just looking at the name. Looking to just __all__ is not enough to tell private from public as some public names may be not listed there.
No need to maintain a refactoring-unfriendly list of code entity names.
To the conclusion, both _name and __all__ are just plain evil, but the thing which actually needs fixing is the Python's module system, designed under an impression of “simple is better than complex” mantra. Compare to, for example, the way how modules behave in Haskell.
UPD:
It looks like PEP-8 has already answered this question in its “Public and internal-interfaces” section:
Even with __all__ set appropriately, internal interfaces (packages, modules, classes, functions, attributes or other names) should still be prefixed with a single leading underscore.
No one uses that pattern, and it is not named.
That's because the proper method to use is to explicitly mark your exported names with the __all__ variable. IDEs will honour this variable, as do tools like help().
Quoting the import statement documentation:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in __all__ are all considered public and are required to exist. If __all__ is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character ('_'). __all__ should contain the entire public API. It is intended to avoid accidentally exporting items that are not part of the API (such as library modules which were imported and used within the module).
(Emphasis mine).
Also see Can someone explain __all__ in Python?

Categories