Modular Compiler in Python

Modular Compiler in Python - python

I am writing a compiler in Python, using the PLY (Python Lex-Yacc) library to 'compile' the compiler. The compiler has to go through a lot of rules (the
number of just the core rules is eventually going to be a little less than a hundred, and they can be extended). So to keep the different types of rules separate, I made many Python modules in a single modules directory.
To include all the rules, I don't have to include the modules in this directory, but I have to include the rules (implemented as Python functions) into the current namespace. Once they simply exist there, the compiler's input will be properly tokenized, parsed, etc.
Here's what I've read about and tried:
using __import__, getattr, and sys.modules (very raw and in general not preferred)
the importlib library (how do I get everything inside the module?)
a lot of fiddling with __init__.py and just trying to from modules import * which will import everything in the modules as well
But none of these seem entirely satisfactory to me. I can't do precisely what I want to do with any of them. So my question is: how can I import some of the attributes of a Python module in a subdirectory into the running namespace of a top-level module?
Thanks for your attention!

You want to use an existing plugin library like stevedore. It will give you the tools to enumerate files that can be imported, and tools to import those modules.

Related

How to read source file for the math library? [duplicate]

I wanted to try and look up the source of some of the modules in the Python standard library, but wasn't able to find them. I tried looking in the modules directory after downloading the python tarball, but it has mainly .c files. I also tried looking at the directory where the python that already comes with the OS (mac osx) has it's modules, and there it seems to have mainly .pyc and .pyo files. Would really appreciate it if someone can help me out.
(I tried what was suggested in the question How do I find the location of Python module sources? with no luck)

In cpython, many modules are implemented in C, and not in Python. You can find those in Modules/, whereas the pure Python ones reside in Lib/.
In some cases (for example the json module), the Python source code provides the module on its own and only uses the C module if it's available (to improve performance). For the remaining modules, you can have a look at PyPy's implementations.

The canonical repository for CPython is this Mercurial repository. There is also a git mirror on GitHub.

That would depend on what you define as Standard Library.
The Python Documentations says:
...this library reference manual describes the standard library that is
distributed with Python. It also describes some of the optional
components that are commonly included in Python distributions.
Python’s standard library is very extensive, offering a wide range of
facilities as indicated by the long table of contents listed below.
The library contains built-in modules (written in C) that provide
access to system functionality such as file I/O that would otherwise
be inaccessible to Python programmers, as well as modules written in
Python that provide standardized solutions for many problems that
occur in everyday programming. Some of these modules are explicitly
designed to encourage and enhance the portability of Python programs
by abstracting away platform-specifics into platform-neutral APIs.
If you take an extensive criteria, the Python Documentation explicitly answers what you're asking for, and I quote:
Exploring CPython’s Internals.
CPython Source Code Layout.
This guide gives an overview of CPython’s code structure. It serves as a summary of file locations for modules and builtins.
For Python modules, the typical layout is:
Lib/<module>.py
Modules/_<module>.c
Lib/test/test_<module>.py
Doc/library/<module>.rst
For extension-only modules, the typical layout is:
Modules/<module>module.c
Lib/test/test_<module>.py
Doc/library/<module>.rst
For builtin types, the typical layout is:
Objects/<builtin>object.c
Lib/test/test_<builtin>.py
Doc/library/stdtypes.rst
For builtin functions, the typical layout is:
Python/bltinmodule.c
Lib/test/test_builtin.py
Doc/library/functions.rst
Some exceptions:
builtin type int is at Objects/longobject.c
builtin type str is at Objects/unicodeobject.c
builtin module sys is at Python/sysmodule.c
builtin module marshal is at Python/marshal.c
Windows-only module winreg is at PC/winreg.c

You can get the source code of pure python modules that are part of the
standard library from the location where Python is installed.
For example at : C:\Python27\Lib (on windows) if you have
used Windows Installer for Python Installation.

Look it up under the Lib sub-directory of the Python installation directory.

The source code for many standard library packages is linked at the top of the package's documentation page in the library documentation, for example, the docs for the random module.
The original commit message for adding these links states
Provide links to Python source where the code is short, readable and
informative adjunct to the docs.

Can a Package containing several sub-packages be also called as a Library in Python?

I am a little bit confused in the difference between a package and a library. When I install packages from pypi.org, these packages contain several sub-packages, that contain modules. When I googled the difference between a package and I library, I found this.
And that being the case, can a package contain several sub-packages be also called as a library? If no then what is a library? And what is the difference between a library and a package containing sub-packages?

Library
Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module
A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect. Main modules you probably see most often are popular because they are special modules that can get info from other files/modules. It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package
This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters. These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.

python 3 import from subdir

My project has to be extensible, i have a lot of scripts with the same interface that lookup things online. Before i was using __import__ but that does not let me put my 'plugins' on a dedicated directory:
root/
main.py
plugins/
[...]
So my question is: Is there a way to individually import modules from that subdirectory? I'm guessing importlib, but i'm so lost in how the python module loading process works... What i want to do is something like this:
for pluginname in plugins:
plugin = somekindofimport("plugins/{name}".format(name=pluginname))
plugin.unififedinterface()
Also, as a side question, the way am i trying to achieve extensibility is a good way?
I'm on python3.3

Stop thinking in terms of pathnames and start thinking in terms of packages. Read Packages in the tutorial, and if you want more detail see The import system.
But the basic idea is this:
Create a file name plugins/__init__.py. It can be empty; that's enough to turn plugins into a package. Which means you can import modules from that package with:
import plugins.plugin
So, how do you do this dynamically? That's what importlib is for. (You can also use __import__ here, but it's less flexible, and less readable in non-trivial cases, so unless you need pre-3.3 compatibility, don't.)
plugin = importlib.import_module('plugins.{name}'.format(name=pluginname))
It would probably be cleaner to import plugins to get the package, and then use relative imports from within that package, as shown in the examples in the import_module docs.
This also means Python takes care of the .pyc creation and caching, etc.
And it means that you can later expand plugins to be a "namespace package", which can be split across multiple directories like /usr/share/myapp/plugins for stock plugins, /etc/myapp/plugins for site plugins and ~/myapp/plugins for user-specific plugins.
If you really, really want to import from a directory that isn't a package, you can create a module loader and use it, but that's a whole lot of work for no actual benefit. (It's actually not that hard in 3.3 (SourceLoader and friends will do most of the work for you), but you will find almost no examples out there to guide you; instead, you'll find examples of the 2.6-3.2 way, or the 2.0-2.5 way, both of which are hard.) Plus, it means that if someone creates a plugin named, say, gzip, you can end up blocking the stdlib gzip module with the plugin. (That's especially fun if the gzip plugin tries to use the gzip stdlib module, as it likely will…) If the plugin ends up being named plugins.gzip, there's no problem.
Also, as a side question, the way am i trying to achieve extensibility is a good way?
As long as you only want to support 3.3+, yes, I think this is a great solution.
Before 3.3, using a package for plugins was a lot more problematic. People have come up with a variety of different plugin systems—in one case going so far as to dynamically create module objects and execfile into them. If you need to deal with that, I would suggest looking at existing Python apps with plugins (e.g., MusicBrainz Picard) to get different ideas.

Finding out importable modules in Python

iPython has this excellent feature of being able to auto-complete modules and this works extremely well in there.
For the longest time, I've had a shell function that allows me to change directories to where a given module is located. It does this by trying to import the name of the module from the first argument and looking into its __file__.
What I would like to do is to find out the importable modules that are available so I can write an auto-complete function for this little helper.
I know that for a give module, I can do something like dir(module_name) and that would give me what I need for that module but I am not sure what to do to find out what I can import, just like iPython when you do something like:
import [TAB]
Or
import St[TAB]
Which would autocomplete all the importable modules that start with St.
I know how to write the auto-completion functionality, I am interesting in just finding the way of knowing what can I import.
EDIT:
Digging through the IPython code I managed to find the exact piece that does the work of getting all of the current modules for the given environment:
from IPython.core.completerlib import module_completion
for module in module_completion('import '):
print module
The reason I'm printing the results is because I am actually using this for ZSH completion so I will need to deal with the output.

As a starting point, you could iterate through the paths in sys.path looking for python modules, as that is exactly what import does. Other than that, Python won’t have an index or something for all modules, as the imports are resolved on run-time, when it’s requested. You could easily provide an index for the built-in modules though and maybe have a cache or something for previously imported modules.

python multiple imports for a common module

I am working on a project wherein I need to use a third party module in different project files(.py files). The situation is like this.
I have a file "abc.py" which imports third party module "common.py". There are couple of other files which also import "common.py". All these files are also imported in main project file "main.py".
It seems redundant to import same module in your project multiple times in different files since "main.py" is also importing all the project files.
I am also not sure how the size of the project gets affected by multiple import statements.
Can someone pls help me in making things bit simpler.

Importing only ever loads a module once. Any imports after that simply add it to the current namespace.
Just import things in the files you need them to be available and let Python do the heavy-lifting of figuring out loading the modules.

Yes, you are right, this behavior really exists in Python. Namely, if user code tries to import the same module in different ways, for example - import a and import A.a (where a.py file is located into A package and the first import is done from within the A package while the other import comes as from outside).
This can easily happen in real life, especially for multi-level packaged Python projects.
I have experienced a side-effect of such behavior, namely command isinstance does not work when an object is checked against a class that is defined in module that was imported in such way.
The solution I can think about is to redefine the __builtin__. __ import__ function to perform its work more intelligently.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.