How should a Python module use code generation?

How should a Python module use code generation? - python

I have a Python module that is built around a native extension written in C. This extension includes code generated using the GNU Bison and (not GNU) Flex tools. That means the build process for my C extension involves calling these tools and then including their outputs (C source files) in the extension sources.
To get this to work when calling python setup.py install, I extended the setuptools.command.build_ext class to call both Flex and Bison and then add the generated source to the Extension source before calling the super class run method.
This means my setup.py looks like:
import os
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext
c_extension = Extension('_mymod',
include_dirs = ['inc'],
sources = [
os.path.join('src', 'lib.c'),
os.path.join('src', 'etc.c')
])
class MyBuild(build_ext):
def run(self):
parser_dir = os.path.join(self.build_temp, 'parser')
# add the parser directory to include_dirs
self.include_dirs.append(parser_dir)
# add the source files to the sources
self.extensions[0].sources.extend([os.path.join(parser_dir, 'lex.yy.c'), os.path.join(parser_dir, 'parse.tab.c')])
# honor the --dry-run flag
if not self.dry_run:
self.mkpath(parser_dir)
os.system('flex -o ' + os.path.join(parser_dir, 'lex.yy.c') + ' ' + os.path.join('src', 'lex.l'))
os.system('bison -d -o ' + os.path.join(parser_dir, 'parse.tab.c') + ' ' + os.path.join('src', 'parse.y'))
# call the super class method
return build_ext.run(self)
setup (name = 'MyMod',
version = '0.1',
description = 'A module that uses external code generation tools',
author = 'Sean Kauffman',
packages = ['MyMod'],
ext_modules = [c_extension],
cmdclass={'build_ext': MyBuild},
python_requires='>=3',
zip_safe=False)
Now, however, I am trying to package this module for distribution and I have a problem. Either users who want to install my package need Bison and Flex installed, or I need to run these tools when I build the source distribution. I see two possible solutions:
I validate the flex and bison are in the system execution PATH. This keeps the custom builder as-is. I have found no documentation that implies I can validate that system files exist like bison and flex. The closest is using the libraries field of the Extension, but it seems I would need some real hackery to check the entire PATH for executables. I haven't tried this yet because my first choice would be option 2.
I move code generation to occur when the source distribution is created. This means the source distribution will contain the output files from bison and flex so people installing the package don't need these tools. This seems like the cleaner option. I have tried extending the sdist command instead of build_ext like above, but it isn't clear how I can add the generated files to the MANIFEST so they are included. Furthermore, I want to ensure that it still works to build using python setup.py install, but I don't think this command will run sdist before building.
It's fine for any solution to only work on Linux and OS X.

The usual solution for distributing code requiring (f)lex and bison/yacc is to bundle the generated scanner and parser, but be prepared to generate them if they are not present. The second part makes development a little easier and also gives people the option of using their own flex/bison version if they feel they have a good reason to do so. I suppose this advice would also apply to Python modules.
(IANAL but my understanding is that there is a licence exception for the code generated by bison, making it possible to distribute even in non-GPL projects. Flex is not GPL to start with, and afaik there are no distribution restrictions.)
To conditionally build the scanner and parser in a source distribution, you could use the code you have already provided, after verifying that the generated files don't exist. (Ideally, you would check that the generated files don't exist or are newer than the respective source file. That depends on the file dates not being altered on their voyage through an archive. That will work fine on Linux and OS X but it might not be completely portable.)
The assumption is that the package is built before executing the sdist command. sdist should normally exclude object files built in the source tree, so it shouldn't be necessary to manually clean the source. However, if you wanted to ensure that the generated files were present when you execute sdist, you could override it in your setup.py the same way you override build_ext, invoking bison and flex prior to calling the base sdist command.

Related

CMake C++ library includes toolchain name

I am building a Python extension in C++ using pybind11 and scikit-build. I base on the example provided at https://github.com/pybind/scikit_build_example/blob/master/setup.py.
My CMakelists boils down to this:
pybind11_add_module(_mylib MODULE ${SOURCE_FILES})
install(TARGETS _mylib DESTINATION .)
setup.py:
setup(
name="mylib",
version="0.0",
packages=['mylib'],
cmake_install_dir="mylib",
)
And on the python side I have in mylib/__init__.py:
from _mylib import *
This all works great. I can install the package with pip and importing mylib successfully imports the library by proxy. This proxy is necessary because I do more in the library than just the C++ library.
Except there is one problem. The name of the built library includes the tools chain. For my system it looks like _mylib.cpython-38-x86_64-linux-gnu.so where I expect _mylib.so. __init__.py cannot find library unless either I rename it manually on the python side or I change the so name.
How can I resolve this issue?

Conclusion: as Alex said this part of the name is necessary. See https://www.python.org/dev/peps/pep-3149/. Python will automatically figure out it can use _mylib.cpython-38-x86_64-linux-gnu.so if you import _mylib.

Is there a portable way to provide localization of a package distributed on PyPI?

Context:
This is kind of a followup to another question of mine.
I would like to provide localized versions of a package. Following the Python documentation, I have extracted a .pot file with pygettext, prepared a translation in a .po file, compiled it in a .mo file.
Everything is fine till there, and my package displays the translated messages.
But my final goal would be to make it available on PyPI. So I have done some research and found:
setuptools documentation: not even one single word about localization...
The Format of GNU MO Files
It explains that the format depends on the endianness of the platform where the file was generated. My understanding is that only the po files are portable...
What is the correct way to include localisation in python packages?
The answer is fully relevant and speaks of the setuptools/babel integration but:
that integration only allows to build mo file and does not speak of their distribution
author describes how they use it, no references for portability across systems
Babel: compile translation files when calling setup.py install
An interesting way, even if it requires the babel module on the target platform. Not so heavy but way heavier than my own package... In fact, the distributions contain only po files and they are compiled with babel at install time.
Question:
Is there a way to build platform specific wheels containing compiled mo files?
If not I will have to require babel on target and try to find my way through mo compilation at install time.

EDIT 7/12/2018:
After some work, I could build a specific package based on what is below in this answer. It can be used from other projects to automatically compile po files at build time through the magic of setuptools enty_points. It is now available on GitHUB (https://github.com/s-ball/mo_installer) and distributed on PyPI (https://pypi.org/project/mo_installer)
The researches that I did before asking the question gave me enough hints to reach a possible solution.
I can now say, thay is is possible to include a platform specific mo file in a wheel - unfortunately in my current solution the wheel gives no indication that it is platform specific. But the same solution allows to build a source distribution that build the mo file on the target platform.
Now for the details:
the tools needed to compile a mo file on the target:
Most solutions picked from Google or SO rely either on Babel or on the GNU gettext msgfmt program. But cPython tools include a pure Python module msgfmt.py that is enough here. Unfortunately, this tool is often not installed by default in many Linux/Unix-like. My solution just includes a copy of that module (a mere 7k file) for 3.7.1 version. It looks like a very stable code (few changes in recent years) and it should work for any Python >= 3.3
the setuptools integration
The magic of setuptools is that the same build subcommand is internally used to build a binary wheel, to install with pip from a source package or to directly install with python setup.py install from a copy (git clone) of the full source package. So I provide a build subclass in setup.py that generates the .mo files with their full path before calling the superclass method. I also use a MANIFEST.in file to list the files that should be copied in a source distribution and a package_data setup argument to list what should go in a binary package or installation folder
run time usage
Provided the mo hierarchy to be installed under a knows package, os.dirname(__file__) called from a module of that package gives its parent folder
Code (assuming the msgfmt.py file is copied under a tools_i18n folder and that po files are under a src folder):
in setup.py
...
sys.path.append(os.path.join(os.path.dirname(__file__), "tools_i18n"))
import msgfmt
from distutils.command.build import build as _build
class Builder(_build):
def run(self):
# po files in src folder are named domain_lang.po
po = re.compile(r"(.*)_(.*).po")
for file in os.listdir("src"):
m = po.match(file)
if m:
# create the LANG/LC_MESSAGES subdir of "locale"
path = os.path.join(self.build_lib, NAME, "locale",
m.group(2), "LC_MESSAGES")
os.makedirs(path, exist_ok=True)
# use msgfmt.py to compile the po file
msgfmt.make(os.path.join("src", file),
os.path.join(path, m.group(1) + ".mo"))
_build.run(self)
setup(
name=NAME,
...
package_data = { "": [..., "locale/*/*/*.mo"]}, # ensure .mo file are copied
cmdclass = {"build": Builder},
)
In MANIFEST.in:
...
include src/*
include tools_i18n/*
To use the translations at run time:
locpath = os.path.dirname(__file__)
lang = locale.getdefaultlocale()[0] # to get platform default language, or whatever...
tr = gettext.translation("argparse", os.path.join(locpath, "locale"),
[lang], fallback=True)
A full project using this method is available at https://github.com/s-ball/i18nparse
Last but not least, after a more in depth reading of the GNU gettext doc, I can say that gettext can process mo files whatever their endianness:
MO files of any endianness can be used on any platform. When a MO file has an endianness other than the platform’s one, the 32-bit numbers from the MO file are swapped at runtime. The performance impact is negligible.

Compiling an IronPython WPF project to exe

What is the best way to pack up an IronPython application for deployment? After scouring the web the best thing I've come up with (and what I'm currently doing) is using clr.CompileModules() to glue together my entire project's .py files into one .dll, and then having a single run.py do this to run the dll:
import clr
clr.AddReference('compiledapp.dll')
import app
This is still suboptimal, though, because it means I have to
distribute 3 files (the .dll, the .xaml, and the run.py launcher)
install IronPython on the host machine
Plus, this feels so... hacky, after the wonderful integration IronPython already has with Visual Studio 2010. I'm completely mystified as to why there is no integrated build system for IPy apps, seeing as it all boils down to IL anyway.
Ideally, I want to be able to have a single .exe with the .xaml merged inside somehow (I read that C# apps compile XAML to BAML and merge them in the executable), and without requiring IronPython to be installed to run. Is this at least halfway possible? (I suppose it's ok if the exe needs some extra .DLLs with it or something. The important part is that it's in .exe form.)
Some edits to clarify: I have tried pyc.py, but it seems to not acknowledge the fact that my project is not just app.py. The size of the exe it produces suggests that it is just 'compiling' app.py without including any of the other files into the exe. So, how do I tell it to compile every file in my project?
To help visualize this, here is a screenshot of my project's solution explorer window.
Edit II: It seems that unfortunately the only way is to use pyc.py and pass every single file to it as a parameter. There are two questions I have for this approach:
How do I possibly process a command line that big? There's a maximum of 256 characters in a command.
How does pyc.py know to preserve the package/folder structure? As shown in my project screenshot above, how will my compiled program know to access modules that are in subfolders, such as accessing DT\Device? Is the hierarchy somehow 'preserved' in the dll?
Edit III: Since passing 70 filenames to pyc.py through the command line will be unwieldy, and in the interest of solving the problem of building IPy projects more elegantly, I've decided to augment pyc.py.
I've added code that reads in a .pyproj file through the /pyproj: parameter, parses the XML, and grabs the list of py files used in the project from there. This has been working pretty well; however, the executable produced seems to be unable to access the python subpackages (subfolders) that are part of my project. My version of pyc.py with my .pyproj reading support patch can be found here: http://pastebin.com/FgXbZY29
When this new pyc.py is run on my project, this is the output:
c:\Projects\GenScheme\GenScheme>"c:\Program Files (x86)\IronPython 2.7\ipy.exe"
pyc.py /pyproj:GenScheme.pyproj /out:App /main:app.py /target:exe
Input Files:
c:\Projects\GenScheme\GenScheme\__init__.py
c:\Projects\GenScheme\GenScheme\Agent.py
c:\Projects\GenScheme\GenScheme\AIDisplay.py
c:\Projects\GenScheme\GenScheme\app.py
c:\Projects\GenScheme\GenScheme\BaseDevice.py
c:\Projects\GenScheme\GenScheme\BaseManager.py
c:\Projects\GenScheme\GenScheme\BaseSubSystem.py
c:\Projects\GenScheme\GenScheme\ControlSchemes.py
c:\Projects\GenScheme\GenScheme\Cu64\__init__.py
c:\Projects\GenScheme\GenScheme\Cu64\agent.py
c:\Projects\GenScheme\GenScheme\Cu64\aidisplays.py
c:\Projects\GenScheme\GenScheme\Cu64\devmapper.py
c:\Projects\GenScheme\GenScheme\Cu64\timedprocess.py
c:\Projects\GenScheme\GenScheme\Cu64\ui.py
c:\Projects\GenScheme\GenScheme\decorators.py
c:\Projects\GenScheme\GenScheme\DeviceMapper.py
c:\Projects\GenScheme\GenScheme\DT\__init__.py
c:\Projects\GenScheme\GenScheme\DT\Device.py
c:\Projects\GenScheme\GenScheme\DT\Manager.py
c:\Projects\GenScheme\GenScheme\DT\SubSystem.py
c:\Projects\GenScheme\GenScheme\excepts.py
c:\Projects\GenScheme\GenScheme\FindName.py
c:\Projects\GenScheme\GenScheme\GenScheme.py
c:\Projects\GenScheme\GenScheme\PMX\__init__.py
c:\Projects\GenScheme\GenScheme\PMX\Device.py
c:\Projects\GenScheme\GenScheme\PMX\Manager.py
c:\Projects\GenScheme\GenScheme\PMX\SubSystem.py
c:\Projects\GenScheme\GenScheme\pyevent.py
c:\Projects\GenScheme\GenScheme\Scheme.py
c:\Projects\GenScheme\GenScheme\Simulated\__init__.py
c:\Projects\GenScheme\GenScheme\Simulated\Device.py
c:\Projects\GenScheme\GenScheme\Simulated\SubSystem.py
c:\Projects\GenScheme\GenScheme\speech.py
c:\Projects\GenScheme\GenScheme\stdoutWriter.py
c:\Projects\GenScheme\GenScheme\Step.py
c:\Projects\GenScheme\GenScheme\TimedProcess.py
c:\Projects\GenScheme\GenScheme\UI.py
c:\Projects\GenScheme\GenScheme\VirtualSubSystem.py
c:\Projects\GenScheme\GenScheme\Waddle.py
Output:
App
Target:
ConsoleApplication
Platform:
ILOnly
Machine:
I386
Compiling...
Saved to App
So it correctly read in the list of files in the .pyproj... Great! But running the exe gives me this:
Unhandled Exception: IronPython.Runtime.Exceptions.ImportException:
No module named Cu64.ui
So even though Cu64\ui.py is obviously included in compilation, the exe, when run, can't find it. This is what I was afraid of in point #2 in the previous edit. How do I preserve the package hierarchy of my project? Perhaps compiling each package seperately may be needed?
I'll extend the bounty for this question. Ultimately my hope is that we can get a working pyc.py that reads in pyproj files and produces working exes in one step. Then maybe it could even be submitted to IronPython's codeplex to be included in the next release... ;]

Use pyc.py to produce app.exe and don't forget to include app.dll and IronPython libraries.
As for XAML - I've created project just for .xaml files that I compile in VS and then use them from IronPython. For example:
<ResourceDictionary.MergedDictionaries>
<ResourceDictionary Source="/CompiledStyle;component/Style.xaml" />
</ResourceDictionary.MergedDictionaries>

It "boils down to IL", but it isn't compatible with the IL that C# code produces, so it can't be directly compiled to a standalone .exe file.
You'll need to use pyc.py to compile your code to a stub EXE with the DLL that CompileModules creates.
Then distribute those files with IronPython.dll, IronPython.Modules.dll, Microsoft.Dynamic.dll, Microsoft.Scripting.Debugging.dll, Microsoft.Scripting.dll, and of course the XAML file.
To compile other files, just add them as arguments:
ipy.exe pyc.py /main:app.py /target:winexe another.py another2.py additional.py

I posted a Python script which can take an IronPython file, figure out its dependencies and compile the lot into a standalone binary at Ironpython 2.6 .py -> .exe. Hope you find it useful. It ought to work for WPF too as it bundles WPF support.

To create a set of assemblies for your IronPython application so that you can distribute it you can either use pyc.py or SharpDevelop.
To compile using pyc.py:
ipy.exe pyc.py /main:Program.py Form.py File1.py File2.py ... /target:winexe
Given the amount of files in your project you could try using SharpDevelop instead of maintaining a long command line for pyc.py. You will need to create a new IronPython project in SharpDevelop and import your files into the project. You will probably need to import the files one at a time since SharpDevelop lacks a way to import multiple files unless they are in a subfolder.
You can then use SharpDevelop to compile your application into an executable and a dll. All the other required files, such as IronPython.dll, Microsoft.Scripting.dll, will be in the bin/debug or bin/release folder. SharpDevelop uses clr.CompileModules and a custom MSBuild task behind the scenes to generate the binaries.
Any IronPython packages defined in your project should be usable from your application after compilation.
Packaging up the XAML can be done by embedding the xaml as a resource. Then using code similar to the following:
import clr
clr.AddReference('PresentationFramework')
clr.AddReference('System')
from System.IO import FileMode, FileStream, Path
from System.Reflection import Assembly
from System.Windows import Application
from System.Windows.Markup import XamlReader
executingAssemblyFileName = Assembly.GetEntryAssembly().Location
directory = Path.GetDirectoryName(executingAssemblyFileName)
xamlFileName = Path.Combine(directory, "Window1.xaml")
stream = FileStream(xamlFileName, FileMode.Open)
window = XamlReader.Load(stream)
app = Application()
app.Run(window)
SharpDevelop 3.2 does not embed resource files correctly so you will need to use SharpDevelop 4.
If you are using IronPython 2.7 you can use the new clr.LoadComponent method that takes an object and either a XAML filename or stream and wires up that object to the XAML.
Whilst the C# compiler can compile your XAML into a BAML resource doing the same with IronPython has a few problems. If you do not link the XAML to a class via the x:Class attribute then it is possible to compile the XAML into a BAML resource and have that embedded into your assembly. However you will not get any autogenerated code so you will need to create that code yourself. Another problem is that this will not work out of the box with SharpDevelop. You will need to edit the SharpDevelop.Build.Python.targets file and change the from Python to C#. Trying to use the x:Class attribute will not work since the BAML reader cannot access any associated IronPython class. This is because the generated IL in the compiled IronPython application is very different to that in a C# or VB.NET assembly.

I installed Visual Studio 2015 with PTVS (ironpython 2.7). I created a very simple WPF project and wasn't able to compile an exe. I always got the exception "ImportError: No module named wpf".
import clr
clr.AddReferenceToFileAndPath("c:\\path\\to\\IronPython.Wpf.dll")
clr.AddReferenceToFileAndPath('c:\\path\\to\\PresentationCore.dll')
clr.AddReferenceToFileAndPath('c:\\path\\to\\PresentationFramework.dll')
clr.AddReferenceToFileAndPath('c:\\path\\to\\WindowsBase.dll')
from System.Windows import Application, Window
import wpf
class MyWindow(Window):
def __init__(self):
wpf.LoadComponent(self, 'RegExTester.xaml')
def OnSearch(self, sender, e):
self.tbOut.Text = "hello world"
if __name__ == '__main__':
Application().Run(MyWindow())
The fault I got was because the clr clause must be before the import wpf. Steps to compile it:
install pip for CPython 2.7 (not ironpython!)
install ipy2asm
python -m pip install ironpycompiler
compile the application like
ipy2asm compile -t winexe -e -s program.py

setup.py adding options (aka setup.py --enable-feature )

I'm looking for a way to include some feature in a python (extension) module in installation phase.
In a practical manner:
I have a python library that has 2 implementations of the same function, one internal (slow) and one that depends from an external library (fast, in C).
I want that this library is optional and can be activated at compile/install time using a flag like:
python setup.py install # (it doesn't include the fast library)
python setup.py --enable-fast install
I have to use Distutils, however all solution are well accepted!

The docs for distutils include a section on extending the standard functionality. The relevant suggestion seems to be to subclass the relevant classes from the distutils.command.* modules (such as build_py or install) and tell setup to use your new versions (through the cmdclass argument, which is a dictionary mapping commands to classes which are to be used to execute them). See the source of any of the command classes (e.g. the install command) to get a good idea of what one has to do to add a new option.

An example of exactly what you want is the sqlalchemy's cextensions, which are there specifically for the same purpose - faster C implementation. In order to see how SA implemented it you need to look at 2 files:
1) setup.py. As you can see from the extract below, they handle the cases with setuptools and distutils:
try:
from setuptools import setup, Extension, Feature
except ImportError:
from distutils.core import setup, Extension
Feature = None
Later there is a check if Feature: and the extension is configured properly for each case using variable extra, which is later added to the setup() function.
2) base.py: here look at how BaseRowProxy is defined:
try:
from sqlalchemy.cresultproxy import BaseRowProxy
except ImportError:
class BaseRowProxy(object):
#....
So basically once C extensions are installed (using --with-cextensions flag during setup), the C implementation will be used. Otherwise, pure Python implementation of the class/function is used.

Adding Version Control / Numbering (?) to Python Project

With my Java projects at present, I have full version control by declaring it as a Maven project. However I now have a Python project that I'm about to tag 0.2.0 which has no version control. Therefore should I come accross this code at a later date, I won't no what version it is.
How do I add version control to a Python project, in the same way Maven does it for Java?

First, maven is a build tool and has nothing to do with version control. You don't need a build tool with Python -- there's nothing to "build".
Some folks like to create .egg files for distribution. It's as close to a "build" as you get with Python. This is a simple setup.py file.
You can use SVN keyword replacement in your source like this. Remember to enable keyword replacement for the modules that will have this.
__version__ = "$Revision$"
That will assure that the version or revision strings are forced into your source by SVN.
You should also include version keywords in your setup.py file.

Create a distutils setup.py file. This is the Python equivalent to maven pom.xml, it looks something like this:
from distutils.core import setup
setup(name='foo',
version='1.0',
py_modules=['foo'],
)
If you want dependency management like maven, take a look at setuptools.

Ants's answer is correct, but I would like to add that your modules can define a __version__ variable, according to PEP 8, which can be populated manually or via Subversion or CVS, e.g. if you have a module thingy, with a file thingy/__init__.py:
___version___ = '0.2.0'
You can then import this version in setup.py:
from distutils.core import setup
import thingy
setup(name='thingy',
version=thingy.__version__,
py_modules=['thingy'],
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.