I've been fascinated by the __future__ module - in particular, its ability to change the way statements are parsed in python.
What's most interesting is how doing something like
from __future__ import print_function
Enables you to use print (and not print_function, like you would expect any other normal import to do).
I have read What is __future__ in Python used for and how/when to use it, and how it works thoroughly and in particular came across a particular line:
A future statement is a directive to the compiler that a particular
module should be compiled using syntax or semantics that will be
available in a specified future release of Python.
I would love to know the intricacies of what exactly makes this possible. In particular, how something like
from __future__ import division
Can enable true division on python2, while
from __future__ import barry_as_FLUFL
Can enable the <> syntax on python3 (what I find most funny is that you have to import a feature from "__future__" for backward compatibility).
Anyway, to summarise, I would like to know how the directive is understood and executed by the compiler when __future__ or its artefacts are imported.
from __future__ import print_function tells the parser to not treat print as a keyword (leaving it as a name instead). That way the compiler treats it as the function and not a statement.
To track this, the compiler struct has a c_future field that holds a PyFutureFeatures object that tracks which future directives have been enabled. Various parts of the parser and compiler check the flags and alter behaviour.
This is mostly handled in the future.c source file, which has a future_parse() function that checks for import from AST objects with the module parameter set to __future__, and sets flags based on what is found.
For example, for the barry_as_FLUFL 'feature', the parser refuses != as syntax but accepts <> instead:
if (type == NOTEQUAL) {
if (!(ps->p_flags & CO_FUTURE_BARRY_AS_BDFL) &&
strcmp(str, "!=")) {
PyObject_FREE(str);
err_ret->error = E_SYNTAX;
break;
}
else if ((ps->p_flags & CO_FUTURE_BARRY_AS_BDFL) &&
strcmp(str, "<>")) {
PyObject_FREE(str);
err_ret->text = "with Barry as BDFL, use '<>' "
"instead of '!='";
err_ret->error = E_SYNTAX;
break;
}
}
You can find the other examples by grepping for the FUTURE_* flags listed in compile.h.
Note that there is a __future__ Python module, but it is not directly involved in the parsing and compilation of code; it is merely there to give Python code easy access to metadata about directives (including bitfield values to pass to the flags argument of the compile() function), nothing more.
Related
I was recently tasked with maintaining a bunch of code that uses from module import * fairly heavily.
This codebase has gotten big enough that import conflicts/naming ambiguity/"where the heck did this function come from, there are like eight imported modules that have one with the same name?!"ism have become more and more common.
Moving forward, I've been using explicit members (i.e. import module ... module.object.function() to make the maintenance work I do more readable.
But I was wondering: is there an IDE or utility which robustly parses Python code and refactors * import statements into module import statements, and then prepends the full module path onto all references to members of that module?
We're not using metaprogramming/reflection/inspect/monkeypatching heavily, so if aforementened IDE/util behaves poorly with such things, that is OK.
Not a perfect solution, but what I usually do is this:
Open Pydev
Remove all * imports
Use the optimize imports command (ctrl+shift+o) to re-add all the imports
Roughly solves the problem :)
If you want to build a solution yourself, try http://docs.python.org/library/modulefinder.html
Here are the other related tools mentioned:
working with AST directly, which is very low-level for your use.
working with modulefinder which may have a lot of the boilerplate code you are looking for,
rope, a refactoring library (#Lucas Graf),
the bicycle repair man, a refactoring libary
the logilab-astng library used in pylint
More about pylint
pylint is a very good tool built on top of ast that is already able to tell you where in your code there are from somemodule import * statements, as well as telling you which imports are not necessary.
example:
# next is what's on line 32
from re import *
this will complain:
W: 32,0: Wildcard import re
W: 32,0: Unused import finditer from wildcard import
W: 32,0: Unused import LOCALE from wildcard import
... # this is a long list ...
Towards a solution?
Note that in the above output pylint gives you the line numbers. it might be some effort, but a refactoring tool can look at those particular warnings, get the line number, import the module and look at the __all__ list, or using a sandboxed execfile() statement to see the module's global names (would modulefinder help with that? maybe...). With the list of global names from __all__ and the names that pylint complains about, you can have two set() and proceed to get the difference. Replace the line featuring wildcard imports with specific imports.
I wrote some refactoring tools to do just that. Star Namer will go through all of your wildcard * imports for a script and replace them with the actual functions to be imported.
Usage: ./star_namer.py module_filename script_filename
Once you've converted all of your star imports to actual names you can use from_to_import.py to fix them. This is how it works:
Running your script through pylint and counting up all of the currently undefined words.
Removing all of the from modname import lines from the script.
Running the script through pylint again and comparing the difference in undefined words.
Going through the json output of pylint (in reverse order), it determines the exact position of replacements to be made and inserts the modname. in the correct place.
I thought this approach would be a little more robust, by offloading the syntax processing to an advanced utility, that's designed for it, instead of trying to grep through all the text myself with regex expressions.
Usage: from_to_import.py script_name modname
It will show you what changes are to be made before making them. Press y to save. The main issues I've found so far are text alignment issues caused by inserting the modname. text which makes comments misaligned and it doesn't deal with aliased function names well (from ... import quickrun as qrun)
Full documentation here: https://github.com/SurpriseDog/Star-Wrangler
I was recently tasked with maintaining a bunch of code that uses from module import * fairly heavily.
This codebase has gotten big enough that import conflicts/naming ambiguity/"where the heck did this function come from, there are like eight imported modules that have one with the same name?!"ism have become more and more common.
Moving forward, I've been using explicit members (i.e. import module ... module.object.function() to make the maintenance work I do more readable.
But I was wondering: is there an IDE or utility which robustly parses Python code and refactors * import statements into module import statements, and then prepends the full module path onto all references to members of that module?
We're not using metaprogramming/reflection/inspect/monkeypatching heavily, so if aforementened IDE/util behaves poorly with such things, that is OK.
Not a perfect solution, but what I usually do is this:
Open Pydev
Remove all * imports
Use the optimize imports command (ctrl+shift+o) to re-add all the imports
Roughly solves the problem :)
If you want to build a solution yourself, try http://docs.python.org/library/modulefinder.html
Here are the other related tools mentioned:
working with AST directly, which is very low-level for your use.
working with modulefinder which may have a lot of the boilerplate code you are looking for,
rope, a refactoring library (#Lucas Graf),
the bicycle repair man, a refactoring libary
the logilab-astng library used in pylint
More about pylint
pylint is a very good tool built on top of ast that is already able to tell you where in your code there are from somemodule import * statements, as well as telling you which imports are not necessary.
example:
# next is what's on line 32
from re import *
this will complain:
W: 32,0: Wildcard import re
W: 32,0: Unused import finditer from wildcard import
W: 32,0: Unused import LOCALE from wildcard import
... # this is a long list ...
Towards a solution?
Note that in the above output pylint gives you the line numbers. it might be some effort, but a refactoring tool can look at those particular warnings, get the line number, import the module and look at the __all__ list, or using a sandboxed execfile() statement to see the module's global names (would modulefinder help with that? maybe...). With the list of global names from __all__ and the names that pylint complains about, you can have two set() and proceed to get the difference. Replace the line featuring wildcard imports with specific imports.
I wrote some refactoring tools to do just that. Star Namer will go through all of your wildcard * imports for a script and replace them with the actual functions to be imported.
Usage: ./star_namer.py module_filename script_filename
Once you've converted all of your star imports to actual names you can use from_to_import.py to fix them. This is how it works:
Running your script through pylint and counting up all of the currently undefined words.
Removing all of the from modname import lines from the script.
Running the script through pylint again and comparing the difference in undefined words.
Going through the json output of pylint (in reverse order), it determines the exact position of replacements to be made and inserts the modname. in the correct place.
I thought this approach would be a little more robust, by offloading the syntax processing to an advanced utility, that's designed for it, instead of trying to grep through all the text myself with regex expressions.
Usage: from_to_import.py script_name modname
It will show you what changes are to be made before making them. Press y to save. The main issues I've found so far are text alignment issues caused by inserting the modname. text which makes comments misaligned and it doesn't deal with aliased function names well (from ... import quickrun as qrun)
Full documentation here: https://github.com/SurpriseDog/Star-Wrangler
I am writing a module to let me write code in python 3, but still run it in 2. It looks surprisingly easy actually... anything else I should add? From my (limited) flailing on the interactive interpreter, the future imports do not affect python 3 and are viewed as redundant.
# _2or3.py
'''
Common usage:
from __future__ import print_function, nested_scopes, division, absolute_import, unicode_literals
from _2or3 import *
'''
import sys
if sys.version[0] == '2':
range = xrange
input = raw_input
Obviously there are some things you cannot do that you would normally be able to do in 3 (like dictionary compressions), and there are a few gotchas between the languages (like bytecodes. It looks like you should NEVER use bytes)
Any comments would be appreciated.
Check out six, that already does this, and loads more. It also has methods that helps you do binary and Unicode in both versions. Not all techniques you need to do can be done this way, though, especially if you need to support Python 2.5 or earlier. I tried to cover most of them in the book, but I'm sure I've missed out on some.
I'm working on pypreprocessor which is a preprocessor that takes c-style directives and I've been able to make it work like a traditional preprocessor (it's self-consuming and executes postprocessed code on-the-fly) except that it breaks library imports.
The problem is: The preprocessor runs through the file, processes it, outputs to a temporary file, and exec() the temporary file. Libraries that are imported need to be handled a little different, because they aren't executed, but rather they are loaded and made accessible to the caller module.
What I need to be able to do is: Interrupt the import (since the preprocessor is being run in the middle of the import), load the postprocessed code as a tempModule, and replace the original import with the tempModule to trick the calling script with the import into believing that the tempModule is the original module.
I have searched everywhere and so far and have no solution.
This Stack Overflow question is the closest I've seen so far to providing an answer:
Override namespace in Python
Here's what I have.
# Remove the bytecode file created by the first import
os.remove(moduleName + '.pyc')
# Remove the first import
del sys.modules[moduleName]
# Import the postprocessed module
tmpModule = __import__(tmpModuleName)
# Set first module's reference to point to the preprocessed module
sys.modules[moduleName] = tmpModule
moduleName is the name of the original module, and tmpModuleName is the name of the postprocessed code file.
The strange part is this solution still runs completely normal as if the first module completed loaded normally; unless you remove the last line, then you get a module not found error.
Hopefully someone on Stack Overflow know a lot more about imports than I do, because this one has me stumped.
Note: I will only award a solution, or, if this is not possible in Python; the best, most detailed explanation of why this is not impossible.
Update: For anybody who is interested, here is the working code.
if imp.lock_held() is True:
del sys.modules[moduleName]
sys.modules[tmpModuleName] = __import__(tmpModuleName)
sys.modules[moduleName] = __import__(tmpModuleName)
The 'imp.lock_held' part detects whether the module is being loaded as a library. The following lines do the rest.
Does this answer your question? The second import does the trick.
Mod_1.py
def test_function():
print "Test Function -- Mod 1"
Mod_2.py
def test_function():
print "Test Function -- Mod 2"
Test.py
#!/usr/bin/python
import sys
import Mod_1
Mod_1.test_function()
del sys.modules['Mod_1']
sys.modules['Mod_1'] = __import__('Mod_2')
import Mod_1
Mod_1.test_function()
To define a different import behavior or to totally subvert the import process you will need to write import hooks. See PEP 302.
For example,
import sys
class MyImporter(object):
def find_module(self, module_name, package_path):
# Return a loader
return self
def load_module(self, module_name):
# Return a module
return self
sys.meta_path.append(MyImporter())
import now_you_can_import_any_name
print now_you_can_import_any_name
It outputs:
<__main__.MyImporter object at 0x009F85F0>
So basically it returns a new module (which can be any object), in this case itself. You may use it to alter the import behavior by returning processe_xxx on import of xxx.
IMO: Python doesn't need a preprocessor. Whatever you are accomplishing can be accomplished in Python itself due to it very dynamic nature, for example, taking the case of the debug example, what is wrong with having at top of file
debug = 1
and later
if debug:
print "wow"
?
In Python 2 there is the imputil module that seems to provide the functionality you are looking for, but has been removed in python 3. It's not very well documented but contains an example section that shows how you can replace the standard import functions.
For Python 3 there is the importlib module (introduced in Python 3.1) that contains functions and classes to modify the import functionality in all kinds of ways. It should be suitable to hook your preprocessor into the import system.
How do you gracefully handle failed future feature imports? If a user is running using Python 2.5 and the first statement in my module is:
from __future__ import print_function
Compiling this module for Python 2.5 will fail with a:
File "__init__.py", line 1
from __future__ import print_function
SyntaxError: future feature print_function is not defined
I'd like to inform the user that they need to rerun the program with Python >= 2.6 and maybe provide some instructions on how to do so. However, to quote PEP 236:
The only lines that can appear before
a future_statement are:
The module docstring (if any).
Comments.
Blank lines.
Other future_statements.
So I can't do something like:
import __future__
if hasattr(__future__, 'print_function'):
from __future__ import print_function
else:
raise ImportError('Python >= 2.6 is required')
Because it yields:
File "__init__.py", line 4
from __future__ import print_function
SyntaxError: from __future__ imports must occur at the beginning of the file
This snippet from the PEP seems to give hope of doing it inline:
Q: I want to wrap future_statements
in try/except blocks, so I can use
different code depending on which
version of Python I'm running. Why
can't I?
A: Sorry! try/except is a runtime
feature; future_statements are
primarily compile-time gimmicks, and
your try/except happens long after the
compiler is done. That is, by the
time you do try/except, the semantics
in effect for the module are already a
done deal. Since the try/except
wouldn't accomplish what it looks
like it should accomplish, it's simply
not allowed. We also want to keep
these special statements very easy to
find and to recognize.
Note that you can import __future__
directly, and use the information in
it, along with sys.version_info, to
figure out where the release you're
running under stands in relation to a
given feature's status.
Ideas?
"I'd like to inform the user that they need to rerun the program with Python >= 2.6 and maybe provide some instructions on how to do so."
Isn't that what a README file is for?
Here's your alternative. A "wrapper": a little blob of Python that checks the environment before running your target aop.
File: appwrapper.py
import sys
major, minor, micro, releaselevel, serial = sys.version_info
if (major,minor) <= (2,5):
# provide advice on getting version 2.6 or higher.
sys.exit(2)
import app
app.main()
What "direct import" means. You can examine the contents of __future__. You're still bound by the fact the a from __future__ import print_function is information to the compiler, but you can poke around before importing the module that does the real work.
import __future__, sys
if hasattr(__future__, 'print_function'):
# Could also check sys.version_info >= __future__. print_function.optional
import app
app.main()
else:
print "instructions for upgrading"
A rather hacky but simple method I've used before is to exploit the fact that byte literals were introduced in Python 2.6 and use something like this near the start of the file:
b'This module needs Python 2.6 or later. Please do xxx.'
This is harmless in Python 2.6 or later, but a SyntaxError in any earlier versions. Anyone trying to compile your file will still get an error, but they also get whatever message you want to give.
You might think that as you will have to have this line after your from __future__ import print_function then it will be the import that generates the SyntaxError and you won't get to see the useful error message, but strangely enough the later error takes precedence. I suspect that as the error from the import isn't really a syntax error in itself it isn't raised on the first compilation pass, and so real syntax errors get raised first (but I'm guessing).
This might not meet you criteria for being 'graceful', and it is very Python 2.6 specific, but it is quick and easy to do.
Just put a comment on the same line with the "from __future__ import ...", like this:
from __future__ import print_function, division # We require Python 2.6 or later
Since Python displays the line containing the error, if you try to run the module with Python 2.5 you'll get a nice, descriptive error:
from __future__ import print_function, division # We require Python 2.6 or later
SyntaxError: future feature print_function is not defined