Python API Compatibility Checker

Python API Compatibility Checker - python

In my current work environment, we produce a large number of Python packages for internal use (10s if not 100s). Each package has some dependencies, usually on a mixture of internal and external packages, and some of these dependencies are shared.
As we approach dependency hell, updating dependencies becomes a time consuming process. While we care about the functional changes a new version might introduce, of equal (if not more) importance are the API changes that break the code.
Although running unit/integration tests against newer versions of a dependency helps us to catch some issues, our coverage is not close enough to 100% to make this a robust strategy. Release notes and a change log help identify major changes at a high-level, but these rarely exist for internally developed tools or go into enough detail to understand the implications the new version has on the (public) API.
I am looking at otherways to automate this process.
I would like to be able to automatically compare two versions of a Python package and report the API differences between them. In particular this would include backwards incompatible changes such as removing functions/methods/classes/modules, adding positional arguments to a function/method/class and changing the number of items a function/method returns. As a developer, based on the report this generates I should have a greater understanding about the code level implications this version change will introduce, and so the time require to integrate it.
Elsewhere, we use the C++ abi-compliance-checker and are looking at the Java api-compliance-checker to help with this process. Is there a similar tool available for Python? I have found plenty of lint/analysis/refactor tools but nothing that provides this level of functionality. I understand that Python's dynamic typing will make a comprehensive report impossible.
If such a tool does not exist, are they any libraries that could help with implementing a solution? For example, my current approach would be to use an ast.NodeVisitor to traverse the package and build a tree where each node represents a module/class/method/function and then compare this tree to that of another version for the same package.
Edit: since posting the question I have found pysdiff which covers some of my requirements, but interested to see alternatives still.
Edit: also found Upstream-Tracker would is a good example of the sort of information I'd like to end up with.

What about using the AST module to parse the files?
import ast
with file("test.py") as f:
python_src = f.read()
node = ast.parse(python_src) # Note: doesn't compile the src
print ast.dump(node)
There's the walk method on the ast node (described http://docs.python.org/2/library/ast.html)
The astdump might work (available on pypi)
This out of date pretty printer
http://code.activestate.com/recipes/533146-ast-pretty-printer/
The documentation tool Sphinx also extracts the information you are looking for. Perhaps give that a look.
So walk the AST and build a tree with the information you want in it. Once you have a tree you can pickle it and diff later or convert the tree to a text representation in a
text file you can diff with difftools, or some external diff program.
The ast has parse() and compile() methods. Only thing is I'm not entirely sure how much information is available to you after parsing (as you don't want to compile()).

Perhaps you can start by using the inspect module
import inspect
import types
def genFunctions(module):
moduleDict = module.__dict__
for name in dir(module):
if name.startswith('_'):
continue
element = moduleDict[name]
if isinstance(element, types.FunctionType):
argSpec = inspect.getargspec(element)
argList = argSpec.args
print "{}.{}({})".format(module.__name__, name, ", ".join(argList))
That will give you a list of "public" (not starting with underscore) functions with their argument lists. You can add more stuff to print the kwargs, classes, etc.
Once you run that on all the packages/modules you care about, in both old and new versions, you'll have two lists like this:
myPackage.myModule.myFunction1(foo, bar)
myPackage.myModule.myFunction2(baz)
Then you can either just sort and diff them, or write some smarter tooling in Python to actually compare all the names, e.g. to permit additional optional arguments but reject new mandatory arguments.

Check out zope.interfaces (you can get it from PyPI). Then you can incorporate unit testing that modules support interfaces into your unit tests. May take a while to retro fit however - also it's not a silver bullet.

Related

Tracking changes in python source files?

I'm learning python and came into a situation where I need to change the behvaviour of a function. I'm initially a java programmer so in the Java world a change in a function would let Eclipse shows that a lot of source files in Java has errors. That way I can know which files need to get modified. But how would one do such a thing in python considering there are no types?! I'm using TextMate2 for python coding.
Currently I'm doing the brute-force way. Opening every python script file and check where I'm using that function and then modify. But I'm sure this is not the way to deal with large projects!!!
Edit: as an example I define a class called Graph in a python script file. Graph has two objects variables. I created many objects (each with different name!!!) of this class in many script files and then decided that I want to change the name of the object variables! Now I'm going through each file and reading my code again in order to change the names again :(. PLEASE help!
Example: File A has objects x,y,z of class C. File B has objects xx,yy,zz of class C. Class C has two instance variables names that should be changed Foo to Poo and Foo1 to Poo1. Also consider many files like A and B. What would you do to solve this? Are you serisouly going to open each file and search for x,y,z,xx,yy,zz and then change the names individually?!!!

Sounds like you can only code inside an IDE!
Two steps to free yourself from your IDE and become a better programmer.
Write unit tests for your code.
Learn how to use grep
Unit tests will exercise your code and provide reassurance that it is always doing what you wanted it to do. They make refactoring MUCH easier.
grep, what a wonderful tool grep -R 'my_function_name' src will find every reference to your function in files under the directory src.
Also, see this rather wonderful blog post: Unix as an IDE.

Whoa, slow down. The coding process you described is not scalable.
How exactly did you change the behavior of the function? Give specifics, please.
UPDATE: This all sounds like you're trying to implement a class and its methods by cobbling together a motley patchwork of functions and local variables - like I wrongly did when I first learned OO coding in Python. The code smell is that when the type/class of some class internal changes, it should generally not affect the class methods. If you're refactoring all your code every 10 mins, you're doing something seriously wrong. Step back and think about clean decomposition into objects, methods and data members.
(Please give more specifics if you want a more useful answer.)
If you were only changing input types, there might be no need to change the calling code.
(Unless the new fn does something very different to the old one, in which case what was the argument against calling it a different name?)
If you changed the return type, and you can't find a common ancestor type or container (tuple, sequence etc.) to put the return values in, then yes you need to change its caller code. However...
...however if the function should really be a method of a class, declare that class and the method already. The previous paragraph was a code smell that your function really should have been a method, specifically a polymorphic method.
Read about code smells, anti-patterns and When do you know you're dealing with an anti-pattern?. There e.g. you will find a recommendation for the video "Recovery from Addiction - A taste of the Python programming language's concision and elegance from someone who once suffered an addiction to the Java programming language." - Sean Kelly
Also, sounds like you want to use Test-Driven Design and add some unittests.
If you give us the specifics we can critique it better.

You won't get this functionality in a text editor. I use sublime text 3, and I love it, but it doesn't have this functionality. It does however jump to files and functions via its 'Goto Anything' (Ctrl+P) functionality, and its Multiple Selections / Multi Edit is great for small refactoring tasks.
However, when it comes to IDEs, JetBrains pycharm has some of the amazing re-factoring tools that you might be looking for.
The also free Python Tools for Visual Studio (see free install options here which can use the free VS shell) has some excellent Refactoring capabilities and a superb REPL to boot.
I use all three. I spend most of my time in sublime text, I like pycharm for refactoring, and I find PT4VS excellent for very involved prototyping.
Despite python being a dynamically typed language, IDEs can still introspect to a reasonable degree. But, of course, it won't approach the level of Java or C# IDEs. Incidentally, if you are coming over from Java, you may have come across JetBrains IntelliJ, which PyCharm will feel almost identical to.
One's programming style is certainly different between a statically typed language like C# and a dynamic language like python. I find myself doing things in smaller, testable modules. The iteration speed is faster. And in a dynamic language one relies less on IDE tools and more on unit tests that cover the key functionality. If you don't have these you will break things when you refactor.

One answer only specific to your edit:
if your old code was working and does not need to be modified, you could just keep old names as alias of the new ones, resulting in your old code not to be broken. Example:
class MyClass(object):
def __init__(self):
self.t = time.time()
# creating new names
def new_foo(self, arg):
return 'new_foo', arg
def new_bar(self, arg):
return 'new_bar', arg
# now creating functions aliases
foo = new_foo
bar = new_bar
if your code need rework, rewrite your common code, execute everything, and correct any failure. You could also look for any import/instantiation of your class.

One of the tradeoffs between statically and dynamically typed languages is that the latter require less scaffolding in the form of type declarations, but also provide less help with refactoring tools and compile-time error detection. Some Python IDEs do offer a certain level of type inference and help with refactoring, but even the best of them will not be able to match the tools developed for statically typed languages.
Dynamic language programmers typically ensure correctness while refactoring in one or more of the following ways:
Use grep to look for function invocation sites, and fix them. (You would have to do that in languages like Java as well if you wanted to handle reflection.)
Start the application and see what goes wrong.
Write unit tests, if you don't already have them, use a coverage tool to make sure that they cover your whole program, and run the test suite after each change to check that everything still works.

vim add automatical sphinx comment under function and class definition

I want to automatically add sphinx comment under head functions and classes.
When I press Enter after head function or class, comment could be implemented like this:
def func(a): #<Enter>
"""
Args:
a (type): The name to use.
Returns:
type. The return
"""
Is it possible to configure .vimrc (.vimrc.local)? Do you know command for this? Or may be plugin?

Though you can do this with the built-in (insert-mode) mappings, you'll soon want to do more advanced insertions.
snippets are like the built-in :abbreviate on steroids, usually with parameter insertions, mirroring, and multiple stops inside them. One of the first, very famous (and still widely used) Vim plugins is snipMate (inspired by the TextMate editor); unfortunately, it's not maintained any more; though there is a fork. A modern alternative (that requires Python though) is UltiSnips. There are more, see this list on the Vim Tips Wiki.
There are two things to evaluate: First, the features of the snippet engine itself, and second, the quality and breadth of snippets provided by the author or others.

How to document and test interfaces required of formal parameters in Python 2?

To ask my very specific question I find I need quite a long introduction to motivate and explain it -- I promise there's a proper question at the end!
While reading part of a large Python codebase, sometimes one comes across code where the interface required of an argument is not obvious from "nearby" code in the same module or package. As an example:
def make_factory(schema):
entity = schema.get_entity()
...
There might be many "schemas" and "factories" that the code deals with, and "def get_entity()" might be quite common too (or perhaps the function doesn't call any methods on schema, but just passes it to another function). So a quick grep isn't always helpful to find out more about what "schema" is (and the same goes for the return type). Though "duck typing" is a nice feature of Python, sometimes the uncertainty in a reader's mind about the interface of arguments passed in as the "schema" gets in the way of quickly understanding the code (and the same goes for uncertainty about typical concrete classes that implement the interface). Looking at the automated tests can help, but explicit documentation can be better because it's quicker to read. Any such documentation is best when it can itself be tested so that it doesn't get out of date.
Doctests are one possible approach to solving this problem, but that's not what this question is about.
Python 3 has a "parameter annotations" feature (part of the function annotations feature, defined in PEP 3107). The uses to which that feature might be put aren't defined by the language, but it can be used for this purpose. That might look like this:
def make_factory(schema: "xml_schema"):
...
Here, "xml_schema" identifies a Python interface that the argument passed to this function should support. Elsewhere there would be code that defines that interface in terms of attributes, methods & their argument signatures, etc. and code that allows introspection to verify whether particular objects provide an interface (perhaps implemented using something like zope.interface / zope.schema). Note that this doesn't necessarily mean that the interface gets checked every time an argument is passed, nor that static analysis is done. Rather, the motivation of defining the interface is to provide ways to write automated tests that verify that this documentation isn't out of date (they might be fairly generic tests so that you don't have to write a new test for each function that uses the parameters, or you might turn on run-time interface checking but only when you run your unit tests). You can go further and annotate the interface of the return value, which I won't illustrate.
So, the question:
I want to do exactly that, but using Python 2 instead of Python 3. Python 2 doesn't have the function annotations feature. What's the "closest thing" in Python 2? Clearly there is more than one way to do it, but I suspect there is one (relatively) obvious way to do it.
For extra points: name a library that implements the one obvious way.

Take a look at plac that uses annotations to define a command-line interface for a script. On Python 2.x it uses plac.annotations() decorator.

The closest thing is, I believe, an annotation library called PyAnno.
From the project webpage:
"The Pyanno annotations have two functions:
Provide a structured way to document Python code
Perform limited run-time checking "

When should a Python script be split into multiple files/modules?

In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?

Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.

As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)

In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)

Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.

Is monkeypatching stdlib methods a good practice in Python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Over time I found the need to override several stdlib methods from Python in order to overcome limitation or to add some missing functionality.
In all cases I added a wrapper function and replaced the original method from the module with my wrapper (the wrapper was calling the original method).
Why I did this? Just to be sure that all the calls to the method are using my new versions, even if these are called from other third-party modules.
I know that monkeypatching can be a bad thing but my question is if this is useful if you use it with care? Meaning that:
you still call the original methods, assuring that you are not missing anything when the original module is updated
you are not changing the original "meaning" of the methods
Examples:
add coloring support to python logging module.
make open() be able to recognize Unicode BOM masks when using text mode
adding logging support to os.system() or subprocess.Popen() - letting you output to console or/and redirect to another file.
implementing methods that are missing on your platform like os.chown() or os.lchown() that are missing on Windows.
Doing things like these appear to me as decent overrides but I would like to see how others are seeing them and specially what should be considered as an acceptable monkeypatch and what not.

None of these things seem to require monkeypatching. All of them seem to have better, more robust and reliable solutions.
Adding a logging handler is easy. No monkeypatch.
Fixing open is done this way.
from io import open
That was easy. No patch.
Logging to os.system()? I'd think that a simple "wrapper" function would be far better than a complex patch. Further, I'd use subprocess.Popen, since that's the recommended replacement.
Adding missing methods to mask OS differences (like os.chown()) seems like a better use for try/except. But that's just me. I like explicit rather than implicit.
On balance, I still can't see a good reason for monkeypatching.
I'd hate to be locked in to legacy code (like os.system) because I was too dependent on my monkeypatches.
The concept of "subclass" applies to modules as well as classes. You can easily write your own modules which (a) import and (b) extend existing modules. You then use your new modules because they provided extra features. You don't need to monkeypatch.
even if these are called from other third-party modules
Dreadful idea. You can easily break another module by altering built-in features. If you have read the other module and are sure the monkeypatches won't break then what you've found is this.
The "other" module should have had room for customization. It should have had a place for a "dependency injection" or Strategy design pattern. Good thinking.
Once you've found this, the "other" module can be fixed to allow this customization. It may be as simple as a documentation change explaining how to modify an object. It may be an additional
parameter for construction to insert your customization.
You can then provide the revised module to the authors to see if they'll support your small update to their module. Many classes can use extra help supporting a "dependency injection" or Strategy design for extensions.
If you have not read the other module and are not sure your monkeypatches work... well... we still have hope that the monkeypatches don't break anything.

Monkeypatching can be "the least of evils", sometimes -- mostly, when you need to test code which uses a subsystem that is not well designed for testability (doesn't support dependency injection &c). In those cases you will be monkeypatching (very temporarily, fortunately) in your test harness, and almost invariably monkeypatching with mocks or fakes for the purpose of isolating tests (i.e., making them unit tests, rather than integration tests).
This "bad but could be worse" use case does not appear to apply to your examples -- they can all be better architected by editing the application level code to call your appropriate wrapper functions (say myos.chown rather than the bare os.chown, for example) and putting your wrapper functions in your own intermediate modules (such as myown) that stand between the application level code and the standard library (or third-party extensions that you are thus wrapping -- there's nothing special about the standard library in this respect).
One problematic situation might arise when the "application level code" isn't really under your control -- it's a third party subsystem that you'd rather not modify. Nevertheless, I have found that in such situations modifying the third party subsystem to call wrappers (rather than the standard library functions directly) is way more productive in the long run -- then of course you submit the change to the maintainers of the third party subsystem in question, they roll your change into their subsystem's next release, and life gets better for everybody (you included, since once your changes are accepted they'll get routinely maintained and tested by others!-).
(As a side note, such wrappers may also be worth submitting as diffs to the standard library, but that is a different case since the standard library evolves very very slowly and cautiously, and in particular on the Python 2 line will never evolve any longer, since 2.7 is the last of that line and it's feature-frozen).
Of course, all of this presupposes an open-source culture. If for some mysterious reasons you're using a closed-source third party subsystem, therefore one which you cannot possibly maintain, then you are in another situation where monkey patching may be the lesser evil (but that's just because the evil of losing strategic control of your development by trusting in code you can't possibly maintain is such a bigger evil in itself;-). I've never found myself in this situation with a third-party package that was both closed-source and itself written in Python (if the latter condition doesn't hold your monkeypatches would do you no good;-).
Note that here the working definition of "closed-source" is really very strict: for example, even Microsoft 12+ years ago distributed sources of libraries such as MFC with Visual C++ (as their product was then called) -- closed-source because you couldn't redistribute their sources, but still, you DID have sources at hand, so when you met some terrible limitation or bug you COULD fix it (and submit the change to them for a future release, as well as publishing your change as a diff as long as it included absolutely none of their copyrighted code -- not trivial, but feasible).
Monkeypatching well beyond the strict confines within which such an approach is "the least of evil" is a frequent mistake of users of dynamic languages -- be careful not to fall into that trap yourself!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.