I'm working on a project to do some static analysis of Python code. We're hoping to encode certain conventions that go beyond questions of style or detecting code duplication. I'm not sure this question is specific enough, but I'm going to post it anyway.
A few of the ideas that I have involve being able to build a certain understanding of how the various parts of source code work so we can impose these checks. For example, in part of our application that's exposing a REST API, I'd like to validate something like the fact that if a route is defined as a GET, then arguments to the API are passed as URL arguments rather than in the request body.
I'm able to get something like that to work by pulling all the routes, which are pretty nicely structured, and there are guarantees of consistency given the route has to be created as a route object. But once I know that, say, a given route is a GET, figuring out how the handler function uses arguments requires some degree of interpretation of the function source code.
Naïvely, something like inspect.getsourcelines will allow me to get the source code, but on further examination that's not the best solution because I immediately have to build interpreter-like features, such as figuring out whether a line is a comment, and then do something like use regular expressions to hunt down places where state is moved from the request context to a local variable.
Looking at tools like PyLint, they seem mostly focused on high-level "universals" of static analysis, and (at least on superficial inspection) don't have obvious ways of extracting this sort of understanding at a lower level.
Is there a more systematic way to get this representation of the source code, either with something in the standard library or with another tool? Or is the only way to do this writing a mini-interpreter that serves my purposes?
Related
I would like, given a python module, to monkey patch all functions, classes and attributes it defines. Simply put, I would like to log every interaction a script I do not directly control has with a module I do not directly control. I'm looking for an elegant solution that will not require prior knowledge of either the module or the code using it.
I found several high-level tools that help wrapping, decorating, patching etc... and i've went over the code of some of them, but I cannot find an elegant solution to create a proxy of any given module and automatically proxy it, as seamlessly as possible, except for appending logic to every interaction (record input arguments and return value, for example).
in case someone else is looking for a more complete proxy implementation
Although there are several python proxy solutions similar to those OP is looking for, I could not find a solution that will also proxy classes and arbitrary class objects, as well as automatically proxy functions return values and arguments. Which is what I needed.
I've got some code written for that purpose as part of a full proxying/logging python execution and I may make it into a separate library in the future. If anyone's interested you can find the core of the code in a pull request. Drop me a line if you'd like this as a standalone library.
My code will automatically return wrapper/proxy objects for any proxied object's attributes, functions and classes. The purpose is to log and replay some of the code, so i've got the equivalent "replay" code and some logic to store all proxied objects to a json file.
Let say that I have open source project from which I would like to borrow some functionality. Can I get some sort of report generated during execution and/or interaction of this project?
Report should contain e.g.:
which functions has been called,
in which order,
which classes has been instantiated etc.?
Would be nice to have some graphic output for that... you know, if else tree and highlighted the executed branch etc.
I am mostly interested in python and C (perl would be fine too) but if there is any universal tool that cover multiple languages (or one tool per language) for that, it would be very nice.
PS: I am familiar with debuggers but I do not want to step every singe line of code and check if this is the correct instruction. I'm assuming that if functions/methods/classes etc. are properly named then one can get some hints about where to find desired piece of code. But only naming is not enough because you do not know (from brief overview of code) if hopefully looking function foo() does not require some data that was generated by obscure function bar() etc. For that reason I am looking for something that can visualize relations between running code.
PS: Do not know if this is question for SO or programmers.stackexchange. Feel free to move if you wish. PS: I've noticed that tags that I've used are not recommended but execution flow tracking is the best phrase to describe this process
Check out Ned Batchelder's coverage and perhaps the graphviz/dot library called pycallgraph. May not be exactly what you need and also (python-only) but in the ballpark.
Pycallgraph is actually likelier to be of interest because it shows the execution path, not just what codelines got executed. It only renders to PDF normally, but it wasn't too difficult to get it to do SVG instead (dot/graphviz supports svg and other formats, pycallgraph was hardcoding pdf rendering).
Neither will do exactly what you want but they are a start.
I am implementing a workflow management system, where the workflow developer overloads a little process function and inherits from a Workflow class. The class offers a method named add_component in order to add a component to the workflow (a component is the execution of a software or can be more complex).
My Workflow class in order to display status needs to know what components have been added to the workflow. To do so I tried 2 things:
execute the process function 2 times, the first time allow to gather all components required, the second one is for the real execution. The problem is, if the workflow developer do something else than adding components (add element in a databases, create a file) this will be done twice!
parse the python code of the function to extract only the add_component lines, this works but if some components are in a if / else statement and the component should not be executed, the component apears in the monitoring!
I'm wondering if there is other solution (I thought about making my workflow being an XML or something to parse easier but this is less flexible).
You cannot know what a program does without "executing" it (could be in some context where you mock things you don't want to be modified but it look like shooting at a moving target).
If you do a handmade parsing there will always be some issues you miss.
You should break the code in two functions :
a first one where the code can only add_component(s) without any side
effects, but with the possibility to run real code to check the
environment etc. to know which components to add.
a second one that
can have side effects and rely on the added components.
Using an XML (or any static format) is similar except :
you are certain there are no side effects (don't need to rely on the programmer respecting the documentation)
much less flexibility but be sure you need it.
I've noticed I have the same piece of code sitting at the top of several of my controllers. They tend to look like this:
def app_description(app):
""" Dictionary describing an app. """
return {'name': app.app,
'id': app.id,
'is_new': app.is_new(),
'created_on': app.created_on.strftime("%m/%d/%Y"),
'configured': app.configured }
I'll call this from a couple different actions in the controller, but generally not outside that controller. It accesses properties. It calls methods. It formats opaque objects (like dates).
My question is: is this controller code, or model code?
The case for controller:
It defines my API.
It's currently only used in that module.
There doesn't seem to be any logic here.
The case for model:
It seems like a description of the data, which the model should be responsible for.
It feels like I might want to use this in other controllers. Haven't gotten there yet, but these functions are still pretty new, so they might.
Attaching a function to the object it clearly belongs to seems better than leaving it as a module-level function.
It could be more succinctly defined on the model. Something like having the top-level model object define .description(), and the subclasses just define a black/whitelist of properties, plus override the method itself to call functions. I'm pretty sure that would be fewer lines of code (as it would save me the repetition of things like 'name': app.name), which seems like a good thing.
Not sure which framework you are using, but I would suggest creating this helper functionality in its own class and put it in a shared folder like lib/
Alternatively you could have an application helper module that just has a bunch of these helpful application-wide functions.
Either way, I'd keep it away from both the model and the controller.
The answer I finally decided on:
In the short term, having these methods is fine in the controllers. If they define the output, then, OK, they can stay there. They're only used in the model.
Theres a couple things to watch out for, which indicate they've grown up, and need to go elsewhere:
In one case, I needed access to a canonical serialization of the object. At that point, it moved into the model, as a model method.
In another case, I found that I was formatting all timestamps the same. I have a standard #ajaxify decorator that does things like sets Content-Type headers, does JSON encoding, etc. In this case, I moved the datetime standard formatting into there -- when the JSON encoder hits a datetime (formerly unserializable), it always treats it the same (seconds since the epoch, for me).
In yet a third case, I realized that I was re-using this function in a couple controllers. For that, I pulled it out into a common class (like another answer suggested) and used that to define my "Web API". I'd used this pattern before -- it makes sense for grouping similarly-used data (like timeseries data, or top-N lists).
I suspect there's more, but basically, I don't think there all as similar as I thought they were initially. I'm currently happy thinking about them as a convention for simple objects in our (small-ish, new-ish) codebase, with the understanding that after a few iterations, a better solution may present itself. In the meantime, they stay in the controller and define my AJAXy-JSON-only interface.
To ask my very specific question I find I need quite a long introduction to motivate and explain it -- I promise there's a proper question at the end!
While reading part of a large Python codebase, sometimes one comes across code where the interface required of an argument is not obvious from "nearby" code in the same module or package. As an example:
def make_factory(schema):
entity = schema.get_entity()
...
There might be many "schemas" and "factories" that the code deals with, and "def get_entity()" might be quite common too (or perhaps the function doesn't call any methods on schema, but just passes it to another function). So a quick grep isn't always helpful to find out more about what "schema" is (and the same goes for the return type). Though "duck typing" is a nice feature of Python, sometimes the uncertainty in a reader's mind about the interface of arguments passed in as the "schema" gets in the way of quickly understanding the code (and the same goes for uncertainty about typical concrete classes that implement the interface). Looking at the automated tests can help, but explicit documentation can be better because it's quicker to read. Any such documentation is best when it can itself be tested so that it doesn't get out of date.
Doctests are one possible approach to solving this problem, but that's not what this question is about.
Python 3 has a "parameter annotations" feature (part of the function annotations feature, defined in PEP 3107). The uses to which that feature might be put aren't defined by the language, but it can be used for this purpose. That might look like this:
def make_factory(schema: "xml_schema"):
...
Here, "xml_schema" identifies a Python interface that the argument passed to this function should support. Elsewhere there would be code that defines that interface in terms of attributes, methods & their argument signatures, etc. and code that allows introspection to verify whether particular objects provide an interface (perhaps implemented using something like zope.interface / zope.schema). Note that this doesn't necessarily mean that the interface gets checked every time an argument is passed, nor that static analysis is done. Rather, the motivation of defining the interface is to provide ways to write automated tests that verify that this documentation isn't out of date (they might be fairly generic tests so that you don't have to write a new test for each function that uses the parameters, or you might turn on run-time interface checking but only when you run your unit tests). You can go further and annotate the interface of the return value, which I won't illustrate.
So, the question:
I want to do exactly that, but using Python 2 instead of Python 3. Python 2 doesn't have the function annotations feature. What's the "closest thing" in Python 2? Clearly there is more than one way to do it, but I suspect there is one (relatively) obvious way to do it.
For extra points: name a library that implements the one obvious way.
Take a look at plac that uses annotations to define a command-line interface for a script. On Python 2.x it uses plac.annotations() decorator.
The closest thing is, I believe, an annotation library called PyAnno.
From the project webpage:
"The Pyanno annotations have two functions:
Provide a structured way to document Python code
Perform limited run-time checking "