Python source analyzer that works with zope components

Python source analyzer that works with zope components - python

There's a code base (Py 2.6 stuff) that extensively uses zope components and there's a need for tool that can analyze the sources. The tool is supposed to analyze the source; like look for usage of some restricted classes/objects/interfaces, etc. Basically, scan each statement in the source while being aware of the statement's context (which class/method/function it is in, which module, etc) and analyze it for specific patterns.
The approach of reading the source into text buffers and matching patterns doesn't seem to be a reliable approach at all.
Another approach that came up was of using inspect, but inspect seems to be broken and doesn't seem to be able to handle our code base (multiple tries, all of them crashed inspect). I might have to give up on that as there seems to be now way out with inspect.
Now, other options that I could think of are using pylint or using AST (and doing a lot of postprocessing on it). Am not sure to what extent pylint is extensible; and when it comes to analyzing source statments if pylint could be aware of context (i.e, which class defn/function/method has this statement etc) Using AST seems too much of an overkill for such a trivial purpose.
What would you suggest as a suitable approach here? Anybody else had to deal with such an issue before?
Please suggest.

Related

Best python library to use for a BNF based autocompleter

I want to create a SQL autocompleter for use with rlwrap: https://github.com/hanslub42/rlwrap
This could then be used with sqlite3 & osqueri for example (I know they already have some autocompletion facility, but it's not good enough, especially under rlwrap).
In fact, more generally I would like to know the best approach for building autocompleters based on BNF grammar descriptions; I may want to produce autocompleters for other rlwrapped REPLs at some point in the future.
I have no experience with parsers, but I have read some stuff online about the different types of parsers and how they work, and this Pyleri tutorial: https://tomassetti.me/pyleri-tutorial/
Pyleri looks fairly straightforward, and has the expecting property which makes it easy to create a auto-completer, but AFAIK it would involve translating the sqlite BNF (and any other BNF's that I might want to use in the future) into python code, which is a drag.
ANTLR has lots of predefined grammar files for many different languages, and the ability to output python code, but I'm not sure how easy it is to produce an autocompleter, and I don't want to read through all the documentation only to find out I've wasted my time.
So can anyone advise me? What's the best approach?

How to get code-completion for COM programming in PyCharm?

When using app = win32com.client.Dispatch('Some.Application'), is there any feasible way get code-completion in PyCharm? It is rather tedious having to retype (or copy-paste) everything from an API documentation, so would creating skeletons be. Is there no other way to let PyCharm know about the Interface provided via COM, especially if I can provide a .tlb file? Or is there at least some way automatically generate such a skeleton (or a wrapping module?) from the TypeLib?

Since there is no way for PyCharm to know the runtime type of app, you shouldn't expect to get code completion on app directly; at least not until they decide to add built-in support for generating code from type libraries.
However, you can exploit the fact that win32com implicitly generates code based on the type library as described in the first part of this answer, together with PyCharm's support for type hinting, to get code completion on COM methods.
Make sure that the Python types have been generated; their location is determined by the GUID of the COM object. For example, the types for Microsoft Word 2016 on my machine are available in
C:\Users\username\appdata\local\temp\gen_py\3.6\00020905-0000-0000-c000-000000000046x0x8x7\.
Add this folder to the path of your PyCharm Python interpreter; see e.g. this answer.
Import the modules for which you want code completion.
In the screenshots below, we use this approach with Word's Find:
Now, besides feeling dirty, this approach relies on the relevant types having been generated and the code completion is limited to the methods published by the object, so I imagine its usefulness in practice might be somewhat limited; in particular, anybody working on the code will have to generate the code, or the annotations will cause NameErrors. Personally, I would probably prefer using Jupyter for the exploratory part of the implementation process, and with minimal tweaks outlined in the answer mentioned above, Jupyter can be extended to have full code completion with win32com.

Is it possible to use a with grako generated parser without grako?

see the title. For a small tool I am writing I wanted to introduce a simple boolean filter language and decided to do that "properly" and use a parser-generator. After playing around with grako a bit I found I like it and got the filter-language done fairly quick (which is also nice :))
The problem is now, if I want to use the tool on other computers or give it to other people I first have to somehow make grako available there, which is a bit bothersome, because everything else is standard python3 stuff.
I guess it is possible by co-packaging the necessary grako-classes, but that seems a bit messy (licensing would be mentioned in any way). Maybe I have overlooked some built-in method.

The short answer is No.
Grako-generated parsers do require the grako library.
For example:
with self._group():
with self._choice():
with self._option():
self._token('nameguard')
with self._option():
self._token('ignorecase')
with self._option():
self._token('left_recursion')
self._error('expecting one of: ignorecase left_recursion nameguard')
All the self._xyz() come from either grako.contexts.ParseContext or grako.parsing.Parser. The backtracking, caching, and the book-keeping required are all hidden behind context managers and decorators.
Having generated parsers depend on grako was a design choice aimed at making the parsers smaller and easier to understand, which was one of the primary objectives of the project (as there are many otherwise-great parser generators that produce obfuscated code).
The other option was to copy the code that the generated parsers could depend on onto each parser, but that seemed a bit unpythonic.

What builtin functions shouldn't be run by untrusted users?

I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?

There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.

Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.

You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.

Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.

If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.

You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.

How can I go about securely executing a subset of python?

I need to store source code for a basic function in a database and allow it to be modified through an admin interface. This code will take several numbers and strings as parameters, and return a number or None. I know that eval is evil, so I need to implement a safe way to execute a very basic subset of python, or something syntactically similar at least, from within a python based web-app.
The obvious answer is to implement a DSL (Domain Specific Language), however, I have no experience with that, nor do I have any idea where to begin, and a lot of the resources available seem to go a little over my head. I'm hoping that maybe there is something already out there which will allow me to essentially generate a secure python-callable function from a string in a database. the language really only needs to support assignment, basic math, if/else, and case insensitive string comparisons. any other features are a bonus, but I think most things can be done with just that, no need for complex data structures, classes, functions, etc.
If no such thing currently exists, I'm willing to look into the possibility of creating one, but as I said, I have no idea how to go about that, and any advice in that regard would be appreciated as well.

Restricted Python environments are hard to make really safe.
Maybe something like lua is a better fit for you

PySandbox might help. I haven't tested it, just found it linked elsewhere.

You could use Pyparsing to implement your DSL, provided the expressions involved won't be too complex (you don't give full details on that but you imply the requirements are pretty simple). See the examples page including specifically fourFn.py or simpleCalc.py.

You could implement a subset of Python by using the ast module to parse Python code into an abstract syntax tree then walk the tree checking that it only uses the subset of Python that you allow. This will only work in Python 2.x since Python 3 has removed the ast module.
However even using this method it will be hard to create something that is 100% secure, since even the most innocuous code could allow the user to write something that could blow up your application, e.g. by allocating more memory than you have available or putting the program into an infinite loop using all the CPU.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.