A tool to validate the structure of a Python Package?

A tool to validate the structure of a Python Package? - python

I started writing Python code not too long ago and everything just works, but I have been having problem writing a package. I was wondering if there is such a thing as a "package validation tool". I know, I could just start up a REPL and start importing the module but...is there a better way? Is there a tool that could tell me "you have these possible errors"?
Or maybe there is something in the middle: is there a way to test a Python's package structure?
As always, thanks in advance!

If you call a module using:
python -m module
Python will load/execute the module, so you should catch crude syntax errors. Also, if module has a block like:
if __name__ = "__main__":
do_something()
It will be called. For some small self-contained modules I often use this this block to run tests.
Given the very dynamic nature of Python, it is very hard to check for correctness if the module author is not using TTD. There is no silver bullet here. There are tools that will check for "code smells" and compliance with standards (dynamic languages tend to generate a profusion of linters).
pylint
PyChecker
PyFlakes
PEP8
A good IDE like PyCharm can help, if you like IDEs.
These tools can help, but are still far from the assurance of static languages where the compiler can catch many errors at compile time. For example, Go seems to be designed to have a very pedantic compiler. Haskell programs are said to be like mathematical proofs.
If you are coming from a Language with strong compile time checks, just relax. Python is kind of a "throw against the wall and see if it sticks", language. Some of the Python "macho" principles:
duck typing
EAFP
We are all consenting adults

There is no tool to test the package structure per se, and I'm unsure of what would be tested. Almost any structure is a valid structure...
But there are some tools to help you test your package data if you are distributing your module, they may be useful:
Pyroma will check the packages meta data.
check-manifest will check the MANIFEST.in file.
I have both of them installed and also uses zest.releaser which also has some basic sanity-checks. But none of these will check that the code is OK, so it won't look for the __init__ files, for example.

Related

how can I find python 3 codes only on github?

Is there a way to know which codes on github are in python 3? So far, I haven't any mentionned.
Tks

Very simply:
Yes, you can find Python3 code.
No, you cannot do so effectively.
Code on GitHub is not identified by language and version -- obviously, since you would have found that in your investigations before coding. Yes, you can generally determine the language of a code file, but only with detailed examination -- you would almost need to pass the file to a Python compiler and reject any with syntax errors. This is not an effective process.
You can reduce the search somewhat by gleaning *.py files and then look for frequent, 3-specific features, such as all print commands using parentheses (coding style in Python 2, mandated in Python 3). This merely reduces the problem; it does not give you a good request mechanism.

How to protect my Python code before distribution?

I have written a python code which takes an input data file, performs some processing on the data and writes another data file as output.
I should distribute my code now but the users should not see the source code but be able to just giving the input and getting the output!
I have never done this before.
I would appreciate any advice on how to achieve this in the easiest way.
Thanks a lot in advance

As Python is an interpreted language by design; and as it compiles code to a bytecode (- which doesn't help the fact you're trying to conceal it, as bytecodes are easier to reverse -) there's no real secure way to hide your source code whereby it is not recoverable, as is true for any programming language, really.
Initially, if you'd wanted to work with a language that can't be so easily reversed- you should've gone for a more native language which compiles directly to the underlying architecture's machine code which is significantly harder to reproduce in the original language let alone read due to neat compiler optimizations, the overhead given by CISC et cetera.
However, some libraries that do convert your source code into an executable format (by packing the Python interpreter and the bytecode alongside it) can be used such as:
cx_Freeze - for freezing any code >=Python 2.7 for any platform, allegedly.
PyInstaller - for freezing general purpose code, it does state additionally that it works with third-party libraries.
py2exe -for freezing code into Windows-only executable format.
Or you might consider a substitute for this, which is code obfuscation which still allows the user to read the source code however make it near-to-impossible to read.
However, an issue brought up with this is that, it'd be harder for code addition as bad code obfuscation techniques could make the code static. Also, on the latter case, the code could have overhead brought by redundant code meant to fool or trick the user into thinking the code is doing something which it is not.
Also in general it negates the standard practice of open-source which is what Python loves to do and support.
So to really conclude, if you don't want to read everything above; the first thing you did wrong was choose Python for this, a language that supports open source and is open source as well. Thus to mitigate the issue you should either reconsider the language, or follow the references above to links to modules which might help aide basic source code concealment.

Firstly, as Python is an interpreted language, I think you cannot completely protect your Python code, .pyc files can be uncompiled to get back .py files (using uncompyle6 for example).
So the only thing you can do is make it very hard to read.
I recommend to have a look at code obfuscation, which consists in making your code unreadable by changing variables/function names, removing comments and docstrings, removing useless spaces, etc. Pyminifier does that kind of things.
You can also write your own obfuscation script.
Then you can also turn your program into a single executable (using pyinstaller for example). I am pretty sure there is a way to get .py files back from the executable, but it just makes it harder. Also beware of cross-platform compatibility when making an executable.

Going through above responses, my understanding is that some of the strategies mentioned may not work if your client wants to execute your protected script along with other unprotected scripts.
One other option is to encrypt your script and then use an interpreter that can decrypt and execute it. It too has some limitations.
ipepycrypter is a suite that helps protect python scripts. This is accomplished by hiding script implementation through encryption. The encrypted script is executed by modifed python interpreter. ipepycrypter consists of encryption tool ipepycrypt and python interpreter ipepython.
More information is available at https://ipencrypter.com/user-guides/ipepycrypter/

One other option, of course, is to expose the functionality over the web, so that the user can interact through the browser without ever having access to the actual code.

There are several tools which compile Python code into either (a) compiled modules usable with CPython, or (b) a self-contained executable.
https://cython.org/ is the best known, and probably? oldest, and it only takes a very small amount of effort to prepare a traditional Python package so that it can be compiled with Cython.
http://numba.pydata.org/ and https://pythran.readthedocs.io/ can also be used in this way, to produce Python compiled modules such that the source doesnt need to be distributed, and it will be very difficult to decompile the distributable back into usable source code.
https://mypyc.readthedocs.io is newer player, an offshoot of the mypy toolkit.
Nuitka is the most advanced at creating a self-contained executable. https://github.com/Nuitka/Nuitka/issues/392#issuecomment-833396517 shows that it is very hard to de-compile code once it has passed through Nuitka.
https://github.com/indygreg/PyOxidizer is another tool worth considering, as it creates a self-contained executable of all the needed packages. By default, only basic IP protection is provided, in that the packages inside it are not trivial to inspect. However for someone with a bit of knowledge of the tool, it is trivial to see the packages enclosed within the binary. However it is possible to add custom module loaders, so that the "modules" in the binary can be stored in unintelligible formats.
Finally, there are many Python to C/go/rust/etc transpilers, however these will very likely not be usable except for small subsets of the language (e.g. will 3/0 throw the appropriate exception in the target language?), and likely will only support a very limited subset of the standard library, and are unlikely to support any imports of packages beyond the standard library. One example is https://github.com/py2many/py2many , but a search for "Python transpiler" will give you many to consider.

Why has the syntax changed from flask.ext.* to flask_*?

It looks like there was a deprecation. How was that decided? Is there a difference between Python 3 and Python 2?

The old flask.ext was deprecated in issue #1135, which was created back in 2014. The actual deprecation notice was turned on in 2016. The reasoning behind the deprecation is:
Some introductory information for new contributors:
Flask used to have flaskext as a namespace for extensions, so they were importable as flaskext.foo. This didn't work well, so the new form flask_foo was introduced. flask.ext.foo is a compatibility layer that will try to import both variants. See http://flask.pocoo.org/docs/0.10/extensions/
flask.ext.foo is hard to maintain, and since now all extensions have switched to the new package naming scheme, it is no longer worth it. We want to deprecate it for 1.0, so we need some sort of tool which can help users to rewrite all their old imports in their apps.
One could write a Python script similar to this beast. This will get the job done, but as its docstring says, it's a terrible hack.
lib2to3 proved useful for writing larger migration tools, but it's nontrivial to use it.
https://github.com/mitsuhiko/python-modernize/ is one based on it, and it seems to me that's the easiest project one could rip off from.
I wasn't able to find complete tutorials that are useful for this. Most seem to be focused on porting to Python 3, which would imply running the default 2to3 fixers on the user's codebase (which we definetly don't want)
One will have to read the sourcecode of 2to3 and lib2to3 to understand, i think. This is doable by entering libraryname hg.python.org into Google, where the libraryname is either 2to3 or lib2to3.
The current state for doing sourcecode manipulation in Python sucks, and i'd like to see a library which wraps lib2to3 and provides a more concise API.
The old .ext was a compatibility layer to support the old flaskext module while waiting for flask_ to standardize.
This separates the flask. namespace from each module's namespace, as the module now lives completely in its own module (flask_module) instead of being loaded into a general namespace for all extensions in Flask. It's also clearer that the module is not bundled as a part of Flask.

deploying python applications

Is it possible to deploy python applications such that you don't release the source code and you don't have to be sure the customer has python installed?
I'm thinking maybe there is some installation process that can run a python app from just the .pyc files and a shared library containing the interpreter or something like that?
Basically I'm keen to get the development benefits of a language like Python - high productivity etc. but can't quite see how you could deploy it professionally to a customer where you don't know how there machine is set up and you definitely can't deliver the source.
How do professional software houses developing in python do it (or maybe the answer is that they don't) ?

You protect your source code legally, not technologically. Distributing py files really isn't a big deal. The only technological solution here is not to ship your program (which is really becoming more popular these days, as software is provided over the internet rather than fully installed locally more often.)
If you don't want the user to have to have Python installed but want to run Python programs, you'll have to bundle Python. Your resistance to doing so seems quite odd to me. Java programs have to either bundle or anticipate the JVM's presence. C programs have to either bundle or anticipate libc's presence (usually the latter), etc. There's nothing hacky about using what you need.
Professional Python desktop software bundles Python, either through something like py2exe/cx_Freeze/some in-house thing that does the same thing or through embedding Python (in which case Python comes along as a library rather than an executable). The former approach is usually a lot more powerful and robust.

Yes, it is possible to make installation packages. Look for py2exe, cx_freeze and others.
No, it is not possible to keep the source code completely safe. There are always ways to decompile.
Original source code can trivially be obtained from .pyc files if someone wants to do it. Code obfuscation would make it more difficult to do something with the code.

I am surprised no one mentioned this before now, but Cython seems like a viable solution to this problem. It will take your Python code and transpile it into CPython compatible C code. You also get a small speed boost (~25% last I checked) since it will be compiled to native machine code instead of just Python byte code. You still need to be sure the user has Python installed (either by making it a pre-requisite pushed off onto the user to deal with, or bundling it as part of the installer process). Also, you do need to have at least one small part of your application in pure Python: the hook into the main function.
So you would need something basic like this:
import cython_compiled_module
if __name__ == '__main__':
cython_compiled_module.main()
But this effectively leaks no implementation details. I think using Cython should meet the criteria in the question, but it also introduces the added complexity of compiling in C, which loses some of Python's easy cross-platform nature. Whether that is worth it or not is up to you.
As others stated, even the resulting compiled C code could be decompiled with a little effort, but it is likely much more close to the type of obfuscation you were initially hoping for.

Well, it depends what you want to do. If by "not releasing the source code" you mean "the customer should not be able to access the source code in any way", well, you're fighting a losing battle. Even programs written in C can be reverse engineered, after all. If you're afraid someone will steal from you, make them sign a contract and sue them if there's trouble.
But if you mean "the customer should not care about python files, and not be able to casually access them", you can use a solution like cx_Freeze to turn your Python application into an executable.

Build a web application in python. Then the world can use it via a browser with zero install.

Checking Python code correctness

In C++ I have compiler that tell me if something wrong with my code after refactoring. How to make sure that Python code is at least correct after changes? There may be some stupid error like wrong function name etc. that pretty easy to find in compile time.
Thanks

Looks like PyChecker or pylint are what you're looking for

use editor / IDE that supports code highlighting. E.g., Notepad++ has word-highlighting feature that I find very useful.
use unit tests
stupid errors will be weeded out first, so I wouldn't worry to much about this type of errors. it's "smart" error you should be afraid of.

Use tools such as pylint or PyChecker.
Write unit tests.

Unit test. http://docs.python.org/library/unittest.html
If your tests are written at a reasonable level of granularity, it can be as fast to unit test as it is to run lint or a compiler.

Static analysis (as from the IDE, or from tools like pyLint and pyChecker) is a very quick and effective way to check simple errors, and enforce a common style.
Unit tests are a great way to ensure the code stands for its contract.
Code reviews and pair programming are one of the best ways to find errors of all sorts, and to spread knowledge in a team.
All of the options require some time, to setup and to execute. However, the gains are tremendous, and far higher than the investment.

Eclipse has a good python plugin for doing the syntax highlighting and debugging.

Pylint is almost doing what you are looking for.
You can also force the compilation of your python files. That will show some basic syntax error (it doesn't have all the capability of a c++ compiler)
I've read this article and decided to make an automated build system with pyDev and ant. It does the compilation of the python files and is running the unit tests. Next step is to integrate pylint to that process.
I hope it helps

As with other languages, you should use assertions liberally throughout your code. Use assertions when you must rely on the predicate to be true for the program to run, not as exception/error handling. An assertion should be used to check for irrecoverable errors and force the program to crash. More on assertions (and python error checking in general)

You may need this:
python -m py_compile script.py

You might also want to check out PEP8 as a style guide for Python Code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.