I've previously used http://code.google.com/p/wkhtmltopdf/ with http://pypi.python.org/pypi/wkhtmltopdf/0.2 to create screenshots of websites from the command line. However, I was wondering whether a pure python package exists, that can do the same. Currently I always need to download the correct binary of http://code.google.com/p/wkhtmltopdf/ if I switch computers. A pure python package would relieve me from this. Any ideas?
That would require a browser engine written in pure python. And this means you need a CSS processor und, more important, a complete Javascript engine written in Python. While this is undoubtedly possible, I'm pretty sure nobody has done it.
Related
I have written a python code which takes an input data file, performs some processing on the data and writes another data file as output.
I should distribute my code now but the users should not see the source code but be able to just giving the input and getting the output!
I have never done this before.
I would appreciate any advice on how to achieve this in the easiest way.
Thanks a lot in advance
As Python is an interpreted language by design; and as it compiles code to a bytecode (- which doesn't help the fact you're trying to conceal it, as bytecodes are easier to reverse -) there's no real secure way to hide your source code whereby it is not recoverable, as is true for any programming language, really.
Initially, if you'd wanted to work with a language that can't be so easily reversed- you should've gone for a more native language which compiles directly to the underlying architecture's machine code which is significantly harder to reproduce in the original language let alone read due to neat compiler optimizations, the overhead given by CISC et cetera.
However, some libraries that do convert your source code into an executable format (by packing the Python interpreter and the bytecode alongside it) can be used such as:
cx_Freeze - for freezing any code >=Python 2.7 for any platform, allegedly.
PyInstaller - for freezing general purpose code, it does state additionally that it works with third-party libraries.
py2exe -for freezing code into Windows-only executable format.
Or you might consider a substitute for this, which is code obfuscation which still allows the user to read the source code however make it near-to-impossible to read.
However, an issue brought up with this is that, it'd be harder for code addition as bad code obfuscation techniques could make the code static. Also, on the latter case, the code could have overhead brought by redundant code meant to fool or trick the user into thinking the code is doing something which it is not.
Also in general it negates the standard practice of open-source which is what Python loves to do and support.
So to really conclude, if you don't want to read everything above; the first thing you did wrong was choose Python for this, a language that supports open source and is open source as well. Thus to mitigate the issue you should either reconsider the language, or follow the references above to links to modules which might help aide basic source code concealment.
Firstly, as Python is an interpreted language, I think you cannot completely protect your Python code, .pyc files can be uncompiled to get back .py files (using uncompyle6 for example).
So the only thing you can do is make it very hard to read.
I recommend to have a look at code obfuscation, which consists in making your code unreadable by changing variables/function names, removing comments and docstrings, removing useless spaces, etc. Pyminifier does that kind of things.
You can also write your own obfuscation script.
Then you can also turn your program into a single executable (using pyinstaller for example). I am pretty sure there is a way to get .py files back from the executable, but it just makes it harder. Also beware of cross-platform compatibility when making an executable.
Going through above responses, my understanding is that some of the strategies mentioned may not work if your client wants to execute your protected script along with other unprotected scripts.
One other option is to encrypt your script and then use an interpreter that can decrypt and execute it. It too has some limitations.
ipepycrypter is a suite that helps protect python scripts. This is accomplished by hiding script implementation through encryption. The encrypted script is executed by modifed python interpreter. ipepycrypter consists of encryption tool ipepycrypt and python interpreter ipepython.
More information is available at https://ipencrypter.com/user-guides/ipepycrypter/
One other option, of course, is to expose the functionality over the web, so that the user can interact through the browser without ever having access to the actual code.
There are several tools which compile Python code into either (a) compiled modules usable with CPython, or (b) a self-contained executable.
https://cython.org/ is the best known, and probably? oldest, and it only takes a very small amount of effort to prepare a traditional Python package so that it can be compiled with Cython.
http://numba.pydata.org/ and https://pythran.readthedocs.io/ can also be used in this way, to produce Python compiled modules such that the source doesnt need to be distributed, and it will be very difficult to decompile the distributable back into usable source code.
https://mypyc.readthedocs.io is newer player, an offshoot of the mypy toolkit.
Nuitka is the most advanced at creating a self-contained executable. https://github.com/Nuitka/Nuitka/issues/392#issuecomment-833396517 shows that it is very hard to de-compile code once it has passed through Nuitka.
https://github.com/indygreg/PyOxidizer is another tool worth considering, as it creates a self-contained executable of all the needed packages. By default, only basic IP protection is provided, in that the packages inside it are not trivial to inspect. However for someone with a bit of knowledge of the tool, it is trivial to see the packages enclosed within the binary. However it is possible to add custom module loaders, so that the "modules" in the binary can be stored in unintelligible formats.
Finally, there are many Python to C/go/rust/etc transpilers, however these will very likely not be usable except for small subsets of the language (e.g. will 3/0 throw the appropriate exception in the target language?), and likely will only support a very limited subset of the standard library, and are unlikely to support any imports of packages beyond the standard library. One example is https://github.com/py2many/py2many , but a search for "Python transpiler" will give you many to consider.
During developing one of my applications, I've come to a point where I'd like to give the users a more powerful filter. Therefore, I'd like to provide a simple scripting interface to the users. The scripting language would be Python.
For obvious reasons, I'd like to tighten the scope of the language to match my particular purposes (I don't want the users to touch the server's HDD files etc.). I also don't want to write a Python interpreter myself (which would be reinventing the wheel and the "new" wheel would be rectangular in the end). However, I haven't found any suitable library or module for this purpose.
Groovy's approach with its Compilation Customizers and Compiler Configuration would be exactly what I want, does something similar exist for Python?
What you're looking for is called a "sandbox" or "restricted execution." This wiki page discusses some of the details.
In a nutshell, there have been several efforts by Python geeks and gurus to build a sandbox on top of Python but they all failed eventually.
The main reason is that Python offers so many paths to do something that the sandbox would either have to forbid common use cases (rendering a lot of the library and Python code useless) or it would have to have holes in the sandbox which would make the concept useless.
So while it looks like a good and simple idea, so far, there is no solution. AFAIK, there are no hooks in Python to tweak the byte code compiler to achieve something like Groovy Sandbox.
Related:
How can I sandbox Python in pure Python?
Is there a "safe" subset of Python for use as an embedded scripting language?
Is it possible to deploy python applications such that you don't release the source code and you don't have to be sure the customer has python installed?
I'm thinking maybe there is some installation process that can run a python app from just the .pyc files and a shared library containing the interpreter or something like that?
Basically I'm keen to get the development benefits of a language like Python - high productivity etc. but can't quite see how you could deploy it professionally to a customer where you don't know how there machine is set up and you definitely can't deliver the source.
How do professional software houses developing in python do it (or maybe the answer is that they don't) ?
You protect your source code legally, not technologically. Distributing py files really isn't a big deal. The only technological solution here is not to ship your program (which is really becoming more popular these days, as software is provided over the internet rather than fully installed locally more often.)
If you don't want the user to have to have Python installed but want to run Python programs, you'll have to bundle Python. Your resistance to doing so seems quite odd to me. Java programs have to either bundle or anticipate the JVM's presence. C programs have to either bundle or anticipate libc's presence (usually the latter), etc. There's nothing hacky about using what you need.
Professional Python desktop software bundles Python, either through something like py2exe/cx_Freeze/some in-house thing that does the same thing or through embedding Python (in which case Python comes along as a library rather than an executable). The former approach is usually a lot more powerful and robust.
Yes, it is possible to make installation packages. Look for py2exe, cx_freeze and others.
No, it is not possible to keep the source code completely safe. There are always ways to decompile.
Original source code can trivially be obtained from .pyc files if someone wants to do it. Code obfuscation would make it more difficult to do something with the code.
I am surprised no one mentioned this before now, but Cython seems like a viable solution to this problem. It will take your Python code and transpile it into CPython compatible C code. You also get a small speed boost (~25% last I checked) since it will be compiled to native machine code instead of just Python byte code. You still need to be sure the user has Python installed (either by making it a pre-requisite pushed off onto the user to deal with, or bundling it as part of the installer process). Also, you do need to have at least one small part of your application in pure Python: the hook into the main function.
So you would need something basic like this:
import cython_compiled_module
if __name__ == '__main__':
cython_compiled_module.main()
But this effectively leaks no implementation details. I think using Cython should meet the criteria in the question, but it also introduces the added complexity of compiling in C, which loses some of Python's easy cross-platform nature. Whether that is worth it or not is up to you.
As others stated, even the resulting compiled C code could be decompiled with a little effort, but it is likely much more close to the type of obfuscation you were initially hoping for.
Well, it depends what you want to do. If by "not releasing the source code" you mean "the customer should not be able to access the source code in any way", well, you're fighting a losing battle. Even programs written in C can be reverse engineered, after all. If you're afraid someone will steal from you, make them sign a contract and sue them if there's trouble.
But if you mean "the customer should not care about python files, and not be able to casually access them", you can use a solution like cx_Freeze to turn your Python application into an executable.
Build a web application in python. Then the world can use it via a browser with zero install.
Recently I have become a fan of storing various settings used for my testing scripts in the OSX defaults system as it allows me to keep various scripts in git and push them to github without worrying about leaving passwords/settings/etc hardcoded into the script.
When writing a shell script using simple bash commands, it is easy enough to use backticks to call the defaults binary to read the preferences and if there is an error reading the preference, the script stops execution and you can see the error and fix it. When I try to do a similar thing in Python or Ruby it tends to be a little more annoying since you have to do additional work to check the return code of defaults to see if there is an error.
I have been attempting to search via google off and on for a library to use the OSX defaults system which ends up being somewhat difficult when "defaults" is part of your query string.
I thought of trying to read the plist files directly but it seems like the plist libraries I have found (such as the built in python one) are only able to read the XML ones (not the binary ones) which is a problem if I ever set anything with the defaults program since it will convert it back to a binary plist.
Recently while trying another search for a Python library I changed the search terms to something using something like NSUserDefaults (I have now forgotten the exact term) I found a Python library called userdefaults but it was developed for an older version of OSX (10.2) with an older version of Python (2.3) and I have not had much luck in getting it to compile on OSX 10.6 and Python 2.6
Ideally I would like to find a library that would make it easy to read from (and as a bonus write to) the OSX defaults system in a way similar to the following python psudo code.
from some.library.defaults import defaults
settings = defaults('com.example.app')
print settings['setting_key']
Since I am also starting to use Ruby more, I would also like to find a Ruby library with similar functionality.
It may be that I have to eventually just 'give up' and write my own simple library around the defaults binary but I thought it wouldn't hurt to try to query others to see if there was an existing solution.
You´ll want to use PyObjC: have a look at this article at mactech.com (in specific: scroll down to "Accessing plists Via Python"). And this article from oreilly on PyObjC.
Run this, for example:
from Foundation import *
standardUserDefaults = NSUserDefaults.standardUserDefaults()
persistentDomains = standardUserDefaults.persistentDomainNames()
persistentDomains.objectAtIndex_(14)
aDomain = standardUserDefaults.persistentDomainForName_(persistentDomains[14])
aDomain.keys()
I'm currently looking at python because I really like the text parsing capabilities and the nltk library, but traditionally I am a .Net/C# programmer. I don't think IronPython is an integration point for me because I am using NLTK and presumably would need a port of that library to the CLR. I've looked a little at Python for .NET and was wondering if this was a good place to start. Is there a way to marshal a python class into C#? Also, is this solution still being used? Better yet, has anyone done this? One thing I am considering is just using a persistence medium as a go-between (parse in Python, store in MongoDB, and run site in .NET).
NLTK is pure-python and thus can be made to run on IronPython easily. A search turned up this ticket - all one has to do is install a couple of extra Python libraries that don't come by default with IronPython.
This is probably the easiest way for you to integrate. Otherwise, you'll have to either run Python as a subprocess, which sounds complex, or run Python as a server that answers your requests. This is probably the most scalable, though complex, approach. If you go this way, consider Twisted to simplify the server code.
But do try IronPython first...
I don't know why you have a problem with IronPython. you can still use any and all nltk calls there.
To answer your question about porting a Python class into C#: try compiling your python code into an EXE. This creates a DLL with all your python classes in it. This is something that has been around for a while and it has worked like a charm for me in the past
Just an Idea
How about running Python behind as a server, and connect it from .NET with socket?
Since NLTK loading take time and better load it in advance anyway.