For some routine work I found that combining different scripting languages can be the fastest way to do. Like I have some main bash script which calles some awk, pyton and bash scripts or even some compiled fortran executables.
I can put all the files into a folder that is in the paths, but it makes modification is a bit slower. If I need a new copy with some modifications I need to add another path to $PATH as well.
Is there a way to make merge these files as a single executable?
For example: tar all the files together and explain somehow that the main script is main.sh? This way I could simply vi the file, modify, run, modify, run ... but I could move the file between folders and machines easily. Also dependencies could be handle properly (executing the tar could set PATH itself).
I hope this dream does exist! Thanks for the comments!
Janos
From a software engineering point of view, this approach does not sound so great because your programs would be be badly structured by design (but maybe that is no problem in your case). Also, you will lose syntax highlighting support in your editor when combining multiple languages in one file. I would rather suggest building a package of all your programs, distributing them
together, and after deployment have the main script call the other programs.
That being said, you could use here documents in Bash, see bash(1) (search for "Here Documents"), to include other text scripts in your main script, then have the main script write them to a temporary directory and execute them. This will not help you with your compiled code though, unless you have a compiler on your target machine.
This Linux Journal Article might be interesting for you too, as it shows how to include a binary payload (e.g. a tgz file) in a Bash script. You could use this to add a compressed archive containing your programs to a never-changing main script.
Thanks for the answers!
Of course this is not a good approach for providing a published code! Sorry if it was confusing. But this is a good approach if you are developing some e.g. scientific idea, and you wish to obtain a proof of concept result fast and you wish to do similar tasks several times but replacing fast some parts of the algorithm. Note, that sometimes many codes are available for some parts of the task. These codes are sometimes needed to be modified a bit (a few lines). I am a big believer of re-implementing everything, but first it is good to know if it worth to do!
For some compromise: can I call a script externally that is wrapped in some tar or zip and is not compressed?
Thanks again,
J.
Related
I have a question about adding project path to python, for facilitating import effort.
Situation
When I write code in Python, I usually add necessary path to sys.path by using
import sys
sys.path.append("/path/to/dir/") # almost every `.py` need this
Sometimes, when my project gets bigger with many levels of directories, this approach seems bulky and error-prone (especially when I re-organize my files)
Recently, I start using a bash script (located at project root directory) that adding the sys.path.append with project root argument to .py file in the project. With this approach, I hardly have to manually care about importing a module.
Question
My question is: Is that a good practice? I find it convenient for myself, compared to my old method, but since the bash script is a separated file, I need 2 command to run any script in my project (one for the bash and one for the .py). I can include the command calling .py to the bash, but it far less flexible than directly call it from terminal.
Really want to hear some advices! Thanks in advance. Any suggestion will be gratefully appreciated!
It is generally not good practice to use manipulate sys.path within a python library or program. You should add the relevant paths to the PYTHONPATH in the calling environment for your python program:
PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH" python ...
or
export PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH"
python ...
This allows you to easily manipulate the paths that your program or library will search for dependencies easily without modifying your code.
It is also very easy to manage this in your personal development environment by modifying your bashrc or in your production environments in your init script (or other wrapper script) and provides you with one place to update each time you add or modify your project paths.
Given that you mention that you have almost one directory per .py file, you should also consider how your code might be organized into packages to further simplify your setup.
It's not a particularly good practice, though you could get away with it. Better to look into virtualenv though (or pipenv) for a smoother workflow.
I have written a python code which takes an input data file, performs some processing on the data and writes another data file as output.
I should distribute my code now but the users should not see the source code but be able to just giving the input and getting the output!
I have never done this before.
I would appreciate any advice on how to achieve this in the easiest way.
Thanks a lot in advance
As Python is an interpreted language by design; and as it compiles code to a bytecode (- which doesn't help the fact you're trying to conceal it, as bytecodes are easier to reverse -) there's no real secure way to hide your source code whereby it is not recoverable, as is true for any programming language, really.
Initially, if you'd wanted to work with a language that can't be so easily reversed- you should've gone for a more native language which compiles directly to the underlying architecture's machine code which is significantly harder to reproduce in the original language let alone read due to neat compiler optimizations, the overhead given by CISC et cetera.
However, some libraries that do convert your source code into an executable format (by packing the Python interpreter and the bytecode alongside it) can be used such as:
cx_Freeze - for freezing any code >=Python 2.7 for any platform, allegedly.
PyInstaller - for freezing general purpose code, it does state additionally that it works with third-party libraries.
py2exe -for freezing code into Windows-only executable format.
Or you might consider a substitute for this, which is code obfuscation which still allows the user to read the source code however make it near-to-impossible to read.
However, an issue brought up with this is that, it'd be harder for code addition as bad code obfuscation techniques could make the code static. Also, on the latter case, the code could have overhead brought by redundant code meant to fool or trick the user into thinking the code is doing something which it is not.
Also in general it negates the standard practice of open-source which is what Python loves to do and support.
So to really conclude, if you don't want to read everything above; the first thing you did wrong was choose Python for this, a language that supports open source and is open source as well. Thus to mitigate the issue you should either reconsider the language, or follow the references above to links to modules which might help aide basic source code concealment.
Firstly, as Python is an interpreted language, I think you cannot completely protect your Python code, .pyc files can be uncompiled to get back .py files (using uncompyle6 for example).
So the only thing you can do is make it very hard to read.
I recommend to have a look at code obfuscation, which consists in making your code unreadable by changing variables/function names, removing comments and docstrings, removing useless spaces, etc. Pyminifier does that kind of things.
You can also write your own obfuscation script.
Then you can also turn your program into a single executable (using pyinstaller for example). I am pretty sure there is a way to get .py files back from the executable, but it just makes it harder. Also beware of cross-platform compatibility when making an executable.
Going through above responses, my understanding is that some of the strategies mentioned may not work if your client wants to execute your protected script along with other unprotected scripts.
One other option is to encrypt your script and then use an interpreter that can decrypt and execute it. It too has some limitations.
ipepycrypter is a suite that helps protect python scripts. This is accomplished by hiding script implementation through encryption. The encrypted script is executed by modifed python interpreter. ipepycrypter consists of encryption tool ipepycrypt and python interpreter ipepython.
More information is available at https://ipencrypter.com/user-guides/ipepycrypter/
One other option, of course, is to expose the functionality over the web, so that the user can interact through the browser without ever having access to the actual code.
There are several tools which compile Python code into either (a) compiled modules usable with CPython, or (b) a self-contained executable.
https://cython.org/ is the best known, and probably? oldest, and it only takes a very small amount of effort to prepare a traditional Python package so that it can be compiled with Cython.
http://numba.pydata.org/ and https://pythran.readthedocs.io/ can also be used in this way, to produce Python compiled modules such that the source doesnt need to be distributed, and it will be very difficult to decompile the distributable back into usable source code.
https://mypyc.readthedocs.io is newer player, an offshoot of the mypy toolkit.
Nuitka is the most advanced at creating a self-contained executable. https://github.com/Nuitka/Nuitka/issues/392#issuecomment-833396517 shows that it is very hard to de-compile code once it has passed through Nuitka.
https://github.com/indygreg/PyOxidizer is another tool worth considering, as it creates a self-contained executable of all the needed packages. By default, only basic IP protection is provided, in that the packages inside it are not trivial to inspect. However for someone with a bit of knowledge of the tool, it is trivial to see the packages enclosed within the binary. However it is possible to add custom module loaders, so that the "modules" in the binary can be stored in unintelligible formats.
Finally, there are many Python to C/go/rust/etc transpilers, however these will very likely not be usable except for small subsets of the language (e.g. will 3/0 throw the appropriate exception in the target language?), and likely will only support a very limited subset of the standard library, and are unlikely to support any imports of packages beyond the standard library. One example is https://github.com/py2many/py2many , but a search for "Python transpiler" will give you many to consider.
I have been developing a fairly extensive library of python modules that automate the more time consuming parts of "3D character development" for games/film/tv.
All of my code up until a few months ago has been run within Maya's dedicated python interpreter, however, my GUIs are built in PySide/PyQt, and so, run just fine in mac/windows/linux or a few other Graphics programs such as Nuke, XSI, Max.
What I would really like to figure out is a "simple" way to distribute my code to various different people ---> using various different operating Systems ---> potentially using various applications (Nuke, XSI, Max), which, in turn, have their own dedicated python interpreters.
The obvious option would be pip and easy_install.. These modules are clearly the "right" way to go, but its not really clear how a user would install/run them under the dedicated python installs that ship with Maya/Nuke/ etc...Though, it does seem possible (as explained here). Still Its going to be a pretty big barrier for a less-technical user.
Any help or points in the right direction would be immensely appreciated..
I would not say that pip/easy_install are the 'right' way for this problem. They are pretty good (not quite 'great') tools for motivated, technically inclined users -- but even in that context they have issues (such as unintended upgrades or deletions). Most importantly, they are opt-in methods: nobody can make you pip unless you want to. This means users can accidentally or deliberately get themselves into very different positions from each other, which makes support and maintenance a nightmare.
I've had very good luck in Maya distributing a zipped file containing a complete environment - all the modules etc. userSetup.py adds that zip to the path and the Python's native zipimport functionality handles the rest. This makes sure that there is only one file to maintain and distribute. It also fixes the common problem of leftover .pyc files creating havok after .py files get moved or renamed. Since this is all standard python, I'd assume this will work for any app-specific python that uses a 2.6+ version of python, though I've never tried it in Nuke or Max.
The main wrinkle will be modules with .pyd or other binary components, typically these don't work inside the zip files. I include a bootstrap routine which unpacks those to a (disposable) location on the user's disk and adds that to the path.
There's a detailed discussion of the method here and some background here
I'm working on an Inno Setup installer for a Python application for Windows 7, and I have these requirements:
The app shouldn't write anything to the installation directory
It should be able to use .pyc files
The app shouldn't require a specific Python version, so I can't just add a set of .pyc files to the installer
Is there a recommended way of handling this? Like give the user a way to (re)generate the .pyc files? Or is the shorter startup time benefit from the .pyc files usually not worth worrying about?
PYC files aren't guaranteed to be compatible for different python versions. If you don't know that all your customers are running the same python versions, you really don't want to distribute pyc's directly. So, you have to choose between distributing PYCs and supporting multiple python versions.
You could create build process that compiles all your files using py_compile and zips them up into a version-specific package. You can do this with setuptools.; however it will be awkward to do because you'll have to run py_compile in every version you need to support.
If you are basically distributing a closed application and don't want people to have trivial access to your source code, then py2exe is probably a simpler alternative. If your python is supposed to be integrated into the user's python install, then it's probably simpler to just create a zip of your .py files and add a one-line .py stub that imports the zipped package(s) using zipfile
if it makes you feel better, PYC doesn't provide much extra security and it doesn't really boost perf much either :)
If you haven't read PEP 3147, that will probably answer your questions.
I don't mean the solution described in that PEP and implemented as of Python 3.2. That's great if your "multiple Python versions" just means "3.2, 3.3, and probably future 3.x". Or even if it means "2.6+ and 3.1+, but I only really care about 3.2 and 3.3, so if I don't get the pyc speedups for other ones that's OK".
But when I asked your supported versions, you said, "2.7", which means you can't rely on PEP 3147 to solve your problems.
Fortunately, the PEP is full of discussion of earlier attempts to solve the problem, and the pitfalls of each, and there should be more than enough there to figure out what the options are and how to implement them.
The one problem is that the PEP is very linux-centric—mainly because it's primarily linux distros that tried to solve the problem in the past. (Apple also did so, but their solution was (a) pretty much working, and (b) tightly coupled with the whole Mac-specific "framework" thing, so they were mostly ignored…)
So, it largely leaves open the question of "Where should I put the .pyc files on Windows?"
The best choice is probably an app-specific directory under the user's local application data directory. See Known Folders if you can require Vista or later, CSIDL if you can't. Either way, you're looking for the FOLDERID_LocalAppData or CSIDL_LOCAL_APPDATA, which is:
The file system directory that serves as a data repository for local (nonroaming) applications. A typical path is C:\Documents and Settings\username\Local Settings\Application Data.
The point is that it's a place for applications to store data that's separate for each user (and inside that user's profile directory), and also for each machine the user's roaming profile might end up on, which means you can safely put stuff there and know that the user has the permissions to write there without UAC getting involved, and also know (as well as you ever can) that no other user or machine will interfere with what's there.
Within that directory, you create a directory for your program, and put whatever you want there, and as long as you picked a unique name (e.g., My Unique App Name or My Company Name\My App Name or a UUID), you're safe from accidental collision with other programs. (There used to be specific guidelines on this in MSDN, but I can no longer find them.)
So, how do you get to that directory?
The easiest way is to just use the env variable %LOCALAPPDATA%. If you need to deal with older Windows, you can use %USERPROFILE% and tack \Local Settings\Application Data onto the end, which is guaranteed to either be the same, or end up in the same place via junctions.
You can also use pywin32 or ctypes to access the native Windows APIs (since there are at least 3 different APIs for this and at least two ways to access those APIs, I don't want to give all possible ways to write this… but a quick google or SO search for "pywin32 SHGetFolderPath" or "ctypes SHGetKnownFolderPath" or whatever should give you what you need).
Or, there are multiple third-party modules to handle this. The first one both Google and PyPI turned up was winshell.
Re-reading the original question, there's a much simpler answer that probably fits your requirements.
I don't know much about Inno, but most installers give you a way to run an arbitrary command as a post-copy step.
So, you can just use python -m compileall to create the .pyc files for you at install time—while you've still got elevated privileges, so there's no problem with UAC.
In fact, if you look at pywin32, and various other Python packages that come as installer packages, they do exactly this. This is an idiomatic thing to do for installing libraries into the user's Python installation, so I don't see why it wouldn't be considered reasonable for installing an executable that uses the user's Python installation.
Of course if the user later decides to uninstall Python 2.6 and install 2.7, your .pyc files will be hosed… but from your description, it sounds like your entire program will be hosed anyway, and the recommended solution for the user would probably be to uninstall and reinstall anyway, right?
I decided to rewrite all our Bash scripts in Python (there are not so many of them) as my first Python project. The reason for it is that although being quite fluent in Bash I feel it's somewhat archaic language and since our system is in the first stages of its developments I think switching to Python now will be the right thing to do.
Are there scripts that should always be written in Bash? For example, we have an init.d daemon script - is it OK to use Python for it?
We run CentOS.
Thanks.
It is OK in the sense that you can do it. But the scripts in /etc/init.d usually need to load config data and some functions (for example to print the nice green OK on the console) which will be hard to emulate in Python.
So try to convert those which make sense (i.e. those which contain complex logic). If you need job control (starting/stopping processes), then bash is better suited than Python.
Generally, scripts in /etc/init.d are written in the "native shell" of the OS (e.g. bash, sh, posix-sh, etc). This is especially true of scripts that will be run at the lower init levels (e.g. not every directory will be mounted in single user mode, including wherever python or the site-libraries might be installed).
Most OS's provide some "helper functions" that make writing scripts in some native shell easier. These scripts define certain return codes and messages that are required/desired when writing service scripts. On RedHat based systems, see:
/etc/init.d/functions
Beyond that, the service scripts in /etc/init.d can be written in any language (including compiled languages). The general calling syntax will need to be supported. Typically there are three arguments that should be supported: start, stop, and status. Some additional arguments might be appropriate, depending on the purpose of the scripts.
% /etc/init.d/foo (start|stop|status)
Every task has languages that are better suited for it and less so. Replacing the backtick ` quote of sh is pretty ponderous in Python as would be myriad quoting details, just to name a couple. There are likely better projects to cut your teeth on.
And all that they said above about Python being relatively heavyweight and not necessarily available when needed.
Certain scripts that I write simply involving looping over a glob in some directories, and then executing some a piped series of commands on them. This kind of thing is much more tedious in python.