Related
When a script is invoked explicitly with python, the argv is mucked with so that argv[0] is the path to the script being run. This is the case if invoked as python foo/bar.py or even as python -m foo.bar.
I need a way to recover the original argv (ie. the one received by python). Unfortunately, it's not as easy as prepending sys.executable to sys.argv because python foo/bar.py is different than python -m foo.bar (the implicit PYTHONPATH differs, which can be crucial depending on your module structure).
More specifically in the cases of python foo/bar.py some other args and python -m foo.bar some other args, I'm looking to recover ['python', 'foo/bar.py', 'some', 'other', 'args'] and ['python', '-m', 'foo.bar', 'some', 'other', 'args'], respectively.
I am aware of prior questions about this:
how to get the ORIGINAL command line in python? with spaces, tabs, etc
Full command line as it was typed
But these seem to have a misunderstanding of how shells work and the answers reflect this. I am not interested in undoing the work of the shell (eg. evaluated shell vars and functions are fine), I just want to get at the original argv given to python.
The only solution I've found is to use /proc/<PID>/cmdline:
import os
with open("/proc/{}/cmdline".format(os.getpid()), 'rb') as f:
original_argv = f.read().split('\0')[:-1]
This does work, but it is Linux-only (no OSX, and Windows support seems to require installing the wmi package). Fortunately for my current use case this restriction is fine. But, it would be nice to have a cleaner, cross platform approach.
The fact that that /proc/<PID>/cmdline approach works gives me hope that python isn't execing before it runs the script (at least not the syscall exec, but maybe the exec builtin). I remember reading somewhere that all of this argument handling (ex. -m) is done in pure python, not C (this is confirmed by the fact that python -m this.does.not.exist will produce an exception that looks like it came from the runtime). So, I'd venture a guess that somewhere in pure python the original argv is available (perhaps this requires some spelunking through the runtime initialization?).
tl;dr Is there a cross platform (builtin, preferably) way to get at the original argv passed to python (before it remove the python executable and transforms -m blah into blah.py)?
edit From spelunking, I discovered Py_GetArgcArgv, which can be accessed via ctypes (found it here, links to several SO posts that mention this approach):
import ctypes
_argv = ctypes.POINTER(ctypes.c_wchar_p)()
_argc = ctypes.c_int()
ctypes.pythonapi.Py_GetArgcArgv(ctypes.byref(_argc),
ctypes.byref(_argv))
argv = _argv[:_argc.value]
print(argv)
Now this is OS-portable, but not python implementation portable (only works on cpython and ctypes is yucky if you don't need it). Also, peculiarly, I don't get the right output on Ubunutu 16.04 (python -m foo.bar gives me ['python', '-m', '-m']), but I may just be making a silly mistake (I get the same behavior on OSX). It would be great to have a fully portable solution (that doesn't dig into ctypes).
Python 3.10 adds sys.orig_argv, which the docs describe as the arguments originally passed to the Python executable. If this isn't exactly what you're looking for, it may be helpful in this or similar cases.
There were a bunch of possibilities considered, including changing sys.argv, but this was, I think, wisely chosen as the most effective and non-disruptive option.
This seems XY problem and you are getting into the weeds in order to accommodate some existing complicated test setup (I've found the question behind the question in your comment). Further efforts would be better spent writing a sane test setup.
Use a better test runner, not unittest.
Create any initial state within the test setup, not in the external environment before entering the Python runtime.
Use a plugin for the randomization and seed stuff, personally I use this one but there are others.
For example if you decide to go with pytest runner, all the test setup can be configured within a [tool.pytest.ini_options] section of the pyproject.toml file and/or with a fixture defined in conftest.py. Overriding the default test configuration can be done with environment variables and/or command line arguments, and neither of these approaches will get mucked around by the shell or during Python interpreter startup.
The manner in which to execute the test suite can and should be as simple as executing a single command:
pytest
And then your perceived problem of needing to recover the original sys.argv will go away.
Your stated problem is:
User called my app with environment variables and arguments.
I want to display a "run like this" diagnostic that will exactly reproduce the results of the current run.
There are at least two solutions:
Abandon the "reproduction" aspect, since the original bash calling command is lost to the portable python app, and instead go for "same effect".
Use a wrapper to capture the original calling command, as suggested by Jean-François Fabre.
With (1) you would be willing to accept ['-m', 'foo'] becoming ['foo.py'], or even turning it into ['/some/dir/foo.py'] in case PYTHONPATH could cause trouble. Displaying ['a', 'b c'] as "a" "b c", or more concisely as a "b c", is straightforward. If environment variables like SEED are an important part of the command line interface then you'll need to iterate over envp and output them, as well. For true reproducibility, you might choose to convert input args to canonical form, compare with observed input args, and exec using the canonical form if they're not identical, so there's no way to execute the bulk of your code using "odd" syntax.
With (2) you would bury the app in some inconveniently named file, advertise the wrapper program far and wide, and enjoy the benefits of seeing args before they're munged.
I have a bunch of code that uses the old deprecated popen from the platform package. Since this is deprecated, I will be moving this to the subprocess package.
What is the equivalent statement to popen("some_command")? Is there a reason that popen was deprecated?
platform.popen has not been deprecated as best as I can tell. However, this is a low-level function that you should not make use of for flexibility and portability reasons.
Lots of other process-launching things were deprecated and some removed in Python 3. Many, many attempts at doing this well were made in the history of Python, and subprocess.Popen and its convenience functions are by far the best. After its existence the others became cruft and most of the retained ones are just there to support legacy code.
If you're going to port your code to use the subprocess module, don't look for an exact equivalent to what you have been doing, or you will miss out on the ways in which it is better. Read and understand the subprocess documentation and understand the ideas it is using to solve the problem of process-launching better than the older alternatives.
How is subprocess.Popen better than the older alternatives?
It is secure. Instead of something('shell command here'), we do Popen(['shell', 'command', 'here']). This doesn't launch an unnecessary shell process, which makes it less errorprone and dangerous.
Consider if I asked the user for their name to be input. I might write something('foo %s" % name) in the old thing. It should work--if the user gives you the name "Mike", then it becomes a command like foo Mike. But what if the user's name is "Mike Graham"? Then I want foo 'Mike Graham'. So now I always put in the apostrophes, but now what if the user's name is "Mike O'Reilley"? Worse yet, what if his name is "Mike; rm -rf /"? The solution here isn't to try to escape these yourself (which is hard to do right, let alone to do cross-platform), but to pass the arguments directly without bothering with the shell--Popen(['foo', name])`.
It is flexible. You can control the input and output fully.
It is nonblocking. Popen can run a process concurrently with yours.
Ok i have these commands used in batch and i wanted to know the commands in python that would have a similar affect, just to be clear i dont want to just use os.system("command here") for each of them. For example in batch if you wanted a list of commands you would type help but in python you would type help() and then modules... I am not trying to use batch in a python script, i just wanna know the similarities in both languages. Like in english you say " Hello" but in french you say "Bonjour" not mix the two languages. (heres the list of commands/functions id like to know:
change the current directory
clear the screen in the console
change the prompt to something other than >>>
how to make a loop function
redirections/pipes
start an exteral program (like notepad or paint) from within a script
how to call or import another python script
how to get help with a specific module without having to type help()
#8: (in batch it would be command /?)
EDITED COMPLETELY
Thanks in Adnvance!
You can't just mechanically translate batch script to Python and hope that it works. It's a different language, with different idioms and ways of doing things, not to mention the different purpose.
I've listed some functions related to what you want below, but there's no substitute for just going and learning Python!
os.chdir
os.system("cls") is probably the simplest solution
Change sys.ps1 and sys.ps2.
Nope, there are no gotos in Python. Use for and while loops instead.
Doesn't make sense, use Python's IO instead.
subprocess.Popen
Doesn't make sense, use import or subprocess.Popen instead.
help
Most of the things you've mentioned (start, cls etc.) are not "batch commands", they're executable programs which perform certain tasks. The DOS "shell" simply executes these when it encounters them in a file. In this sense, "python" is the equivalent of a single executable (like cls).
Now that that's clear, cd (and most other OS specific tasks) are accomplished using the os module. There's no single Python statement to clear the screen - that would be wasteful. Changing the prompt of the python interpreter can be done by assigning to sys.ps1. Loops are done using while or for. Redirection doesn't happen. YOu can however use the subprocess module to run subcommands and send their outputs to files or other streams. Starting commands is done using the subprocess.Popen function. For getting help, you can either do help("command") or if you're using ipython, just say command? and hit enter.
You should really go through the tutorial rather than trying to map batch commands to Python.
The Python docs are excellent, and are the place to start. For doing shell-script like things, you'll want to check out:
http://docs.python.org/library/os.html?module-os
http://docs.python.org/library/os.path.html#module-os.path
http://docs.python.org/library/shutil.html#module-shutil
http://docs.python.org/library/subprocess.html#module-subprocess
Python is not a system shell, Python is a multi-paradigm programming language.
If you want to compare .bat with anything, compare it with sh or bash. (You can have those on various platforms too - for example, sh for windows is in the MinGW package).
I am pretty much facing the same problem as you, daniel11. As a solution, I am learning BATCH commands and their meaning. After I understand those, I am going to write a program in Python that does the same or accomplishes the same task.
Thanks to Adam V. and katrielatex for their insight and suggestions.
I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.
Spinning off from another thread, when is it appropriate to use os.system() to issue commands like rm -rf, cd, make, xterm, ls ?
Considering there are analog versions of the above commands (except make and xterm), I'm assuming it's safer to use these built-in python commands instead of using os.system()
Any thoughts? I'd love to hear them.
Rule of thumb: if there's a built-in Python function to achieve this functionality use this function. Why? It makes your code portable across different systems, more secure and probably faster as there will be no need to spawn an additional process.
One of the problems with system() is that it implies knowledge of the shell's syntax and language for parsing and executing your command line. This creates potential for a bug where you didn't validate input properly, and the shell might interpet something like variable substitution or determining where an argument begins or ends in a way you don't expect. Also, another OS's shell might have divergent syntax from your own, including very subtle divergence that you won't notice right away. For reasons like these I prefer to use execve() instead of system() -- you can pass argv tokens directly and not have to worry about something in the middle (mis-)parsing your input.
Another problem with system() (this also applies to using execve()) is that when you code that, you are saying, "look for this program, and pass it these args". This makes a couple of assumptions which may lead to bugs. First is that the program exists and can be found in $PATH. Maybe on some system it won't. Second, maybe on some system, or even a future version of your own OS, it will support a different set of options. In this sense, I would avoid doing this unless you are absolutely certain the system you will run on will have the program. (Like maybe you put the callee program on the system to begin with, or the way you invoke it is mandated by something like POSIX.)
Lastly... There's also a performance hit associated with looking for the right program, creating a new process, loading the program, etc. If you are doing something simple like a mv, it's much more efficient to use the system call directly.
These are just a few of the reasons to avoid system(). Surely there are more.
Darin's answer is a good start.
Beyond that, it's a matter of how portable you plan to be. If your program is only ever going to run on a reasonably "standard" and "modern" Linux then there's no reason for you to re-invent the wheel; if you tried to re-write make or xterm they'd be sending the men in the white coats for you. If it works and you don't have platform concerns, knock yourself out and simply use Python as glue!
If compatibility across unknown systems was a big deal you could try looking for libraries to do what you need done in a platform independent way. Or you need to look into a way to call on-board utilities with different names, paths and mechanisms depending on which kind of system you're on.
The only time that os.system might be appropriate is for a quick-and-dirty solution for a non-production script or some kind of testing. Otherwise, it is best to use built-in functions.
Your question seems to have two parts. You mention calling commands like "xterm", "rm -rf", and "cd".
Side Note: you cannot call 'cd' in a sub-shell. I bet that was a trick question ...
As far as other command-level things you might want to do, like "rm -rf SOMETHING", there is already a python equivalent. This answers the first part of your question. But I suspect you are really asking about the second part.
The second part of your question can be rephrased as "should I use system() or something like the subprocess module?".
I have a simple answer for you: just say NO to using "system()", except for prototyping.
It's fine for verifying that something works, or for that "quick and dirty" script, but there are just too many problems with os.system():
It forks a shell for you -- fine if you need one
It expands wild cards for you -- fine unless you don't have any
It handles redirect -- fine if you want that
It dumps output to stderr/stdout and reads from stdin by default
It tries to understand quoting, but it doesn't do very well (try 'Cmd" > "Ofile')
Related to #5, it doesn't always grok argument boundaries (i.e. arguments with spaces in them might get screwed up)
Just say no to "system()"!
I would suggest that you only use use os.system for things that there are not already equivalents for within the os module. Why make your life harder?
The os.system call is starting to be 'frowned upon' in python. The 'new' replacement would be subprocess.call or subprocess.Popen in the subprocess module. Check the docs for subprocess
The other nice thing about subprocess is you can read the stdout and stderr into variables, and process that without having to redirect to other file(s).
Like others have said above, there are modules for most things. Unless you're trying to glue together many other commands, I'd stick with the things included in the library. If you're copying files, use shutil, working with archives you've got modules like tarfile/zipfile and so on.
Good luck.