Recovering original argv - python

When a script is invoked explicitly with python, the argv is mucked with so that argv[0] is the path to the script being run. This is the case if invoked as python foo/bar.py or even as python -m foo.bar.
I need a way to recover the original argv (ie. the one received by python). Unfortunately, it's not as easy as prepending sys.executable to sys.argv because python foo/bar.py is different than python -m foo.bar (the implicit PYTHONPATH differs, which can be crucial depending on your module structure).
More specifically in the cases of python foo/bar.py some other args and python -m foo.bar some other args, I'm looking to recover ['python', 'foo/bar.py', 'some', 'other', 'args'] and ['python', '-m', 'foo.bar', 'some', 'other', 'args'], respectively.
I am aware of prior questions about this:
how to get the ORIGINAL command line in python? with spaces, tabs, etc
Full command line as it was typed
But these seem to have a misunderstanding of how shells work and the answers reflect this. I am not interested in undoing the work of the shell (eg. evaluated shell vars and functions are fine), I just want to get at the original argv given to python.
The only solution I've found is to use /proc/<PID>/cmdline:
import os
with open("/proc/{}/cmdline".format(os.getpid()), 'rb') as f:
original_argv = f.read().split('\0')[:-1]
This does work, but it is Linux-only (no OSX, and Windows support seems to require installing the wmi package). Fortunately for my current use case this restriction is fine. But, it would be nice to have a cleaner, cross platform approach.
The fact that that /proc/<PID>/cmdline approach works gives me hope that python isn't execing before it runs the script (at least not the syscall exec, but maybe the exec builtin). I remember reading somewhere that all of this argument handling (ex. -m) is done in pure python, not C (this is confirmed by the fact that python -m this.does.not.exist will produce an exception that looks like it came from the runtime). So, I'd venture a guess that somewhere in pure python the original argv is available (perhaps this requires some spelunking through the runtime initialization?).
tl;dr Is there a cross platform (builtin, preferably) way to get at the original argv passed to python (before it remove the python executable and transforms -m blah into blah.py)?
edit From spelunking, I discovered Py_GetArgcArgv, which can be accessed via ctypes (found it here, links to several SO posts that mention this approach):
import ctypes
_argv = ctypes.POINTER(ctypes.c_wchar_p)()
_argc = ctypes.c_int()
ctypes.pythonapi.Py_GetArgcArgv(ctypes.byref(_argc),
ctypes.byref(_argv))
argv = _argv[:_argc.value]
print(argv)
Now this is OS-portable, but not python implementation portable (only works on cpython and ctypes is yucky if you don't need it). Also, peculiarly, I don't get the right output on Ubunutu 16.04 (python -m foo.bar gives me ['python', '-m', '-m']), but I may just be making a silly mistake (I get the same behavior on OSX). It would be great to have a fully portable solution (that doesn't dig into ctypes).

Python 3.10 adds sys.orig_argv, which the docs describe as the arguments originally passed to the Python executable. If this isn't exactly what you're looking for, it may be helpful in this or similar cases.
There were a bunch of possibilities considered, including changing sys.argv, but this was, I think, wisely chosen as the most effective and non-disruptive option.

This seems XY problem and you are getting into the weeds in order to accommodate some existing complicated test setup (I've found the question behind the question in your comment). Further efforts would be better spent writing a sane test setup.
Use a better test runner, not unittest.
Create any initial state within the test setup, not in the external environment before entering the Python runtime.
Use a plugin for the randomization and seed stuff, personally I use this one but there are others.
For example if you decide to go with pytest runner, all the test setup can be configured within a [tool.pytest.ini_options] section of the pyproject.toml file and/or with a fixture defined in conftest.py. Overriding the default test configuration can be done with environment variables and/or command line arguments, and neither of these approaches will get mucked around by the shell or during Python interpreter startup.
The manner in which to execute the test suite can and should be as simple as executing a single command:
pytest
And then your perceived problem of needing to recover the original sys.argv will go away.

Your stated problem is:
User called my app with environment variables and arguments.
I want to display a "run like this" diagnostic that will exactly reproduce the results of the current run.
There are at least two solutions:
Abandon the "reproduction" aspect, since the original bash calling command is lost to the portable python app, and instead go for "same effect".
Use a wrapper to capture the original calling command, as suggested by Jean-François Fabre.
With (1) you would be willing to accept ['-m', 'foo'] becoming ['foo.py'], or even turning it into ['/some/dir/foo.py'] in case PYTHONPATH could cause trouble. Displaying ['a', 'b c'] as "a" "b c", or more concisely as a "b c", is straightforward. If environment variables like SEED are an important part of the command line interface then you'll need to iterate over envp and output them, as well. For true reproducibility, you might choose to convert input args to canonical form, compare with observed input args, and exec using the canonical form if they're not identical, so there's no way to execute the bulk of your code using "odd" syntax.
With (2) you would bury the app in some inconveniently named file, advertise the wrapper program far and wide, and enjoy the benefits of seeing args before they're munged.

Related

Better python unittest integration?

I'm using GNU Emacs 24.5.1 to work on Python code. I often want to run just a single unit test. I can do this, for example, by running:
test=spi.test_views.IndexViewTest.generate_select2_data_with_embedded_spaces make test
with M-X compile. My life would be simpler if I could give some command like "Run the test where point is", and have emacs figure out the full name of the test for me. Is possible?
Update: with the folowing buffer, I'd like some command which runs M-X compile with:
test=spi.test_views.IndexViewTest.test_unknown_button make test
where spi is the name of the directory test_views.py is in. Well, technically, I need to construct the python path to my test function, but in practice, it'll be <directory.file.class.function>.
This seems like the kind of thing somebody would have already invented, but I don't see anything in the python mode docs.
I believe you use the "default" python mode, while the so-called elpy mode (that I strongly recommend giving a try when doing Python developments within Emacs) seems to provide what you are looking for:
C-c C-t (elpy-test)
Start a test run. This uses the currently configured test runner to discover
and run tests. If point is inside a test case, the test runner will run exactly
that test case. Otherwise, or if a prefix argument is given, it will run all tests.
Extra details
The elpy-test function internally relies on the function (elpy-test-at-point), which appears to be very close to the feature you mentioned in the question.
See e.g. the code/help excerpt in the following screenshot:

getting bash completion options programmatically

I want a function that programmatically returns completion options from either bash or zsh. There are lots of examples of related questions on stackoverflow but no proper, generic answers anywhere. I do NOT want to know how to write a specific completer function for bash.
I've already tried implementing this by reading debian /etc/completion shell code, by echoing control-codes for tab into "bash -i", and even tried using automated subprocess interaction with python-pexpect. Every time I thought I was successful, I find some small problem that invalidates the whole solution. I'd accept a solution in any language, but ideally it would be python. Obviously the exact input output would vary depending on systems, but take a look at the example I/O below:
function("git lo") returns ["log","lol","lola"]
function("apt-get inst") returns ["apt-get install"]
function("apt-get") returns []
function("apt-get ") returns ["apt-get autoclean","apt-get autoremove", ...]
function ("./setup") returns ["./setup.py"]
If you are thinking of a solution written in shell, it would ideally be something I can execute without "source"ing. For instance bash "compgen" command looks interesting (try "compgen -F _git"), but note that "bash -c 'compgen -F _git'" does not work because the completion helper "_git" is not in scope.
This gist is my best solution so far. It meets all the requirements, works well for multiple versions of bash on multiple OS's but it requires a subprocess call and it's so complicated it's absurd. The comments includes full documentation of all the outrageous slings and arrows. I'm still hoping for something more reasonable to come along, but unless it does.. this is it!

Syntax for subprocess.call (Win7 x64)

I am trying to call an .exe file that's not in my local Python directory using subprocess.call(). The command (as I type it into cmd.exe) is exactly as follows: "C:\Program Files\R\R-2.15.2\bin\Rscript.exe" --vanilla C:\python\buyback_parse_guide.r
The script runs, does what I need to do, and I have confirmed the output is correct.
Here's my python code, which I thought would do the exact same thing:
## Set Rcmd
Rcmd = r'"C:\Program Files\R\R-2.15.2\bin\Rscript.exe"'
## Set Rargs
Rargs = r'--vanilla C:\python\buyback_parse_guide.r'
retval = subprocess.call([Rcmd,Rargs],shell=True)
When I call retval in my Python console, it returns 1 and the .R script doesn't run, but I get no errors. I'm pretty sure this is a really simple syntax error... help? Much thanks!
To quote the docs:
If shell is True, it is recommended to pass args as a string rather than as a sequence.
Splitting it up (either manually, or via shlex) just so subprocess can recombine them so the shell can split them again is silly.
I'm not sure why you think you need shell=True here. (If you don't have a good reason, you generally don't want it…) But even without shell=True:
On Windows, if args is a sequence, it will be converted to a string in a manner described in Converting an argument sequence to a string on Windows. This is because the underlying CreateProcess() operates on strings.
So, just give the shell the command line:
Rcmd = r'"C:\Program Files\R\R-2.15.2\bin\Rscript.exe" --vanilla C:\python\buyback_parse_guide.r'
retval = subprocess.call(Rcmd, shell=True)
According to the docs, Rscript:
… is an alternative front end for use in #! scripts and other scripting applications.
… is convenient for writing #! scripts… (The standard Windows command line has no concept of #! scripts, but Cygwin shells do.)
… is only supported on systems with the execv system call.
So, it is not the way to run R scripts from another program under Windows.
This answer says:
Rscript.exe is your friend for batch scripts… For everything else, there's R.exe
So, unless you have some good reason to be using Rscript outside of a batch script, you should switch to R.exe.
You may wonder why it works under cmd.exe, but not from Python. I don't know the answer to that, and I don't think it's worth digging through code or experimenting to find out, but I can make some guesses.
One possibility is that when you're running from the command line, that's a cmd.exe that controls a terminal, while when you're running from subprocess.call(shell=True) or os.system, that's a headless cmd.exe. Running a .bat/.cmd batch file gets you a non-headless cmd, but running cmd directly from another app does not. R has historically had all kinds of complexities dealing with the Windows terminal, which is why they used to have separate Rterm.exe and Rcmd.exe tools. Nowadays, those are both merged into R.exe, and it should work just fine either way. But if you try doing things the docs say not to do, that may not be tested, it's perfectly reasonable that it may not work.
At any rate, it doesn't really matter why it works in some situations even though it's not documented to. That certainly doesn't mean it should work in other situations it's not documented to work in, or that you should try to force it to do so. Just do the right thing and run R.exe instead of Rscript.exe.
Unless you have some information that contradicts everything I've found in the documentation and everywhere else I can find, I'm placing my money on Rscript.exe itself being the problem.
You'll have to read the documentation on the invocation differences between Rscript.exe and R.exe, but they're not identical. According to the intro docs,:
If you just want to run a file foo.R of R commands, the recommended way is to use R CMD BATCH foo.R
According to your comment above:
When I type "C:\R\R-2.15.2\bin\i386\R.exe" CMD BATCH C:\python\buyback_parse_guide.r into cmd.exe, the .R script runs successfully. What's the proper syntax for passing this into python?
That depends on the platform. On Windows, a list of arguments gets turned into a string, so you're better off just using a string so you don't have to debug the joining; on Unix, a string gets split into a list of arguments, so you're better off using a list so you don't have to debug the joining.
Since there are no spaces in the path, I'd take the quotes out.
So:
rcmd = r'C:\R\R-2.15.2\bin\i386\R.exe CMD BATCH C:\python\buyback_parse_guide.r'
retval = subprocess.call(rcmd)

Python - When Is It Ok to Use os.system() to issue common Linux commands

Spinning off from another thread, when is it appropriate to use os.system() to issue commands like rm -rf, cd, make, xterm, ls ?
Considering there are analog versions of the above commands (except make and xterm), I'm assuming it's safer to use these built-in python commands instead of using os.system()
Any thoughts? I'd love to hear them.
Rule of thumb: if there's a built-in Python function to achieve this functionality use this function. Why? It makes your code portable across different systems, more secure and probably faster as there will be no need to spawn an additional process.
One of the problems with system() is that it implies knowledge of the shell's syntax and language for parsing and executing your command line. This creates potential for a bug where you didn't validate input properly, and the shell might interpet something like variable substitution or determining where an argument begins or ends in a way you don't expect. Also, another OS's shell might have divergent syntax from your own, including very subtle divergence that you won't notice right away. For reasons like these I prefer to use execve() instead of system() -- you can pass argv tokens directly and not have to worry about something in the middle (mis-)parsing your input.
Another problem with system() (this also applies to using execve()) is that when you code that, you are saying, "look for this program, and pass it these args". This makes a couple of assumptions which may lead to bugs. First is that the program exists and can be found in $PATH. Maybe on some system it won't. Second, maybe on some system, or even a future version of your own OS, it will support a different set of options. In this sense, I would avoid doing this unless you are absolutely certain the system you will run on will have the program. (Like maybe you put the callee program on the system to begin with, or the way you invoke it is mandated by something like POSIX.)
Lastly... There's also a performance hit associated with looking for the right program, creating a new process, loading the program, etc. If you are doing something simple like a mv, it's much more efficient to use the system call directly.
These are just a few of the reasons to avoid system(). Surely there are more.
Darin's answer is a good start.
Beyond that, it's a matter of how portable you plan to be. If your program is only ever going to run on a reasonably "standard" and "modern" Linux then there's no reason for you to re-invent the wheel; if you tried to re-write make or xterm they'd be sending the men in the white coats for you. If it works and you don't have platform concerns, knock yourself out and simply use Python as glue!
If compatibility across unknown systems was a big deal you could try looking for libraries to do what you need done in a platform independent way. Or you need to look into a way to call on-board utilities with different names, paths and mechanisms depending on which kind of system you're on.
The only time that os.system might be appropriate is for a quick-and-dirty solution for a non-production script or some kind of testing. Otherwise, it is best to use built-in functions.
Your question seems to have two parts. You mention calling commands like "xterm", "rm -rf", and "cd".
Side Note: you cannot call 'cd' in a sub-shell. I bet that was a trick question ...
As far as other command-level things you might want to do, like "rm -rf SOMETHING", there is already a python equivalent. This answers the first part of your question. But I suspect you are really asking about the second part.
The second part of your question can be rephrased as "should I use system() or something like the subprocess module?".
I have a simple answer for you: just say NO to using "system()", except for prototyping.
It's fine for verifying that something works, or for that "quick and dirty" script, but there are just too many problems with os.system():
It forks a shell for you -- fine if you need one
It expands wild cards for you -- fine unless you don't have any
It handles redirect -- fine if you want that
It dumps output to stderr/stdout and reads from stdin by default
It tries to understand quoting, but it doesn't do very well (try 'Cmd" > "Ofile')
Related to #5, it doesn't always grok argument boundaries (i.e. arguments with spaces in them might get screwed up)
Just say no to "system()"!
I would suggest that you only use use os.system for things that there are not already equivalents for within the os module. Why make your life harder?
The os.system call is starting to be 'frowned upon' in python. The 'new' replacement would be subprocess.call or subprocess.Popen in the subprocess module. Check the docs for subprocess
The other nice thing about subprocess is you can read the stdout and stderr into variables, and process that without having to redirect to other file(s).
Like others have said above, there are modules for most things. Unless you're trying to glue together many other commands, I'd stick with the things included in the library. If you're copying files, use shutil, working with archives you've got modules like tarfile/zipfile and so on.
Good luck.

Vim Python omni-completion failing to work on system modules

I'm noticing that even for system modules, code completion doesn't work too well.
For example, if I have a simple file that does:
import re
p = re.compile(pattern)
m = p.search(line)
If I type p., I don't get completion for methods I'd expect to see (I don't see search() for example, but I do see others, such as func_closure(), func_code()).
If I type m., I don't get any completion what so ever (I'd expect .groups(), in this case).
This doesn't seem to affect all modules.. Has any one seen this behaviour and knows how to correct it?
I'm running Vim 7.2 on WinXP, with the latest pythoncomplete.vim from vim.org (0.9), running python 2.6.2.
Completion for this kind of things is tricky, because it would need to execute the actual code to work.
For example p.search() could return None or a MatchObject, depending on the data that is passed to it.
This is why omni-completion does not work here, and probably never will. It works for things that can be statically determined, for example a module's contents.
I never got the builtin omnicomplete to work for any languages. I had the most success with pysmell (which seems to have been updated slightly more recently on github than in the official repo). I still didn't find it to be reliable enough to use consistently but I can't remember exactly why.
I've resorted to building an extensive set of snipMate snippets for my primary libraries and using the default tab completion to supplement.

Categories