Embed Python in Python?

Embed Python in Python? - python

I wrote a "compiler" PypTeX that converts an input file a.tex containing Hello #{3+4} to an ouput file a.pyptex containing Hello 7. I evaluate arbitrary Python fragments like #{3+4} using something like eval(compile('3+4','a.tex',mode='eval'),myglobals), where myglobals is some (initially empty) dict. This creates a thin illusion of an embedded interpreter for running code in a.tex, however the call stack when running '3+4' looks pretty weird, because it backs up all the way into the PypTeX interpreter, instead of topping out at the user code '3+4' in a.tex.
Is there a way of doing something like eval but chopping off the top of the call stack?
Motivation: debugging
Imagine an exception is raised by the Python fragment deep inside numpy, and pdb is launched. The user types up until they reach the scope of their user code and then they type list. The way I've done it, this displays the a.tex file, which is the right context to be showing to the user and is the reason why I've done it this way. However, if the user types up again, the user ends up in the bowels of the PypTeX compiler.
An analogy would be if the g++ compiler had an error deep in a template, displayed a template "call stack" in its error message, but that template call stack backed all the way out into the bowels of the actual g++ call stack and exposed internal g++ details that would only serve to confuse the user.
Embedding Python in Python
Maybe the problem is that the illusion of the "embedded interpreter" created by eval is slightly too thin. eval allows to specify globals, but it inherits whatever call stack the caller has, so if one could somehow supply eval with a truncated call stack, that would resolve my problem. Alternatively, if pdb could be told "you shall go no further up" past a certain stack frame, that would help too. For example, if I could chop off a part of the stack in the traceback object and then pass it to pdb.post_mortem().
Or if one could do from sys import Interpreter; foo = Interpreter(); foo.eval(...), meaning that foo is a clean embedded interpreter with a distinct call stack, global variables, etc..., that would also be good.
Is there a way of doing this?
A rejected alternative
One way that is not good is to extract all Python fragments from a.tex by regular expression, dump them into a temporary file a.py and then run them by invoking a fresh new Python interpreter at the command line. This causes pdb to eventually top out into a.py. I've tried this and it's a very bad user experience. a.py should be an implementation detail; it is automatically generated and will look very unfamiliar to the user. It is hard for the user to figure out what bits of a.py came from what bits of a.tex. For large documents, I found this to be much too hard to use. See also pythontex.

I think I found a sufficient solution:
import pdb, traceback
def exec_and_catch(cmd,globals):
try:
exec(cmd,globals) # execute a user program
except Exception as e:
tb = e.__traceback__.tb_next # if user program raises an exception, remove the
f = e.with_traceback(tb) # top stack frame and return the exception.
return f
return None # otherwise, return None signifying success.
foo = exec_and_catch("import module_that_does_not_exist",{})
if foo is not None:
traceback.print_exception(value=foo, tb=foo.__traceback__, etype=type(foo))
pdb.post_mortem(foo.__traceback__)

Related

Load and execute a full python script from a raw link?

I'm facing some problems trying to load a full python script from my pastebin/github pages.
I followed this link, trying to convert the raw into a temp file and use it like a module: How to load a python script from a raw link (such as Pastebin)?
And this is my test (Using a really simple python script as raw, my main program is not so simple unfortunately): https://trinket.io/python/0e95ba50c8
When I run the script (that now is creating a temp file in the current directory of the .py file) I get this error:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\BOT\\Images\\tempxm4xpwpz.py'
Otherwise I also treid the exec() function... No better results unfortunately.
With this code:
import requests as rq
import urllib.request
def main():
code = "https://pastebin.com/raw/MJmYEKqh"
response = urllib.request.urlopen(code)
data = response.read()
exec(data)
I get this error:
File "<string>", line 10, in <module>
File "<string>", line 5, in hola
NameError: name 'printest' is not defined
Since my program is more complex compared to this simple test, I don't know how to proceed...
Basically What I want to achieve is to write the full script of my program on GitHub and connect it to a .exe so if I upgrade the raw also my program is updated. Avoiding to generate and share (only with my friends) a new .exe everytime...
Do you think is possible? If so.. what am I doing wrong?
PS: I'm also open to other possibilities to let my friends update the program without downloading everytime the .exe, as soon as they don't have to install anything (that's why I'm using .exe).

Disclaimer: it is really not a good idea to run an unverified (let alone untrusted) code. That being said if you really want to do it...
Probably the easiest and "least-dirty" way would be to run whole new process. This can be done directly in python. Something like this should work (inspiration from the answer you linked in your question):
import urllib.request
import tempfile
import subprocess
code = "https://pastebin.com/raw/MJmYEKqh"
response = urllib.request.urlopen(code)
data = response.read()
with tempfile.NamedTemporaryFile(suffix='.py') as source_code_file:
source_code_file.write(data)
source_code_file.flush()
subprocess.run(['python3', source_code_file.name])
You can also make your code with exec run correctly:
What may work:
exec(data, {}) -- All you need to do, is to supply {} as second argument (that is use exec(data, {})). Function exec may receive two additional optional arguments -- globals and locals. If you supply just one, it will use the same directory for locals. That is the code within the exec would behave like sort-of "clean" environment, at the top-level. Which is something you aim for.
exec(data, globals()) -- Second option is to supply the globals from your current scope. This will also work, though you probably has no need to give the execucted code access to your globals, given that that code will set-up everything inside anyway
What does not work:
exec(data, {}, {}) -- In this case the executed code will have two different dictionaries (albeit both empty) for locals and globals. As such it will behavie "as-in" (I'm not really sure about this part, but as I tested it, it seams as such) the function. Meaning that it will add the printest and hola functions to the local scope instead of global scope. Regardless, I expected it to work -- I expected it will just query the printest in the hola function from the local scope instead of global. However, for some reason the hola function in this case gets compiled in such a way it expects printest to be in global scope and not local, which is not there. I really did not figured out why. So this will result in the NameError
exec(data, globals(), locals()) -- This will provide access to the state from the caller function. Nevertheless, it will crash for the very same reason as in the previous case
exec(data) -- This is just a shorthand for exec(data, globals(), locals()

Python - import local variables of a function

I'm trying to make a debugger function, which is called when an error is raised, and let me access a console so I can check what happened in my program.
Here is the basic function:
def DEBUGGER(error):
print(error)
print("[DEBUGGER] Your program has failed, here is the debugger. Enter EXIT to end program.")
while True:
line = input(">>> ").lower()
if line == 'exit':
sys.exit(0)
else:
try:
exec(line)
except Exception as e:
print(str(e))
The problem is that I can't enter something like print(var) because it's referenced in another function.
Globals functions don't help me since I want to be able to call any variable in my program, and I can't globalize them all.. I know I can resolve it by putting all my functions in classes but I can't for many reasons.
Is there a way to get local variables of the running functions ? (When I call DEBUGGER(), the mother function is still running)
If no, can I export the local variables of the current function and pass it as an argument to DEBUGGER() ?
Thanks for your answers.

You are basically re-implementing the Python debugger pdb. If you want to go this route, you probably want to study the source code. pdb itself is a user-interface around the lower-level bdb (basic debugger) module, and the source code for that is also available.
To answer your direct question: when you catch an exception you have access to a traceback object (either via exception.__traceback__ or via sys.exc_info()), and tracebacks have access to both the local and global namespace of each frame in the stack, via the tb_frame attribute. That attribute is set to a frame object, which has f_locals and f_globals attributes.
The bdb.Bdb.get_stack() method could be an interesting example on how to treat a traceback, and the internal pdb.Pdb._select_frame() method then is used to pick a frame from the stack to use the locals and globals from.
If you don't want to re-implement the full debugger, you can use the pdb.pm() or pdb.port_mortem() functions. These take the last traceback raised and let you inspect the stack frame in an interactive environment:
try:
exec(line)
except Exception as e:
pdb.post_mortem(e.__traceback__)

The correct way to "write" your "DEBUGGER" function is:
import pdb
DEBUGGER = pdb.set_trace
Now you can call DEBUGGER() wherever you want, you will be in an interactive environment with access not only to local vars but also to whole call stack, and the ability to execute the remaining code step by step (including stepping into other functions etc), change the control flow to continue executing from another line etc etc etc.
Oh and yes: you can of course just write import pdb; pdb.set_trace() instead ;-)

Python redirect return statement to stdout

I am writing a Python interpreter and want to redirect the function's return values to stdout, like the Python Interpreter in Interactive Mode. Within this mode, when the user calls a function, its return value is printed on the screen. The same occurs with expressions.
E.g.
>>> foo()
'Foo return value'
>>> 2+4
6
>>> print('Hello!')
'Hello!'
Changing the sys.stdout only affects the print function. How do I redirect the other expressions to stdout?
Thank you

First, the interactive mode does not print the return value from any function called. Instead, it prints the result of whatever expression the user typed in. If that's not a function call, it still gets printed. If it has 3 function calls in it, it still prints one result, not 3 lines. And so on.
So, trying to redirect function return values to stdout is the wrong thing to do.
What the interactive interpreter does is something sort of like this:
line = raw_input(sys.ps1)
_ = eval(line)
if _ is not None:
print repr(_)
(You may notice that you can change sys.ps1 from the interactive prompt to change what the prompt looks like, access _ to get the last value, etc.)
However, that's not what it really does. And that's not how you should go about this yourself either. If you try, you'll have to deal with complexities like keeping your own globals separate from the user's, handling statements as well as expressions, handling multi-line statements and expressions (doing raw_input(sys.ps2) is easy, but how do you know when to do that?), interacting properly with readline and rlcomplete, etc.
There's a section of the documentation called Custom Python Interpreters which explains the easy way to do this:
The modules described in this chapter allow writing interfaces similar to Python’s interactive interpreter. If you want a Python interpreter that supports some special feature in addition to the Python language, you should look at the code module.
And code:
… provides facilities to implement read-eval-print loops in Python. Two classes and convenience functions are included which can be used to build applications which provide an interactive interpreter prompt.
The idea is that you let Python do all the hard stuff, up to whatever level you want to take over, and then you just write the part on top of that.
You may want to look at the source for IDLE, ipython, bpython, etc. for ideas.

Instead of using exec() to run the user input, try eval():
retval = eval(user_input)
sys.stdout.write(repr(retval) + "\n")

How to access the calling source line from interactive shell

I want to make a function that can determine the source code of how it was called. I'm aware of how to do this generally with the inspect module. For example, this question, works well and provides my desired output in the lines variable as shown below:
def hello(x):
frame,filename,line_number,function_name,lines,index=\
inspect.getouterframes(inspect.currentframe())[1]
print(frame,filename,line_number,function_name,lines,index)
The problem is that this solution doesn't work in an interactive command line session. For example, from a command line, the result looks like:
>>> y = hello(7)
(<frame object at 0x01ECA9E8>, '<stdin>', 1, '<module>', None, None)
The problem is that the source file is '<stdin>', so the lines variable is None. How can I access the calling line to find the result containing the string y = hello(7) during an interactive session?

It may not be possible, as #abarnert says. There are at least partial workarounds, however.
Getting source code lines isn't the biggest problem; modules like readline keep track of them. Interactive Python and iPython expose their lines slightly differently (sigh), but that too can be equalized. (My show package, for example, does this; its linecacher module puts a veneer on to equalize source access for normal Python and the different interactive flavors.)
The bigger problem is that, even once you have the source code, inspect doesn't provide legitimate line numbers (e.g. inspect.currentframe().f_back.f_lineno works great in normal code, but gives values like 1 or the point of the call in <stdin> when called interactively.)
But I'm not quite ready to call it impossible. Based on tracebacks generated by interactive Python and iPython, it appears that there may be sufficient information available to reconstruct "where did this call come from?" How much effort that would take, and how robust the answers would be...those are open questions.

Run python program from another python program (with certain requirements)

Let's say I have two python scripts A.py and B.py. I'm looking for a way to run B from within A in such a way that:
B believes it is __main__ (so that code in an if __name__=="__main__" block in B will run)
B is not actually __main__ (so that it does not, e.g., overwrite the "__main__" entry in sys.modules)
Exceptions raised within B propagate to A (i.e., could be caught with an except clause in A).
Those exceptions, if not caught, generate a correct traceback referencing line numbers within B.
I've tried various techniques, but none seem to satisfy all my requirements.
using tools from the subprocess module means exceptions in B do not propagate to A.
execfile("B.py", {}) runs B, but it doesn't think it's main.
execfile("B.py", {'__name__': '__main__'}) makes B.py think it's main, but it also seems to screw up the exception traceback printing, so that the tracebacks refer to lines within A (i.e., the real __main__).
using imp.load_source with __main__ as the name almost works, except that it actually modifies sys.modules, thus stomping on the existing value of __main__
Is there any way to get what I want?
(The reason I'm doing this is because I'm doing some cleanup on an existing library. This library has no real test suite, just a set of "example" scripts that produce certain output. I'm trying to leverage these as tests to ensure that my cleanup doesn't affect the library's ability to execute these examples, so I want to run each example script from within my test suite. I'd like to be able to see exceptions from these scripts within the test script so the test script can report the type of failure, instead of just reporting a generic SubprocessError whenever an example script raises some exception.)

Your use case makes sense, but I still think you'd be better off refactoring the tests such that they can be run externally.
Do you test scripts have something like this?
def test():
pass
if __name__ == '__main__':
test()
If not, perhaps you should convert your tests to calling a function such as test. Then, from your main test script, you can just:
import test1
test1.test()
import test2
test2.test()
Provide a common interface to running tests, that the tests themselves use. Having a big block of code in a __main__ check is Not A Good Thing.
Sorry that I didn't answer the question you asked, but I feel this is the correct solution without deviating too far from the original test code.

Answering my own question because the result is kind of interesting and might be useful to others:
It turns out I was wrong: execfile("B.py", {'__name__': '__main__'} is the way to go after all. It does correctly produce the tracebacks. What I was seeing with incorrect line numbers weren't exceptions but warnings. These warnings were produced using warnings.warn("blah", stacklevel=2). The stacklevel=2 argument is supposed to allow for things like deprecation warnings to be raised where the deprecated thing is used, rather than at the warning call (see the documentation).
However, it seems that the execfile-d file doesn't count as a "stack level" for this purpose, and is invisible for stacklevel purposes. So if code at the top level of an execfile-d module causes a warning with stacklevel 2, the warning is not raised at the right line number in the execfile-d source file; instead it is raised at the corresponding line number of the file which is running the execfile.
This is unfortunate, but I can live with it, since these are only warnings, so they won't impact the actual performance of the tests. (I didn't notice at first that it was only the warnings that were affected by the line-number mismatches, because there were lots of warnings and exceptions intermixed in the test output.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.