How to access the variables of C++ program from gdb with Python

How to access the variables of C++ program from gdb with Python - python

How can one access the values like these using the Python from gdb?:
some_array_smartptr->operator[](0)->item // no errors are checked for sake of clarity
In gdb this line works fine, but I cannot figure out how to use it in Python as I am implementing the automated testing.
Please note, that both vector and smartptr are not standard, but are manually written. Semantics is the same though.

The gdb.parse_and_eval() should do exactly what you want for live-process debugging.
Documentation.
I'd like to iterate through some arrays an so on.
The parse_and_eval returns a gdb.Value. Once you have that value, you can use any of the Values methods for further access.
Example.

Related

Interpret Python bytecode in C# (with fine control)

For a project idea of mine, I have the following need, which is quite precise:
I would like to be able to execute Python code (pre-compiled before hand if necessary) on a per-bytecode-instruction basis. I also need to access what's inside the Python VM (frame stack, data stacks, etc.). Ideally, I would also like to remove a lot of Python built-in features and reimplement a few of them my own way (such as file writing).
All of this must be coded in C# (I'm using Unity).
I'm okay with loosing a few of Python's actual features, especially concerning complicated stuff with imports, etc. However, I would like most of it to stay intact.
I looked a little bit into IronPython's code but it remains very obscure to me and it seems quite enormous too. I began translating Byterun (a Python bytecode interpreter written in Python) but I face a lot of difficulties as Byterun leverages a lot of Python's features to... interpret Python.
Today, I don't ask for a pre-made solution (except if you have one in mind?), but rather for some advice, places to look at, etc. Do you have any ideas about the things I should research first?

I've tried to do my own implementation of the Python VM in the distant past and learned a lot but never came even close to a fully working implementation. I used the C implementation as a starting point, specifically everything in https://github.com/python/cpython/tree/main/Objects and
https://github.com/python/cpython/blob/main/Python/ceval.c (look for switch(opcode))
Here are some pointers:
Come to grips with the Python object model. Implement an abstract PyObject class with the necessary methods for instancing, attribute access, indexing and slicing, calling, comparisons, aritmetic operations and representation. Provide concrete implemetations for None, booleans, ints, floats, strings, tuples, lists and dictionaries.
Implement the core of your VM: a Frame object that loops over the opcodes and dispatches, using a giant switch statment (following the C implementation here), to the corresponding methods of the PyObject. The frame should maintains a stack of PyObjects for the operants of the opcodes. Depending on the opcode, arguments are popped from and pushed on this stack. A dict can be used to store and retrieve local variables. Use the Frame object to create a PyObject for function objects.
Get familiar with the idea of a namespace and the way Python builds on the concept of namespaces. Implement a module, a class and an instance object, using the dict to map (attribute)names to objects.
Finally, add as many builtin functions as you think you need to get a usefull implementation.
I think it is easy to underestimate the amount of work you're getting yourself into, but ... have fun!

What is the best or proper way to allow debugging of generated code?

For various reasons, in one project I generate executable code by means of generating AST from various source files the compiling that to bytecode (though the question could also work for cases where the bytecode is generated directly I guess).
From some experimentation, it looks like the debugger more or less just uses the lineno information embedded in the AST alongside the filename passed to compile in order to provide a representation for the debugger's purposes, however this assumes the code being executed comes from a single on-disk file.
That is not necessarily the case for my project, the executable code can be pieced together from multiple sources, and some or all of these sources may have been fetched over the network, or been retrieved from non-disk storage (e.g. database).
And so my Y questions, which may be the wrong ones (hence the background):
is it possible to provide a memory buffer of some sort, or is it necessary to generate a singular on-disk representation of the "virtual source"?
how well would the debugger deal with jumping around between the different bits and pieces if the virtual source can't or should not be linearised[0]
and just in case, is the assumption of Python only supporting a single contiguous source file correct or can it actually be fed multiple sources somehow?
[0] for instance a web-style literate program would be debugged in its original form, jumping between the code sections, not in the so-called "tangled" form

Some of this can be handled by the trepan3k debugger. For other things various hooks are in place.
First of all it can debug based on bytecode alone. But of course stepping instructions won't be possible if the line number table doesn't exist. And for that reason if for no other, I would add a "line number" for each logical stopping point, such as at the beginning of statements. The numbers don't have to be line numbers, they could just count from 1 or be indexes into some other table. This is more or less how go's Pos type position works.
The debugger will let you set a breakpoint on a function, but that function has to exist and when you start any python program most of the functions you define don't exist. So the typically way to do this is to modify the source to call the debugger at some point. In trepan3k the lingo for this is:
from trepan.api import debug; debug()
Do that in a place where the other functions you want to break on and that have been defined.
And the functions can be specified as methods on existing variables, e.g. self.my_function()
One of the advanced features of this debugger is that will decompile the bytecode to produce source code. There is a command called deparse which will show you the context around where you are currently stopped.
Deparsing bytecode though is a bit difficult so depending on which kind of bytecode you get the results may vary.
As for the virtual source problem, well that situation is somewhat tolerated in the debugger, since that kind of thing has to go on when there is no source. And to facilitate this and remote debugging (where the file locations locally and remotely can be different), we allow for filename remapping.
Another library pyficache is used to for this remapping; it has the ability I believe remap contiguous lines of one file into lines in another file. And I think you could use this over and over again. However so far there hasn't been need for this. And that code is pretty old. So someone would have to beef up trepan3k here.
Lastly, related to trepan3k is a trepan-xpy which is a CPython bytecode debugger which can step bytecode instructions even when the line number table is empty.

Use user provided python code during runtime

I'm developing a system that operates on (arbitrary) data from databases. The data may need some preprocessing before the system can work with it. To allow the user the specify possibly complex rules I though of giving the user the possibility to input Python code which is used to do this task. The system is pure Python.
My plan is to introduce the tables and columns as variables and let the user to anything Python can do (including access to the standard libs). Now to my problem:
How do I take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output. I think the easiest way would be to use the user-entered data a the body of a method and take the return value of that function a my new data.
Is this possible? If yes, how? It's unimportant that the user may enter malicious code since the worst thing that could happen is, that he screws up his own system, which is thankfully not my problem ;)

Python provides an exec() statement which should do what you want. You will want to pass in the variables that you want available as the second and/or third arguments to the function (globals and locals respectively) as those control the environment that the exec is run in.
For example:
env = {'somevar': 'somevalue'}
exec(code, env)
Alternatively, execfile() can be used in a similar way, if the code that you want executed is stored in its own file.
If you only have a single expression that you want to execute, you can also use eval.

Is this possible?
If it doesn't involve time travel, anti-gravity or perpetual motion the answer to this question is always "YES". You don't need to ask that.
The right way to proceed is as follows.
You build a framework with some handy libraries and packages.
You build a few sample applications that implement this requirement: "The data may need some preprocessing before the system can work with it."
You write documentation about how that application imports and uses modules from your framework.
You turn the framework, the sample applications and the documentation over to users to let them build these applications.
Don't waste time on "take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output".
The user should write applications like this.
from your_framework import the_file_loop
def their_function( one_line_as_dict ):
one_line_as_dict['field']= some stuff
the_file_loop( their_function )
That can actually be the entire program.
You'll have to write the_file_loop, which will look something like this.
def the_file_loop( some_function ):
with open('input') as source:
with open('output') as target:
for some_line in source:
the_data = make_a_dictionary( some_line )
some_function( the_data )
target.write( make_a_line( the_data ) )
By creating a framework, and allowing users to write their own programs, you'll be a lot happier with the results. Less magic.

2 choices:
You take his input and put it in a file, then you execute it.
You use exec()

If you just want to set some local values and then provide a python shell, check out the code module.
You can start an instance of a shell that is similar to the python shell, as well as initialize it with whatever local variables you want. This would assume that whatever functionality you want to use the resulting values is built into the classes you are passing in as locals.
Example:
shell = code.InteractiveConsole({'foo': myVar1, 'bar': myVar2})

What you actually want is exec, since eval is limited to taking an expression and returning a value. With exec, you can have code blocks (statements) and work on arbitrarily complex data, passed in as the globals and locals of the code.
The result is then returned by the code via some convention (like binding it to result).

well, you're describing compile()
But... I think I'd still implement this using regular python source files. Add a special location to the path, say '~/.myapp/plugins', and just __import__ everything there. Probably you'll want to provide some convenient base classes that expose the interface you're trying to offer, so that your users can inherit from them.

What's the best way to record the type of every variable assignment in a Python program?

Python is so dynamic that it's not always clear what's going on in a large program, and looking at a tiny bit of source code does not always help. To make matters worse, editors tend to have poor support for navigating to the definitions of tokens or import statements in a Python file.
One way to compensate might be to write a special profiler that, instead of timing the program, would record the runtime types and paths of objects of the program and expose this data to the editor.
This might be implemented with sys.settrace() which sets a callback for each line of code and is how pdb is implemented, or by using the ast module and an import hook to instrument the code, or is there a better strategy? How would you write something like this without making it impossibly slow, and without runnning afoul of extreme dynamism e.g side affects on property access?

I don't think you can help making it slow, but it should be possible to detect the address of each variable when you encounter a STORE_FAST STORE_NAME STORE_* opcode.
Whether or not this has been done before, I do not know.
If you need debugging, look at PDB, this will allow you to step through your code and access any variables.
import pdb
def test():
print 1
pdb.set_trace() # you will enter an interpreter here
print 2

What if you monkey-patched object's class or another prototypical object?
This might not be the easiest if you're not using new-style classes.

You might want to check out PyChecker's code - it does (i think) what you are looking to do.

Pythoscope does something very similar to what you describe and it uses a combination of static information in a form of AST and dynamic information through sys.settrace.
BTW, if you have problems refactoring your project, give Pythoscope a try.

Running unexported .dll functions with python

This may seem like a weird question, but I would like to know how I can run a function in a .dll from a memory 'signature'. I don't understand much about how it actually works, but I needed it badly. Its a way of running unexported functions from within a .dll, if you know the memory signature and adress of it.
For example, I have these:
respawn_f "_ZN9CCSPlayer12RoundRespawnEv"
respawn_sig "568BF18B06FF90B80400008B86E80D00"
respawn_mask "xxxxx?xxx??xxxx?"
And using some pretty nifty C++ code you can use this to run functions from within a .dll.
Here is a well explained article on it:
http://wiki.alliedmods.net/Signature_Scanning
So, is it possible using Ctypes or any other way to do this inside python?

If you can already run them using C++ then you can try using SWIG to generate python wrappers for the C++ code you've written making it callable from python.
http://www.swig.org/
Some caveats that I've found using SWIG:
Swig looks up types based on a string value. For example
an integer type in Python (int) will look to make sure
that the cpp type is "int" otherwise swig will complain
about type mismatches. There is no automatic conversion.
Swig copies source code verbatim therefore even objects in the same namespace
will need to be fully qualified so that the cxx file will compile properly.
Hope that helps.

You said you were trying to call a function that was not exported; as far as I know, that's not possible from Python. However, your problem seems to be merely that the name is mangled.
You can invoke an arbitrary export using ctypes. Since the name is mangled, and isn't a valid Python identifier, you can use getattr().
Another approach if you have the right information is to find the export by ordinal, which you'd have to do if there was no name exported at all. One way to get the ordinal would be using dumpbin.exe, included in many Windows compiled languages. It's actually a front-end to the linker, so if you have the MS LinK.exe, you can also use that with appropriate commandline switches.
To get the function reference (which is a "function-pointer" object bound to the address of it), you can use something like:
import ctypes
func = getattr(ctypes.windll.msvcrt, "##myfunc")
retval = func(None)
Naturally, you'd replace the 'msvcrt' with the dll you specifically want to call.
What I don't show here is how to unmangle the name to derive the calling signature, and thus the arguments necessary. Doing that would require a demangler, and those are very specific to the brand AND VERSION of C++ compiler used to create the DLL.
There is a certain amount of error checking if the function is stdcall, so you can sometimes fiddle with things till you get them right. But if the function is cdecl, then there's no way to automatically check. Likewise you have to remember to include the extra this parameter if appropriate.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to access the variables of C++ program from gdb with Python - python

The gdb.parse_and_eval() should do exactly what you want for live-process debugging. Documentation. I'd like to iterate through some arrays an so on. The parse_and_eval returns a gdb.Value. Once you have that value, you can use any of the Values methods for further access. Example.

Related

Interpret Python bytecode in C# (with fine control)

What is the best or proper way to allow debugging of generated code?

Use user provided python code during runtime

What's the best way to record the type of every variable assignment in a Python program?

Running unexported .dll functions with python

Categories

Resources