Parsing C in python with libclang but generated the wrong AST

Parsing C in python with libclang but generated the wrong AST - python

I want to use the libclang binding python to generate a C code's AST. OK, the source code is portrayed below .
#include <stdlib.h>
#include "adlist.h"
#include "zmalloc.h"
list *listCreate(void)
{
struct list *list;
if ((list = zmalloc(sizeof(*list))) == NULL)
return NULL;
list->head = list->tail = NULL;
list->len = 0;
list->dup = NULL;
list->free = NULL;
list->match = NULL;
return list;
}
And a implementation I wrote :
#!/usr/bin/python
# vim: set fileencoding=utf-8
import clang.cindex
import asciitree
import sys
def node_children(node):
return (c for c in node.get_children() if c.location.file.name == sys.argv[1])
def print_node(node):
text = node.spelling or node.displayname
kind = str(node.kind)[str(node.kind).index('.')+1:]
return '{} {}'.format(kind, text)
if len(sys.argv) != 2:
print("Usage: dump_ast.py [header file name]")
sys.exit()
clang.cindex.Config.set_library_file('/usr/lib/llvm-3.6/lib/libclang-3.6.so')
index = clang.cindex.Index.create()
translation_unit = index.parse(sys.argv[1], ['-x', 'c++', '-std=c++11', '-D__CODE_GENERATOR__'])
print(asciitree.draw_tree(translation_unit.cursor, node_children, print_node))
But the final output of this test is like the below :
TRANSLATION_UNIT adlist.c
+--FUNCTION_DECL listCreate
+--COMPOUND_STMT
+--DECL_STMT
+--STRUCT_DECL list
+--VAR_DECL list
+--TYPE_REF struct list
Obviously, the final result is wrong. there are much codes left no parsed. I have tried to traverse the translation unit but the result is just like the tree shows---many nodes were gone. Why will be that ？ And is there any method to solve the problem? Thank you!
I guess that the reason is that Libclang is unable to parse malloc(). because neither stdlib has been included in this code nor has a user-defined definition provided for malloc.

The parse did not complete successfully, probably because you're missing some include paths.
You can confirm what the exact problem is by printing the diagnostic messages.
translation_unit = index.parse(sys.argv[1], args)
for diag in translation_unit.diagnostics:
print diag

Related

Python ctypes throws TypeError: one character bytes, bytearray or integer expected for functions with pointer references

I am new to both Python and ctypes the module. Trying to call C++ a function by loading a shared library. Here is the prototype of the function I want to call.
void foo_func(const char *binary, size_t binsz, size_t memsz, void *params, size_t paramssz, void *settings);
And here is the code I have written to call this function.
import ctypes
import pathlib
class virt_buff(ctypes.Structure):
_fields_ = [("x", ctypes.c_int), ("y", ctypes.c_int)]
if __name__ == "__main__":
libname = pathlib.Path().absolute() / "build/libfoo.so"
c_lib = ctypes.CDLL(libname)
func_param = virt_buff(7, 0)
with open("build/fib.bin", mode='rb') as file: # b is important -> binary
binary = file.read()
c_lib.foo_func(ctypes.c_char(binary), file.tell(), 0x9000 + (file.tell() & ~0xFFF), func_param, 4, NULL)
But when I run this code, it gives me the following output.
Traceback (most recent call last):
File "ct_test.py", line 32, in <module>
c_lib.foo_func(ctypes.c_char(binary), file.tell(), 0x9000 + (file.tell() & ~0xFFF), virt_param, 4, NULL)
TypeError: one character bytes, bytearray or integer expected
I tried a lot of things and nothing seems to work. Can anyone help me find out what the actual problem is?

The first error message is due to trying to pass a single character (c_char) and initializing it with a multiple character data blob. You want c_char_p for C char* instead. But there are other issues after that. Declaring .argtypes and .restype for your function correctly will help ctypes catch errors.
Make the following additional commented fixes:
import ctypes as ct
import pathlib
class virt_buff(ct.Structure):
_fields_ = [("x", ct.c_int), ("y", ct.c_int)]
if __name__ == "__main__":
libname = pathlib.Path().absolute() / "build/libfoo.so"
# Declare argument types and result type that match the C call for ctypes to error check
# void foo_func(const char *binary, size_t binsz, size_t memsz, void *params, size_t paramssz, void *settings);
c_lib = ct.CDLL(str(libname)) # on windows, CDLL(libname) failed to accept WindwowsPath.
c_lib.foo_func.argtypes = ct.c_char_p, ct.c_size_t, ct.c_size_t, ct.c_void_p, ct.c_size_t, ct.c_void_p
c_lib.foo_func.restype = None
func_param = virt_buff(7, 0)
with open("build/fib.bin", mode='rb') as file:
binary = file.read()
# Indent under with, so file.tell() is in scope,
# Pass address of structure using ct.byref() and size with ct.sizeof()
# Use Python None instead of C NULL.
c_lib.foo_func(binary, file.tell(), 0x9000 + (file.tell() & ~0xFFF), ct.byref(func_param), ct.sizeof(func_param), None)

Your parameter is almost certainly supposed to a POINTER to char, and not just a single char:
c_lib.foo_func(ctypes.c_char_p(binary), ...

Return array in Python from shared library built in Go

I have the following Go file:
package main
import "C"
//export stringList
func stringList(a, b string) []string {
var parserString[] string
parserString = append(parserString, a)
parserString = append(parserString, b)
return parserString
}
func main() {}
which I then build using go build -o stringlist.so -buildmode=c-shared stringlist.go
Then, I try to call it in Python using
from ctypes import *
lib = cdll.LoadLibrary("./stringlist.so")
lib.stringList.argtypes = [c_wchar_p, c_wchar_p]
lib.stringList("hello", "world")
but receive the error
panic: runtime error: cgo result has Go pointer
goroutine 17 [running, locked to thread]:
main._cgoexpwrap_0de9d34d4a40_stringList.func1(0xc00005ce90)
_cgo_gotypes.go:46 +0x5c
main._cgoexpwrap_0de9d34d4a40_stringList(0x7f9fdfa8eec0, 0x7f9fddaffab0, 0x7f9fdd8f46cf, 0x7ffc501ce460, 0xc00000e040, 0x2, 0x2)
_cgo_gotypes.go:48 +0x11b
Aborted (core dumped)
What is the problem? How can I fix it? Is stringList not returning a proper type?

As mentioned in the comments, you need to convert to C types before returning any value, this means that your function should return **C.char. The Go documentation has functions that convert from *C.chart to C.GoString.
Cgo Wiki
CGO Documentation
Here's a simple albeit probably leaky function; in Cgo you are responsible of
memory management, this particular function allocates the result memory but if you know before calling the Cgo function how much memory or space you need it's possible to preallocate it.
All that it does is convert to upper case the provided strings.
package main
/*
#include <stdlib.h>
*/
import "C"
import (
"fmt"
"strings"
"unsafe"
)
//export stringList
func stringList(a, b *C.char) **C.char {
// convert to Go strings
goStringA := C.GoString(a)
goStringB := C.GoString(b)
//... do something with the strings ...
fmt.Println("CGO: ", goStringA)
fmt.Println("CGO: ", goStringB)
goStringA = strings.ToUpper(goStringA)
goStringB = strings.ToUpper(goStringB)
// Convert back to C strings
// This strings _WILL NOT_ be garbage collect
// it may be that we want to free them.
// https://github.com/golang/go/wiki/cgo#go-strings-and-c-strings
ra := C.CString(goStringA)
rb := C.CString(goStringB)
// Allocate memory for our result pointer
resultMem := C.malloc(C.size_t(2) * C.size_t(unsafe.Sizeof(uintptr(0))))
// Assign to the results var
// https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
result := (*[1<<30 - 1]*C.char)(resultMem)
(*result)[0] = ra
(*result)[1] = rb
return (**C.char)(resultMem)
}
func main() {}
Because I'm in a macOS computer I have to build it like so:
go build -o stringlist.dylib -buildmode=c-shared main.go
Then I can run it with the following Python code:
#!/usr/bin/env python3
from ctypes import *
lib = cdll.LoadLibrary('./stringlist.dylib')
lib.stringList.argtypes = [c_char_p, c_char_p]
lib.stringList.restype = POINTER(c_char_p)
result = lib.stringList("hello".encode(), "world".encode())
for word in result:
if word:
print(word.decode('utf-8'))
NOTE: The python code produces a segfault; I'm guessing that it's because I'm not using correctly the ctypes module as I'm not that familiar with. However the following C code does not segfault.
#include <stdlib.h>
#include "stringlist.h"
#include <dlfcn.h>
int main(int n, char **args) {
char lib_path[1000];
sprintf(lib_path, "%s/stringlist.dylib", args[1]);
void *handle = dlopen(lib_path, RTLD_LAZY);
char** (*stringList)(char*, char*) = dlsym(handle, "stringList");
const char *a = "hello";
const char *b = "world";
char **result = stringList(a, b);
for (size_t i = 0; i< 2; i++) {
printf("%s\n", result[i]);
}
return 0;
}
Build it like this:
clang main.c -o crun && ./crun

Python convert C header file to dict

I have a C header file which contains a series of classes, and I'm trying to write a function which will take those classes, and convert them to a python dict. A sample of the file is down the bottom.
Format would be something like
class CFGFunctions {
class ABC {
class AA {
file = "abc/aa/functions"
class myFuncName{ recompile = 1; };
};
class BB
{
file = "abc/bb/functions"
class funcName{
recompile=1;
}
}
};
};
I'm hoping to turn it into something like
{CFGFunctions:{ABC:{AA:"myFuncName"}, BB:...}}
# Or
{CFGFunctions:{ABC:{AA:{myFuncName:"string or list or something"}, BB:...}}}
In the end, I'm aiming to get the filepath string (which is actually a path to a folder... but anyway), and the class names in the same class as the file/folder path.
I've had a look on SO, and google and so on, but most things I've found have been about splitting lines into dicts, rather then n-deep 'blocks'
I know I'll have to loop through the file, however, I'm not sure the most efficient way to convert it to the dict.
I'm thinking I'd need to grab the outside class and its relevant brackets, then do the same for the text remaining inside.
If none of that makes sense, it's cause I haven't quite made sense of the process myself haha
If any more info is needed, I'm happy to provide.
The following code is a quick mockup of what I'm sorta thinking...
It is most likely BROKEN and probably does NOT WORK. but its sort of the process that I'm thinking of
def get_data():
fh = open('CFGFunctions.h', 'r')
data = {} # will contain final data model
# would probably refactor some of this into a function to allow better looping
start = "" # starting class name
brackets = 0 # number of brackets
text= "" # temp storage for lines inside block while looping
for line in fh:
# find the class (start
mt = re.match(r'Class ([\w_]+) {', line)
if mt:
if start == "":
start = mt.group(1)
else:
# once we have the first class, find all other open brackets
mt = re.match(r'{', line)
if mt:
# and inc our counter
brackets += 1
mt2 = re.match(r'}', line)
if mt2:
# find the close, and decrement
brackets -= 1
# if we are back to the initial block, break out of the loop
if brackets == 0:
break
text += line
data[start] = {'tempText': text}
====
Sample file
class CfgFunctions {
class ABC {
class Control {
file = "abc\abc_sys_1\Modules\functions";
class assignTracker {
description = "";
recompile = 1;
};
class modulePlaceMarker {
description = "";
recompile = 1;
};
};
class Devices
{
file = "abc\abc_sys_1\devices\functions";
class registerDevice { recompile = 1; };
class getDeviceSettings { recompile = 1; };
class openDevice { recompile = 1; };
};
};
};
EDIT:
If possible, if I have to use a package, I'd like to have it in the programs directory, not the general python libs directory.

As you detected, parsing is necessary to do the conversion. Have a look at the package PyParsing, which is a fairly easy-to-use library to implement parsing in your Python program.
Edit: This is a very symbolic version of what it would take to recognize a very minimalistic grammer - somewhat like the example at the top of the question. It won't work, but it might put you in the right direction:
from pyparsing import ZeroOrMore, OneOrMore, \
Keyword, Literal
test_code = """
class CFGFunctions {
class ABC {
class AA {
file = "abc/aa/functions"
class myFuncName{ recompile = 1; };
};
class BB
{
file = "abc/bb/functions"
class funcName{
recompile=1;
}
}
};
};
"""
class_tkn = Keyword('class')
lbrace_tkn = Literal('{')
rbrace_tkn = Literal('}')
semicolon_tkn = Keyword(';')
assign_tkn = Keyword(';')
class_block = ( class_tkn + identifier + lbrace_tkn + \
OneOrMore(class_block | ZeroOrMore(assignment)) + \
rbrace_tkn + semicolon_tkn \
)
def test_parser(test):
try:
results = class_block.parseString(test)
print test, ' -> ', results
except ParseException, s:
print "Syntax error:", s
def main():
test_parser(test_code)
return 0
if __name__ == '__main__':
main()
Also, this code is only the parser - it does not generate any output. As you can see in the PyParsing docs, you can later add the actions you want. But the first step would be to recognize the what you want to translate.
And a last note: Do not underestimate the complexities of parsing code... Even with a library like PyParsing, which takes care of much of the work, there are many ways to get mired in infinite loops and other amenities of parsing. Implement things step-by-step!
EDIT: A few sources for information on PyParsing are:
http://werc.engr.uaf.edu/~ken/doc/python-pyparsing/HowToUsePyparsing.html
http://pyparsing.wikispaces.com/
(Particularly interesting is http://pyparsing.wikispaces.com/Publications, with a long list of articles - several of them introductory - on PyParsing)
http://pypi.python.org/pypi/pyparsing_helper is a GUI for debugging parsers
There is also a 'tag' Pyparsing here on stackoverflow, Where Paul McGuire (the PyParsing author) seems to be a frequent guest.
* NOTE: *
From PaulMcG in the comments below: Pyparsing is no longer hosted on wikispaces.com. Go to github.com/pyparsing/pyparsing

Embedding python serial controller in C++

I've been searching around for several hours trying to get this code working and I just can't quite seem to get it.
I'm working on a function in C++ where I can call one of a number of python scripts, which have variable numbers of arguements. The Python works, but I keep getting segfaults in my C++.
double run_python(motor_command command){
//A routine that will run a python function that is in the same directory.
Py_Initialize();
PySys_SetPath(".");
string pyName; //Declaration of the string and int
int speed;
if (command.action == READ){
pyName = "read_encoders"; //Name of one python module
}else{
pyName = "drive_motor"; //Name of the other python module
speed = command.speed; //struct
}
int board_address = command.board_address;
int motor = command.motor_num;
//PyObject* moduleName = PyString_FromString(pyName.c_str());
// Py_INCREF(myModule);
//PyObject* myFunction = PyObject_GetAttrString(myModule, "run"); //Both of these python functions have subroutine 'run'
PyObject* args;
if(command.action == READ){
args = PyTuple_Pack(2,PyInt_FromLong(board_address),PyInt_FromLong(motor)); //Appropriate args for the read_encoders
}else{
args = PyTuple_Pack(3,PyInt_FromLong(board_address),PyInt_FromLong(motor), PyInt_FromLong(speed)); //Appropriate args for the drive_motor
}
Py_INCREF(args);
cout << "I got here" << endl;
PyObject* myModule = PyImport_Import((char*)pyName.c_str());//Python interface
cout << "args = " << args << " modlue = " << myModule << endl;
//Py_INCREF(myModule);
PyObject* myResult = PyObject_CallObject(myModule, args); //Run it and store the result in myResult
Py_INCREF(myResult);
double result = PyFloat_AsDouble(myResult);
Py_DECREF(myResult);
return result;
}
So far, what I can figure out is that somehow my myModule is not geting imported correctly and is returning a NULL value. As a result, when I attempt the _CallObject, it throws a segfault and I'm up a creek. When I uncommend the Py_INCREF for myModule, it throws a segfault there, and so I guess taht I'm not importing my python code correctly.
Oh, useful information: OS: Angstorm Linux, on a MinnowBoard (x86 architecture).
General structure of the python program:
import sys
import serial
board_num = sys.argv[1]
motor = sys.argv[2]
speed = sys.argv[3]
def run(board_num, motor, speed):
# Command arguments: Board number (0x80, 0x81...), motor number (0 or 1) and speed(2's complement signed integer)
ser = serial.Serial('/dev/ttyPCH1', 38400)
motor_min = 0
motor_max = 1 # These are the two acceptable values for motor enumerated values.
e_code = -1 # Error code
try:
board_num = int(board_num, 0)
except:
print "Invalid address format: Must be a number"
exit(e_code)
try:
motor = int(motor, 0)
except:
print "Motor must be either motor 0 or 1. Or possibly one or two..."
exit(e_code)
try:
speed = int(speed, 0)
except:
print "Motor speed must be an integer."
exit(e_code)
#print board_num
Thank you in advance! If you have any alternative ways to get this working in the same way, I'm open for suggestions!

Try this code to append . to your sys.path:
PyObject *sys_path;
PyObject *path;
sys_path = PySys_GetObject("path");
path = PyString_FromString(".")
if (PyList_Append(sys_path, path) < 0)
source: http://www.gossamer-threads.com/lists/python/dev/675857
OLD:
First try to execute your Python script alone, with python on the command line.
It is harder to debug Python errors from a C/C++ program. Did you install pySerial?

gdb-python : Parsing structure's each field and print them with proper value, if exists

I am writing a python script to automate debugging core dump from gdb. i am trying to print data structure which includes kernel data structures and lists(e.g. struct list_head). For example the structure is something like this:
struct my_struct {
struct my_hardware_context ahw;
struct net_device *netdev;
struct pci_dev *pdev;
struct list_head mac_list;
....
....
};
i am using following API tp print this structure:
gdb.execute('p (*(struct my_struct *)dev_base->priv)')
so i am able to print the content of 'struct my_struct' , struct my_hardware_context ahw, but not the content of pointers and list ( e.g. struct net_device *netdev, struct pci_dev *pdev, struct list_head mac_list) automatically (only address is printed). So how to print the content of *netdev, *pdev and mac_list using gdb-python script?
EDITED : to make my question more clear
I am writing a python script to automate debugging core dump from gdb. i am trying to print data structure which includes kernel data structures and lists(e.g. struct list_head). For example the structure is something like this:
struct my_struct {
struct my_hardware_context ahw;
struct net_device *netdev;
struct pci_dev *pdev;
struct list_head mac_list;
....
....
};
i am using following API to print this structure: (it can be assumed that i have right core dump and added proper symbols.
main_struct = gdb.execute('p (*(struct my_struct *)dev_base->priv)')
print main_struct
Now it will print the values of all members of struct my_struct but upto one level , meaning it will print the whole content of struct my_hardware_context ahw because it is an instance but it will not print the content of struct net_device *netdev, struct pci_dev *pdev, struct list_head mac_list etc. so now manually i need to do it like below:
netdev = gdb.parse_and_eval('*(*(struct my_struct *)dev_base->next->priv).netdev')
print netdev
pdev = gdb.parse_and_eval('*(*(struct my_struct *)dev_base->next->priv).pdev')
print pdev
so i want to automate these steps. Is there any gdb-python API or way by which it can iterate the struct my_struct and print the pointers, arrays and lists values also automatically?
Thanks.

struct net_device, struct pci_dev from Linux are meant to be used by kernel and not userspace code. They're not even exported in the sanitized kernel headers you get with make headers_install for use with libc.
GDB can't print struct net_device, struct pci_dev because it doesn't have debug info describing the definition of those structures. Your userspace struct my_struct is declared to have opaque pointers to those structures. I don't think you should be doing that in the first place.
Edit After Core Dump Clarification
The trick is loading debug info from both the kernel and your driver module into GDB:
Grab a kernel with debuginfo (CONFIG_DEBUG_INFO). e.g. for Centos, get the matching kernel-debuginfo package from http://debuginfo.centos.org/6/x86_64/.
Get the .text, .data and .bss load addresses of your driver module by inspecting /sys/module/MY-DRIVER/sections/{.text,.data,.bss} from a system running your driver under normal operation.
Assuming the kernel with debug info is located at /usr/lib/debug/lib/modules/3.9.4-200.fc18.x86_64/vmlinux, run:
$ gdb /usr/lib/debug/lib/modules/3.9.4-200.fc18.x86_64/vmlinux vmcore
(gdb) add-symbol-file MY-DRIVER.ko TEXT-ADDR -s .data DATA-ADDR -s .bss BSS-ADDR
while replacing TEXT-ADDR, DATA-ADDR and BSS-ADDR with the address from the files under /sys/module/MY-DRIVER/sections/. (I think just lying and using an address of 0 would probably work in this case)
Verify that ptype struct net_device, ptype struct pci_dev, ptype my_struct work. Then after obtaining the address of a struct *my_struct the way you did before you should be able print its contents.
Traversing a Struct While Following Pointers
print-struct-follow-pointers.py
import gdb
def is_container(v):
c = v.type.code
return (c == gdb.TYPE_CODE_STRUCT or c == gdb.TYPE_CODE_UNION)
def is_pointer(v):
return (v.type.code == gdb.TYPE_CODE_PTR)
def print_struct_follow_pointers(s, level_limit = 3, level = 0):
indent = ' ' * level
if not is_container(s):
gdb.write('%s\n' % (s,))
return
if level >= level_limit:
gdb.write('%s { ... },\n' % (s.type,))
return
gdb.write('%s {\n' % (s.type,))
for k in s.type.keys():
v = s[k]
if is_pointer(v):
gdb.write('%s %s: %s' % (indent, k, v))
try:
v1 = v.dereference()
v1.fetch_lazy()
except gdb.error:
gdb.write(',\n')
continue
else:
gdb.write(' -> ')
print_struct_follow_pointers(v1, level_limit, level + 1)
elif is_container(v):
gdb.write('%s %s: ' % (indent, k))
print_struct_follow_pointers(v, level_limit, level + 1)
else:
gdb.write('%s %s: %s,\n' % (indent, k, v))
gdb.write('%s},\n' % (indent,))
class PrintStructFollowPointers(gdb.Command):
'''
print-struct-follow-pointers [/LEVEL_LIMIT] STRUCT-VALUE
'''
def __init__(self):
super(PrintStructFollowPointers, self).__init__(
'print-struct-follow-pointers',
gdb.COMMAND_DATA, gdb.COMPLETE_SYMBOL, False)
def invoke(self, arg, from_tty):
s = arg.find('/')
if s == -1:
(expr, limit) = (arg, 3)
else:
if arg[:s].strip():
(expr, limit) = (arg, 3)
else:
i = s + 1
for (i, c) in enumerate(arg[s+1:], s + 1):
if not c.isdigit():
break
end = i
digits = arg[s+1:end]
try:
limit = int(digits)
except ValueError:
raise gdb.GdbError(PrintStructFollowPointers.__doc__)
(expr, limit) = (arg[end:], limit)
try:
v = gdb.parse_and_eval(expr)
except gdb.error, e:
raise gdb.GdbError(e.message)
print_struct_follow_pointers(v, limit)
PrintStructFollowPointers()
Sample Session
(gdb) source print-struct-follow-pointers.py
(gdb) print-struct-follow-pointers *p
You can limit the levels of embedded structures printed:
(gdb) print-struct-follow-pointers/4 *p

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing C in python with libclang but generated the wrong AST - python

The parse did not complete successfully, probably because you're missing some include paths. You can confirm what the exact problem is by printing the diagnostic messages. translation_unit = index.parse(sys.argv[1], args) for diag in translation_unit.diagnostics: print diag

Related

Python ctypes throws TypeError: one character bytes, bytearray or integer expected for functions with pointer references

Return array in Python from shared library built in Go

Python convert C header file to dict

Embedding python serial controller in C++

gdb-python : Parsing structure's each field and print them with proper value, if exists

Categories

Resources