Unit test C-generating python code - python

I have a project which involves some (fairly simple) C-code generation as part of a build system. In essence, I have some version information associated with a project (embedded C) which I want to expose in my binary, so that I can easily determine what firmware version was programmed to a particular device for debugging purposes.
I'm writing some simplistic python tools to do this, and I want to make sure they're thoroughly tested. In general, this has been fairly straightforward, but I'm unsure what the best strategy is for the code-generation portion. Essentially, I want to make sure that the generated files both:
Are syntactically correct
Contain the necessary information
The second, I can (I believe) achieve to a reasonable degree with regex matching. The first, however, is something of a bigger task. I could probably use something like pycparser and examine the resulting AST to accomplish both goals, but that seems like an unnecessarily heavyweight solution.
Edit: A dataflow diagram of my build hierarchy

Thanks for the diagram! Since you are not testing for coverage, if it were me, I would just compile the generated C code and see if it worked :) . You didn't mention your toolchain, but in a Unix-like environment, gcc <whatever build flags> -c generated-file.c || echo 'Oops!' should be sufficient.
Now, it may be that the generated code isn't a freestanding compilation unit. No problem there: write a shim. Example shim.c:
#include <stdio.h>
#include "generated-file.c"
main() {
printf("%s\n", GENERATED_VERSION); //or whatever is in generated-file.c
}
Then gcc -o shim shim.c && diff <(./shim) "name of a file holding the expected output" || echo 'Oops!' should give you a basic test. (The <() is bash process substitution.) The file holding the expected results may already be in your git repo, or you might be able to use your Python routine to write it to disk somewhere.
Edit 2 This approach can work even if your actual toolchain isn't amenable to automation. To test syntactic validity of your code, you can use gcc even if you are using a different compiler for your target processor. For example, compiling with gcc -ansi will disable a number of GNU extensions, which means code that compiles with gcc -ansi is more likely to compile on another compiler than is code that compiles with full-on, GNU-extended gcc. See the gcc page on "C Dialect Options" for all the different flavors you can use (ditto C++).
Edit Incidentally, this is the same approach GNU autoconf uses: write a small test program to disk (autoconf calls it conftest.c), compile it, and see if the compilation succeeded. The test program is (preferably) the bare minimum necessary to test if everything is OK. Depending on how complicated your Python is, you might want to test several different aspects of your generated code with respective, different shims.

Related

gdb corefile generated by C++ segfault and pybinder with no symbols

context: I have a program which runs on server, which segfaults several times a month. The program is a python program which uses some library implemented in C++ and exposed by pybinder.
I am able to capture the corefile on server and I have the source code (both C++ and python part). I want to know how I can get the segfault stacktrace?
Several things I have tried to
build the source code (C++ part) with -g3 option. From my understand, it should have the same binary and address as the one running on server. The only difference should be symbol table (and possibly several other sections in ELF).
I tried to gdb -ex r bazel-bin/username/coredump/capture_corefile /tmp/test_coredump/corefile.python.3861066.
bazel-bin/username/coredump/capture_corefile is the python script in C++ with symbol table.
/tmp/test_coredump/corefile.python.3861066 is the corefile I have collected.
But it shows
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f58ca51332b in ?? ()
Starting program:
No executable file specified.
I tried to directly get the line of code by llvm-symbolizer.
For python script as the object, it fails directly.
desktop$ llvm-symbolizer --obj=bazel-bin/username/coredump/capture_corefile 0x00007f58ca51332b
LLVMSymbolizer: error reading file: The file was not recognized as a valid object file
??
??:0:0
For shared object, it also fails:
desktop$ llvm-symbolizer --obj=bazel-bin/username/coredump/coredump_pybind.so 0x00007f58ca51332b
_fini
??:0:0
I confirm the symbol table is not stripped:
file bazel-bin/username/coredump/coredump_pybind.so
bazel-bin/experimental/hjiang/coredump/coredump_pybind.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[md5/uuid]=baf3b4d9a8f7955b5db6b977843e2eb0, not stripped
Does someone know how to get the stacktrace with everything I have?
build the source code (C++ part) with -g3 option. From my understand, it should have the same binary and address as the one running on server.
This is by no means guaranteed, and actually pretty hard to achieve with GCC.
You didn't mention the compiler you use, but if it is Clang (implied by your later use of llvm-symbolizer), then note that Clang currently doesn't produce the same code with and without -g.
In addition, to make this work, you need to keep all the flags originally used (including all optimization flags) -- it's no good to replace -O2 with -g3 -- the binary will be vastly different.
You can check whether your rebuilt library is any good by running nm original.so, nm replacement.so, and comparing the addresses of any symbols which appear in both outputs. The replacement.so is usable IFF all common symbol's addresses match.
The best practice here is to build the .so with optimization and debug info (e.g. gcc ... -g3 -O2 ...), keep that binary for future debugging, but send a striped binary to the server. That way you are guaranteed to have the exact binary you need if/when the stripped binary crashes.
gdb -ex r bazel.../corefile
The above command asks gdb to run a core file, which makes no sense.
Whatever you tried to achieve here, that isn't the right way to do it.
Also, GDB (in general) can't help if you give it only the core -- for most tasks you also need the binary which produced that core.
Your first step should be to get a crash stack trace, as described e.g. here. Once you have something that looks like a reasonable stack trace, you could try swapping full-debug version of .so.

Bundle all dependencies in Cython compile [duplicate]

I am trying to make one unix executable file from my python source files.
I have two file, p1.py and p2.py
p1.py :-
from p2 import test_func
print (test_func())
p2.py :-
def test_func():
return ('Test')
Now, as we can see p1.py is dependent on p2.py . I want to make an executable file by combining two files together. I am using cython.
I changed the file names to p1.pyx and p2.pyx respectively.
Now, I can make file executable by using cython,
cython p1.pyx --embed
It will generate a C source file called p1.c . Next we can use gcc to make it executable,
gcc -Os -I /usr/include/python3.5m -o test p1.c -lpython3.5m -lpthread -lm -lutil -ldl
But how to combine two files into one executable ?
People are tempted to do this because it's fairly easy to do for the simplest case (one module, no dependencies). #ead's answer is good but honestly pretty fiddly and it is handling the next simplest case (two modules that you have complete control of, no dependencies).
In general a Python program will depend on a range of external modules. Python comes with a large standard library which most programs use to an extent. There's a wide range of third party libraries for maths, GUIs, web frameworks. Even tracing those dependencies through the libraries and working out what you need to build is complicated, and tools such as PyInstaller attempt it but aren't 100% reliable.
When you're compiling all these Python modules you're likely to come across a few Cython incompatibilities/bugs. It's generally pretty good, but struggles with features like introspection, so it's unlikely a large project will compile cleanly and entirely.
On top of that many of those modules are compiled modules written either in C, or using tools such as SWIG, F2Py, Cython, boost-python, etc.. These compiled modules may have their own unique idiosyncrasies that make them difficult to link together into one large blob.
In summary, it may be possible, but for non-trivial programs it is not a good idea however appealing it seems. Tools like PyInstaller and Py2Exe that use a much simpler approach (bundle everything into a giant zip file) are much more suitable for this task (and even then they struggle to be really robust).
Note this answer is posted with the intention of making this question a canonical duplicate for this problem. While an answer showing how it might be done is useful, "don't do this" is probably the best solution for the vast majority of people.
There are some loops you have to jump through to make it work.
First, you must be aware that the resulting executable is a very slim layer which just delegates the whole work to (i.e. calls functions from) pythonX.Ym.so. You can see this dependency when calling
ldd test
...
libpythonX.Ym.so.1.0 => not found
...
So, to run the program you either need to have the LD_LIBRARY_PATH showing to the location of the libpythonX.Ym.so or build the exe with --rpath option, otherwise at the start-up of test dynamic loader will throw an error similar to
/test: error while loading shared libraries: libpythonX.Ym.so.1.0: cannot open shared object file: No such file or directory
The generic build command would look like following:
gcc -fPIC <other flags> -o test p1.c -I<path_python_include> -L<path_python_lib> -Wl,-rpath=<path_python_lib> -lpython3.6m <other_needed_libs>
It is also possible to build against static version of the python-library, thus eliminating run time dependency on the libpythonX.Ym, see for example this SO-post.
The resulting executable test behaves exactly the same as if it were a python-interpreter. This means that now, test will fail because it will not find the module p2.
One simple solution were to cythonize the p2-module inplace (cythonize p2.pyx -i): you would get the desired behavior - however, you would have to distribute the resulting shared-object p2.so along with test.
It is easy to bundle both extension into one executable - just pass both cythonized c-files to gcc:
# creates p1.c:
cython --empbed p1.pyx
# creates p2.c:
cython p2.pyx
gcc ... -o test p1.c p2.c ...
But now a new (or old) problem arises: the resulting test-executable cannot once again find the module p2, because there is no p2.py and no p2.so on the python-path.
There are two similar SO questions about this problem, here and here. In your case the proposed solutions are kind of overkill, here it is enough to initialize the p2 module before it gets imported in the p1.pyx-file to make it work:
# making init-function from other modules accessible:
cdef extern object PyInit_p2();
#init/load p2-module manually
PyInit_p2() #Cython handles error, i.e. if NULL returned
# actually using already cached imported module
# no search in python path needed
from p2 import test_func
print(test_func())
Calling the init-function of a module prior to importing it (actually the module will not be really imported a second time, only looked up in the cache) works also if there are cyclic dependencies between modules. For example if module p2 imports module p3, which imports p2in its turn.
Warning: Since Cython 0.29, Cython uses multi-phase initialization per default for Python>=3.5, thus calling PyInit_p2 is not enough (see e.g. this SO-post). To switch off this multi-phase initialization -DCYTHON_PEP489_MULTI_PHASE_INIT=0should be passed to gcc or similar to other compilers.
Note: However, even after all of the above, the embedded interpreter will need its standard libraries (see for example this SO-post) - there is much more work to do to make it truly standalone! So maybe one should heed #DavidW's advice:
"don't do this" is probably the best solution for the vast majority of
people.
A word of warning: if we declare PyInit_p2() as
from cpython cimport PyObject
cdef extern PyObject *PyInit_p2();
PyInit_p2(); # TODO: error handling if NULL is returned
Cython will no longer handle the errors and its our responsibility. Instead of
PyObject *__pyx_t_1 = NULL;
__pyx_t_1 = PyInit_p2(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 4, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
produced for object-version, the generated code becomes just:
(void)(PyInit_p2());
i.e. no error checking!
On the other hand using
cdef extern from *:
"""
PyObject *PyInit_p2(void);
"""
object PyInit_p2()
will not work with g++ - one has to add extern C to declaration.

What is a reliable way to include libraries for virtualenv python packages in nix?

I'll start by saying virtualenv is basically a requirement here since Nix is not yet being used by the rest of the development team. This excellent guide on Python in Nix doesn't quite drill down to this particular issue.
In some cases I can update LD_LIBRARY_PATH, but it gets to be rather tedious and potentially error prone due to the dynamic nature of Python (a particular branch could trigger the use of a library not previously included in LD_LIBRARY_PATH):
shellHook = ''
export LD_LIBRARY_PATH=${mysql57}/lib:${gcc6}/lib:$LD_LIBRARY_PATH
'';
Worse, the ${ggc6}/lib doesn't work for me here, since the library I need (libatomic.so) is under the *-gcc-6.4.0-lib/lib directory, not the *-gcc-6.4.0/lib directory, and I'm not sure how to reference the former.
$ echo $LD_LIBRARY_PATH
/nix/store/x3x3si0pc3w0vam9jj308b0qhcv7zlg2-mysql-5.7.19/lib:/nix/store/mc8p626zjk9zlgji1i8f85nax4c62nrj-gcc-wrapper-6.4.0/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
Some output from find for libatomic:
/nix/store/rww78vdn2rkayrnqsjl8ib5iq2vfm3sw-gcc-6.4.0/lib/libatomic.a
/nix/store/klqzvvcy1xyjjflmf7agryayc4s72jg2-gcc-6.4.0-lib/lib/libatomic.so.1
/nix/store/klqzvvcy1xyjjflmf7agryayc4s72jg2-gcc-6.4.0-lib/lib/libatomic.so
/nix/store/klqzvvcy1xyjjflmf7agryayc4s72jg2-gcc-6.4.0-lib/lib/libatomic.la
/nix/store/klqzvvcy1xyjjflmf7agryayc4s72jg2-gcc-6.4.0-lib/lib/libatomic.so.1.2.0
I don't use the nixpkgs Python infrastructure much, so I'm not sure if there is a way to eliminate setting the LD_LIBRARY_PATH. Recommendations for setting it:
You can use lib.makeLibraryPath to make the process much less tedious. If you know that all the libraries that might be necessary are (mostly) in buildDepends anyway, you can use lib.makeLibraryPath (buildDepends ++ [ anything else ]).
The issue with GCC libraries has to do with the fact that Nix needs wrapped versions of GCC, so pkgs.gcc6 isn't the "raw" GCC derivation. You can use gcc6.cc.lib, or, if you are using makeLibraryPath, just gcc6.cc will be enough (as makeLibraryPath will automatically figure out that the lib output is the correct one to look under).

Compile a Python application to C

I have made an application in Python. It contains several plugins, organized into different subdirectories. I need to compile entirely to C code to improve security of source code. I have dealt with Cython, but cannot find how to compile the entire directory, with all plugin dependencies. I need a way to compile each of the dependencies to C, and that the application runs from C compiled.
http://docs.cython.org/src/quickstart/build.html
How to compile and link multiple python modules (or packages) using cython?
Python does not compile to native code. Scripts can be "frozen" with a few different tools, which makes them into standalone executables, but it's not actually compiling to C, it's packaging the script (or just its Python byte code representation) with a copy of the interpreter and all of its dependencies; the binary still has all the Python code (or the trivial byte code transform thereof) in it.
Cython lets you compile a syntactic variant of Python into Python C extensions, but they still run on the Python interpreter, and they still expose enough information to reverse the transformation.
Get the proper legal protections in place and freeze your Python executable if you like (freezing is enough to make the source code "non-obvious" even if anyone who went to trivial effort could get it back), but Python does not compile to plain C directly (if it did, I'd expect the CPython reference interpreter to do that more often just to gain performance with the built-in modules, yet they only write C accelerators by hand).

Is it possible to compile c code using python?

I want to build a python program that get as input a path to .c file and then it compile its.
The program will output OK to the screen if the compilation is sucessful, and BAD otherwise.
I'm been trying to google it, but could not find anything. I've been also trying to run cmd within python with an argument of the compiling program but it didn't work.
To clarify - I've already got a very specific compiler in my machine which I want to run. I dont want python to act as a compiler. Just get a code, run my compiler over it, and see what's the answer.
It should work on Linux server with python 2.4.
Thanks
You can compile C code using only the standard library, and it will work on every platform and with every Python version (assuming you actually have a C compiler available). Check out the distutils.ccompiler module which Python uses to compile C extension modules. A simple example:
// main.c
#include <stdio.h>
int main() {
printf("Hello world!\n");
return(0);
}
Compilation script:
# build.py
from distutils.ccompiler import new_compiler
if __name__ == '__main__':
compiler = new_compiler()
compiler.compile(['main.c'])
compiler.link_executable(['main.o'], 'main')
Everything else (include paths, library paths, custom flags or link args or macros) can be passed via various configuration options. Check out the above link to the module documentation for more info.
Sure, why not? Of course, you'd need GCC installed (or llvm) so you have something to compile with. You can just use os.system, or any of the other ways for calling an external program.
Of course, you're probably better off looking at something like SCons, which already exists to solve this problem.
Plus, to answer the question actually asked, there's nothing that would prevent you from writing a compiler/assembler/linker in python, they're just programs like anything else. Performance probably wouldn't be very good though.
The following steps should do the trick:
Get PLY. Python Lex and Yacc. http://www.dabeaz.com/ply/
Find a Yacc/Lex configuration for C. http://www.lysator.liu.se/c/ANSI-C-grammar-y.html
Tweak PLY to use the C language rules you found.
Run. You are "compiling" C code -- checking the syntax.
If I understood you clearly, you just want to run compiler with some arguments from python?
In this case, you can just to use os.system. http://docs.python.org/library/os.html#os.system
Or better way is module "subprocess". http://docs.python.org/library/subprocess.html#module-subprocess

Categories