I am a master's student working on replicating the results of the paper : https://www.microsoft.com/en-us/research/publication/not-all-bytes-are-equal-neural-byte-sieve-for-fuzzing/
I want to create an augmented fuzzer which rejects the modifications to seeds which it finds not useful. Any help in achieving this will be very much helpful.
I have created a simple python function for the augmented fuzzer. To test the implementation, I took the trivial "deadbeef" program and wrote the python function such that whenever the seed is modified to "deadbeef", the function sends a "not useful" return to the 'common_fuzz_stuff()' function of the AFL-fuzz code. It should mean that the fuzzer should not be able to find the crash. But it still is able to find the crash and I'm not able to determine where I have gone wrong.
Here is the python function for AFL:
def check_useful(seed):
my_string = str.encode('deadbeef')
file = open(seed, 'rb')
value = file.read()
if (value == my_string):
print('[*] Crash Found!')
return True
else:
return False
And here is the afl-fuzz.c code snippet:
/* Write a modified test case, run program, process results. Handle
error conditions, returning 1 if it's time to bail out. This is
a helper function for fuzz_one(). */
EXP_ST u8 common_fuzz_stuff(char** argv, u8* out_buf, u32 len) {
if (PyCallable_Check(pFuncCheckModel)){
pArgs = PyTuple_New(1);
PyTuple_SetItem(pArgs, 0, PyUnicode_FromString(queue_cur->fname));
pFuncReturn = PyObject_CallObject(pFuncCheckModel, pArgs);
if (PyObject_IsTrue(pFuncReturn)){
skip_requested = 1;
return 1;
}
} else
{
PyErr_Print();
}
How is my program still able to find the crash even if the return value is 1 from the common_fuzz_stuff() function for the seed "deadbeef"?
In case your decision whether this input is useful or not depends only on the input itself (not the mutation), as far as I understand, you could use the experimental/post_library stuff. The documentation is included in the example post_library and contains a note, that this is probably not what you want -- not you for your specific need, this is approximate cite from that documentation. :)
On the other hand, this single-function-API description contains the following:
2) If you want to skip this test case altogether and have AFL generate a
new one, return NULL. Use this sparingly - it's faster than running
the target program with patently useless inputs, but still wastes CPU
time.
To answer my own question:
I had to send out_file to the Python function instead of queue_cur->fname.
PyTuple_SetItem(pArgs, 0, PyUnicode_FromString(out_file));
Also skip_requested = 1; in the above code is redundant.
Now the fuzzer will run and will not find the crash
Related
I have looked at the other questions similar to this but they don't work for me well.
My question is I have this code here:
function pyInput(){
const buffers = [];
proc.stdout.on('data', (chunk) => buffers.push(chunk));
proc.stdout.on('end', () => {
const result = JSON.parse(Buffer.concat(buffers));
console.log('Python process exited, result:', result);
});
proc.stdin.write(JSON.stringify([['a','b',1],['b','c',-6],['c','a',4],['b','d',5],['d','a', -10]]));
proc.stdin.end();
}
The python function I'm trying to pass this to:
def createGraph(listOfAttr):
for i in range(len(listOfAttr)):
G.add_edge(listOfAttr[i][0], listOfAttr[i][1], weight = listOfAttr[i][2])
#createGraph([['a','b',1],['b','c',-6],['c','a',4],['b','d',5],['d','a', -10]])
my_list = json.load(sys.stdin)
json.dump(my_list,sys.stdout)
The code is basically for finding negative cycles in a graph, and I want to load that data in from node js. However my python program never finishes executing, it just gets stuck and I dont know why. For now I won't pass the list from Node into the py function, but I am trying to at least print it out to see if its being passed to python.
The json.load() from sys.stdin is probably the problem. Since sys.stdin is a pipe, it never actually closes until you tell it to, so it just hogs the stream, waiting for new data. You can probably hog the stream for just some time using input() until you receive some sort of input string telling you you hit the end, then moving along. Make sure to update your node.js script aswell, to feed each piece of data as line, terminated by a \n, and to send the end signal you specified once it's done.
I have a python wrapper holding some c++ code. In it is a function that I setup as a process from my python code. Its a while statement that I need to setup a condition for when it should shut down.
For this situation , the while statement is simple.
while(TERMINATE == 0)
I have data that is being sent back from within the while loop. I'm using pipe() to create 'in' and 'out' objects. I send the 'out' object to the function when I create the process.
fxn = self.FG.do_videosequence
(self.inPipe, self.outPipe) = Pipe()
self.stream = Process(target=fxn, args=(self.outPipe,))
self.stream.start()
As I mentioned, while inside the wrapper I am able to send data back to the python script with
PyObject *send = Py_BuildValue("s", "send_bytes");
PyObject_CallMethodObjArgs(pipe, send, temp, NULL);
This works just fine. However, I'm having issues with sending a message to the C++ code, in the wrapper, that tells the loop to stop.
What I figured I would do is just check poll(), as that is what I do on the python script side. I want to keep it simple. When the system sees that there is an incoming signal from the python script it would set TERMINATE = 1. so i wrote this.
PyObject *poll = Py_BuildValue("p", "poll");
As I'm expecting a true or false from the python function poll(). I figured "p" would be ideal as it would convert true to 1 and false to 0.
in the loop I have
if(PyObject_CallMethodObjArgs(pipe, poll, NULL, NULL))
TERMINATE = 1;
I wanted to use poll() as its non-blocking, like recv() is. This way I could just go about my other work and check poll() once a cycle.
however, when I send a signal from the python script it never trips.
self.inPipe.send("Hello");
I'm not sure where the disconnect is. When I print the poll() request, I get 0 the entire time. I'm either not calling it correctly, and its just defaulting to 0. or I'm not actually generating a signal to trip the poll() call. Thus its always 0.
Does anyone have any insight as what i am doing wrong?
*****UPDATE******
I found some other information.
PyObject *poll = Py_BuildValue("p", "poll");
should be
PyObject *poll = Py_BuildValue("s", "poll");
as I'm passing a string as a reference to the function im calling it should be referenced as a string. It has nothing to do with the return type.
From there the return of
PyObject_CallMethodObjArgs(pipe, poll, NULL, NULL)
is a pyobject so it needs to be checked against a pyobject. such as making a call to
PyObject_IsTrue
to determine if its true or false. I'll make changes to my code and if I have solution I'll update the post with an answer.
So I've been able to find the solution. In the end I was making two mistakes.
The first mistake was when I created the pyobject reference to the python function I was calling. I mistook the information and inserted a "p" thinking before reading the context. So
PyObject *poll = Py_BuildValue("p", "poll");
should be
PyObject *poll = Py_BuildValue("s", "poll");
The second mistake was how I was handling the return value of
PyObject_CallMethodObjArgs(pipe, poll, NULL, NULL)
while its true that its calling a python object, it is not returning a simple true false value, but rather a python object. So I specificly needed to handle the python object, by calling
PyObject_IsTrue(Pyobject o)
with the return of the poll() request as the argument. I now have the ability to send/recieve from both the python script and the C api contained in the wrapper.
I have just started learning python, so it may seem a foolish question but i really want to know what could be the real possibility of not using the while true for the python interpreter which Execute compiled code with the help of ceval.c instead of for (;;) here in the same code.
I know the interpreter must go in infinite loop until something is returned hence the infinite for loop was written like this
for (;;) {
#ifdef WITH_TSC
if (inst1 == 0) {
But going by the python own Principle Readability counts won't while true would have been a better option ?
Or this will have any performance difference ?
for (;;) {
is the classical way to make a C forever loop, stemming from the 1970's. I believe it's even in the original Kernighan and Ritchie book. It's idiomatic and a habit, there's no performance reason.
But strange enough most C programmers from that time would have written
if (!inst1) {
rather than
if (inst1 == 0) {
which makes this code a bit inconsistent stylewise...
I have some C code that calls a Python function. This Python function accepts an address and uses WINFUNCTYPE to eventually convert it to a function that Python can call. The C function send as a parameter to the Python function will eventually call another Python function. It is at this last step which causes a crash. So in short I go from C -> Python -> C -> Python. The last C -> Python causes a crash. I've been trying to understand the problem, but I have been unable to.
Can someone point out my problem?
C code compiled with Visual Studio 2010 and run with the args "c:\...\crash.py" and "func1":
#include <stdlib.h>
#include <stdio.h>
#include <Python.h>
PyObject* py_lib_mod_dict; //borrowed
void __stdcall cfunc1()
{
PyObject* py_func;
PyObject* py_ret;
int size;
PyGILState_STATE gil_state;
gil_state = PyGILState_Ensure();
printf("Hello from cfunc1!\n");
size = PyDict_Size(py_lib_mod_dict);
printf("The dictionary has %d items!\n", size);
printf("Calling with GetItemString\n");
py_func = PyDict_GetItemString(py_lib_mod_dict, "func2"); //fails here when cfunc1 is called via callback... will not even go to the next line!
printf("Done with GetItemString\n");
py_ret = PyObject_CallFunction(py_func, 0);
if (py_ret)
{
printf("PyObject_CallFunction from cfunc1 was successful!\n");
Py_DECREF(py_ret);
}
else
printf("PyObject_CallFunction from cfunc1 failed!\n");
printf("Goodbye from cfunc1!\n");
PyGILState_Release(gil_state);
}
int wmain(int argc, wchar_t** argv)
{
PyObject* py_imp_str;
PyObject* py_imp_handle;
PyObject* py_imp_dict; //borrowed
PyObject* py_imp_load_source; //borrowed
PyObject* py_dir; //stolen
PyObject* py_lib_name; //stolen
PyObject* py_args_tuple;
PyObject* py_lib_mod;
PyObject* py_func;
PyObject* py_ret;
Py_Initialize();
//import our python script
py_dir = PyUnicode_FromWideChar(argv[1], wcslen(argv[1]));
py_imp_str = PyString_FromString("imp");
py_imp_handle = PyImport_Import(py_imp_str);
py_imp_dict = PyModule_GetDict(py_imp_handle); //borrowed
py_imp_load_source = PyDict_GetItemString(py_imp_dict, "load_source"); //borrowed
py_lib_name = PyUnicode_FromWideChar(argv[2], wcslen(argv[2]));
py_args_tuple = PyTuple_New(2);
PyTuple_SetItem(py_args_tuple, 0, py_lib_name); //stolen
PyTuple_SetItem(py_args_tuple, 1, py_dir); //stolen
py_lib_mod = PyObject_CallObject(py_imp_load_source, py_args_tuple);
py_lib_mod_dict = PyModule_GetDict(py_lib_mod); //borrowed
printf("Calling cfunc1 from main!\n");
cfunc1();
py_func = PyDict_GetItem(py_lib_mod_dict, py_lib_name);
py_ret = PyObject_CallFunction(py_func, "(I)", &cfunc1);
if (py_ret)
{
printf("PyObject_CallFunction from wmain was successful!\n");
Py_DECREF(py_ret);
}
else
printf("PyObject_CallFunction from wmain failed!\n");
Py_DECREF(py_imp_str);
Py_DECREF(py_imp_handle);
Py_DECREF(py_args_tuple);
Py_DECREF(py_lib_mod);
Py_Finalize();
fflush(stderr);
fflush(stdout);
return 0;
}
Python code:
from ctypes import *
def func1(cb):
print "Hello from func1!"
cb_proto = WINFUNCTYPE(None)
print "C callback: " + hex(cb)
call_me = cb_proto(cb)
print "Calling callback from func1."
call_me()
print "Goodbye from func1!"
def func2():
print "Hello and goodbye from func2!"
Output:
Calling cfunc1 from main!
Hello from cfunc1!
The dictionary has 88 items!
Calling with GetItemString
Done with GetItemString
Hello and goodbye from func2!
PyObject_CallFunction from cfunc1 was successful!
Goodbye from cfunc1!
Hello from func1!
C callback: 0x1051000
Calling callback from func1.
Hello from cfunc1!
The dictionary has 88 items!
Calling with GetItemString
PyObject_CallFunction from wmain failed!
I added a PyErr_Print() to the end and this was the result:
Traceback (most recent call last):
File "C:\Programming\crash.py", line 9, in func1
call_me()
WindowsError: exception: access violation writing 0x0000000C
EDIT: Fixed a bug that abarnert pointed out. Output is unaffected.
EDIT: Added in the code that resolved the bug (acquiring the GIL lock in cfunc1). Thanks again abarnert.
The problem is this code:
py_func = PyDict_GetItemString(py_lib_mod_dict, "func2"); //fails here when cfunc1 is called via callback... will not even go to the next line!
printf("Done with GetItemString\n");
py_ret = PyObject_CallFunction(py_func, 0);
Py_DECREF(py_func);
As the docs say, PyDict_GetItemString returns a borrowed reference. So, the first time you call here, you borrow the reference, and decref it, causing it to be destroyed. The next time you call, you get back garbage, and try to call it.
So, to fix it, just remove the Py_DECREF(py_func) (or add Py_INCREF(py_func) after the pyfunc = line).
Actually, you will usually get back a special "dead" object, so you can test this pretty easily: put a PyObject_Print(py_func, stdout) after the py_func = line and after the Py_DECREF line, and you'll probably see something like <function func2 at 0x10b9f1230> the first time, <refcnt 0 at 0x10b9f1230> the second and third times (and you won't see the fourth, because it'll crash before you get there).
I don't have a Windows box handy, but changing wmain, wchar_t, PyUnicode_FromWideChar, WINFUNCTYPE, etc. to main, char, PyString_FromString, CFUNCTYPE, etc., I was able to build and run your code, and I get a crash in the same place… and the fix works.
Also… shouldn't you be holding the GIL inside cfunc1? I don't often write code like this, so maybe I'm wrong. And I don't get a crash with the code as-is. Obviously, spawning a thread to run cfunc1 does crash, and PyGILState_Ensure/Release solves that crash… but that doesn't prove you need anything in the single-threaded case. So maybe this isn't relevant… but if you get another crash after fixing the first one (in the threaded case, mine looked like Fatal Python error: PyEval_SaveThread: NULL tstate), look into this.
By the way, if you're new to Python extending and embedding: A huge number of unexplained crashes are, like this one, caused by manual refcounting errors. That's the reason things like boost::python, etc. exist. It's not that it's impossible to get it right with the plain C API, just that it's so easy to get it wrong, and you will have to get used to debugging problems like this.
abarnert's answer provided the correct functions to call, however the explanation bothered me so I came home early and poked around some more.
Before I go into the explanation, I want to mention that when I say GIL, I strictly mean the mutex, semaphore, or whatever that the Global Interpreter Lock uses to do the thread synchronization. This does not include any other housekeeping that Python does before/after it acquires and releases the GIL.
Single threaded programs do not initialize the GIL because you never call PyEval_InitThreads(). Thus there is no GIL. Even if there was locking going on, it shouldn't matter because it's single threaded. However, functions that acquire and release the GIL also do some funny stuff like mess with the thread state in addition to acquiring/releasing the GIL. Documentation on WINFUNCTYPE objects explicitly states that it releases the GIL before making the jump to C. So when the C callback was called in Python, I suspect something like PyEval_SaveThread() is called (maybe in error because it's only suppose to be called in threaded operations at least from my understanding). This would release the GIL (if it existed) and set the thread state to become NULL, however there's no GIL in single threaded Python programs, thus all it really does is just set the thread state to NULL. This causes the majority of Python functions in the C callback to fail hard.
Really the only benefit of calling PyGILState_Ensure/Release is to tell Python to set the thread state to something valid before running off and doing things. There's not a GIL to acquire (not initialized because I never called PyEval_InitThreads()).
To test my theory: In the main function I use PyThreadState_Swap(NULL) to grab a copy of the thread state object. I restore it during the callback and everything works fine. If I keep the thread state at null, I get pretty much the same access violation even without doing a Python -> C callback. Inside cfunc1, I restore the thread state and there's no more problems cfunc1 itself during the Python -> C callback.
There is an issue when cfunc1 returns into Python code, but that's probably because I messed with the thread state and the WINFUNCTYPE object is expecting something totally different. If you keep the thread the state without setting it back to null when returning, Python just sits there and does nothing. If you restore it back to null, it crashes. However, it does successfully executes cfunc1 so I'm not sure I care too much.
I may eventually go poke around in the Python source code to be 100% sure, but I'm sure enough to be satisfied.
I realize "fast" is a bit subjective so I'll explain with some context. I'm working on a Python module called psutil for reading process information in a cross-platform way. One of the functions is a pid_exists(pid) function for determining if a PID is in the current process list.
Right now I'm doing this the obvious way, using EnumProcesses() to pull the process list, then interating through the list and looking for the PID. However, some simple benchmarking shows this is dramatically slower than the pid_exists function on UNIX-based platforms (Linux, OS X, FreeBSD) where we're using kill(pid, 0) with a 0 signal to determine if a PID exists. Additional testing shows it's EnumProcesses that's taking up almost all the time.
Anyone know a faster way than using EnumProcesses to determine if a PID exists? I tried OpenProcess() and checking for an error opening the nonexistent process, but this turned out to be over 4x slower than iterating through the EnumProcesses list, so that's out as well. Any other (better) suggestions?
NOTE: This is a Python library intended to avoid third-party lib dependencies like pywin32 extensions. I need a solution that is faster than our current code, and that doesn't depend on pywin32 or other modules not present in a standard Python distribution.
EDIT: To clarify - we're well aware that there are race conditions inherent in reading process iformation. We raise exceptions if the process goes away during the course of data collection or we run into other problems. The pid_exists() function isn't intended to replace proper error handling.
UPDATE: Apparently my earlier benchmarks were flawed - I wrote some simple test apps in C and EnumProcesses consistently comes out slower and OpenProcess (in conjunction with GetProcessExitCode in case the PID is valid but the process has stopped) is actually much faster not slower.
OpenProcess could tell you w/o enumerating all. I have no idea how fast.
EDIT: note that you also need GetExitCodeProcess to verify the state of the process even if you get a handle from OpenProcess.
Turns out that my benchmarks evidently were flawed somehow, as later testing reveals OpenProcess and GetExitCodeProcess are much faster than using EnumProcesses after all. I'm not sure what happened but I did some new tests and verified this is the faster solution:
int pid_is_running(DWORD pid)
{
HANDLE hProcess;
DWORD exitCode;
//Special case for PID 0 System Idle Process
if (pid == 0) {
return 1;
}
//skip testing bogus PIDs
if (pid < 0) {
return 0;
}
hProcess = handle_from_pid(pid);
if (NULL == hProcess) {
//invalid parameter means PID isn't in the system
if (GetLastError() == ERROR_INVALID_PARAMETER) {
return 0;
}
//some other error with OpenProcess
return -1;
}
if (GetExitCodeProcess(hProcess, &exitCode)) {
CloseHandle(hProcess);
return (exitCode == STILL_ACTIVE);
}
//error in GetExitCodeProcess()
CloseHandle(hProcess);
return -1;
}
Note that you do need to use GetExitCodeProcess() because OpenProcess() will succeed on processes that have died recently so you can't assume a valid process handle means the process is running.
Also note that OpenProcess() succeeds for PIDs that are within 3 of any valid PID (See Why does OpenProcess succeed even when I add three to the process ID?)
There is an inherent race condition in the use of pid_exists function: by the time the calling program gets to use the answer, the process may have already disappeared, or a new process with the queried id may have been created. I would dare say that any application that uses this function is flawed by design and that optimizing this function is therefore not worth the effort.
I'd code Jay's last function this way.
int pid_is_running(DWORD pid){
HANDLE hProcess;
DWORD exitCode;
//Special case for PID 0 System Idle Process
if (pid == 0) {
return 1;
}
//skip testing bogus PIDs
if (pid < 0) {
return 0;
}
hProcess = handle_from_pid(pid);
if (NULL == hProcess) {
//invalid parameter means PID isn't in the system
if (GetLastError() == ERROR_INVALID_PARAMETER) {
return 0;
}
//some other error with OpenProcess
return -1;
}
DWORD dwRetval = WaitForSingleObject(hProcess, 0);
CloseHandle(hProcess); // otherwise you'll be losing handles
switch(dwRetval) {
case WAIT_OBJECT_0;
return 0;
case WAIT_TIMEOUT;
return 1;
default:
return -1;
}
}
The main difference is closing the process handle (important when the client of this function is running for a long time) and the process termination detection strategy. WaitForSingleObject gives you the opportunity to wait for a while (changing the 0 to a function parameter value) until the process ends.