KeyboardHookProc in DLL doesn't do anything when called from python - python

I've been trying to write a DLL in C.
Install hook sets up the KeyboardProc. Calling the InstallHook() and UninstallHook() functions from Python always returns a 0, which I guess is because my callback function KeyboardProc isn't working.
The following is my C code for the DLL:
#include "stdafx.h"
#include <windows.h>
#include <string.h>
#include <ctype.h>
#include <stdio.h>
#include "ourdll.h"
//#pragma comment(linker, "/SECTION:.SHARED,RWS")
//#pragma data_seg(".SHARED")
HHOOK hKeyboardHook = 0;
int keypresses = 0;
HMODULE hInstance = 0;
//#pragma data_seg()
BOOL WINAPI DllMain (HANDLE hModule, DWORD dwFunction, LPVOID lpNot)
{
hInstance = hModule; //Edit
return TRUE;
}
LRESULT CALLBACK KeyboardProc(int hookCode, WPARAM vKeyCode, LPARAM flags)
{
if(hookCode < 0)
{
return CallNextHookEx(hKeyboardHook, hookCode, vKeyCode, flags);
}
keypresses++;;
return CallNextHookEx(hKeyboardHook, hookCode, vKeyCode, flags);
}
__declspec(dllexport) void InstallHook(void)
{
hKeyboardHook = SetWindowsHookEx(WH_KEYBOARD, KeyboardProc, hInstance, 0);
}
__declspec(dllexport) int UninstallHook(void)
{
UnhookWindowsHookEx(hKeyboardHook);
hKeyboardHook = 0;
return keypresses;
}
The Python code to use this is as follows:
>>> from ctypes import *
>>> dll = CDLL('C:\...\OurDLL.dll')
>>> dll.InstallHook()
[Type something at this point]
>>> result = dll.UninstallHook()
>>> result
0
EDIT: I should probably mention that I've also tried out a LowLevelKeyboardHook. I understand that the LowLevel hook is global and will catch all keystrokes, but that just caused my dll.InstallHook() Python code to freeze for a second or two before returning zero.
I am no expert in C, so any help would be greatly appreciated. Thanks.

hKeyboardHook = SetWindowsHookEx(WH_KEYBOARD, KeyboardProc, NULL, 0);
SetWindowsHookEx requires a hModule - save the hModule from DllMain and pass it here. (You can pass NULL only if the thread id is your own thread.)
One exception to this is for the _LL hook types; these don't need a hmodule param since these hook don't inject into the target process - that's why your code using KEYBOARD_LL is 'succeeding'.
As for why it might be blocking when you use KEYBOARD_LL - docs for LowLevelKeyboardHookProc mention that the thread that installs the hook (ie. calls SetWindowsHookEx) must have a message loop, which you might not have in your python code.
Debugging tips: it looks like SetWindowsHookEx should be returning NULL (with GetLastError() returning a suitable error code); while developing code, using some combination of assert/printf/OutputDebugString as appropriate to check these return values is a good way to ensure that your assumptions are correct and give you some clues as to where things are going wrong.
BTW, one other thing to watch for with KEYBOARD vs KEYBOARD_LL: the KEYBOARD hook gets loaded into the target process - but only if it's the same bitness - so a 32-bit hook only sees keys pressed by other 32-bit processes. OTOH, KEYBOARD_LL is called back in your own process, so you get to see all keys - and also don't need to deal with the shared segment (though as far as I know it's also less efficient as a KEYBOARD hook).

Related

How to force/test malloc failure in shared library when called via Python ctypes

I have a Python program that calls a shared library (libpq in this case) that itself calls malloc under the hood.
I want to be able to test (i.e. in unit tests) what happens when those calls to malloc fail (e.g. when there isn't enough memory).
How can I force that?
Note: I don't think setting a resource limit on the process using ulimit -d would work. It would need to be be precise and robust enough to, say, make a single malloc call inside libpq, for example one inside PQconnectdbParams, to fail, but all others to work fine, across different versions of Python, and even different resource usages in the same version of Python.
It's possible, but it's tricky. In summary
You can override malloc in a shared library, test_malloc_override.so say, and then (on linux at least) using the LD_PRELOAD environment variable to load it.
But... Python calls malloc all over the place, and you need those to succeed. To isolate the "right" calls to malloc to fail you can use the glibc functions "backtrace" and "backtrace_symbols" to inspect the stack to see if it's the right one to fail.
This shared library exposes a small API to control which calls to malloc will fail (so it doesn't need to be hard coded in the library)
To allow some calls to malloc to succeed, you need a pointer to the original malloc function. However, to find this you need to call dlsym, which itself can call malloc. So you need to build in a simple allocator inside the new malloc so these calls (recursive) calls to malloc succeed. Thanks to https://stackoverflow.com/a/10008252/1319998 for this tip.
In more detail:
The shared library code
// In test_override_malloc.c
// Some of this code is inspired by https://stackoverflow.com/a/10008252/1319998
#define _GNU_SOURCE
#include <dlfcn.h>
#include <execinfo.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// Fails malloc at the fail_in-th call when search_string is in the backtrade
// -1 means never fail
static int fail_in = -1;
static char search_string[1024];
// To find the original address of malloc during malloc, we might
// dlsym will be called which might allocate memory via malloc
static char initialising_buffer[10240];
static int initialising_buffer_pos = 0;
// The pointers to original memory management functions to call
// when we don't want to fail
static void *(*original_malloc)(size_t) = NULL;
static void (*original_free)(void *ptr) = NULL;
void set_fail_in(int _fail_in, char *_search_string) {
fail_in = _fail_in;
strncpy(search_string, _search_string, sizeof(search_string));
}
void *
malloc(size_t size) {
void *memory = NULL;
int trace_size = 100;
void *stack[trace_size];
static int initialising = 0;
static int level = 0;
// Save original
if (!original_malloc) {
if (initialising) {
if (size + initialising_buffer_pos >= sizeof(initialising_buffer)) {
exit(1);
}
void *ptr = initialising_buffer + initialising_buffer_pos;
initialising_buffer_pos += size;
return ptr;
}
initialising = 1;
original_malloc = dlsym(RTLD_NEXT, "malloc");
original_free = dlsym(RTLD_NEXT, "free");
initialising = 0;
}
// If we're in a nested malloc call (the backtrace functions below can call malloc)
// then call the original malloc
if (level) {
return original_malloc(size);
}
++level;
if (fail_in == -1) {
memory = original_malloc(size);
} else {
// Find if we're in the stack
backtrace(stack, trace_size);
char **symbols = backtrace_symbols(stack, trace_size);
int found = 0;
for (int i = 0; i < trace_size; ++i) {
if (strstr(symbols[i], search_string) != NULL) {
found = 1;
break;
}
}
free(symbols);
if (!found) {
memory = original_malloc(size);
} else {
if (fail_in > 0) {
memory = original_malloc(size);
}
--fail_in;
}
}
--level;
return memory;
}
void free(void *ptr) {
if (ptr < (void*) initialising_buffer || ptr > (void*)(initialising_buffer + sizeof(initialising_buffer))) {
original_free(ptr);
}
}
Compiled with
gcc -shared -fPIC test_override_malloc.c -o test_override_malloc.so -ldl
Example Python code
This could go inside the unit tests
# Inside my_test.py
from ctypes import cdll
cdll.LoadLibrary('./test_override_malloc.so').set_fail_in(0, b'libpq.so')
# ... then call a function in the shared library libpq.so
# The `0` above means the very next call it makes to malloc will fail
Run with
LD_PRELOAD=$PWD/test_override_malloc.so python3 my_test.py
(This might all not be worth it admittedly... if Python calls malloc a lot, I wonder if that in most situations it's unlikely that Python will be fine but just the one call in the library will fail)

Freeze/Fail when using functional with OpenMP [Pybind11/OpenMP]

I have a problem with the functional feature of Pybind11 when I use it with a for-loop with OpenMP. I've done some research and my problem sounds pretty similar to the one in this Pull Request from 2 years ago, but although this PR is closed and the issue seems to be fixed I still have this issue. A code example I created will hopefully explain my problem better:
b.h
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <omp.h>
namespace py = pybind11;
class B {
public:
B(int n, const int& initial_value);
void map(const std::function<int(int)> &f);
private:
int n;
int* elements;
};
b.cpp
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include "b.h"
namespace py = pybind11;
B::B(int n, const int& v)
: n(n) {
elements = new int[n];
#pragma omp parallel for
for (int i = 0; i < n; i++) {
elements[i] = v;
}
}
void B::map(const std::function<int(int)> &f) {
#pragma omp parallel for
for (int i = 0; i < n; i++) {
elements[i] = f(elements[i]);
}
}
PYBIND11_MODULE(m, handle) {
handle.doc() = "Example Module";
py::class_<B>(handle, "B")
.def(py::init<int, int>())
.def("map", &B::map)
;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.4...3.18)
project(example)
find_package(OpenMP)
add_subdirectory(pybind11)
pybind11_add_module(m b.cpp)
if(OpenMP_CXX_FOUND)
target_link_libraries(m PUBLIC OpenMP::OpenMP_CXX)
else()
message( FATAL_ERROR "Your compiler does not support OpenMP" )
endif()
test.py
from build.m import *
def test(i):
return i * 20
b = B(2, 2)
b.map(test)
I basically have an array where I want to apply a Python function to every element using a for-loop. I know that it is an issue with functional and OpenMP specifically because in other parts of my project I am using OpenMP successfully and functional is also working if I am not using OpenMP.
Edit: It freezes at the map function and has to be terminated. I am using Ubuntu 21.10, Python 3.9, GCC 11.2.0, OpenMP 4.5, and the newest version of the pybind11 repo.
You're likely experiencing a deadlock between OpenMP's scheduler and Python's GIL (Global Interpreter Lock).
I suggest attaching gdb to your process and looking at where the threads are to verify that's really the problem.
IMHO mixing Python functions and OpenMP like that is asking for trouble. If you want multi-threading of Python functions you can use multiprocessing.pool.ThreadPool. But unless your functions release the GIL most of the time you won't benefit from multi-threading.

dll export function to ctypes

Background
I have some functions which are written in C++, which require high real-time performance. I want to quickly export these functions as dynamic link library to be exposed to Python so that I could do some high level programming.
In these functions, in order to simply usage, I use PyList_New in <Python.h> to collect some intermedia data. But I met some errors.
Code Example
I found the core problem is that I CAN'T event export a python object. After compiling the source to dll and use ctypes to load it, result shows
OSError: exception: access violation reading 0x0000000000000008
C++ code:
#include <Python.h>
#ifdef _MSC_VER
#define DLL_EXPORT __declspec( dllexport )
#else
#define DLL_EXPORT
#endif
#ifdef __cplusplus
extern "C"{
#endif
DLL_EXPORT PyObject *test3() {
PyObject* ptr = PyList_New(10);
return ptr;
}
#ifdef __cplusplus
}
#endif
Python test code:
if __name__ == "__main__":
import ctypes
lib = ctypes.cdll.LoadLibrary(LIB_DLL)
test3 = lib.test3
test3.argtypes = None
test3.restype = ctypes.py_object
print(test3())
Environment Config
Clion with Microsoft Visual Studio 2019 Community, and the arch is amd64.
I know that, the right way is to use the recommanded method to wrap C++ source using Python/C Api to a module, but it seems that I have to code a lot. Anyone can help?
ctypes is normally for calling "regular" C functions, not Python C API functions, but it can be done. You must use PyDLL to load a function that uses Python, as it won't release the GIL (global intepreter lock) required to be held when using Python functions. Your code as shown is invalid, however, because it doesn't populate the list it creates (using OP code as test.c):
>>> from ctypes import *
>>> lib = PyDLL('./test')
>>> lib.test3.restype=py_object
>>> lib.test3()
[<NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>]
Instead, write a C or C++ function normally:
test.cpp
#ifdef _MSC_VER
#define DLL_EXPORT __declspec( dllexport )
#else
#define DLL_EXPORT
#endif
#ifdef __cplusplus
extern "C"{
#endif
DLL_EXPORT int* create(int n) {
auto p = new int[n];
for(int i = 0; i < n; ++i)
p[i] = i;
return p;
}
DLL_EXPORT void destroy(int* p) {
delete [] p;
}
#ifdef __cplusplus
}
#endif
test.py
from ctypes import *
lib = CDLL('./test')
lib.create.argtypes = c_int,
lib.create.restype = POINTER(c_int)
lib.destroy.argtypes = POINTER(c_int),
lib.destroy.restype = None
p = lib.create(5)
print(p) # pointer to int
print(p[:5]) # convert to list...pointer doesn't have length so slice.
lib.destroy(p) # free memory
Output:
<__main__.LP_c_long object at 0x000001E094CD9DC0>
[0, 1, 2, 3, 4]
I solved it by myself. Just change to Release and all the problems are solved.

Python Initialization fails with dynamically loaded DLL [duplicate]

I have a C++ program and it has sort of plugin structure: when program starts up, it's looking for dll in the plugin folder with certain exported function signatures, such as:
void InitPlugin(FuncTable* funcTable);
Then the program will call the function in the dll to initialize and pass function pointers to the dll. From that time on, the dll can talk to the program.
I know Cython let you call C function in Python, but I'm not sure can I write a Cython code and compile it to a dll so my C++ program can initialize with it. An example code would be great.
Using cython-module in a dll is not unlike using a cython-module in an embeded python interpreter.
The first step would be to mark cdef-function which should be used from external C-code with public, for example:
#cyfun.pyx:
#doesn't need python interpreter
cdef public int double_me(int me):
return 2*me;
#needs initialized python interpreter
cdef public void print_me(int me):
print("I'm", me);
cyfun.c and cyfun.h can be generated with
cython -3 cyfun.pyx
These files will be used for building of the dll.
The dll will need one function to initialize the python interpreter and another to finalize it, which should be called only once before double_me and print_me can be used (Ok, double_me would work also without interpreter, but this is an implementation detail). Note: the initialization/clean-up could be put also in DllMain - see such a version further bellow.
The header-file for the dll would look like following:
//cyfun_dll.h
#ifdef BUILDING_DLL
#define DLL_PUBLIC __declspec(dllexport)
#else
#define DLL_PUBLIC __declspec(dllimport)
#endif
//return 0 if everything ok
DLL_PUBLIC int cyfun_init();
DLL_PUBLIC void cyfun_finalize();
DLL_PUBLIC int cyfun_double_me(int me);
DLL_PUBLIC void cyfun_print_me(int me);
So there are the necessary init/finalize-functions and the symbols are exported via DLL_PUBLIC (which needs to be done see this SO-post) so it can be used outside of the dll.
The implementation follows in cyfun_dll.c-file:
//cyfun_dll.c
#define BUILDING_DLL
#include "cyfun_dll.h"
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "cyfun.h"
DLL_PUBLIC int cyfun_init(){
int status=PyImport_AppendInittab("cyfun", PyInit_cyfun);
if(status==-1){
return -1;//error
}
Py_Initialize();
PyObject *module = PyImport_ImportModule("cyfun");
if(module==NULL){
Py_Finalize();
return -1;//error
}
return 0;
}
DLL_PUBLIC void cyfun_finalize(){
Py_Finalize();
}
DLL_PUBLIC int cyfun_double_me(int me){
return double_me(me);
}
DLL_PUBLIC void cyfun_print_me(int me){
print_me(me);
}
Noteworthy details:
we define BUILDING_DLL so DLL_PUBLIC becomes __declspec(dllexport).
we use cyfun.h generated by cython from cyfun.pyx.
cyfun_init inizializes python interpreter and imports the built-in module cyfun. The somewhat complicated code is because since Cython-0.29, PEP-489 is default. More information can be found in this SO-post. If the Python-interpreter isn't initialized or if the module cyfun is not imported, the chances are high, that calling the functionality from cyfun.h will end in a segmentation fault.
cyfun_double_me just wraps double_me so it becomes visible outside of the dll.
Now we can build the dll!
:: set up tool chain
call "<path_to_vcvarsall>\vcvarsall.bat" x64
:: build cyfun.c generated by cython
cl /Tccyfun.c /Focyfun.obj /c <other_coptions> -I<path_to_python_include>
:: build dll-wrapper
cl /Tccyfun_dll.c /Focyfun_dll.obj /c <other_coptions> -I<path_to_python_include>
:: link both obj-files into a dll
link cyfun.obj cyfun_dll.obj /OUT:cyfun.dll /IMPLIB:cyfun.lib /DLL <other_loptions> -L<path_to_python_dll>
The dll is now built, but the following details are noteworthy:
<other_coptions> and <other_loptions> can vary from installation to installation. An easy way is to see them is to run cythonize some_file.pyx` and to inspect the log.
we don't need to pass python-dll, because it will be linked automatically, but we need to set the right library-path.
we have the dependency on the python-dll, so later on it must be somewhere where it can be found.
Were you go from here depends on your task, we test our dll with a simple main:
//test.c
#include "cyfun_dll.h"
int main(){
if(0!=cyfun_init()){
return -1;
}
cyfun_print_me(cyfun_double_me(2));
cyfun_finalize();
return 0;
}
which can be build via
...
:: build main-program
cl /Tctest.c /Focytest.obj /c <other_coptions> -I<path_to_python_include>
:: link the exe
link test.obj cyfun.lib /OUT:test_prog.exe <other_loptions> -L<path_to_python_dll>
And now calling test_prog.exe leads to the expected output "I'm 4".
Depending on your installation, following things must be considered:
test_prog.exe depends on pythonX.Y.dll which should be somewhere in the path so it can be found (the easiest way is to copy it next to the exe)
The embeded python interpreter needs an installation, see this and/or this SO-posts.
IIRC, it is not a great idea to initialize, then to finalize and then to initialize the Python-interpreter again (that might work for some scenarios, but not all , see for example this) - the interpreter should be initialized only once and stay alive until the programs ends.
Thus, it may make sense to put initialization/clean-up code into DllMain (and make cyfun_init() and cyfun_finalize() private), e.g.
BOOL WINAPI DllMain(
HINSTANCE hinstDLL, // handle to DLL module
DWORD fdwReason, // reason for calling function
LPVOID lpReserved ) // reserved
{
// Perform actions based on the reason for calling.
switch( fdwReason )
{
case DLL_PROCESS_ATTACH:
return cyfun_init()==0;
case DLL_PROCESS_DETACH:
cyfun_finalize();
break;
case DLL_THREAD_ATTACH:
// Do thread-specific initialization.
break;
case DLL_THREAD_DETACH:
// Do thread-specific cleanup.
break;
}
return TRUE;
}
If your C/C++-program already has an initialized Python-interpreter it would make sense to offer a function which only imports the module cyfun and doesn't initialize the python-interpreter. In this case I would define CYTHON_PEP489_MULTI_PHASE_INIT=0, because PyImport_AppendInittab must be called before Py_Initialize, which might be already too late when the dll is loaded.
I'd imagine it'd be difficult to call it directly, with Cython depending so much on the Python runtime.
Your best bet is to embed a Python interpreter directly inside your app, e.g. as described in this answer, and call your Cython code from the interpreter. That's what I would do.

Python C API free() errors after using Py_SetPath() and Py_GetPath()

I'm trying to figure out why I can't simply get and set the python path through its C API. I am using Python3.6, on Ubuntu 17.10 with gcc version 7.2.0. Compiling with:
gcc pytest.c `python3-config --libs` `python3-config --includes`
#include <Python.h>
int main()
{
Py_Initialize(); // removes error if put after Py_SetPath
printf("setting path\n"); // prints
Py_SetPath(L"/usr/lib/python3.6"); // Error in `./a.out': free(): invalid size: 0x00007fd5a8365030 ***
printf("success\n"); // doesn't print
return 0;
}
Setting the path works fine, unless I also try to get the path prior to doing so. If I get the path at all, even just to print without modifying the returned value or anything, I get a "double free or corruption" error.
Very confused. Am I doing something wrong or is this a bug? Anyone know a workaround if so?
Edit: Also errors after calling Py_Initialize();. Updated code. Now errors even if I don't call Py_GetPath() first.
From alk it seems related to this bug: https://bugs.python.org/issue31532
Here is the workaround I am using. Since you can't call Py_GetPath() before Py_Initialize(), and also seemingly you can't call Py_SetPath() after Py_Initialize(), you can add to or get the path like this after calling Py_Initialize():
#include <Python.h>
int main()
{
Py_Initialize();
// get handle to python sys.path object
PyObject *sys = PyImport_ImportModule("sys");
PyObject *path = PyObject_GetAttrString(sys, "path");
// make a list of paths to add to sys.path
PyObject *newPaths = PyUnicode_Split(PyUnicode_FromWideChar(L"a:b:c", -1), PyUnicode_FromWideChar(L":", 1), -1);
// iterate through list and add all paths
for(int i=0; i<PyList_Size(newPaths); i++) {
PyList_Append(path, PyList_GetItem(newPaths, i));
}
// print out sys.path after appends
PyObject *newlist = PyUnicode_Join(PyUnicode_FromWideChar(L":", -1), path);
printf("newlist = %ls\n", PyUnicode_AsWideCharString(newlist, NULL));
return 0;
}
[the below answer refers to this version of the question.]
From the docs:
void Py_Initialize()
Initialize the Python interpreter. In an application embedding Python, this should be called before using any other Python/C API functions; with the exception of Py_SetProgramName(), Py_SetPythonHome() and Py_SetPath().
But the code you show does call Py_GetPath() before it calls Py_Initialize();, which it per the above paragraph implicitly should not.

Categories