C++ Implementation of the GDAL Algorithm GDALFillNodata()

C++ Implementation of the GDAL Algorithm GDALFillNodata() - python

Looking for some help with some C++ code to implement/call the GDALFillNoData() algorithm. I already have a working version using python and gdal, which is somewhat slow filling elevation DEMs (1.5GB). I'm curious if this is possible. I've written the code for a command line application and posted it here. File paths are hard coded at the moment. It builds (CodeBlocks 16.1/MinGW) and runs in but then crashes.
I'm not a C++ programmer, though I wish I were, but I'm trying to understand the language better. I'm moderately decent at python. I'm likely missing some thing basic here that's normal to C++.
There likely code that's been commented out through testing. So if something doesn't make sense that's why.
Here's the Code:
#include <iostream>
#include "gdal.h"
#include "gdal_priv.h"
#include "cpl_conv.h"
#include "gdal_alg.h"
int main()
{
GDALAllRegister();
//CPLPushErrorHandler(CPLQuietErrorHandler);
// Read/Write Files
const char *input = "D:/myIn.tif";
GDALDataset *pSrcDataset;
//GDALRasterBandH hMaskBand;
GDALRasterBand *poBand;
//CPLErr maskBand;
int maskFlags;
int noData;
double maxSearch = 10.0;
int maxInt = 1;
int nBlockXSize, nBlockYSize;
double adfGeoTransform[6];
//CPLErr eErr;
pSrcDataset = (GDALDataset*) GDALOpen(input, GA_Update);
CPLAssert( pSrcDataset != NULL );
poBand = pSrcDataset->GetRasterBand( 1 );
poBand->GetBlockSize( &nBlockXSize, &nBlockYSize );
printf( "Block=%dx%d Type=%s, ColorInterp=%s\n",
nBlockXSize, nBlockYSize,
GDALGetDataTypeName(poBand->GetRasterDataType()),
GDALGetColorInterpretationName(
poBand->GetColorInterpretation()) );
noData = pSrcDataset->GetRasterBand(1)->GetNoDataValue();
printf( "No Data Value = %i\n",noData );
printf( "Driver: %s/%s\n",
pSrcDataset->GetDriver()->GetDescription(),
pSrcDataset->GetDriver()->GetMetadataItem( GDAL_DMD_LONGNAME ) );
printf( "Size is %dx%dx%d\n",
pSrcDataset->GetRasterXSize(), pSrcDataset->GetRasterYSize(),
pSrcDataset->GetRasterCount() );
if( pSrcDataset->GetProjectionRef() != NULL )
printf( "Projection is `%s'\n", pSrcDataset->GetProjectionRef() );
if( pSrcDataset->GetGeoTransform( adfGeoTransform ) == CE_None )
printf( "Origin = (%.6f,%.6f)\n", adfGeoTransform[0],
adfGeoTransform[3] );
printf( "Pixel Size = (%.6f,%.6f)\n",adfGeoTransform[1],
adfGeoTransform[5] );
//maskBand = pSrcDataset->GetRasterBand(1)->GetMaskBand();
//hMaskBand = GDALGetMaskBand( maskBand );
//hMaskBand = pSrcDataset->GetRasterBand(1)->GetNoDataValue();
maskFlags = pSrcDataset->GetRasterBand(1)->GetMaskFlags();
printf ( "Mask Flags = %i\n", maskFlags );
printf ( "Processing image\n" );
// Ignore that this is on two lines here
GDALFillNodata(pSrcDataset, pSrcDataset->GetRasterBand(1)-
>GetMaskBand(), maxSearch, 0, maxInt, NULL, NULL, NULL);
//CPLAssert( eErr == CE_None);
GDALClose(pSrcDataset);
return 0;
}
Some errors that I'm getting when running this code (see image links). The process returns a 255 which is I think something that is unique to CodeBlocks?
Program Crashes
Process returns 255
Here is the python implementation. Pretty straight forward. Is None the same as NULL? Because one of the errors I got when using NULL as the hMaskBand (rasterfill.cpp)
#Run the gdal fill
ET = gdal.Open(infile, GA_Update)
ETband = ET.GetRasterBand(1)
result = gdal.FillNodata(targetBand = ETband, maskBand = None,
maxSearchDist = 500, smoothingIterations = 1)
print result # return 0
ET = None
Please let me know if you need more information. Knowing what little I know about C++ it's probably my build environment. :)
Thanks,
Heath

As a follow up for the comments above, the problem is the fact that the first argument given to the GDALFillNoData API is a GDALDataset* instead of a GDALRasterBand*.
But I've been struggling too much with GDAL to keep myself from giving a much explanatory answer.
The implementation of the GDAL lib is very C oriented, while it is indeed written using C++ features (e.g. classes).
So, you will find that in many of the available APIs the arguments are not referring to types defined in the library. Instead, they use typedef like the GDALRasterBandH, that are indeed alias to the void* type.
This has the nasty effect that, whatever type of argument you feed to the compiler, it won't complain, hence you will get a lot of errors at runtime if you make such (very common) mistakes.
Why they do this? I think it was a way to exploit PIMPL in order to avoid the compiler to work on several translation units, every time one of this classes was modified.
The real problem of this approach is that you lose all the benefits from static type checking, i.e. the compiler telling you when there is a type mismatch. As a side note, this is not a critics to the GDAL lib (thank God we have it!! And it is a huge and very useful project), it's just the way it is.
By the way, I'm currently working on a C++ opensource project, Rasterix, that is meant to be a user friendly GUI to GDAL tools for raster datasets. Do not take it as do it like me hint, as this app still needs proper testing, but there is a lot of GDAL APIs use cases there that may help if you need sort of examples on how to use them.

Related

call c++ function from python

I meet a question,this dll java or php can call successful,but when I use python call get access violation error,
I want to know is c++ char* is error to ctypes c_char_p, how can I use python ctypes map c++ char*
c++ dll define
#define Health_API extern "C" __declspec(dllexport)
Health_API unsigned long Initialization(int m_type,char * m_ip,int m_port)
Health_API int GetRecordCount(unsigned long m_handle)
python code
from ctypes import *
lib = cdll.LoadLibrary(r"./Health.dll")
inita = lib.Initialization
inita.argtype = (c_int, c_char_p, c_int)
inita.restype = c_ulong
getcount = lib.GetRecordCount
getcount.argtype = c_ulong
getcount.retype = c_int
# here call
handle = inita(5, '127.0.0.1'.encode(), 4000)
print(handle)
result = getcount(handle)
error info:
2675930080
Traceback (most recent call last):
File "C:/Users/JayTam/PycharmProjects/DecoratorDemo/demo.py", line 14, in <module>
print(getcount(handle))
OSError: exception: access violation reading 0x000000009F7F73E0
After modification
I find if not define restype=long , it will return negative number,this is a noteworthy problem,but I change the restype=u_int64,it cann't connect as well.
It is delightful that my shcoolmate use her computer(win7 x64) call the same dll success,use simple code as follow
from ctypes import *
lib = cdll.LoadLibrary("./Health.dll")
handle = lib.Initialization(5, '127.0.0.1'.encode(), 4000)
lib.GetRecordList(handle, '')
I am win10 x64,my python environment is annaconda,and I change the same environment as her,use the same code,still return the same error
python vs java
Although I konw it shuould be the environment cause,I really want to konw What exactly caused it?
python
each run handle is Irregular number,my classmate's pc get regular number
such as :288584672,199521248,1777824736,-607161376(I dont set restype)
this is using python call dll,cann't connect port 4000
java
also get regular number without negative number
such as :9462886128,9454193200,9458325520,9451683632
this's java call dll
so I think should be this code casue error,but in other pc no problem,it's amazing
handle = lib.Initialization(5, '127.0.0.1'.encode(), 4000)
c++ dll:
Health_API unsigned long Initialization(int m_type,char* m_ip,int m_port)
{
CHandleNode* node;
switch(m_type)
{
case BASEINFO:
{
CBaseInfo* baseInfo = new CBaseInfo;
baseInfo->InitModule(m_ip,m_port);
node = m_info.AddMCUNode((unsigned long)baseInfo);
node->SetType(m_type);
return (unsigned long)baseInfo;
}
break;
case BASEINFO_ALLERGENS:
{
CBaseInfo_Allergens* baseInfo_Allergens = new CBaseInfo_Allergens;
baseInfo_Allergens->InitModule(m_ip,m_port);
node = m_info.AddMCUNode((unsigned long)baseInfo_Allergens);
node->SetType(m_type);
return (unsigned long)baseInfo_Allergens;
}
break;
. (case too many, ... represent them)
.
.
}
return -1;
}
Health_API int GetRecordList(unsigned long m_handle,char* m_index)
{
CHandleNode* node = m_info.GetMCUNode(m_handle);
if( node == NULL)
{
return -1;
}
switch(node->GetType())
{
case BASEINFO:
{
CBaseInfo* baseInfo = (CBaseInfo*)m_handle;
return baseInfo->GetRecordList(m_index);
}
break;
case BASEINFO_ALLERGENS:
{
CBaseInfo_Allergens* baseInfo_Allergens = (CBaseInfo_Allergens *)m_handle;
return baseInfo_Allergens->GetRecordList(m_index);
}
break;
. (case too many, ... represent them)
.
.
}
return -1;
}

It may help to have the source code of the C++, but I can take some guesses
I don't think this is related, but:
getcount.retype is missing an s, it should be restype.
And I always define argtype as an array instead of a tuple, and if there is only one argument I still use an array of one element.
If the char* is a null terminated string, you need indeed to use c_char_p. And in this case it is highly recommended to add a const to the m_ip argument.
If the char* is a pointer to a raw buffer (filled or not by the calling function), which is not the case here, you need to use POINTER(c_char) and not c_char_p
Base on the error message, your handle is a pointer, you should use for that the right type in your C++ code: void*. And in your python you should use c_void_p
Is it 64 bits code ? What toolchain/compiler was used to build the DLL ? What is the size of unsigned long in this toolchain ?
Edit: Here the reason of your problem: Is Python's ctypes.c_long 64 bit on 64 bit systems?
handle is a pointer in your DLL, and was certainly built using a compiler where a long is 64 bits, but for python 64 bits on Windows a long is 32 bits, which cannot store your pointer. Your really should use a void*
First you could try to use c_void_p in your python for the handle without modifying your source code, it may works depending of the compiler used. But since the size of the long type is not defined by any standard, and may not be able to store a pointer, you really should use in your C++ code void* or uintptr_t
Edit2: If you are using visual studio, and not GCC, the size of a long is only 32 bits (Size of long vs int on x64 platform)
So you must fix the code of the DLL

python ctypes - wrap void pointer

I am wrapping a C library into Python via ctypes. At the moment I stuck on one line or more on one parameter. Here the C Code:
void* gVimbaHandleFake = (void*)1;
err = VmbFeatureBoolGet(gVimbaHandleFake, "GeVTLIsPresent", &isGigE );
The problem is this strange void pointer. In general I know what a void pointer is but this one seems to be "special". If I change the 1 in (void*)1 the program is not working anymore (it is about finding network cameras). It is not crashing but doesnt find the cameras anymore.
I tried many different things, the last tries in Python:
gVimbaHandle = cast(1, c_void_p)
err = self.dll.VmbFeatureBoolGet(byref(gVimbaHandle), "GeVTLIsPresent", byref(isGigE))
also tried the "normal" way:
gVimbaHandle = c_void_p(1)
My program isnt crashing but it tells me that the handle is invalid ...
When I looked into the pointer with gVimbaHandle.value I get 1L as output. Could this be the problem, the L for the long datatype?
Does anybody knows how to fix this or can explain the "special" (void*)1 pointer in C to me?
Thank you very much!

So the solution/answer is:
gVimbaHandle = c_void_p(1)
err = self.dll.VmbFeatureBoolGet(gVimbaHandle, "GeVTLIsPresent", byref(isGigE))
thanks #eryksun

How can I get the new name of a renamed file given its file descriptor/object? [duplicate]

Is it possible to get the filename of a file descriptor (Linux) in C?

You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.

I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.

In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.

As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).

Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.

You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?

There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.

Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Python - C embedded Segmentation fault

I am facing a problem similar to the Py_initialize / Py_Finalize not working twice with numpy .. The basic coding in C:
Py_Initialize();
import_array();
//Call a python function which imports numpy as a module
//Py_Finalize()
The program is in a loop and it gives a seg fault if the python code has numpy as one of the imported module. If I remove numpy, it works fine.
As a temporary work around I tried not to use Py_Finalize(), but that is causing huge memory leaks [ observed as the memory usage from TOP keeps on increasing ]. And I tried but did not understand the suggestion in that link I posted. Can someone please suggest the best way to finalize the call while having imports such as numpy.
Thanks
santhosh.

I recently faced a very similar issue and developed a workaround that works for my purposes, so I thought I would write it here in the hope it might help others.
The problem
I work with some postprocessing pipeline for which I can write a own functor to work on some data passing through the pipeline and I wanted to be able to use Python scripts for some of the operations.
The problem is that the only thing I can control is the functor itself, which gets instantiated and destroyed at times beyond my control. I furthermore have the problem that even if I do not call Py_Finalize the pipeline sometimes crashes once I pass another dataset through the pipeline.
The solution in a Nutshell
For those who don't want to read the whole story and get straight to the point, here's the gist of my solution:
The main idea behind my workaround is not to link against the Python library, but instead load it dynamically using dlopen and then get all the addresses of the required Python functions using dlsym. Once that's done, one can call Py_Initialize() followed by whatever you want to do with Python functions followed by a call to Py_Finalize() once you're done. Then, one can simply unload the Python library. The next time you need to use Python functions, simply repeat the steps above and Bob's your uncle.
However, if you are importing NumPy at any point between Py_Initialize and Py_Finalize, you will also need to look for all the currently loaded libraries in your program and manually unload those using dlclose.
Detailed workaround
Loading instead of linking Python
The main idea as I mentioned above is not to link against the Python library. Instead, what we will do is load the Python library dynamically using dlopen():
#include
...
void* pHandle = dlopen("/path/to/library/libpython2.7.so", RTLD_NOW | RTLD_GLOBAL);
The code above loads the Python shared library and returns a handle to it (the return type is an obscure pointer type, thus the void*). The second argument (RTLD_NOW | RTLD_GLOBAL) is there to make sure that the symbols are properly imported into the current application's scope.
Once we have a pointer to the handle of the loaded library, we can search that library for the functions it exports using the dlsym function:
#include <dlfcn.h>
...
// Typedef named 'void_func_t' which holds a pointer to a function with
// no arguments with no return type
typedef void (*void_func_t)(void);
void_func_t MyPy_Initialize = dlsym(pHandle, "Py_Initialize");
The dlsym function takes two parameters: a pointer to the handle of the library that we obtained previously and the name of the function we are looking for (in this case, Py_Initialize). Once we have the address of the function we want, we can create a function pointer and initialize it to that address. To actually call the Py_Initialize function, one would then simply write:
MyPy_Initialize();
For all the other functions provided by the Python C-API, one can just add calls to dlsym and initialize function pointers to its return value and then use those function pointers instead of the Python functions. One simply has to know the parameter and return value of the Python function in order to create the correct type of function pointer.
Once we are finished with the Python functions and call Py_Finalize using a procedure similar to the one for Py_Initialize one can unload the Python dynamic library in the following way:
dlclose(pHandle);
pHandle = NULL;
Manually unloading NumPy libraries
Unfortunately, this does not solve the segmentation fault problems that occur when importing NumPy. The problems comes from the fact that NumPy also loads some libraries using dlopen (or something equivalent) and those do not get unloaded them when you call Py_Finalize. Indeed, if you list all the loaded libraries within your program, you will notice that after closing the Python environment with Py_Finalize, followed by a call to dlclose, some NumPy libraries will remain loaded in memory.
The second part of the solution requires to list all the Python libraries that remain in memory after the call dlclose(pHandle);. Then, for each of those libraries, grab a handle to them and then call dlcloseon them. After that, they should get unloaded automatically by the operating system.
Fortunately, there are functions under both Windows and Linux (sorry MacOS, couldn't find anything that would work in your case...):
- Linux: dl_iterate_phdr
- Windows: EnumProcessModules in conjunction with OpenProcess and GetModuleFileNameEx
Linux
This is rather straight forward once you read the documentation about dl_iterate_phdr:
#include <link.h>
#include <string>
#include <vector>
// global variables are evil!!! but this is just for demonstration purposes...
std::vector<std::string> loaded_libraries;
// callback function that gets called for every loaded libraries that
// dl_iterate_phdr finds
int dl_list_callback(struct dl_phdr_info *info, size_t, void *)
{
loaded_libraries.push_back(info->dlpi_name);
return 0;
}
int main()
{
...
loaded_libraries.clear();
dl_iterate_phdr(dl_list_callback, NULL);
// loaded_libraries now contains a list of all dynamic libraries loaded
// in your program
....
}
Basically, the function dl_iterate_phdr cycles through all the loaded libraries (in the reverse order they were loaded) until either the callback returns something other than 0 or it reaches the end of the list. To save the list, the callback simply adds each element to a global std::vector (one should obviously avoid global variables and use a class for example).
Windows
Under Windows, things get a little more complicated, but still manageable:
#include <windows.h>
#include <psapi.h>
std::vector<std::string> list_loaded_libraries()
{
std::vector<std::string> m_asDllList;
HANDLE hProcess(OpenProcess(PROCESS_QUERY_INFORMATION
| PROCESS_VM_READ,
FALSE, GetCurrentProcessId()));
if (hProcess) {
HMODULE hMods[1024];
DWORD cbNeeded;
if (EnumProcessModules(hProcess, hMods, sizeof(hMods), &cbNeeded)) {
const DWORD SIZE(cbNeeded / sizeof(HMODULE));
for (DWORD i(0); i < SIZE; ++i) {
TCHAR szModName[MAX_PATH];
// Get the full path to the module file.
if (GetModuleFileNameEx(hProcess,
hMods[i],
szModName,
sizeof(szModName) / sizeof(TCHAR))) {
#ifdef UNICODE
std::wstring wStr(szModName);
std::string tModuleName(wStr.begin(), wStr.end());
#else
std::string tModuleName(szModName);
#endif /* UNICODE */
if (tModuleName.substr(tModuleName.size()-3) == "dll") {
m_asDllList.push_back(tModuleName);
}
}
}
}
CloseHandle(hProcess);
}
return m_asDllList;
}
The code in this case is slightly longer than for the Linux case, but the main idea is the same: list all the loaded libraries and save them into a std::vector. Don't forget to also link your program to the Psapi.lib!
Manual unloading
Now that we can list all the loaded libraries, all you need to do is find among those the ones that come from loading NumPy, grab a handle to them and then call dlclose on that handle. The code below will work on both Windows and Linux, provided that you use the dlfcn-win32 library.
#ifdef WIN32
# include <windows.h>
# include <psapi.h>
# include "dlfcn_win32.h"
#else
# include <dlfcn.h>
# include <link.h> // for dl_iterate_phdr
#endif /* WIN32 */
#include <string>
#include <vector>
// Function that list all loaded libraries (not implemented here)
std::vector<std::string> list_loaded_libraries();
int main()
{
// do some preprocessing stuff...
// store the list of loaded libraries now
// any libraries that get added to the list from now on must be Python
// libraries
std::vector<std::string> loaded_libraries(list_loaded_libraries());
std::size_t start_idx(loaded_libraries.size());
void* pHandle = dlopen("/path/to/library/libpython2.7.so", RTLD_NOW | RTLD_GLOBAL);
// Not implemented here: get the addresses of the Python function you need
MyPy_Initialize(); // Needs to be defined somewhere above!
MyPyRun_SimpleString("import numpy"); // Needs to be defined somewhere above!
// ...
MyPyFinalize(); // Needs to be defined somewhere above!
// Now list the loaded libraries again and start manually unloading them
// starting from the end
loaded_libraries = list_loaded_libraries();
// NB: this below assumes that start_idx != 0, which should always hold true
for(std::size_t i(loaded_libraries.size()-1) ; i >= start_idx ; --i) {
void* pHandle = dlopen(loaded_libraries[i].c_str(),
#ifdef WIN32
RTLD_NOW // no support for RTLD_NOLOAD
#else
RTLD_NOW|RTLD_NOLOAD
#endif /* WIN32 */
);
if (pHandle) {
const unsigned int Nmax(50); // Avoid getting stuck in an infinite loop
for (unsigned int j(0) ; j < Nmax && !dlclose(pHandle) ; ++j);
}
}
}
Final words
The examples shown here capture the basic ideas behind my solution, but can certainly be improved to avoid global variables and facilitate ease of use (for example, I wrote a singleton class that handles the automatic initialization of all the function pointers after loading the Python library).
I hope this can be useful to someone in the future.
References
dl_iterate_phdr: https://linux.die.net/man/3/dl_iterate_phdr
PsAPI library: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684894(v=vs.85).aspx
OpenProcess: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684320(v=vs.85).aspx
EnumProcess: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682629(v=vs.85).aspx
GetModuleFileNameEx: https://msdn.microsoft.com/en-us/library/windows/desktop/ms683198(v=vs.85).aspx
dlfcn-win32 library: library: https://github.com/dlfcn-win32/dlfcn-win32

I'm not quite sure how you don't seem to understand the solution posted in Py_initialize / Py_Finalize not working twice with numpy. The solution posted is quite simple: call Py_Initialize and Py_Finalize only once for each time your program executes. Do not call them every time you run the loop.
I assume that your program, when it starts, runs some initialization commands (which are only run once). Call Py_Initialize there. Never call it again. Also, I assume that when your program terminates, it has some code to tear down things, dump log files, etc. Call Py_Finalize there. Py_Initialize and Py_Finalize are not intended to help you manage memory in the Python interpreter. Do not use them for that, as they cause your program to crash. Instead, use Python's own functions to get rid of objects you don't want to keep.
If you really MUST create a new environment every time you run your code, you can use Py_NewInterpreter and to create a sub-interpreter and Py_EndInterpreter to destroy that sub-interpreter later. They're documented near the bottom of the Python C API page. This works similarly to having a new interpreter, except that modules are not re-initialized each time a sub-interpreter starts.

How important is it to check return values when using the Python C API?

It seems that everytime I call a function that returns a PyObject*, I have to add four lines of error checking. Example:
py_fullname = PyObject_CallMethod(os, "path.join", "ss", folder, filename);
if (!py_fullname) {
Py_DECREF(pygame);
Py_DECREF(os);
return NULL;
}
image = PyObject_CallMethodObjArgs(pygame, "image.load", py_fullname, NULL);
Py_DECREF(py_fullname);
if (!image) {
Py_DECREF(pygame);
Py_DECREF(os);
return NULL;
}
image = PyObject_CallMethodObjArgs(image, "convert", NULL);
if (!image) {
Py_DECREF(pygame);
Py_DECREF(os);
return NULL;
}
Am I missing something? Is there a better way to do this? This has the added problem that I may forget all the stuff I'm supposed to Py_DECREF().

That's why goto is alive (though not entirely well;-) in C coding (as opposed to C++ and other languages that support exceptions): it's the only decent way to NOT have such repeated blocks of termination all over the mainline of your code -- a conditional forward jump to errorexit at each return-value check, with a label errorexit: that does the decrefs (and file closes, and whatever else you need to do at termination) and the return NULL.

Here are two ways I might write that code, influenced by my experience writing in two heavily macro-ised pseudo-assembler languages, one of which wasn't C. I've moved the deref of fullname, not because it's wrong in your code, but because I want to demonstrate how you handle a longer-lived resource in both schemes. So imagine that "fullname" is going to be needed again later in the routine:
Arrow code
result = NULL;
py_fullname = PyObject_CallMethod(os, "path.join", "ss", folder, filename);
if (py_fullname) {
image = PyObject_CallMethodObjArgs(pygame, "image.load", py_fullname, NULL);
if (image) {
image = PyObject_CallMethodObjArgs(image, "convert", NULL);
result = // something to do with image, presumably.
}
Py_DECREF(py_fullname);
}
Py_DECREF(pygame);
Py_DECREF(os);
return result;
The way this game is played, is that whenever you call a function which returns a resource, you check the return value immediately (or perhaps after freeing some resource which is no longer required, as in your example code), and the block corresponding to a successful call must either release the resource, or assign it to a return value, or actually return it, before the block is exited. This will usually be either in the second line of the block, after it has been used in the first line, or else in the last line of the block.
It's called "arrow code" because if you make 5 or 6 such calls in a function, you end up with 5 or 6 levels of indentation, and your function looks like a "turn right" sign. When this happens you either refactor, or you go against your every Pythonic instinct, use tabs for indentation, and reduce the tab stops ;-)
Goto
result = NULL;
py_fullname = PyObject_CallMethod(os, "path.join", "ss", folder, filename);
if (!py_fullname) goto cleanup_pygame
image = PyObject_CallMethodObjArgs(pygame, "image.load", py_fullname, NULL);
if (!image) goto cleanup_fullname
image = PyObject_CallMethodObjArgs(image, "convert", NULL);
result = // something to do with image, presumably.
cleanup_fullname:
Py_DECREF(py_fullname);
cleanup_pygame:
Py_DECREF(pygame);
Py_DECREF(os);
return result;
This goto code is structurally identical to the arrow code, just less indented and easier to mess up and jump to the wrong label. In some circumstances you will clean up different resources on success from what you clean up on failure (for instance if you're constructing and returning something, then on failure you need to clean up whatever you've done so far, but on success you only clean up what you're not returning). Those are the circumstances where the goto code is a clear win over the arrow code, because you can have separate cleanup paths for the two cases, but they still look the same, appear together at the end of the routine, and perhaps even share code. So you might end up with something like this:
result = NULL;
helper = allocate_something;
if (!helper) goto return_result;
result = allocate_something_else;
if (!result) goto error_return; // OK, result is already NULL, but it makes the point
result->contents = allocate_another_thing;
if (!result->contents) goto error_cleanup_result;
result->othercontents = allocate_last_thing;
if (!result->othercontents) goto error_cleanup_contents;
free_helper:
free(helper);
return_result:
return result;
error_cleanup_contents:
free(result->contents);
error_cleanup_result:
free(result);
error_return;
result = NULL;
goto free_helper;
Yes, it's horrible, and Python or C++ programmers will be physically sick at the sight of it. If I never have to write code like this again, I won't be all that disappointed. But as long as you have a systematic scheme for how resources are cleaned up, you should always know exactly what error label to jump to when something goes wrong, and that error label should "know" to clean up all resources that have been allocated so far. Doing it in reverse order allows fall-through to share the code. And once you're used to it, it's reasonably easy to do two things: first follow the path from any given error label, to the exit, and confirm that everything which should be freed is freed. Second, see the difference between two error cases, and confirm that this is the correct difference between the error-handling needed, because the difference is precisely to free the thing that was allocated in between the jumps to those labels.
That said, a semi-decent optimising compiler will common up the code for the error cases in your example. It's just easier to make a mistake when you have code copy-and-pasted about the place like that, especially when you modify it later.

This is the C API. If you code in C only, you may have to live with it, but if your application is coded in C++, you might want a take a look at a C++/Python wrapper.

This is one of the reasons why C++ introduces exception handling and RAII. Assuming you can use C++, you can create a function that invokes the C function, tests the result, and throws an exception if an error occurred. That way, you can call the wrapper without any checks... if an error occurs, it will throw the exception. However, there is no need to reinvent the wheel, check out the Boost.Python library.

While I don't see it often I consider this a good solution for C programmers ("the goto with tie"):
result = NULL;
// make sure all variables are initialized
do
{
py_fullname = PyObject_CallMethod(os, "path.join", "ss", folder, filename);
if (!py_fullname)
{
// some additional error handling here
// write a trace message with __FILE__ and __LINE__
break;
}
image = PyObject_CallMethodObjArgs(pygame, "image.load", py_fullname, NULL);
if (!image)
{
// some additional error handling here
break;
}
image = PyObject_CallMethodObjArgs(image, "convert", NULL);
result = // something to do with image, presumably.
} while (true);
if (py_fullname)
Py_DECREF(py_fullname);
if (pygame)
Py_DECREF(pygame);
if (os)
Py_DECREF(os);
return result;
There are several advantages:
Only one exit in the function - you can set a breakpoint at the end and be sure it works
Cleanup code is not repeated
No cascaded ifs
Error checking is done as early as possible and points exactly to the problem (trace message)
Initializing all variables is a good idea anyhow so why not use it in the cleanup section?
I recommend to write some macros in order to unify the code:
#define CATCH_BEGIN do {
#define CATCH_END } while (1!=1);
#define CLEANUP_VOID(function,var) {if (var != NULL) { function(var); var = NULL;}}
This allows to cleanup at the end like this:
CLEANUP_VOID(Py_DECREF, py_fullname)
CLEANUP_VOID(Py_DECREF, pygame)
CLEANUP_VOID(Py_DECREF, os)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.