I have recently bumped into a swig limitation related to the size of C++ std::string.
I have some C++ code returning a pair. I noticed that when the size of the string in the pair is smaller that 2*1024*1024*1024-64 (2GB) the pair properly returns and the string is mapped to a python native string. However if the string is greater than 2GB, then in python the string is not mapped anymore to a native python string. For example using the code below, and mapping through swig to python you can reproduce my error.
Environment:: SWIG Version 3.0.8, Ubuntu 16.04.3 LTS, g++ 5.4.0; Python 2.7.12
/////////// bridge.h
#include <vector>
#include <utility>
#include <string>
#include <iostream>
#include <fstream>
using namespace std;
pair<int, string> large_string(long sz);
long size_pstring(pair<int,string>& p);
void print_pstring(pair<int,string>& p);
string save_pstring(pair<int,string>& p);
//////////bridge.cc
#include "bridge.h"
pair<int, string> large_string(long sz){
pair<int, string> pis;
pis.first=20;
pis.second=string(sz,'A');
return pis;
}
long size_pstring(pair<int,string>& p){
return p.second.size();
}
void print_pstring(pair<int,string>& p){
cout<<"PSTRING: first="<<p.first<<" second.SZ="<<p.second.size()<<"\n";
cout<<"First 100 chars: \n"<<p.second.substr(0,100)<<"\n";
}
string save_pstring(pair<int,string>& p){
string fname="aloe.txt";
std::ofstream ofile(fname.c_str());
ofile<<p.second;
ofile.close();
return fname;
}
////////// bridge.i
%module graphdb
%include stl.i
%include "std_vector.i"
%{
#include "bridge.h"
%}
%include "bridge.h"
namespace std {
%template(p_string) pair<int,string>;
};
//////// makefile
all:
swig -c++ -python bridge.i
g++ -std=c++11 -fpic -c bridge.cc bridge_wrap.cxx -I/usr/include/python2.7/
g++ -shared *.o -o _graphdb.so
Bellow I include a session in python showing that it is probably just a matter of how string is mapped and that most probably an int rather long is used to represent the size of string in swig bridge code.
>>> s=graphdb.large_string(12)
>>> print s
(20, 'AAAAAAAAAAAA')
>>> s=graphdb.large_string(2*1024*1024*1024)
>>> print s
(20, <Swig Object of type 'char *' at 0x7fd4205a6090>)
>>> l=graphdb.size_pstring(s)
>>> print l
2147483648
>>> fname = graphdb.save_pstring(s)
Saving the string to a file is correct and next I can load the file to a python string correctly.
So my question: does anybody know what swig config option I should change to allow large strings to be properly mapped to native python ?
--Thx
Related
I have this little exploitable file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// gcc -z execstack -z norelro -fno-stack-protector -o format0 format0.c
int target;
void vuln(char *string)
{
printf(string);
if (target){
printf("Tyes yes eys");
}
}
int main(int argc, char **argv)
{
vuln(argv[1]);
return 0;
}
It's very simple, I compile like this:
gcc file.c -o file -no-pie
and then I run it like this get it to leak some values:
./file %x
38b3fda8
Which works prefectly.
But I want to automate this a bit, using python. So I try the following:
$ ./form &(python -c "print('%x'*3)")
[1] 30633
%x%x%x
[1]+ Done ./form
and this looks super weird. Firstly, the string format error is not triggered. Then it prints it's own name and some other random stuff.
I also tried doing this in gdb, with the same result.
How do I give input with python like every other tutorial online?
I think you meant:
./form $(python -c "print('%x'*3)")
What ./form &(python -c "print('%x'*3)")
does is:
/form &
(python -c "print('%x'*3)")
i.e. form is run in the background. (Process 30633) in your example.
Python is run in the foreground in a subshell. (And prints out %x%x%x to your terminal)
I'm trying to call a C++ function from my Python code, if I pass a Boolean or an int it works perfectly, but if I send a string, it only prints the first character.
I am compiling with:
g++ -c -fPIC foo.cpp -Wextra -Wall -o foo.o
g++ -shared -Wl,-soname,libfoo.so -o libfoo.so foo.o
python3 fooWrapper.py
Here is the C++ and Python code:
Python:
from ctypes import cdll
lib = cdll.LoadLibrary("./libfoo.so")
lib.Foo_bar("hello")
c++:
#include <iostream>
#include <string>
#include <unistd.h>
void bar(char* string){
printf("%s", string);
}
extern "C" {
void Foo_bar(char* aString){
bar(aString);
}
}
I'm aware of the Boost Library, but i couldn't manage to download it, and this way works well excepts for strings.
Thank you for your help
The problem is that strings are passed as pointers to wchar_t wide characters in Python 3. And in little-endian system your string can be coded in binary as
"h\0\0\0e\0\0\0l\0\0\0l\0\0\0o\0\0\0\0\0\0\0"
Which, when printed with %s will stop at the first null terminator.
For UTF-8-encoded byte strings (char *) you need a bytes object. For example:
lib.Foo_bar("hello".encode())
or use bytes literals:
lib.Foo_bar(b"hello")
Even better if you had specified the correct argument types:
from ctypes import cdll, c_char_p
foo_bar = cdll.LoadLibrary("./libfoo.so").Foo_bar
foo_bar.argtypes = [c_char_p]
foo_bar(b"hello\n")
foo_bar("hello\n")
when run will output the following:
hello
Traceback (most recent call last):
File "foo.py", line 5, in <module>
foo_bar("hello\n")
ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type
i.e. the latter call that uses a string instead of bytes would throw.
You may also process Python3 strings in C++ directly using the wchar_t type. In that case, you need to do any necessary conversions in C++ like this:
#include <iostream>
#include <locale>
#include <codecvt>
void bar(wchar_t const* aString)
{
// Kudos: https://stackoverflow.com/a/18374698
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;
std::cout << convert.to_bytes(aString) << std::endl;
}
extern "C" {
void Foo_bar(wchar_t const* aString)
{
bar(aString);
}
}
You will lose Python2 compatibility, however.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I'm encountering some problems in swigging a c++ function to python. My function is contained in my header file:
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <cmath>
#include <math.h>
#include <complex>
#include <tgmath.h>
#include <iostream>
#include <Eigen/Core>
// Classes used in the wrapped function DLM
#include "Param.h"
#include "Wingsegment.h"
#include "Modes.h"
#include "IPS.h"
#include "Dlattice.h"
#include "Strmesh.h"
#include "Aeromesh.h"
void DLM( std::string DLM_inputfile, std::vector<double>
&Qhh_std_real, std::vector<double> &Qhh_std_imag,
std::vector<double> &omegavector_std, int &nModes, int
&nOmega );
std::string return_format_modes() ;
Just for clarification I'll insert some part of the DLM.cpp code:
#include "DLM.h"
std::string format_modes; // Global variable
void DLM( std::string DLM_inputfile, std::vector<double>
&Qhh_std_real, std::vector<double> &Qhh_std_imag,
std::vector<double> &omegavector_std, int &nModes, int
&nOmega )
{
const std::string config_file_name = DLM_inputfile;
Param PARAM;
PARAM.read_DLM_input( config_file_name);
PARAM.read_General();
PARAM.read_Aerogeneral(); //<-- here apparently the issue
...
};
Given this c++ function, here is the file for SWIG pyDLM_Cpp.i:
%module pyDLM_Cpp
%{
/* Every thing in this file is being copied in
wrapper file. We include the C header file necessary
to compile the interface */
#include "./dlm/DLM.h"
#include "./dlm/Param.h"
%}
%include "std_vector.i";
%include "std_string.i";
namespace std {
%template(DoubleVector) vector<double>;
};
%include "./dlm/DLM.h";
%include "./dlm/Param.h";
the Makefile I use (which looks like working file is):
EIGEN_PATH = path/to/eigen
INCLPATH = -I$(EIGEN_PATH)
all:
swig -c++ -python -Wall pyDLM_Cpp.i
g++ -O2 -c -std=gnu++11 -fPIC ./dlm/DLM.cpp
$(INCLPATH)
g++ -O2 -c -std=gnu++11 -fPIC pyDLM_Cpp_wrap.cxx -
I/usr/include/python2.7 $(INCLPATH)
g++ -O2 -shared -fPIC DLM.o pyDLM_Cpp_wrap.o -o
_pyDLM_Cpp.so
Now, when I try to import my function into python the error I get is:
ImportError: ./_pyDLM_Cpp.so: undefined symbol:
_ZN5Param16read_AerogeneralEv
Now, the function Param::read_Aerogeneral() is declared in Param.h as member of the object Param defined in the first lines of the DLM.cpp file and is not directly used in Python which calls only the function DLM in the DLM.h file, so I don't get why this particular issue. Plus, I saw online many similar problems but none of the proposed solutions worked.
Anyone can help on how to overcome this issue?
Thanks in advance
PS: the code uses internally the library Eigen as it can be seen in the different proposed files.
The dynamic linker complains that the symbol _ZN5Param16read_AerogeneralEv is not defined.
You are convinced that it is defined in object file DLM.o.
Please check whether it is actually defined in that object file with
nm DLM.o | grep _ZN5Param16read_AerogeneralEv
If you see an entry starting with T, then it is defined in this file. If you only see an entry starting with U, or no entry at all, then it is not defined in this file.
If it is defined, try reordering the object files on the linker command line (let DLM.o be the last object).
It is more likely that the symbol is actually not defined there. You need to investigate why this is the case and fix it.
T. Herzke's answer was actually helpful. Found out that the symbol in question _ZN5Param16read_AerogeneralEv is only defined in Param.o which is built if specified as following in the makefile:
g++ -O2 -c -std=gnu++11 -fPIC ./dlm/DLM.cpp ./dlm/Param.cpp
$(INCLPATH) # need to add the Param.cpp
And then add the Param.o to the _pyDLM_Cpp.so building:
g++ -O2 -shared -fPIC DLM.o Param.o pyDLM_Cpp_wrap.o -o
_pyDLM_Cpp.so # need to add the Param.o
Building the interface this way doesn't give any error when imported into my python routine.
I am developing a C++ library in which SWIG is used to generate its Python wrapper. Some of my C++ files use <inittypes.h> to call PRId64 and other macros in sprintf.
I was able to compile my library with Python 2.6 and GCC 4.4.7 on Scientific Linux 6 (RHEL6 clone), but Python 2.7 and GCC 4.8.2 on Scientific Linux 7 (RHEL7 clone) generated many errors like below.
/home/oxon/libTARGET/inc/target/T2EvalBoard.h:562:145: warning: too many arguments for format [-Wformat-extra-args]
In file included from /home/oxon/libTARGET_build/src/targetPYTHON_wrap.cxx:3117:0:
/home/oxon/libTARGET/inc/target/BaseCameraModule.h: In member function ‘virtual void TARGET::BaseCameraModule::ReceiveEvent(uint32_t&, uint8_t**)’:
/home/oxon/libTARGET/inc/target/BaseCameraModule.h:211:66: error: expected ‘)’ before ‘PRIu32’
sprintf(str, "Cannot read event data. Requested length is %" PRIu32 " bytes, but only %" PRId64 " bytes were read.", length, fBytesReturned);
I know that I have to add the following lines in header files first in order to use PRId64 and other.
#define __STDC_FORMAT_MACROS
#include <inttypes.h>
But targetPYTHON_wrap.cxx, which is a source file generated by SWIG, includes <Python.h> in the beginning of the file, and so the above lines are ignored. Indeed, the following code cannot be compiled, because <Python.h> includes <inttypes.h> in it.
#include <Python.h>
#define __STDC_FORMAT_MACROS
#include <inttypes.h>
#include <stdio.h>
int main()
{
printf("Output: " PRIu32 "\n", 100);
return 0;
}
How do I use PRId64 and other macros with <Python.h> and SWIG?
In SWIG, the following adds lines to the very top of the SWIG wrapper, so it will be defined before Python.h:
%begin %{
#define __STDC_FORMAT_MACROS
#include <inttypes.h>
%}
I added -D__STDC_FORMAT_MACROS in CXX_FLAGS, but looking for a better solution if exists.
I am building a R extension which has an embedded python in it.
Everything goes well now, except that the python cannot find the encoding I needed. It keeps throwing LookupError when I do something involving 'big5'. However, if I build a stand alone c++ application, the python interpreter does find the encoding and stop throwing errors.
test.cpp for normal stand alone example in c:
#include <Python.h>
int main(int argc, char* argv[]) {
Py_SetProgramName("test"); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString(
"import codecs\n"
"f = codecs.open('big5_encoded_file', encoding='big5', mode='r')"
);
Py_Finalize();
return 0;
}
testr.cpp for R extension:
#include <R.h>
#include <Rdefines.h>
#include <Python.h>
extern "C" SEXP testpy();
SEXP testpy() {
Py_SetProgramName("test"); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString(
"import codecs\n"
"f = codecs.open('big5_encoded_file', encoding='big5', mode='r')"
);
Py_Finalize();
return R_NilValue;
}
A Makefile on ubuntu 12.10:
all: test testr.so
test: test.cpp
g++ test.cpp -o test -I/usr/include/python2.7 -lpython2.7
testr.so: testr.cpp
R CMD SHLIB testr.cpp
The ./test runs normally, but Rscript -e "dyn.load('testr.so');.Call('testpy')" produces a "LookupError: unknown encoding: big5"
Thanks
-- edit --
To build the testr.so, please set:
export PKG_CXXFLAGS=-I/usr/include/python2.7
export PKG_LIBS=-lpython2.7
I notice that it is a linking issue.
I tried to import encodings.big5 in the embedded python, but the error of undefined reference occurred. The solution in http://bugs.python.org/issue4434 works for me:
before PyInitialize() I can call dlopen("libpython2.7.so", RTLD_LAZY | RTLD_GLOBAL);