LookupError: unknown encoding 'big5' under embeded python - python

I am building a R extension which has an embedded python in it.
Everything goes well now, except that the python cannot find the encoding I needed. It keeps throwing LookupError when I do something involving 'big5'. However, if I build a stand alone c++ application, the python interpreter does find the encoding and stop throwing errors.
test.cpp for normal stand alone example in c:
#include <Python.h>
int main(int argc, char* argv[]) {
Py_SetProgramName("test"); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString(
"import codecs\n"
"f = codecs.open('big5_encoded_file', encoding='big5', mode='r')"
);
Py_Finalize();
return 0;
}
testr.cpp for R extension:
#include <R.h>
#include <Rdefines.h>
#include <Python.h>
extern "C" SEXP testpy();
SEXP testpy() {
Py_SetProgramName("test"); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString(
"import codecs\n"
"f = codecs.open('big5_encoded_file', encoding='big5', mode='r')"
);
Py_Finalize();
return R_NilValue;
}
A Makefile on ubuntu 12.10:
all: test testr.so
test: test.cpp
g++ test.cpp -o test -I/usr/include/python2.7 -lpython2.7
testr.so: testr.cpp
R CMD SHLIB testr.cpp
The ./test runs normally, but Rscript -e "dyn.load('testr.so');.Call('testpy')" produces a "LookupError: unknown encoding: big5"
Thanks
-- edit --
To build the testr.so, please set:
export PKG_CXXFLAGS=-I/usr/include/python2.7
export PKG_LIBS=-lpython2.7

I notice that it is a linking issue.
I tried to import encodings.big5 in the embedded python, but the error of undefined reference occurred. The solution in http://bugs.python.org/issue4434 works for me:
before PyInitialize() I can call dlopen("libpython2.7.so", RTLD_LAZY | RTLD_GLOBAL);

Related

Writing input to c executable manually trigger exploit, but python input does not

I have this little exploitable file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// gcc -z execstack -z norelro -fno-stack-protector -o format0 format0.c
int target;
void vuln(char *string)
{
printf(string);
if (target){
printf("Tyes yes eys");
}
}
int main(int argc, char **argv)
{
vuln(argv[1]);
return 0;
}
It's very simple, I compile like this:
gcc file.c -o file -no-pie
and then I run it like this get it to leak some values:
./file %x
38b3fda8
Which works prefectly.
But I want to automate this a bit, using python. So I try the following:
$ ./form &(python -c "print('%x'*3)")
[1] 30633
%x%x%x
[1]+ Done ./form
and this looks super weird. Firstly, the string format error is not triggered. Then it prints it's own name and some other random stuff.
I also tried doing this in gdb, with the same result.
How do I give input with python like every other tutorial online?
I think you meant:
./form $(python -c "print('%x'*3)")
What ./form &(python -c "print('%x'*3)")
does is:
/form &
(python -c "print('%x'*3)")
i.e. form is run in the background. (Process 30633) in your example.
Python is run in the foreground in a subshell. (And prints out %x%x%x to your terminal)

Calling c++ function from python

I'm trying to call a C++ function from my Python code, if I pass a Boolean or an int it works perfectly, but if I send a string, it only prints the first character.
I am compiling with:
g++ -c -fPIC foo.cpp -Wextra -Wall -o foo.o
g++ -shared -Wl,-soname,libfoo.so -o libfoo.so foo.o
python3 fooWrapper.py
Here is the C++ and Python code:
Python:
from ctypes import cdll
lib = cdll.LoadLibrary("./libfoo.so")
lib.Foo_bar("hello")
c++:
#include <iostream>
#include <string>
#include <unistd.h>
void bar(char* string){
printf("%s", string);
}
extern "C" {
void Foo_bar(char* aString){
bar(aString);
}
}
I'm aware of the Boost Library, but i couldn't manage to download it, and this way works well excepts for strings.
Thank you for your help
The problem is that strings are passed as pointers to wchar_t wide characters in Python 3. And in little-endian system your string can be coded in binary as
"h\0\0\0e\0\0\0l\0\0\0l\0\0\0o\0\0\0\0\0\0\0"
Which, when printed with %s will stop at the first null terminator.
For UTF-8-encoded byte strings (char *) you need a bytes object. For example:
lib.Foo_bar("hello".encode())
or use bytes literals:
lib.Foo_bar(b"hello")
Even better if you had specified the correct argument types:
from ctypes import cdll, c_char_p
foo_bar = cdll.LoadLibrary("./libfoo.so").Foo_bar
foo_bar.argtypes = [c_char_p]
foo_bar(b"hello\n")
foo_bar("hello\n")
when run will output the following:
hello
Traceback (most recent call last):
File "foo.py", line 5, in <module>
foo_bar("hello\n")
ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type
i.e. the latter call that uses a string instead of bytes would throw.
You may also process Python3 strings in C++ directly using the wchar_t type. In that case, you need to do any necessary conversions in C++ like this:
#include <iostream>
#include <locale>
#include <codecvt>
void bar(wchar_t const* aString)
{
// Kudos: https://stackoverflow.com/a/18374698
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;
std::cout << convert.to_bytes(aString) << std::endl;
}
extern "C" {
void Foo_bar(wchar_t const* aString)
{
bar(aString);
}
}
You will lose Python2 compatibility, however.

C++ string to python limitation using swig

I have recently bumped into a swig limitation related to the size of C++ std::string.
I have some C++ code returning a pair. I noticed that when the size of the string in the pair is smaller that 2*1024*1024*1024-64 (2GB) the pair properly returns and the string is mapped to a python native string. However if the string is greater than 2GB, then in python the string is not mapped anymore to a native python string. For example using the code below, and mapping through swig to python you can reproduce my error.
Environment:: SWIG Version 3.0.8, Ubuntu 16.04.3 LTS, g++ 5.4.0; Python 2.7.12
/////////// bridge.h
#include <vector>
#include <utility>
#include <string>
#include <iostream>
#include <fstream>
using namespace std;
pair<int, string> large_string(long sz);
long size_pstring(pair<int,string>& p);
void print_pstring(pair<int,string>& p);
string save_pstring(pair<int,string>& p);
//////////bridge.cc
#include "bridge.h"
pair<int, string> large_string(long sz){
pair<int, string> pis;
pis.first=20;
pis.second=string(sz,'A');
return pis;
}
long size_pstring(pair<int,string>& p){
return p.second.size();
}
void print_pstring(pair<int,string>& p){
cout<<"PSTRING: first="<<p.first<<" second.SZ="<<p.second.size()<<"\n";
cout<<"First 100 chars: \n"<<p.second.substr(0,100)<<"\n";
}
string save_pstring(pair<int,string>& p){
string fname="aloe.txt";
std::ofstream ofile(fname.c_str());
ofile<<p.second;
ofile.close();
return fname;
}
////////// bridge.i
%module graphdb
%include stl.i
%include "std_vector.i"
%{
#include "bridge.h"
%}
%include "bridge.h"
namespace std {
%template(p_string) pair<int,string>;
};
//////// makefile
all:
swig -c++ -python bridge.i
g++ -std=c++11 -fpic -c bridge.cc bridge_wrap.cxx -I/usr/include/python2.7/
g++ -shared *.o -o _graphdb.so
Bellow I include a session in python showing that it is probably just a matter of how string is mapped and that most probably an int rather long is used to represent the size of string in swig bridge code.
>>> s=graphdb.large_string(12)
>>> print s
(20, 'AAAAAAAAAAAA')
>>> s=graphdb.large_string(2*1024*1024*1024)
>>> print s
(20, <Swig Object of type 'char *' at 0x7fd4205a6090>)
>>> l=graphdb.size_pstring(s)
>>> print l
2147483648
>>> fname = graphdb.save_pstring(s)
Saving the string to a file is correct and next I can load the file to a python string correctly.
So my question: does anybody know what swig config option I should change to allow large strings to be properly mapped to native python ?
--Thx

Enter some parameter to program with python

If I have source code like this
#include <stdio.h> //test
int main(void)
{
int tmp;
scanf("%d", &tmp);
printf("%d\n", tmp);
}
I know I can give parameter with python like (python -c 'print "1234"';cat) | ./test
But I have problem with other case. For example if program get integer with scanf and string with read.
#include <stdio.h> //test
#include <unistd.h>
int main(void)
{
int tmp1=0;
char tmp2[100]={0};
scanf("%d", &tmp1);
read(0, tmp2, 100);
printf("%d %s\n", tmp1, tmp2);
}
I tried like this (python -c 'print "134\n"+"Hello World\n"';cat) | ./test I think result may be 134 Hello World However result was just 134. I can't input string with this method.
I can't find other way to solve this problem. Is there any method to solve this problem?
I'm using x64 Ubuntu 16.04 LTS and compile option was -o test -m32 test.c

Creating a DLL from a wrapped cpp file with SWIG

I am in the process of learning how to use SWIG on Windows.
The following is my c++ code:
/* File : example.cxx */
#include "example.h"
#define M_PI 3.14159265358979323846
/* Move the shape to a new location */
void Shape::move(double dx, double dy) {
x += dx;
y += dy;
}
int Shape::nshapes = 0;
double Circle::area(void) {
return M_PI*radius*radius;
}
double Circle::perimeter(void) {
return 2*M_PI*radius;
}
double Square::area(void) {
return width*width;
}
double Square::perimeter(void) {
return 4*width;
}
This is my header file:
/* File : example.h */
class Shape {
public:
Shape() {
nshapes++;
}
virtual ~Shape() {
nshapes--;
};
double x, y;
void move(double dx, double dy);
virtual double area(void) = 0;
virtual double perimeter(void) = 0;
static int nshapes;
};
class Circle : public Shape {
private:
double radius;
public:
Circle(double r) : radius(r) { };
virtual double area(void);
virtual double perimeter(void);
};
class Square : public Shape {
private:
double width;
public:
Square(double w) : width(w) { };
virtual double area(void);
virtual double perimeter(void);
};
This is my interface file:
/* File : example.i */
%module example
%{
#include "example.h"
%}
%include "example.h"
I have managed to wrap my c++ code with the following command in Cygwin using SWIG:
$swig -c++ -python -o example_wrap.cpp example.i
My question is, how do I create a DLL from this point forward using the generated code (example_wrap.cpp)? Any ideas?
I tried creating a DLL with Visual Studio C++ 2010 but I get the build error:
LINK : fatal error LNK1104: cannot open file 'python27_d.lib
I'm fairly new to using SWIG so any help would be greatly appreciated.
Thanks!
add MS_NO_COREDLL definition at Configuration Properties->C/C++->Preprocessor->Preprocessor Definitions;
or add #define MS_NO_COREDLL line before including python.h.
#define MS_NO_COREDLL
#include <Python.h>
If you look in the libs directory of your Python installation I suspect you will find a python27.lib and not a python27_d.lib. I believe that the _d.lib is the debug version of the Python library and your Python installation didn't include it. Elsewhere I've seen it suggested that the simplest way around this is to download the Python sources and build the release and debug versions yourself but I've never tried this. Alternatively change you build to use the release version of the Python .lib. You should be able to debug your own code but not the Python code then.
The problem seems to be that, for unknown reasons, the file pyconfig.h FORCES the use of a specifically named .lib file. OUCH! Frankly, this looks like a bug to me - let the programmer specify what .lib file to use! Don't force it!
In the code below, you could simply #ifdef 0 the entire thing, or rename "python27_d" to
"python".
Anyway, here is the offensive code from pyconfig.h:
/* For an MSVC DLL, we can nominate the .lib files used by extensions
*/
#ifdef MS_COREDLL
# ifndef Py_BUILD_CORE /* not building the core - must be an ext */
# if defined(_MSC_VER) /* So MSVC users need not specify the .lib file in their Makefile (other compilers are generally taken care of by distutils.) */
# ifdef _DEBUG
# pragma comment(lib,"python27_d.lib")
# else
# pragma comment(lib,"python27.lib")
# endif /* _DEBUG */
# endif /* _MSC_VER */
# endif /* Py_BUILD_CORE */
#endif /* MS_COREDLL */
SWIG (at least on v3.0) generates the python.h inclusion in the wrapper as follows:
#if defined(_DEBUG) && defined(SWIG_PYTHON_INTERPRETER_NO_DEBUG)
/* Use debug wrappers with the Python release dll */
# undef _DEBUG
# include <Python.h>
# define _DEBUG
#else
# include <Python.h>
#endif
So when compiling a debug version of the wrapper on a Windows platform, we simply need to define the SWIG_PYTHON_INTERPRETER_NO_DEBUG flag to avoid the pyconfig.h file issue mentioned in Ken's answer.
Building the project in Release mode removes the python27_d.lib dependency too; at least it did for my own project.
I found out that addind the Python symbols do the Project solves it. Do it like this
I also copied the python27.lib to a file named python27_d.lib
You can try adding "python27_d.lib" (without quotes) to ignored libs:
Configuration Properties -> Linker -> Input -> Ignore Specific Library
I resolved the missing python27_d.lib by doing the following:
Copy python27.lib to python27_d.lib
In pyconfig.h comment out define Py_DEBUG

Categories