Memory Leak in Ctypes method - python

I have a project mostly written in Python. This project runs on my Raspberry Pi (Model B). With the use of the Pi Camera I record to a stream. Every second I pauze the recording to take the last frame from the stream and compare it with a older frame. The comparing is done in C code (mainly because it is faster than Python).
The C code is called from Python using Ctypes. See the code below.
# Load picturecomparer.so and set argument and return types
cmethod = ctypes.CDLL(Paths.CMODULE_LOCATION)
cmethod.compare_pictures.restype = ctypes.c_double
cmethod.compare_pictures.argtypes = [ctypes.c_char_p, types.c_char_p]
The 2 images that must be compared are stored on the disk. Python gives the paths of both images as arguments to the C code. The C code will return a value (double) which is the difference in percentage of both images.
# Call the C method to compare the images
difflevel = cmethod.compare_pictures(path1, path2)
The C code looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#ifndef STB_IMAGE_IMPLEMENTATION
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#ifndef STBI_ASSERT
#define STBI_ASSERT(x)
#endif
#endif
#define COLOR_R 0
#define COLOR_G 1
#define COLOR_B 2
#define OFFSET 10
double compare_pictures(const char* path1, const char* path2);
double compare_pictures(const char* path1, const char* path2)
{
double totalDiff = 0.0, value;
unsigned int x, y;
int width1, height1, comps1;
unsigned char * image1 = stbi_load(path1, &width1, &height1, &comps1, 0);
int width2, height2, comps2;
unsigned char * image2 = stbi_load(path2, &width2, &height2, &comps2, 0);
// Perform some checks to be sure images are valid
if (image1 == NULL | image2 == NULL) { return 0; }
if (width1 != width2 | height1 != height2) { return 0; }
for (y = 0; y < height1; y++)
{
for (x = 0; x < width1; x++)
{
// Calculate difference in RED
value = (int)image1[(x + y*width1) * comps1 + COLOR_R] - (int)image2[(x + y*width2) * comps2 + COLOR_R];
if (value < OFFSET && value > (OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
// Calculate difference in GREEN
value = (int)image1[(x + y*width1) * comps1 + COLOR_G] - (int)image2[(x + y*width2) * comps2 + COLOR_G];
if (value < OFFSET && value >(OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
// Calculate difference in BLUE
value = (int)image1[(x + y*width1) * comps1 + COLOR_B] - (int)image2[(x + y*width2) * comps2 + COLOR_B];
if (value < OFFSET && value >(OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
}
}
totalDiff = 100.0 * totalDiff / (double)(width1 * height1 * 3);
return totalDiff;
}
The C code will be executed every ~2 seconds. I just noticed that there is a memory leak. After around 10 to 15 minutes my Raspberry Pi haves like 10MB ram left to use. A few minutes later it crashes and doesn't respond anymore.
I have done some checks to find out what causes this in my project. My entire project uses around 30-40MB ram if I disable the C code. This project is all my Raspberry Pi have to execute.
Model B: 512MB ram which shares between CPU and GPU.
GPU: 128MB (/boot/config.txt).
My Linux distro uses: ~60MB.
So I have ~300MB for my project.
Hope someone could point me where it goes wrong, or if I have to call GC myself, etc..
Thanks in advance.
p.s. I know the image comparing is not the best way, but it works for me now.

Since the images are being returned as pointers to buffers stbi_load must be allocating space for them and you are not releasing this space before returning so the memory leak is not surprising.
Check for the documentation to see if there is a specific stpi_free function or try adding free(image1); free(image2); before the final return.
Having checked I can categorically say that you should be calling STBI_FREE(image1); STBI_FREE(image2); before returning.

Related

How to copy a 2D array (matrix) from python with a C function (and do some computer heavy computation) which return a 2D array (matrix) in python?

I want to copy a 2D numpy array (matrix) in a C function a get it back in python (and then do some calculation on it in C taking the speed advantage of C) . Therefore I need the C function matrix_copy to return a 2D array (or, I guess, a pointer to it). I tried with the following code but I get the following output (where one can see the second dimension of the array is lost).
matrix_in.shape:
(300, 200)
matrix_out.shape:
(300,)
How could I change the code (I guess the matrix_copy.c adding some pointer magic) so I could obtain an exact copy of the matrix_in in matrix_out?
Here is the main.py script:
from ctypes import c_void_p, c_double, c_int, cdll
from numpy.ctypeslib import ndpointer
import numpy as np
import pdb
n = 300
m = 200
matrix_in = np.random.randn(n, m)
lib = cdll.LoadLibrary("matrix_copy.so")
matrix_copy = lib.matrix_copy
matrix_copy.restype = ndpointer(dtype=c_double,
shape=(n,))
matrix_out = matrix_copy(c_void_p(matrix_in.ctypes.data),
c_int(n),
c_int(m))
print("matrix_in.shape:")
print(matrix_in.shape)
print("matrix_out.shape:")
print(matrix_out.shape)
Here is the matrix_copy.c script:
#include <stdlib.h>
#include <stdio.h>
double * matrix_copy(const double * matrix_in, int n, int m){
double * matrix_out = (double *)malloc(sizeof(double) * (n*m));
int index = 0;
for(int i=0; i< n; i++){
for(int j=0; j<m; j++){
matrix_out[i+j] = matrix_in[i+j];
//matrix_out[i][j] = matrix_in[i][j];
// some heavy computations not yet implemented
}
}
return matrix_out;
}
which I compile with the command
cc -fPIC -shared -o matrix_copy.so matrix_copy.c
And as a side note, why does the notation matrix_out[i][j] = matrix_in[i][j]; throws me an error on compilation?
matrix_copy.c:10:26: error: subscripted value is not an array, pointer, or vector
matrix_out[i][j] = matrix_in[i][j];
~~~~~~~~~~~~~^~
matrix_copy.c:10:44: error: subscripted value is not an array, pointer, or vector
matrix_out[i][j] = matrix_in[i][j];
The second dimension is 'lost' because you explicitly omit it in the named shape argument of ndpointer. Change:
matrix_copy.restype = ndpointer(dtype=c_double, shape=(n,))
to
matrix_copy.restype = ndpointer(dtype=c_double, shape=(n,m), flags='C')
Where flags='C' additionally notes that the returned data is stored contiguously in row major order.
With regards to matrix_out[i][j] = matrix_in[i][j]; throwing an error, consider that matrix_in is of type const double *. matrix_in[i] would yield a value of type const double - how do you further index this value (i.e., with [j])?
If you want to emulate accessing a 2-dimensional array via a single pointer, you must calculate offsets manually. matrix_out[i+j] is not sufficient, as you must account for the span of each sub array:
matrix_out[i * m + j] = matrix_in[i * m + j];
Note that in C, size_t is the generally preferred type to use when dealing with memory sizes or array lengths.
matrix_copy.c, simplified:
#include <stdlib.h>
double *matrix_copy(const double *matrix_in, size_t n, size_t m)
{
double *matrix_out = malloc(sizeof *matrix_out * n * m);
for (size_t i = 0; i < n; i++)
for (size_t j = 0; j < m; j++)
matrix_out[i * m + j] = matrix_in[i * m + j];
return matrix_out;
}
matrix.py, with more explicit typing:
from ctypes import c_void_p, c_double, c_size_t, cdll, POINTER
from numpy.ctypeslib import ndpointer
import numpy as np
c_double_p = POINTER(c_double)
n = 300
m = 200
matrix_in = np.random.randn(n, m).astype(c_double)
lib = cdll.LoadLibrary("matrix_copy.so")
matrix_copy = lib.matrix_copy
matrix_copy.argtypes = c_double_p, c_size_t, c_size_t
matrix_copy.restype = ndpointer(
dtype=c_double,
shape=(n,m),
flags='C')
matrix_out = matrix_copy(
matrix_in.ctypes.data_as(c_double_p),
c_size_t(n),
c_size_t(m))
print("matrix_in.shape:", matrix_in.shape)
print("matrix_out.shape:", matrix_out.shape)
print("in == out", matrix_in == matrix_out)
The incoming data is a probably single block of memory. You need to create the substructure.
In my C++ code I have to do the following on data (block) coming in via swig:
void divide2DDoubleArray(double * &block, double ** &subblockdividers, int noofsubblocks, int subblocksize){
/* The starting address of a block of doubles is used to generate
* pointers to subblocks.
*
* block: memory containing the original block of data
* subblockdividers: array of subblock addresses
* noofsubblocks: specify the number of subblocks produced
* subblocksize: specify the size of the subblocks produced
*
* Design by contract: application should make sure the memory
* in block is allocated and initialized properly.
*/
// Build 2D matrix for cols
subblockdividers=new double *[noofsubblocks];
subblockdividers[0]= block;
for (int i=1; i<noofsubblocks; ++i) {
subblockdividers[i] = &subblockdividers[i-1][subblocksize];
}
}
Now the pointer returned in subblockdividers can be used the way you would like to.
Don't forget to free subblockdividers when your done. (Note: adjustments might be needed to compile this as C code)

Clipping a binary number to required length C/C++

I have written a short function to convert an input decimal number to a binary output. However, at a much higher level of the code, the end user should toggle an option as to whether or not they desire a 5B or 10B value. For the sake of some other low level maths, I have to clip the data here.
So I need some help figuring out how to clip the output to a desired length and stuff the required number of leading zeros.
The incomplete C code:
long dec2bin(int x_dec,int res)
{
long x_bin = 0;
int x_bin_len;
int x_rem, i = 1;
while (x_dec != 0)
{
x_rem = x_dec % 2;
x_dec /= 2;
x_bin += x_rem * i;
i *= 10;
}
return x_bin;
}
I had completed a working proof of concept using python. The end application however, requires I write this in C.
The working python script:
def dec2bin(x_dec,x_res):
x_bin = bin(x_dec)[2:] #Convert to Binary (Remove 0B Prefix)
x_len = len(x_bin)
if x_len < x_res: #If Smaller than desired resolution
x_bin = '0' * (x_res-x_len) + x_bin #Stuff with leading 0s
if x_len > x_res: #If larger than desired resolution
x_bin = x_bin[x_len-x_res:x_len] #Display desired no. LSBs
return x_bin
I'm sure this has been done before, Indeed, my python script proves it should be relatively straightforward, but I'm not as experienced with C.
Any help is greatly appreciated.
Mark.
As #yano suggested, I think you have to return an ascii string to the caller, rather than a long. Below's the short function I wrote for my own purposes, for any base...
char *itoa ( int i, int base, int ndigits ) {
static char a[999], digits[99] = /* up to base 65 */
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#$*";
int n=ndigits;
memset(a,'0',ndigits); a[ndigits]='\000';
while ( --n >= 0) {
a[n] = digits[i%base];
if ( (i/=base) < 1 ) break; }
return ( a );
} /* --- end-of-function itoa() --- */

Same random numbers in C++ as computed by Python3 numpy.random.rand

I would like to duplicate in C++ the testing for some code that has already been implemented in Python3 which relies on numpy.random.rand and randn values and a specific seed (e.g., seed = 1).
I understand that Python's random implementation is based on a Mersenne twister. The C++ standard library also supplies this in std::mersenne_twister_engine.
The C++ version returns an unsigned int, whereas Python rand is a floating point value.
Is there a way to obtain the same values in C++ as are generated in Python, and be sure that they are the same? And the same for an array generated by randn ?
You can do it this way for integer values:
import numpy as np
np.random.seed(12345)
print(np.random.randint(256**4, dtype='<u4', size=1)[0])
#include <iostream>
#include <random>
int main()
{
std::mt19937 e2(12345);
std::cout << e2() << std::endl;
}
The result of both snippets is 3992670690
By looking at source code of rand you can implement it in your C++ code this way:
import numpy as np
np.random.seed(12345)
print(np.random.rand())
#include <iostream>
#include <iomanip>
#include <random>
int main()
{
std::mt19937 e2(12345);
int a = e2() >> 5;
int b = e2() >> 6;
double value = (a * 67108864.0 + b) / 9007199254740992.0;
std::cout << std::fixed << std::setprecision(16) << value << std::endl;
}
Both random values are 0.9296160928171479
It would be convenient to use std::generate_canonical, but it uses another method to convert the output of Mersenne twister to double. The reason they differ is likely that generate_canonical is more optimized than the random generator used in NumPy, as it avoids costly floating point operations, especially multiplication and division, as seen in source code. However it seems to be implementation dependent, while NumPy produces the same result on all platforms.
double value = std::generate_canonical<double, std::numeric_limits<double>::digits>(e2);
This doesn't work and produces result 0.8901547132827379, which differs from the output of Python code.
For completeness and to avoid re-inventing the wheel, here is an implementation for both numpy.rand and numpy.randn in C++
The header file:
#ifndef RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#define RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#include "RandomNumGenerator.h"
//Uniform distribution - numpy.rand
class RandomNumGen_NumpyCompatible {
public:
RandomNumGen_NumpyCompatible();
RandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);
std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
void seed(std::uint_fast32_t seed);
void discard(unsigned long long); // NOTE!! Advances and discards twice as many values as passed in to keep tracking with Numpy order
uint_fast32_t operator()(); //Simply returns the next Mersenne value from the engine
double getDouble(); //Calculates the next uniformly random double as numpy.rand does
std::string getGeneratorType() const { return "RandomNumGen_NumpyCompatible"; }
private:
std::mt19937 m_mersenneEngine;
};
///////////////////
//Gaussian distribution - numpy.randn
class GaussianRandomNumGen_NumpyCompatible {
public:
GaussianRandomNumGen_NumpyCompatible();
GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);
std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
void seed(std::uint_fast32_t seed);
void discard(unsigned long long); // NOTE!! Advances and discards twice as many values as passed in to keep tracking with Numpy order
uint_fast32_t operator()(); //Simply returns the next Mersenne value from the engine
double getDouble(); //Calculates the next normally (Gaussian) distrubuted random double as numpy.randn does
std::string getGeneratorType() const { return "GaussianRandomNumGen_NumpyCompatible"; }
private:
bool m_haveNextVal;
double m_nextVal;
std::mt19937 m_mersenneEngine;
};
#endif
And the implementation:
#include "RandomNumGen_NumpyCompatible.h"
RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible()
{
}
RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_mersenneEngine(seed)
{
}
void RandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
m_mersenneEngine.seed(newSeed);
}
void RandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
//Advances and discards TWICE as many values to keep with Numpy order
m_mersenneEngine.discard(2*z);
}
std::uint_fast32_t RandomNumGen_NumpyCompatible::operator()()
{
return m_mersenneEngine();
}
double RandomNumGen_NumpyCompatible::getDouble()
{
int a = m_mersenneEngine() >> 5;
int b = m_mersenneEngine() >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
///////////////////
GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible()
: m_haveNextVal(false)
{
}
GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_haveNextVal(false), m_mersenneEngine(seed)
{
}
void GaussianRandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
m_mersenneEngine.seed(newSeed);
}
void GaussianRandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
//Burn some CPU cyles here
for (unsigned i = 0; i < z; ++i)
getDouble();
}
std::uint_fast32_t GaussianRandomNumGen_NumpyCompatible::operator()()
{
return m_mersenneEngine();
}
double GaussianRandomNumGen_NumpyCompatible::getDouble()
{
if (m_haveNextVal) {
m_haveNextVal = false;
return m_nextVal;
}
double f, x1, x2, r2;
do {
int a1 = m_mersenneEngine() >> 5;
int b1 = m_mersenneEngine() >> 6;
int a2 = m_mersenneEngine() >> 5;
int b2 = m_mersenneEngine() >> 6;
x1 = 2.0 * ((a1 * 67108864.0 + b1) / 9007199254740992.0) - 1.0;
x2 = 2.0 * ((a2 * 67108864.0 + b2) / 9007199254740992.0) - 1.0;
r2 = x1 * x1 + x2 * x2;
} while (r2 >= 1.0 || r2 == 0.0);
/* Box-Muller transform */
f = sqrt(-2.0 * log(r2) / r2);
m_haveNextVal = true;
m_nextVal = f * x1;
return f * x2;
}
After doing a bit of testing, it does seem that the values are within a tolerance (see #fdermishin 's comment below) when the C++ unsigned int is divided by the maximum value for an unsigned int like this:
#include <limits>
...
std::mt19937 generator1(seed); // mt19937 is a standard mersenne_twister_engine
unsigned val1 = generator1();
std::cout << "Gen 1 random value: " << val1 << std::endl;
std::cout << "Normalized Gen 1: " << static_cast<double>(val1) / std::numeric_limits<std::uint32_t>::max() << std::endl;
However, Python's version seems to skip every other value.
Given the following two programs:
#!/usr/bin/env python3
import numpy as np
def main():
np.random.seed(1)
for i in range(0, 10):
print(np.random.rand())
###########
# Call main and exit success
if __name__ == "__main__":
main()
sys.exit()
and
#include <cstdlib>
#include <iostream>
#include <random>
#include <limits>
int main()
{
unsigned seed = 1;
std::mt19937 generator1(seed); // mt19937 is a standard mersenne_twister_engine
for (unsigned i = 0; i < 10; ++i) {
unsigned val1 = generator1();
std::cout << "Normalized, #" << i << ": " << (static_cast<double>(val1) / std::numeric_limits<std::uint32_t>::max()) << std::endl;
}
return EXIT_SUCCESS;
}
the Python program prints:
0.417022004702574
0.7203244934421581
0.00011437481734488664
0.30233257263183977
0.14675589081711304
0.0923385947687978
0.1862602113776709
0.34556072704304774
0.39676747423066994
0.538816734003357
whereas the C++ program prints:
Normalized, #0: 0.417022
Normalized, #1: 0.997185
Normalized, #2: 0.720324
Normalized, #3: 0.932557
Normalized, #4: 0.000114381
Normalized, #5: 0.128124
Normalized, #6: 0.302333
Normalized, #7: 0.999041
Normalized, #8: 0.146756
Normalized, #9: 0.236089
I could easily skip every other value in the C++ version, which should give me numbers that match the Python version (within a tolerance). But why would Python's implementation seem to skip every other value, or where do these extra values in the C++ version come from?

Passing an array using Ctypes

So my python program is
from ctypes import *
import ctypes
number = [0,1,2]
testlib = cdll.LoadLibrary("./a.out")
testlib.init.argtypes = [ctypes.c_int]
testlib.init.restype = ctypes.c_double
#create an array of size 3
testlib.init(3)
#Loop to fill the array
#use AccessArray to preform an action on the array
And the C part is
#include <stdio.h>
double init(int size){
double points[size];
return points[0];
}
double fillarray(double value, double location){
// i need to access
}
double AccessArray(double value, double location){
// i need to acess the array that is filled in the previous function
}
So what I need to do is to pass an array from the python part to the C function somehow move that array in C to the another function where I will access it in order to process it.
I'm stuck though because I cant figure out a way to move the array in the C part.
can someone show me how to do this?
You should try something like this (in your C code):
#include <stdio.h>
double points[1000];//change 1000 for the maximum size for you
int sz = 0;
double init(int size){
//verify size <= maximum size for the array
for(int i=0;i<size;i++) {
points[i] = 1;//change 1 for the init value for you
}
sz = size;
return points[0];
}
double fillarray(double value, double location){
//first verify 0 < location < sz
points[(int)location] = value;
}
double AccessArray(double value, double location){
//first verify 0 < location < sz
return points[(int)location];
}
This is a very simple solution but if you need to allocate an array with just any size you shoul study the use of malloc
Maybe something like this?
$ cat Makefile
go: a.out
./c-double
a.out: c.c
gcc -fpic -shared c.c -o a.out
zareason-dstromberg:~/src/outside-questions/c-double x86_64-pc-linux-gnu 27062 - above cmd done 2013 Fri Dec 27 11:03 AM
$ cat c.c
#include <stdio.h>
#include <malloc.h>
double *init(int size) {
double *points;
points = malloc(size * sizeof(double));
return points;
}
double fill_array(double *points, int size) {
int i;
for (i=0; i < size; i++) {
points[i] = (double) i;
}
}
double access_array(double *points, int size) {
// i need to access the array that is filled in the previous function
int i;
for (i=0; i < size; i++) {
printf("%d: %f\n", i, points[i]);
}
}
zareason-dstromberg:~/src/outside-questions/c-double x86_64-pc-linux-gnu 27062 - above cmd done 2013 Fri Dec 27 11:03 AM
$ cat c-double
#!/usr/local/cpython-3.3/bin/python
import ctypes
testlib = ctypes.cdll.LoadLibrary("./a.out")
testlib.init.argtypes = [ctypes.c_int]
testlib.init.restype = ctypes.c_void_p
#create an array of size 3
size = 3
double_array = testlib.init(size)
#Loop to fill the array
testlib.fill_array(double_array, size)
#use AccessArray to preform an action on the array
testlib.access_array(double_array, size)

Using memoization but still code runs forever

I am trying to solve the SPOJ problem "Cricket Tournament". I wrote the code in python and also in c. In python it takes about 2 seconds for input 0.0 0/0 300. But in C it runs forever. Code in C is running for some smaller test cases like 19.5 0/0 1
Code in C
#include<stdio.h>
float ans[10][120][300]={0};
float recursion(int balls, int reqRuns, int wickets);
int readScore(void);
int main()
{
int t;
scanf("%d",&t);
while(t--)
{
memset(ans,0,sizeof(ans));
float overs;
int myruns,wickets,target;
scanf("%f",&overs);
myruns=readScore();
scanf("%d",&wickets);
//printf("%d %d\n",myruns,wickets );
scanf("%d",&target);
//printf("%d %d %d\n",myruns,wickets,target);
if(myruns>=target)
{
printf("%s\n","100.00");
continue;
}
else if(wickets>=10)
{
printf("%s\n", "0.00");
continue;
}
printf("overs = %f\n", overs);
int ov = (int) overs;
int ball = (int)(overs*10)%10;
int totballs = 6*ov+ball;
//printf("%d %d\n",ov,ball );
//printf("%d %d %d\n",totballs, target- myruns,wickets );
float finalAns = recursion(totballs,target-myruns, wickets)*100;
printf("%.2f\n",finalAns);
}
return 0;
}
int readScore()
{
char ch;
int ans2=0;
ch = getchar();
//ch = getchar();
//ans = ans*10 + ch-'0';
//printf("sadasdas %d\n",ch );
while(ch!='/')
{
ch=getchar();
//printf(" ch = %d\n", ch-'0');
if(ch!='/')
ans2 = ans2*10 + ch-'0';
}
//printf("%d\n",ans );
return ans2;
}
float recursion(int balls, int reqRuns, int wickets)
{
if (reqRuns<=0)
return 1;
if (balls==120||wickets==10)
return 0;
if(ans[wickets][balls][reqRuns]!=0)
return ans[wickets][balls][reqRuns];
ans[wickets][balls][reqRuns] = (recursion(balls+1, reqRuns,wickets)+recursion(balls+1, reqRuns-1,wickets)+
recursion(balls+1, reqRuns-2,wickets)+recursion(balls+1, reqRuns-3,wickets)+
recursion(balls+1, reqRuns-4,wickets)+recursion(balls+1, reqRuns-5,wickets)+
recursion(balls+1, reqRuns-6,wickets)+recursion(balls+1, reqRuns,wickets+1)+
2*recursion(balls, reqRuns-1,wickets))/10;
return ans[wickets][balls][reqRuns];
}
Code in Python
from __future__ import division
saved = {}
t = input()
def func(f):
if f in saved: return saved[f]
x,y,z,n = f
if z >= n: return 1
if x == 120: return 0
if y == 10: return 0
saved[f] = (func((x+1,y+1,z,n)) + func((x+1, y,z,n)) + func((x+1,y,z+1,n)) + func((x+1, y, z+2,n)) + func((x+1, y, z+3,n)) + func((x+1, y, z+4,n)) + func((x+1, y, z+5,n))+ func((x+1, y, z+6,n))+ func((x,y,z+1,n)) + func((x,y,z+1,n))) / 10
return saved[f]
def converter(f):
v = f.index('.')
x,y = int(f[:v]), int(f[-1])
return x*6+(y)
for i in range(t):
x,y,z = raw_input().split()
v = y.index('/')
q = int(y[:v])
x,y,z = converter(x), int(y[(v+1):]), int(z)
print '%.2f' % (100 * func((x,y,q,z)))
Your problem is that a lot of the results of the recursion are 0, so
if(ans[wickets][balls][reqRuns]!=0)
return ans[wickets][balls][reqRuns];
fails to return the cached result in many cases, hence you're recomputing many many results, while the check f in saved in Python prevents recomputation of the same values.
I changed your C code to set the initial entries of ans to contain negative numbers (if you know the floating point representation of your platform to be IEEE754, simply changing to memset(ans, 0x80, sizeof ans); will do), and replaced the condition with
if (ans[wickets][balls][reqRuns] >= 0)
and got the result immediately:
$ time ./a.out < spoj_inp.txt
overs = 0.000000
18.03
real 0m0.023s
user 0m0.020s
sys 0m0.002s
The problem is with your use of scanf. It treats space or newline as terminator of an input. Mostly likely you are typing enter after each input. However, problem is that it leaves the \n in the buffer and that is passed to the next input.
If you are not using strict c, you can call
cin.ignore()
after each scanf call. I tried it on your code and was able to get successful output.
Alternately, you can call
fflush(stdin);
This might be helpful too
scanf at stackoverflow
I guess the recursion is to be blamed here. Code does work for smaller targets. Get rid of recursion if possible.
With smaller targets:
input
2
0.0 0/1 10
0.0 2/2 20
output
100.00
99.99

Categories