Using memoization but still code runs forever - python

I am trying to solve the SPOJ problem "Cricket Tournament". I wrote the code in python and also in c. In python it takes about 2 seconds for input 0.0 0/0 300. But in C it runs forever. Code in C is running for some smaller test cases like 19.5 0/0 1
Code in C
#include<stdio.h>
float ans[10][120][300]={0};
float recursion(int balls, int reqRuns, int wickets);
int readScore(void);
int main()
{
int t;
scanf("%d",&t);
while(t--)
{
memset(ans,0,sizeof(ans));
float overs;
int myruns,wickets,target;
scanf("%f",&overs);
myruns=readScore();
scanf("%d",&wickets);
//printf("%d %d\n",myruns,wickets );
scanf("%d",&target);
//printf("%d %d %d\n",myruns,wickets,target);
if(myruns>=target)
{
printf("%s\n","100.00");
continue;
}
else if(wickets>=10)
{
printf("%s\n", "0.00");
continue;
}
printf("overs = %f\n", overs);
int ov = (int) overs;
int ball = (int)(overs*10)%10;
int totballs = 6*ov+ball;
//printf("%d %d\n",ov,ball );
//printf("%d %d %d\n",totballs, target- myruns,wickets );
float finalAns = recursion(totballs,target-myruns, wickets)*100;
printf("%.2f\n",finalAns);
}
return 0;
}
int readScore()
{
char ch;
int ans2=0;
ch = getchar();
//ch = getchar();
//ans = ans*10 + ch-'0';
//printf("sadasdas %d\n",ch );
while(ch!='/')
{
ch=getchar();
//printf(" ch = %d\n", ch-'0');
if(ch!='/')
ans2 = ans2*10 + ch-'0';
}
//printf("%d\n",ans );
return ans2;
}
float recursion(int balls, int reqRuns, int wickets)
{
if (reqRuns<=0)
return 1;
if (balls==120||wickets==10)
return 0;
if(ans[wickets][balls][reqRuns]!=0)
return ans[wickets][balls][reqRuns];
ans[wickets][balls][reqRuns] = (recursion(balls+1, reqRuns,wickets)+recursion(balls+1, reqRuns-1,wickets)+
recursion(balls+1, reqRuns-2,wickets)+recursion(balls+1, reqRuns-3,wickets)+
recursion(balls+1, reqRuns-4,wickets)+recursion(balls+1, reqRuns-5,wickets)+
recursion(balls+1, reqRuns-6,wickets)+recursion(balls+1, reqRuns,wickets+1)+
2*recursion(balls, reqRuns-1,wickets))/10;
return ans[wickets][balls][reqRuns];
}
Code in Python
from __future__ import division
saved = {}
t = input()
def func(f):
if f in saved: return saved[f]
x,y,z,n = f
if z >= n: return 1
if x == 120: return 0
if y == 10: return 0
saved[f] = (func((x+1,y+1,z,n)) + func((x+1, y,z,n)) + func((x+1,y,z+1,n)) + func((x+1, y, z+2,n)) + func((x+1, y, z+3,n)) + func((x+1, y, z+4,n)) + func((x+1, y, z+5,n))+ func((x+1, y, z+6,n))+ func((x,y,z+1,n)) + func((x,y,z+1,n))) / 10
return saved[f]
def converter(f):
v = f.index('.')
x,y = int(f[:v]), int(f[-1])
return x*6+(y)
for i in range(t):
x,y,z = raw_input().split()
v = y.index('/')
q = int(y[:v])
x,y,z = converter(x), int(y[(v+1):]), int(z)
print '%.2f' % (100 * func((x,y,q,z)))

Your problem is that a lot of the results of the recursion are 0, so
if(ans[wickets][balls][reqRuns]!=0)
return ans[wickets][balls][reqRuns];
fails to return the cached result in many cases, hence you're recomputing many many results, while the check f in saved in Python prevents recomputation of the same values.
I changed your C code to set the initial entries of ans to contain negative numbers (if you know the floating point representation of your platform to be IEEE754, simply changing to memset(ans, 0x80, sizeof ans); will do), and replaced the condition with
if (ans[wickets][balls][reqRuns] >= 0)
and got the result immediately:
$ time ./a.out < spoj_inp.txt
overs = 0.000000
18.03
real 0m0.023s
user 0m0.020s
sys 0m0.002s

The problem is with your use of scanf. It treats space or newline as terminator of an input. Mostly likely you are typing enter after each input. However, problem is that it leaves the \n in the buffer and that is passed to the next input.
If you are not using strict c, you can call
cin.ignore()
after each scanf call. I tried it on your code and was able to get successful output.
Alternately, you can call
fflush(stdin);
This might be helpful too
scanf at stackoverflow

I guess the recursion is to be blamed here. Code does work for smaller targets. Get rid of recursion if possible.
With smaller targets:
input
2
0.0 0/1 10
0.0 2/2 20
output
100.00
99.99

Related

Clipping a binary number to required length C/C++

I have written a short function to convert an input decimal number to a binary output. However, at a much higher level of the code, the end user should toggle an option as to whether or not they desire a 5B or 10B value. For the sake of some other low level maths, I have to clip the data here.
So I need some help figuring out how to clip the output to a desired length and stuff the required number of leading zeros.
The incomplete C code:
long dec2bin(int x_dec,int res)
{
long x_bin = 0;
int x_bin_len;
int x_rem, i = 1;
while (x_dec != 0)
{
x_rem = x_dec % 2;
x_dec /= 2;
x_bin += x_rem * i;
i *= 10;
}
return x_bin;
}
I had completed a working proof of concept using python. The end application however, requires I write this in C.
The working python script:
def dec2bin(x_dec,x_res):
x_bin = bin(x_dec)[2:] #Convert to Binary (Remove 0B Prefix)
x_len = len(x_bin)
if x_len < x_res: #If Smaller than desired resolution
x_bin = '0' * (x_res-x_len) + x_bin #Stuff with leading 0s
if x_len > x_res: #If larger than desired resolution
x_bin = x_bin[x_len-x_res:x_len] #Display desired no. LSBs
return x_bin
I'm sure this has been done before, Indeed, my python script proves it should be relatively straightforward, but I'm not as experienced with C.
Any help is greatly appreciated.
Mark.
As #yano suggested, I think you have to return an ascii string to the caller, rather than a long. Below's the short function I wrote for my own purposes, for any base...
char *itoa ( int i, int base, int ndigits ) {
static char a[999], digits[99] = /* up to base 65 */
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#$*";
int n=ndigits;
memset(a,'0',ndigits); a[ndigits]='\000';
while ( --n >= 0) {
a[n] = digits[i%base];
if ( (i/=base) < 1 ) break; }
return ( a );
} /* --- end-of-function itoa() --- */

Is python really slow or is it because I havent optimized the python code?

The Leetcode question is: Given a string s, find the length of the longest substring without repeating characters.
https://leetcode.com/problems/longest-substring-without-repeating-characters/
I coded in both C++ and Python to see whether there is a huge performance gap, and found the result to be:
Here are the c++ and python implementations of the same logic:
class Solution {
public:
int lengthOfLongestSubstring(string s) {
int max_count=0;
int k=1;
int i=0;
int j=0;
bool visited[256];
memset(visited,false,256);
int n=s.size();
while(k<=n && i<n && j<n){
/*for(int l=i;l<=j;l++) cout << s[l];
cout << endl;*/
if(visited[int(s[j])]){
memset(visited,false,256);
k=1;
i++;
j=i+k-1;
}else{
if (max_count<k) max_count=k;
visited[int(s[j])]=true;
k++;
j++;
}
}
return max_count;
}
};
and
class Solution:
def lengthOfLongestSubstring(self, a: str) -> int:
#apply sliding window for k=0,1,2,..,n until repetition is found for a substring
k=1 #wndow length
i=0 #starting indx of substring
j=0 #ending indx of substring
init_visited=[False]*256
visited=init_visited[:]
max_count=0
n=len(a)
while k<=n and j<n and i<n:
#print(k,i,j)
#print(a[i:j+1])
if visited[ord(a[j])]:
visited = init_visited[:]
i+=1
k=1
j=i+k-1
else:
visited[ord(a[j])]=True
max_count=max(max_count,k)
k+=1
j+=1
return max_count
What could have I improved in the Python code to make it faster?
Python will never be as fast as C++ but you can optimize this function using a dictionary:
def longestsub(S):
distinct = set()
result = 0
start = 0
for i,c in enumerate(S):
if c in distinct:
result = max(result,i-start)
distinct.difference_update(S[start:i])
start = i
distinct.add(c)
result = max(result,len(S)-start)
return result
output:
testData = ["abcabcbb","bbbbb","pwwkew",""]
for s in testData:
print(f"'{s}' : ",longestsub(s))
'abcabcbb' : 3
'bbbbb' : 1
'pwwkew' : 3
'' : 0
performance:
longestsub("abcabcbb"*10000) # 80,000 characters --> 3 in 30 ms

Weave Inline C++ Code in Python 2.7

I'm trying to rewrite this function:
def smoothen_fast(heightProfile, travelTime):
smoothingInterval = 30 * travelTime
heightProfile.extend([heightProfile[-1]]*smoothingInterval)
# Get the mean of first `smoothingInterval` items
first_mean = sum(heightProfile[:smoothingInterval]) / smoothingInterval
newHeightProfile = [first_mean]
for i in xrange(len(heightProfile)-smoothingInterval-1):
prev = heightProfile[i] # the item to be subtracted from the sum
new = heightProfile[i+smoothingInterval] # item to be added
# Calculate the sum of previous items by multiplying
# last mean with smoothingInterval
prev_sum = newHeightProfile[-1] * smoothingInterval
new_sum = prev_sum - prev + new
mean = new_sum / smoothingInterval
newHeightProfile.append(mean)
return newHeightProfile
as embedded C++ Code:
import scipy.weave as weave
heightProfile = [0.14,0.148,1.423,4.5]
heightProfileSize = len(heightProfile)
travelTime = 3
code = r"""
#include <string.h>
int smoothingInterval = 30 * travelTime;
double *heightProfileR = new double[heightProfileSize+smoothingInterval];
for (int i = 0; i < heightProfileSize; i++)
{
heightProfileR[i] = heightProfile[i];
}
for (int i = 0; i < smoothingInterval; i++)
{
heightProfileR[heightProfileSize+i] = -1;
}
double mean = 0;
for (int i = 0; i < smoothingInterval; i++)
{
mean += heightProfileR[i];
}
mean = mean/smoothingInterval;
double *heightProfileNew = new double[heightProfileSize-smoothingInterval];
for (int i = 0; i < heightProfileSize-smoothingInterval-1; i++)
{
double prev = heightProfileR[i];
double newp = heightProfile[i+smoothingInterval];
double prev_sum = heightProfileNew[i] * smoothingInterval;
double new_sum = prev_sum - prev + newp;
double meanp = new_sum / smoothingInterval;
heightProfileNew[i+1] = meanp;
}
return_val = Py::new_reference_to(Py::Double(heightProfileNew));
"""
d = weave.inline(code,['heightProfile','heightProfileSize','travelTime'])
As a return type i need the heightProfileNew. I need the access it like a list in Python later.
I look at these examples:
http://docs.scipy.org/doc/scipy/reference/tutorial/weave.html
He keeps telling me that he doesn't know Py::, but in the examples there are no Py-Includes?
I know, the question is old, but I think it is still interesting.
Assuming your using weave to improve computation speed and that you know the length of your output beforehand, I suggest that you create the result before calling inline. That way you can create the result variable in python (very easy). I would also suggest using a nd.ndarray as a result because it makes shure you use the right datatype. You can iterate ndarrays in python the same way you iterate lists.
import numpy as np
heightProfileArray = np.array(heightprofile)
# heightProfileArray = np.array(heightprofile, dtype = np.float32) if you want to make shure you have the right datatype. Another choice would be np.float64
resultArray = np.zeros_like(heightProfileArray) # same array size and data type but filled with zeros
[..]
weave.inline(code,['heightProfile','heightProfileSize','travelTime','resultArray'])
for element in resultArray:
print element
In your C++-code you can then just assign values to elements of that array:
[..]
resultArray[i+1] = 5.5;
[..]

Memory Leak in Ctypes method

I have a project mostly written in Python. This project runs on my Raspberry Pi (Model B). With the use of the Pi Camera I record to a stream. Every second I pauze the recording to take the last frame from the stream and compare it with a older frame. The comparing is done in C code (mainly because it is faster than Python).
The C code is called from Python using Ctypes. See the code below.
# Load picturecomparer.so and set argument and return types
cmethod = ctypes.CDLL(Paths.CMODULE_LOCATION)
cmethod.compare_pictures.restype = ctypes.c_double
cmethod.compare_pictures.argtypes = [ctypes.c_char_p, types.c_char_p]
The 2 images that must be compared are stored on the disk. Python gives the paths of both images as arguments to the C code. The C code will return a value (double) which is the difference in percentage of both images.
# Call the C method to compare the images
difflevel = cmethod.compare_pictures(path1, path2)
The C code looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#ifndef STB_IMAGE_IMPLEMENTATION
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#ifndef STBI_ASSERT
#define STBI_ASSERT(x)
#endif
#endif
#define COLOR_R 0
#define COLOR_G 1
#define COLOR_B 2
#define OFFSET 10
double compare_pictures(const char* path1, const char* path2);
double compare_pictures(const char* path1, const char* path2)
{
double totalDiff = 0.0, value;
unsigned int x, y;
int width1, height1, comps1;
unsigned char * image1 = stbi_load(path1, &width1, &height1, &comps1, 0);
int width2, height2, comps2;
unsigned char * image2 = stbi_load(path2, &width2, &height2, &comps2, 0);
// Perform some checks to be sure images are valid
if (image1 == NULL | image2 == NULL) { return 0; }
if (width1 != width2 | height1 != height2) { return 0; }
for (y = 0; y < height1; y++)
{
for (x = 0; x < width1; x++)
{
// Calculate difference in RED
value = (int)image1[(x + y*width1) * comps1 + COLOR_R] - (int)image2[(x + y*width2) * comps2 + COLOR_R];
if (value < OFFSET && value > (OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
// Calculate difference in GREEN
value = (int)image1[(x + y*width1) * comps1 + COLOR_G] - (int)image2[(x + y*width2) * comps2 + COLOR_G];
if (value < OFFSET && value >(OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
// Calculate difference in BLUE
value = (int)image1[(x + y*width1) * comps1 + COLOR_B] - (int)image2[(x + y*width2) * comps2 + COLOR_B];
if (value < OFFSET && value >(OFFSET * -1)) { value = 0; }
totalDiff += fabs(value) / 255.0;
}
}
totalDiff = 100.0 * totalDiff / (double)(width1 * height1 * 3);
return totalDiff;
}
The C code will be executed every ~2 seconds. I just noticed that there is a memory leak. After around 10 to 15 minutes my Raspberry Pi haves like 10MB ram left to use. A few minutes later it crashes and doesn't respond anymore.
I have done some checks to find out what causes this in my project. My entire project uses around 30-40MB ram if I disable the C code. This project is all my Raspberry Pi have to execute.
Model B: 512MB ram which shares between CPU and GPU.
GPU: 128MB (/boot/config.txt).
My Linux distro uses: ~60MB.
So I have ~300MB for my project.
Hope someone could point me where it goes wrong, or if I have to call GC myself, etc..
Thanks in advance.
p.s. I know the image comparing is not the best way, but it works for me now.
Since the images are being returned as pointers to buffers stbi_load must be allocating space for them and you are not releasing this space before returning so the memory leak is not surprising.
Check for the documentation to see if there is a specific stpi_free function or try adding free(image1); free(image2); before the final return.
Having checked I can categorically say that you should be calling STBI_FREE(image1); STBI_FREE(image2); before returning.

Ctree Specializer is using for loop index for computation, not the actual array value

I'm implementing a simple Xor Reducer, but it is unable to return the appropriate value.
Python Code (Input):
class LazySpecializedFunctionSubclass(LazySpecializedFunction):
subconfig_type = namedtuple('subconfig',['dtype','ndim','shape','size','flags'])
def __init__(self, py_ast = None):
py_ast = py_ast or get_ast(self.kernel)
super(LazySlimmy, self).__init__(py_ast)
# [... other code ...]
def points(self, inpt):
iter = np.nditer(input, flags=['c_index'])
while not iter.finished:
yield iter.index
iter.iternext()
class XorReduction(LazySpecializedFunctionSubclass):
def kernel(self, inpt):
'''
Calculates the cumulative XOR of elements in inpt, equivalent to
Reduce with XOR
'''
result = 0
for point in self.points(inpt): # self.points is defined in LazySpecializedFunctionSubclass
result = point ^ result # notice how 'point' here is the actual element in self.points(inpt), not the index
return result
C Code (Output):
// <file: module.c>
void kernel(long* inpt, long* output) {
long result = 0;
for (int point = 0; point < 2; point ++) {
result = point ^ result; // Notice how it's using the index, point, not inpt[point].
};
* output = result;
};
Any ideas how to fix this?
The problem is that you are using point in different ways, in XorReduction kernel method you are iterating of the values in the array, but in the generated C code you are iterating over the indices of the array. Your C code xor reduction is thus done on the indices.
The generated C function should look more like
// <file: module.c>
void kernel(long* inpt, long* output) {
long result = 0;
for (int point = 0; point < 2; point ++) {
result = inpt[point] ^ result; // you did not reference your input in the question
};
* output = result;
};

Categories