C++ and Python version of the same algorithm giving different result

C++ and Python version of the same algorithm giving different result - python

The following code is an algorithm to determine the amount of integer triangles, with their biggest side being smaller or equal to MAX, that have an integer median. The Python version works but is too slow for bigger N, while the C++ version is a lot faster but doesn't give the right result.
When MAX is 10, C++ and Python both return 3.
When MAX is 100, Python returns 835 and C++ returns 836.
When MAX is 200, Python returns 4088 and C++ returns 4102.
When MAX is 500, Python returns 32251 and C++ returns 32296.
When MAX is 1000, Python returns 149869 and C++ returns 150002.
Here's the C++ version:
#include <cstdio>
#include <math.h>
const int MAX = 1000;
int main()
{
long long int x = 0;
for (int b = MAX; b > 4; b--)
{
printf("%lld\n", b);
for (int a = b; a > 4; a -= 2){
for (int c = floor(b/2); c < floor(MAX/2); c+=1)
{
if (a+b > 2*c){
int d = 2*(pow(a,2)+pow(b,2)-2*pow(c,2));
if (sqrt(d)/2==floor(sqrt(d)/2))
x+=1;
}
}
}
}
printf("Done: ");
printf("%lld\n", x);
}
Here's the original Python version:
import math
def sumofSquares(n):
f = 0
for b in range(n,4,-1):
print(b)
for a in range(b,4,-2):
for C in range(math.ceil(b/2),n//2+1):
if a+b>2*C:
D = 2*(a**2+b**2-2*C**2)
if (math.sqrt(D)/2).is_integer():
f += 1
return f
a = int(input())
print(sumofSquares(a))
print('Done')
I'm not too familiar with C++ so I have no idea what could be happening that's causing this (maybe an overflow error?).
Of course, any optimizations for the algorithm are more than welcome!

The issue is that the range for your c (C in python) variables do not match. To make them equivalent to your existing C++ range, you can change your python loop to:
for C in range(int(math.floor(b/2)), int(math.floor(n/2))):
...
To make them equivalent to your existing python range, you can change your C++ loop to:
for (int c = ceil(b/2.0); c < MAX/2 + 1; c++) {
...
}
Depending on which loop is originally correct, this will make the results match.

It seams some troubles could be here:
(sqrt(d)==floor(sqrt(d)))

Related

Optimise a Python/ C++ algorithm

I was participating in a competitive programming contest, and faced a question where out of four test cases, my answer was correct in 3, but exceeded time limit in 4th.
I tried to get better results by converting my code from python to cpp (I know that time complexity remains same, but it was worth a shot :))
Following is the question:
A string is said to be using strong language if it contains at least K consecutive characters '*'.
You are given a string S with length N. Determine whether it uses strong language or not.
Input:
The first line of the input contains a single integer T denoting the number of test cases. The description of T test cases follows.
The first line of each test case contains two space-separated integers N and K.
The second line contains a single string S with length N.
Output:
Print a single line containing the string "YES" if the string contains strong language or "NO" if it does not
My python approach:
for _ in range(int(input())):
k = int(input().split()[1])
s = input()
s2 = "".join(["*"]*k)
if len(s.split(s2))>1:
print("YES")
else:
print("NO")
My converted Cpp code (converted it myself)
#include <iostream>
#include<string>
using namespace std;
int main() {
// your code goes here
int t;
std::cin >> t;
for (int i = 0; i < t; i++) {
/* code */
int n,k;
std::cin >> n >> k;
string str;
cin >> str;
string str2(k,'*');
size_t found = str.find(str2);
if (found != string::npos){
std::cout << "YES" << std::endl;
} else {
std::cout << "NO" << std::endl;
}
}
return 0;
}
Please guide me how can I reduce my time complexity?
Other approaches : "Using find() function instead of split or using for loop"
Edit:
Sample Input :
2
5 1
abd
5 2
*i**j
Output :
NO
YES

The bounds you posted suggest that linear time is OK in Python. You can simply keep a running track of how many asterisks you have seen in a row.
T = int(input())
for _ in range(T):
n, k = map(int, input())
s = input()
count, ans = 0, False
for c in s:
if c == "*":
count += 1
else:
count = 0
ans = ans or count >= k
if ans:
print("NO")
else:
print("YES")
I can also tell you why you are TLE'ing. Consider the case where n = 1e6, k = 5e5, and s is a string where the first k-1 characters are asterisks. The find method you have is going to check every position for matching the str2 you created. This will take O(n^2) time, giving you a TLE.

How python manipulates Power in math

I was wondering how python calculates base to power of exp so fast, if python is written in C then it should use pow() for calculating power, meanwhile the output of pow in C for a number like 2^10000 is like below :
#include <stdio.h>
#include <math.h>
int main()
{
int base_num = 2;
int exp_num = 10000;
printf("%lf\n", pow(base_num,exp_num) );
return 0;
}
the result is : inf
but when i execute this code in python :
>>> print( 2**10000 )
it results the large number below :
19950631168807583848837421626835850838234968318861924548520089498529438830221946631919961684036194597899331129423209124271556491349413781117593785932096323957855730046793794526765246551266059895520550086918193311542508608460618104685509074866089624888090489894838009253941633257850621568309473902556912388065225096643874441046759871626985453222868538161694315775629640762836880760732228535091641476183956381458969463899410840960536267821064621427333394036525565649530603142680234969400335934316651459297773279665775606172582031407994198179607378245683762280037302885487251900834464581454650557929601414833921615734588139257095379769119277800826957735674444123062018757836325502728323789270710373802866393031428133241401624195671690574061419654342324638801248856147305207431992259611796250130992860241708340807605932320161268492288496255841312844061536738951487114256315111089745514203313820202931640957596464756010405845841566072044962867016515061920631004186422275908670900574606417856951911456055068251250406007519842261898059237118054444788072906395242548339221982707404473162376760846613033778706039803413197133493654622700563169937455508241780972810983291314403571877524768509857276937926433221599399876886660808368837838027643282775172273657572744784112294389733810861607423253291974813120197604178281965697475898164531258434135959862784130128185406283476649088690521047580882615823961985770122407044330583075869039319604603404973156583208672105913300903752823415539745394397715257455290510212310947321610753474825740775273986348298498340756937955646638621874569499279016572103701364433135817214311791398222983845847334440270964182851005072927748364550578634501100852987812389473928699540834346158807043959118985815145779177143619698728131459483783202081474982171858011389071228250905826817436220577475921417653715687725614904582904992461028630081535583308130101987675856234343538955409175623400844887526162643568648833519463720377293240094456246923254350400678027273837755376406726898636241037491410966718557050759098100246789880178271925953381282421954028302759408448955014676668389697996886241636313376393903373455801407636741877711055384225739499110186468219696581651485130494222369947714763069155468217682876200362777257723781365331611196811280792669481887201298643660768551639860534602297871557517947385246369446923087894265948217008051120322365496288169035739121368338393591756418733850510970271613915439590991598154654417336311656936031122249937969999226781732358023111862644575299135758175008199839236284615249881088960232244362173771618086357015468484058622329792853875623486556440536962622018963571028812361567512543338303270029097668650568557157505516727518899194129711337690149916181315171544007728650573189557450920330185304847113818315407324053319038462084036421763703911550639789000742853672196280903477974533320468368795868580237952218629120080742819551317948157624448298518461509704888027274721574688131594750409732115080498190455803416826949787141316063210686391511681774304792596709376
and because :
10000 x log(2) + 1 = 3011
so it's absolutely correct ! because it has 3011 digits.
so i was wondering how can it really act so fast to raise 2 to the power 10000 in less than a second, while the C itself cannot calculate 2^10000 and returns inf
what formula does it use to calculate so fast, also i have tested 2^1000000 and it resulted in less than 2 seconds

Ctypes: allocate double** , pass it to C, then use it in Python

EDIT 3
I have some C++ code (externed as C) which I access from python.
I want to allocate a double** in python, pass it to the C/C++ code to copy the content of a class internal data, and then use it in python similarly to how I would use a list of lists.
Unfortunately I can not manage to specify to python the size of the most inner array, so it reads invalid memory when iterating over it and the program segfaults.
I can not change the structure of the internal data in C++, and I'd like to have python do the bound checking for me (like if I was using a c_double_Array_N_Array_M instead of an array of pointers).
test.cpp (compile with g++ -Wall -fPIC --shared -o test.so test.cpp )
#include <stdlib.h>
#include <string.h>
class Dummy
{
double** ptr;
int e;
int i;
};
extern "C" {
void * get_dummy(int N, int M) {
Dummy * d = new Dummy();
d->ptr = new double*[N];
d->e = N;
d->i = M;
for(int i=0; i<N; ++i)
{
d->ptr[i]=new double[M];
for(int j=0; j <M; ++j)
{
d->ptr[i][j] = i*N + j;
}
}
return d;
}
void copy(void * inst, double ** dest) {
Dummy * d = static_cast<Dummy*>(inst);
for(int i=0; i < d->e; ++i)
{
memcpy(dest[i], d->ptr[i], sizeof(double) * d->i);
}
}
void cleanup(void * inst) {
if (inst != NULL) {
Dummy * d = static_cast<Dummy*>(inst);
for(int i=0; i < d->e; ++i)
{
delete[] d->ptr[i];
}
delete[] d->ptr;
delete d;
}
}
}
Python (this segfaults. Put it in the same dir in which the test.so is)
import os
from contextlib import contextmanager
import ctypes as ct
DOUBLE_P = ct.POINTER(ct.c_double)
library_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'test.so')
lib = ct.cdll.LoadLibrary(library_path)
lib.get_dummy.restype = ct.c_void_p
N=15
M=10
#contextmanager
def work_with_dummy(N, M):
dummy = None
try:
dummy = lib.get_dummy(N, M)
yield dummy
finally:
lib.cleanup(dummy)
with work_with_dummy(N,M) as dummy:
internal = (ct.c_double * M)
# Dest is allocated in python, it will live out of the with context and will be deallocated by python
dest = (DOUBLE_P * N)()
for i in range(N):
dest[i] = internal()
lib.copy(dummy, dest)
#dummy is not available anymore here. All the C resources has been cleaned up
for i in dest:
for n in i:
print(n) #it segfaults reading more than the length of the array
What can I change in my python code so that I can treat the array as a list?
(I need only to read from it)

3 ways to pass a int** array from Python to C and back
So that Python knows the size of the array when iterating
The data
This solutions work for either 2d array or array of pointers to arrays with slight modifications, without the use of libraries like numpy.
I will use int as a type instead of double and we will copy source, which is defined as
N = 10;
M = 15;
int ** source = (int **) malloc(sizeof(int*) * N);
for(int i=0; i<N; ++i)
{
source[i] = (int *) malloc(sizeof(int) * M);
for(int j=0; j<M; ++j)
{
source[i][j] = i*N + j;
}
}
1) Assigning the array pointers
Python allocation
dest = ((ctypes.c_int * M) * N) ()
int_P = ctypes.POINTER(ctypes.c_int)
temp = (int_P * N) ()
for i in range(N):
temp[i] = dest[i]
lib.copy(temp)
del temp
# temp gets collected by GC, but the data was stored into the memory allocated by dest
# You can now access dest as if it was a list of lists
for row in dest:
for item in row:
print(item)
C copy function
void copy(int** dest)
{
for(int i=0; i<N; ++i)
{
memcpy(dest[i], source[i], sizeof(int) * M);
}
}
Explanation
We first allocate a 2D array. A 2D array[N][M] is allocated as a 1D array[N*M], with 2d_array[n][m] == 1d_array[n*M + m].
Since our code is expecting a int**, but our 2D array in allocated as a int *, we create a temporary array to provide the expected structure.
We allocate temp[N][M], and than we assign the address of the memory we allocated previously temp[n] = 2d_array[n] = &1d_array[n*M] (the second equal is there to show what is happening with the real memory we allocated).
If you change the copying code so that it copies more than M, let's say M+1, you will see that it will not segfault, but it will override the memory of the next row because they are contiguous (if you change the copying code, remember to add increase by 1 the size of dest allocated in python, otherwise it will segfault when you write after the last item of the last row)
2) Slicing the pointers
Python allocation
int_P = ctypes.POINTER(ctypes.c_int)
inner_array = (ctypes.c_int * M)
dest = (int_P * N) ()
for i in range(N):
dest[i] = inner_array()
lib.copy(dest)
for row in dest:
# Python knows the length of dest, so everything works fine here
for item in row:
# Python doesn't know that row is an array, so it will continue to read memory without ever stopping (actually, a segfault will stop it)
print(item)
dest = [internal[:M] for internal in dest]
for row in dest:
for item in row:
# No more segfaulting, as now python know that internal is M item long
print(item)
C copy function
Same as for solution 1
Explanation
This time we are allocating an actual array of pointers of array, like source was allocated.
Since the outermost array ( dest ) is an array of pointers, python doesn't know the length of the array pointed to (it doesn't even know that is an array, it could be a pointer to a single int as well).
If you iterate over that pointer, python will not bound check and it will start reading all your memory, resulting in a segfault.
So, we slice the pointer taking the first M elements (which actually are all the elements in the array). Now python knows that it should only iterate over the first M elements, and it won't segfault any more.
I believe that python copies the content pointed to a new list using this method ( see sources )
2.1) Slicing the pointers, continued
Eryksun jumped in in the comments and proposed a solution which avoids the copying of all the elements in new lists.
Python allocation
int_P = ctypes.POINTER(ctypes.c_int)
inner_array = (ctypes.c_int * M)
inner_array_P = ctypes.POINTER(inner_array)
dest = (int_P * N) ()
for i in range(N):
dest[i] = inner_array()
lib.copy(dest)
dest_arrays = [inner_array_p.from_buffer(x)[0] for x in dest]
for row in dest_arrays:
for item in row:
print(item)
C copying code
Same as for solution 1
3) Contiguous memory
This method is an option only if you can change the copying code on the C side. source will not need to be changed.
Python allocation
dest = ((ctypes.c_int * M) * N) ()
lib.copy(dest)
for row in dest:
for item in row:
print(item)
C copy function
void copy(int * dest) {
for(int i=0; i < N; ++i)
{
memcpy(&dest[i * M], source[i], sizeof(int) * M);
}
}
Explanation
This time, like in case 1) we are allocating a contiguous 2D array. But since we can change the C code, we don't need to create a different array and copy the pointers since we will be giving the expected type to C.
In the copy function, we pass the address of the first item of every row, and we copy M elements in that row, then we go to the next row.
The copy pattern is exactly as in case 1), but this time instead of writing the interface in python so that the C code receives the data how it expects it, we changed the C code to expect the data in that precise format.
If you keep this C code, you'll be able to use numpy arrays as well, as they are 2D row major arrays.
All of this answer is possible thanks the great (and concise) comments of #eryksun below the original question.

python scipy/weave c. Using python variables in c code

Im trying to run some c code in python using inline from scipy.weave.
Lets say we have 2 double arrays and onbe double value, i wish to add each index of the first index to the corresponiding index of the next index, plus the value.
The C code:
double* first;
double* second;
double val;
int length;
int i;
for (i = 0; i < length; i++) {
second[i] = second[i] + first[i] + val;
}
Then i wish to use the "second" array in my python code again.
Given the following python code:
import numpy
from scipy import weave
first = zeros(10) #first double array
second = ones(10) #second python array
val = 1.0
code = """
the c code
"""
second = inline(code,[first, second, val, 10])
Now i am not shure if this is the correct way of sending in the arrays/getting it out, and how to use/get acces to them within the c code.

What is the difference between this C++ code and this Python code?

Answer
Thanks to #TheDark for spotting the overflow. The new C++ solution is pretty freakin' funny, too. It's extremely redundant:
if(2*i > n && 2*i > i)
replaced the old line of code if(2*i > n).
Background
I'm doing this problem on HackerRank, though the problem may not be entirely related to this question. If you cannot see the webpage, or have to make an account and don't want to, the problem is listed in plain text below.
Question
My C++ code is timing out, but my python code is not. I first suspected this was due to overflow, but I used sizeof to be sure that unsigned long long can reach 2^64 - 1, the upper limit of the problem.
I practically translated my C++ code directly into Python to see if it was my algorithms causing the timeouts, but to my surprise my Python code passed every test case.
C++ code:
#include <iostream>
bool pot(unsigned long long n)
{
if (n % 2 == 0) return pot(n/2);
return (n==1); // returns true if n is power of two
}
unsigned long long gpt(unsigned long long n)
{
unsigned long long i = 1;
while(2*i < n) {
i *= 2;
}
return i; // returns greatest power of two less than n
}
int main()
{
unsigned int t;
std::cin >> t;
std::cout << sizeof(unsigned long long) << std::endl;
for(unsigned int i = 0; i < t; i++)
{
unsigned long long n;
unsigned long long count = 1;
std::cin >> n;
while(n > 1) {
if (pot(n)) n /= 2;
else n -= gpt(n);
count++;
}
if (count % 2 == 0) std::cout << "Louise" << std::endl;
else std::cout << "Richard" << std::endl;
}
}
Python 2.7 code:
def pot(n):
while n % 2 == 0:
n/=2
return n==1
def gpt(n):
i = 1
while 2*i < n:
i *= 2
return i
t = int(raw_input())
for i in range(t):
n = int(raw_input())
count = 1
while n != 1:
if pot(n):
n /= 2
else:
n -= gpt(n)
count += 1
if count % 2 == 0:
print "Louise"
else:
print "Richard"
To me, both versions look identical. I still think I'm somehow being fooled and am actually getting overflow, causing timeouts, in my C++ code.
Problem
Louise and Richard play a game. They have a counter is set to N. Louise gets the first turn and the turns alternate thereafter. In the game, they perform the following operations.
If N is not a power of 2, they reduce the counter by the largest power of 2 less than N.
If N is a power of 2, they reduce the counter by half of N.
The resultant value is the new N which is again used for subsequent operations.
The game ends when the counter reduces to 1, i.e., N == 1, and the last person to make a valid move wins.
Given N, your task is to find the winner of the game.
Input Format
The first line contains an integer T, the number of testcases.
T lines follow. Each line contains N, the initial number set in the counter.
Constraints
1 ≤ T ≤ 10
1 ≤ N ≤ 2^64 - 1
Output Format
For each test case, print the winner's name in a new line. So if Louise wins the game, print "Louise". Otherwise, print "Richard". (Quotes are for clarity)
Sample Input
1
6
Sample Output
Richard
Explanation
As 6 is not a power of 2, Louise reduces the largest power of 2 less than 6 i.e., 4, and hence the counter reduces to 2.
As 2 is a power of 2, Richard reduces the counter by half of 2 i.e., 1. Hence the counter reduces to 1.
As we reach the terminating condition with N == 1, Richard wins the game.

When n is greater than 2^63, your gpt function will eventually have i as 2^63 and then multiply 2^63 by 2, giving an overflow and a value of 0. This will then end up with an infinite loop, multiplying 0 by 2 each time.

Try this bit-twiddling hack, which is probably slightly faster:
unsigned long largest_power_of_two_not_greater_than(unsigned long x) {
for (unsigned long y; (y = x & (x - 1)); x = y) {}
return x;
}
x&(x-1) is x without its least significant one-bit. So y will be zero (terminating the loop) exactly when x has been reduced to a power of two, which will be the largest power of two not greater than the original x. The loop is executed once for every 1-bit in x, which is on average half as many iterations as your approach. Also, this one has not issues with overflow. (It does return 0 if the original x was 0. That may or may not be what you want.)
Note the if the original x was a power of two, that value is simply returned immediately. So the function doubles as a test whether x is a power of two (or 0).
While that is fun and all, in real-life code you'd probably be better off finding your compiler's equivalent to this gcc built-in (unless your compiler is gcc, in which case here it is):
Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in X, starting at the most
significant bit position. If X is 0, the result is undefined.
(Also available as __builtin_clzl for unsigned long arguments and __builtin_clzll for unsigned long long.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

C++ and Python version of the same algorithm giving different result - python

It seams some troubles could be here: (sqrt(d)==floor(sqrt(d)))

Related

Optimise a Python/ C++ algorithm

How python manipulates Power in math

Ctypes: allocate double** , pass it to C, then use it in Python

python scipy/weave c. Using python variables in c code

What is the difference between this C++ code and this Python code?

Categories

Resources