I have this program in Python:
# ...
print 2 ** (int(input())-1) % 1000000007
The problem is that this program works a long time on big numbers. I rewrote my code using C++, but sometimes I have a wrong answer. For example, in Python code for number 12345678 I've got 749037894 and its correct, but in C++ I've got -291172004.
This is the C++ code:
#include <iostream>
#include <cmath>
using namespace std;
const int MOD = 1e9 + 7;
int main() {
// ...
long long x;
cin >> x;
long long a =pow(2, (x-1));
cout << a % MOD;
}
As already mentioned, your problem is that for large exponent you have integer overflow.
To overcome this, remember that modular multiplication has such property that:
(A * B) mod C = (A mod C * B mod C) mod C
And then you can implement 'e to the power p modulo m' function using fast exponentiation scheme.
Assuming no negative powers:
long long powmod(long long e, long long p, long long m){
if (p == 0){
return 1;
}
long long a = 1;
while (p > 1){
if (p % 2 == 0){
e = (e * e) % m;
p /= 2;
} else{
a = (a * e) % m;
e = (e * e) % m;
p = (p - 1) / 2;
}
}
return (a * e) % m;
}
Note that remainder is taken after every multiplication, so no overflow can occur, if single multiplication doesn't overflow (and that's true for 1000000007 as m and long long).
You seem to be dealing with positive numbers and those are overflowing the number of bits you've allocated for their storage. Also keep in mind that there is a difference between Python and C/C++ in the way a modulo on a negative value is computed. To get a similar computation, you will need to add the Modulo to the value so it's positive before you take the modulo which is the way it works in Python:
cout << (a+MOD) % MOD;
You may have to add MOD n times till the temporary value is positive before taking its modulo.
Like has been mentioned by many of the other answers, your problem lies in integer overflow.
You can do like suggested by deniss and implement your own modmul() and modpow() functions.
If, however, this is part of a project that will need to do plenty of calculations with very large numbers, I would suggest using a "big number library" like GNU GMP or mbedTLS Bignum library.
In C++ the various fundamental types have fixed sizes. For example a long long is typically 64 bits wide. But width varies with system type and other factors. As suggested above you can check climits.h for your particular environment's limits.
Raising 2 to the power 12345677 would involve shifting the binary number 10 left by 12345676 bits which wouldn't fit in a 64 bit long long (and I suspect is unlikely to fit in most long long implementations).
Another factor to consider is that pow returns a double (or long double) depending on the overload used. You don't say what compiler you are using but most likely you got a warning about possible truncation or data loss when the result of calling pow is assigned to the long long variable a.
Finally, even if pow is returning a long double I suspect the exponent 12345677 is too large to be stored in a long double so pow is probably returning positive infinity which then gets truncated to some bit pattern that will fit in a long long. You could certainly check that by introducing an intermediate long double variable to receive the value of pow which you could then examine in a debugger.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am making an implementation of a MobileNetV2 in C and comparing it, layer by layer, to a Keras model of the same network to make sure I'm doing things right. I managed to get to a very close approximation of the inference result of the network, but there is an error in the 5th decimal place or so. Looking for the reason for the imprecision I came across something strange.
I am working exclusively with float objects in C, and all of the arrays in Python, including all of the weight arrays and other parameters, are float32.
When exporting my processed image from Python to a .csv file, I put seven decimal points to the export function: np.savetxt(outfile, twoD_data_slice, fmt='%-1.7e') which would still result in a float but with certain limitations. Namely, that last decimal place does not have full precision. However, one of the numbers I got was "0.98431373". When trying to convert this to C it instead gave me "0.98431377".
I asked a question here about this result and I was told of my mistake to use seven decimal places, but this still doesn't explain why Python can handle a number like "0.98431373" as a float32 when in C that gets changed to "0.98431377".
My guess is that Python is using a different 32-bit float than the one I'm using in C, as evidenced by how their float32 can handle a number like "0.98431373" and the float in C cannot. And I think this is what is causing the imprecision of my implementation when compared to the final result in Python. Because if Python can handle numbers like these, then the precision it has while doing calculations for the neural network is higher than in C, or at least different, so the answer should be different as well.
Is the floating point standard different in Python compared to C? And if so, is there a way I can tell Python to use the same format as the one in C?
Update
I changed the way I import files using atof, like so:
void import_image(float data0[224][224][3]) {
// open file
FILE *fptr;
fptr = fopen("image.csv", "r");
if (fptr == NULL) {
perror("fopen()");
exit(EXIT_FAILURE);
}
char c = fgetc(fptr); // generic char
char s[15]; // maximum number of characters is "-x.xxxxxxxe-xx" = 14
for (int y = 0; y < 224; ++y) { // lines
for (int x = 0; x < 224; ++x) { // columns
for (int d = 0; d < 3; ++d) { // depth
// write string
int i;
for (i = 0; c != '\n' && c != ' '; ++i) {
assert( 0 <= i && i <= 14 );
s[i] = c;
c = fgetc(fptr);
}
s[i] = '\0';
float f = atof(s); // convert to float
data0[y][x][d] = f; // save on array
c = fgetc(fptr);
}
}
}
fclose(fptr);
}
I also exported the images from python using seven decimal places and the result seems more accurate. If the float standard from both is the same, even the half-precision digit should be the same. And indeed, there doesn't seem to be any error in the image I import when compared to the one I exported from Python.
There is still, however, an error in the last digits of the final answer in the system. Python displays this answer with eight significant places. I mimic this with %.8g.
My answer:
'tiger cat', 0.42557633
'tabby, tabby cat', 0.35453162
'Egyptian cat', 0.070309319
'lynx, catamount', 0.0073038512
'remote control, remote', 0.0032443549
Python's Answer:
('n02123159', 'tiger_cat', 0.42557606)
('n02123045', 'tabby', 0.35453174)
('n02124075', 'Egyptian_cat', 0.070309244)
('n02127052', 'lynx', 0.007303906)
('n04074963', 'remote_control', 0.0032443653)
The error seems to start appearing after the first convolutional layer, which is where I start making mathematical operations with these values. There could be an error in my implementation, but assuming there isn't, could this be caused by a difference in the way Python operates with floats compared to C? Is this imprecision expected, or is it likely an error in the code?
A 32-bit floating point number can encode about 232 different values.
0.98431373 is not one of them.
Finite floating point values are of the form: some_integer * power-of-two.
The closest choice to 0.98431373 is 0.98431372_64251708984375 which is 16514044 * 2-24.
Printing 0.98431372_64251708984375 to 8 fractional decimal places is 0.98431373. That may appear to be the 32-bit float value, but its exact value differs a small amount.
in C that gets changed to "0.98431377"
0.98431377 is not an expected output of a 32-bit float as the next larger float is 0.98431378_6029815673828125. Certainly OP's conversion code to C results in a 64-bit double with some unposted TBD conversion artifacts.
"the way I import the data to C is I take the mantissa, convert it to float, then take then exponent, convert it to long, and then multiply the mantissa by 10^exponent" is too vague. Best to post code than only a description of code.
Is the floating point standard different in Python compared to C?
They could differ, but likely are the same.
And if so, is there a way I can tell Python to use the same format as the one in C?
Not really. More likely the other way around. I am certain C allows more variations on FP than python.
I was solving a problem on codeforces:- Here is the Question
I wrote python code to solve the same:-
n=int(input())
print(0 if ((n*(n+1))/2)%2==0 else 1)
But it failed for the test-case: 1999999997 See Submission-[TestCase-6]
Why it failed despite Python can handle large numbers effectively ? [See this Thread]
Also the similar logic worked flawlessly when I coded it in CPP [See Submission Here]:-
#include<bits/stdc++.h>
using namespace std;
int main(){
int n;
cin>>n;
long long int sum=1ll*(n*(n+1))/2;
if(sum%2==0) cout<<0;
else cout<<1;
return 0;
}
Ran a test based on the insight from #juanpa.arrivillaga and this has been a great rabbit hole:
number = 1999999997
temp = n * (n+1)
# type(temp) is int, n is 3999999990000000006. We can clearly see that after dividing by 2 we should get an odd number, and therefore output 1
divided = temp / 2
# type(divided) is float. Printing divided for me gives 1.999999995e+18
# divided % 2 is 0
divided_int = temp // 2
# type(divided_int) is int. Printing divided for me gives 1999999995000000003
// Forces integer division, and will always return an integer: 7 // 2 will be equal to 3, not 3.5
As per the other answer you have linked, the int type in python can handle very large numbers.
Float can also handle large numbers, but there are issues with our ability to represent floats across languages. The crux of it is that not all floats can be captured accurately: In many scenarios the difference between 1.999999995e+18 and 1.999999995000000003e+18 is so minute it won't matter, but this is a scenario where it does, as you care a lot about the final digit of the number.
You can learn more about this by watching this video
As mentioned by #juanpa.arrivillaga and #DarrylG in comments, I should have used floor operator// for integer division, the anomaly was cause due to float division by / division operator.
So, the correct code should be:-
n=int(input())
print(0 if (n*(n+1)//2)%2==0 else 1)
As a Python intermediate learner, I made an 8 ball in Python.
Now that I am starting on learning C, is there a way to simulate to the way random.choice can select a string from a list of strings , but in C ?
The closest thing to a "list of strings" in C is an array of string pointers; and the only standard library function that produces random numbers is rand(), defined in <stdlib.h>.
A simple example:
#include <stdio.h>
#include <stdlib.h>
#include <time.h> // needed for the usual srand() initialization
int main(void)
{
const char *string_table[] = { // array of pointers to constant strings
"alpha",
"beta",
"gamma",
"delta",
"epsilon"
};
int table_size = 5; // This must match the number of entries above
srand(time(NULL)); // randomize the start value
for (int i = 1; i <= 10; ++i)
{
const char *rand_string = string_table[rand() % table_size];
printf("%2d. %s\n", i, rand_string);
}
return 0;
}
That will generate and print ten random choices from an array of five strings.
The string_table variable is an array of const char * pointers. You should always use a constant pointer to refer to a literal character string like "alpha". It keeps you from using that pointer in a context where the string contents might be changed.
The random numbers are what are called "pseudorandom"; statistically uncorrelated, but completely determined by a starting "seed" value. Using the statement srand(time(NULL)) takes the current time/date value (seconds since some starting date) and uses that as a seed that won't be repeated in any computer's lifetime. But you will get exactly the same "random" numbers if you manage to run the program twice in the same second. This is easy to do in a shell script, for example. A higher-resolution timestamp would be nice, but there isn't anything useful in the C standard library.
The rand() function returns a non-negative int value from 0 to some implementation-dependent maximum value. The symbolic constant RAND_MAX has that value. The expression rand() % N will return the remainder from dividing that value by N, which is a number from 0 to N-1.
Aconcagua has pointed out that this isn't ideal. If N doesn't evenly divide RAND_MAX, then there will be a bias toward smaller numbers. It's okay for now, though, but plan to learn some other methods later if you do serious simulation or statistical work; and if you do get to that point, you probably won't use the built-in rand() function anyway.
You can write a function if you know the size of your array and use rand() % size to get a random index from your array. Then return the value of arr[randidx]
I am currently using the book 'Programming in D' for learning D. I tried to solve a problem of summing up the squares of numbers from 1 to 10000000. I first made a functional approach to solve the problem with the map and reduce but as the numbers get bigger I have to cast the numbers to bigint to get the correct output.
long num = 10000001;
BigInt result;
result = iota(1,num).map!(a => to!BigInt(a * a)).reduce!((a,b) => (a + b));
writeln("The sum is : ", result);
The above takes 7s to finish when compiled with dmd -O . I profiled the program and most of the time is wasted on BigInt calls. Though the square of the number can fit into a long I have to typecast them to bigint so that reduce function sums and returns the appropriate sum. The python program takes only 3 seconds to finish it. When num = 100000000 D program gets to 1 minute and 13 seconds to finish. Is there a way to optimize the calls to bigint. The products can themselves be long but they have to be typecasted as bigint objects so that they give right results from reduce operations. I tried pushing the square of the numbers into a bigint array but its also slower. I tried to typecast all the numbers as Bigint
auto bigs_map_nums = iota(1,num).map!(a => to!BigInt(a)).array;
auto bigs_map = sum(bigs_map_nums.map!(a => (a * a)).array);
But its also slower. I read the answers at How to optimize this short factorial function in scala? (Creating 50000 BigInts). Is it a problem with the implementation of multiplication for bigger integers in D too ? Is there a way to optimize the function calls to BigInt ?
python code :
timeit.timeit('print sum(map(lambda num : num * num, range(1,10000000)))',number=1)
333333283333335000000
3.58552622795105
The code was executed on a dual-core 64 bit linux laptop with 2 GB RAM.
python : 2.7.4
dmd : DMD64 D Compiler v2.066.1
Without range coolness: foreach(x; 0 .. num) result += x * x;
With range cool(?)ness:
import std.functional: reverseArgs;
result = iota(1, num)
.map!(a => a * a)
.reverseArgs!(reduce!((a, b) => a + b))(BigInt(0) /* seed */);
The key is to avoid BigInting every element, of course.
The range version is a little slower than the non-range one. Both are significantly faster than the python version.
Edit: Oh! Oh! It can be made much more pleasant with std.algorithm.sum:
result = iota(1, num)
.map!(a => a * a)
.sum(BigInt(0));
The python code is not equivalent to the D code, in fact it does a lot less.
Python uses an int, then it promotes that int to long() when the result is bigger than what can be stored in an int() type. Internally, (at least CPython) uses a long number to store integer bigger than 256, which is at least 32bits. Up until that overflow normal cpu instructions can be used for the multiplication which are quite faster than bigint multiplication.
D's BigInt implementation treats the numbers as BigInt from the start and uses the expensive multiplication operation from 1 until the end. Much more work to be done there.
It's interesting how complicated the multiplication can be when we talk about BigInts.
The D implementation is
https://github.com/D-Programming-Language/phobos/blob/v2.066.1/std/internal/math/biguintcore.d#L1246
Python starts by doing
static PyObject *
int_mul(PyObject *v, PyObject *w)
{
long a, b;
long longprod; /* a*b in native long arithmetic */
double doubled_longprod; /* (double)longprod */
double doubleprod; /* (double)a * (double)b */
CONVERT_TO_LONG(v, a);
CONVERT_TO_LONG(w, b);
/* casts in the next line avoid undefined behaviour on overflow */
longprod = (long)((unsigned long)a * b);
... //check if we have overflowed
{
const double diff = doubled_longprod - doubleprod;
const double absdiff = diff >= 0.0 ? diff : -diff;
const double absprod = doubleprod >= 0.0 ? doubleprod :
-doubleprod;
/* absdiff/absprod <= 1/32 iff
32 * absdiff <= absprod -- 5 good bits is "close enough" */
if (32.0 * absdiff <= absprod)
return PyInt_FromLong(longprod);
else
return PyLong_Type.tp_as_number->nb_multiply(v, w);
}
}
and if the number is bigger than what a long can hold it does a karatsuba multiplication. Implementation in :
http://svn.python.org/projects/python/trunk/Objects/longobject.c (k_mul function)
The equivalent code would wait to use BigInts until they are no native data types that can hold the number in question.
DMD's backend does not emit highly optimized code. For fast programs, compile with GDC or LDC.
On my computer, I get these timings:
Python: 3.01
dmd -O -inline -release: 3.92
ldmd2 -O -inline -release: 2.14
I have a checksum function in Python:
def checksum(data):
a = b = 0
l = len(data)
for i in range(l):
a += ord(data[i])
b += (l - i)*ord(data[i])
return (b << 16) | a, a, b
that I am trying to port to a C module for speed. Here's the C function:
static PyObject *
checksum(PyObject *self, PyObject *args)
{
int i, length;
unsigned long long a = 0, b = 0;
unsigned long long checksum = 0;
char *data;
if (!PyArg_ParseTuple(args, "s#", &data, &length)) {
return NULL;
}
for (i = 0; i < length; i++) {
a += (int)data[i];
b += (length - i) * (int)data[i];
}
checksum = (b << 16) | a;
return Py_BuildValue("(Kii)", checksum, (int)a, (int)b);
}
I use it by opening a file and feeding it a 4096 block of data. They both return the same values for small strings, but when I feed it binary data straight from a file, the C version returns wildly different values. Any help would be appreciated.
I would guess that you have some kind of overflow in your local variables. Probably b gets to large. Just dump the values for debugging purposes and you should see if it's the problem. As you mention, that you are porting the method for performance reasons. Have you checked psyco? Might be fast enough and much easier. There are more other tools which compile parts of python code on the fly to C, but I don't have the names in my head.
I'd suggest that the original checksum function is "incorrect". The value returned for checksum is of unlimited size (for any given size in MB, you could construct an input for which the checksum will be at least of this size). If my calculations are correct, the value can fit in 64 bits for inputs of less than 260 MB, and b can fit in an integer for anything less than 4096 bytes. Now, I might be off with the number, but it means that for larger inputs the two functions are guaranteed to work differently.
To translate the first function to C, you'd need to keep b and c in Python integers, and to perform the last calculation as a Python expression. This can be improved, though:
You could use C long long variables to store an intermediate sum and add it to the Python integers after a certain number of iterations. If the number of iterations is n, the maximum value for a is n * 255, and for b is len(data) * n * 255. Try to keep those under 2**63-1 when storing them in C long long variables.
You can use long long instead of unsigned long long, and raise a RuntimeError every time it gets negative in debug mode.
Another solution would be to limit the Python equivalent to 64 bits by using a & 0xffffffffffffffff and b & 0xffffffffffffffff.
The best solution would be to use another kind of checksum, like binascii.crc32.