Discrepancy between floats in C and Python [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am making an implementation of a MobileNetV2 in C and comparing it, layer by layer, to a Keras model of the same network to make sure I'm doing things right. I managed to get to a very close approximation of the inference result of the network, but there is an error in the 5th decimal place or so. Looking for the reason for the imprecision I came across something strange.
I am working exclusively with float objects in C, and all of the arrays in Python, including all of the weight arrays and other parameters, are float32.
When exporting my processed image from Python to a .csv file, I put seven decimal points to the export function: np.savetxt(outfile, twoD_data_slice, fmt='%-1.7e') which would still result in a float but with certain limitations. Namely, that last decimal place does not have full precision. However, one of the numbers I got was "0.98431373". When trying to convert this to C it instead gave me "0.98431377".
I asked a question here about this result and I was told of my mistake to use seven decimal places, but this still doesn't explain why Python can handle a number like "0.98431373" as a float32 when in C that gets changed to "0.98431377".
My guess is that Python is using a different 32-bit float than the one I'm using in C, as evidenced by how their float32 can handle a number like "0.98431373" and the float in C cannot. And I think this is what is causing the imprecision of my implementation when compared to the final result in Python. Because if Python can handle numbers like these, then the precision it has while doing calculations for the neural network is higher than in C, or at least different, so the answer should be different as well.
Is the floating point standard different in Python compared to C? And if so, is there a way I can tell Python to use the same format as the one in C?
Update
I changed the way I import files using atof, like so:
void import_image(float data0[224][224][3]) {
// open file
FILE *fptr;
fptr = fopen("image.csv", "r");
if (fptr == NULL) {
perror("fopen()");
exit(EXIT_FAILURE);
}
char c = fgetc(fptr); // generic char
char s[15]; // maximum number of characters is "-x.xxxxxxxe-xx" = 14
for (int y = 0; y < 224; ++y) { // lines
for (int x = 0; x < 224; ++x) { // columns
for (int d = 0; d < 3; ++d) { // depth
// write string
int i;
for (i = 0; c != '\n' && c != ' '; ++i) {
assert( 0 <= i && i <= 14 );
s[i] = c;
c = fgetc(fptr);
}
s[i] = '\0';
float f = atof(s); // convert to float
data0[y][x][d] = f; // save on array
c = fgetc(fptr);
}
}
}
fclose(fptr);
}
I also exported the images from python using seven decimal places and the result seems more accurate. If the float standard from both is the same, even the half-precision digit should be the same. And indeed, there doesn't seem to be any error in the image I import when compared to the one I exported from Python.
There is still, however, an error in the last digits of the final answer in the system. Python displays this answer with eight significant places. I mimic this with %.8g.
My answer:
'tiger cat', 0.42557633
'tabby, tabby cat', 0.35453162
'Egyptian cat', 0.070309319
'lynx, catamount', 0.0073038512
'remote control, remote', 0.0032443549
Python's Answer:
('n02123159', 'tiger_cat', 0.42557606)
('n02123045', 'tabby', 0.35453174)
('n02124075', 'Egyptian_cat', 0.070309244)
('n02127052', 'lynx', 0.007303906)
('n04074963', 'remote_control', 0.0032443653)
The error seems to start appearing after the first convolutional layer, which is where I start making mathematical operations with these values. There could be an error in my implementation, but assuming there isn't, could this be caused by a difference in the way Python operates with floats compared to C? Is this imprecision expected, or is it likely an error in the code?

A 32-bit floating point number can encode about 232 different values.
0.98431373 is not one of them.
Finite floating point values are of the form: some_integer * power-of-two.
The closest choice to 0.98431373 is 0.98431372_64251708984375 which is 16514044 * 2-24.
Printing 0.98431372_64251708984375 to 8 fractional decimal places is 0.98431373. That may appear to be the 32-bit float value, but its exact value differs a small amount.
in C that gets changed to "0.98431377"
0.98431377 is not an expected output of a 32-bit float as the next larger float is 0.98431378_6029815673828125. Certainly OP's conversion code to C results in a 64-bit double with some unposted TBD conversion artifacts.
"the way I import the data to C is I take the mantissa, convert it to float, then take then exponent, convert it to long, and then multiply the mantissa by 10^exponent" is too vague. Best to post code than only a description of code.
Is the floating point standard different in Python compared to C?
They could differ, but likely are the same.
And if so, is there a way I can tell Python to use the same format as the one in C?
Not really. More likely the other way around. I am certain C allows more variations on FP than python.

Related

Problem in handling large number in Python

I was solving a problem on codeforces:- Here is the Question
I wrote python code to solve the same:-
n=int(input())
print(0 if ((n*(n+1))/2)%2==0 else 1)
But it failed for the test-case: 1999999997 See Submission-[TestCase-6]
Why it failed despite Python can handle large numbers effectively ? [See this Thread]
Also the similar logic worked flawlessly when I coded it in CPP [See Submission Here]:-
#include<bits/stdc++.h>
using namespace std;
int main(){
int n;
cin>>n;
long long int sum=1ll*(n*(n+1))/2;
if(sum%2==0) cout<<0;
else cout<<1;
return 0;
}
Ran a test based on the insight from #juanpa.arrivillaga and this has been a great rabbit hole:
number = 1999999997
temp = n * (n+1)
# type(temp) is int, n is 3999999990000000006. We can clearly see that after dividing by 2 we should get an odd number, and therefore output 1
divided = temp / 2
# type(divided) is float. Printing divided for me gives 1.999999995e+18
# divided % 2 is 0
divided_int = temp // 2
# type(divided_int) is int. Printing divided for me gives 1999999995000000003
// Forces integer division, and will always return an integer: 7 // 2 will be equal to 3, not 3.5
As per the other answer you have linked, the int type in python can handle very large numbers.
Float can also handle large numbers, but there are issues with our ability to represent floats across languages. The crux of it is that not all floats can be captured accurately: In many scenarios the difference between 1.999999995e+18 and 1.999999995000000003e+18 is so minute it won't matter, but this is a scenario where it does, as you care a lot about the final digit of the number.
You can learn more about this by watching this video
As mentioned by #juanpa.arrivillaga and #DarrylG in comments, I should have used floor operator// for integer division, the anomaly was cause due to float division by / division operator.
So, the correct code should be:-
n=int(input())
print(0 if (n*(n+1)//2)%2==0 else 1)

A matrix (vector) in armadillo gains new decimal places after loading it from a file

I want to transfer the sourcecode from python to c++ (#include < armadillo >).
I have a vector (matrix) saved to "my_vec.txt" with the dimension 1x200 :
-0.082833
0.151422
-0.088526
...
...
0.115863
0.131043
0.041844
I want to calculate the dot-product of two my_vec's in python (this is just an example for testing).
result = my_vec.dot(my_vec)
print (str.format("{0:.10f}", result))
gives me 6.1402435303 as a result
When I try to do the same operation in c++ with armadillo:
float result;
result = dot(my_vec, my_vec);
std::cout << std::setprecision(10) << result;
I get 6.140244007.
So I looked at my float-vector my_vec in armadillo after loading the values from the text file. This is how it looks like:
-8.283299952745e-002
1.514219939709e-001
-8.852600306273e-002
...
...
1.158630028367e-001
1.310430020094e-001
4.184399917722e-002
So many decimal places were added (that are not existing in my_vec.txt). Of course, this difference has influence on the further computation. How can I prevent that?
It looks like you used less precision in your C++ code. C++ float usually corresponds to NumPy float32; if you want precision equivalent to a NumPy float64, that's generally C++ double.

Understanding Two's complement to float(Texas Instruments Sensor Tag)

I found some sample code to extract temperature from the Texas Instruments Sensor Tag on github:
https://github.com/msaunby/ble-sensor-pi/blob/master/sensortag/sensor_calcs.py
I don't understand what the following code does:
tosigned = lambda n: float(n-0x10000) if n>0x7fff else float(n)
How i read the above piece of code:
if n>0x7fff: n = float(n-0x10000)
else n = float(n)
Basically what is happening is that the two's complement value(n) is converted to float. Why should this only happen when the value of n is greater than 0x7fff? If the value is 0x7fff or smaller, then we just convert i to float. Why? I don't understand this.
The sample code from Texas Instruments can be found here:
http://processors.wiki.ti.com/index.php/SensorTag_User_Guide#SensorTag_Android_Development
Why is the return value devided by 128.0 in this function in the TI sample code?
private double extractAmbientTemperature(BluetoothGattCharacteristic c) {
int offset = 2;
return shortUnsignedAtOffset(c, offset) / 128.0;
}
I did ask this to the developer, but didn't get a reply.
On disk and in memory integers are stored to a certain bit-width. Modern Python's ints allows us to ignore most of that detail because they can magically expand to whatever size is necessary, but sometimes when we get values from disk or other systems we have to think about how they are actually stored.
The positive values of a 16-bit signed integer will be stored in the range 0x0001-0x7fff, and its negative values from 0x8000-0xffff. If this value was read in some way that didn't already check the sign bit (perhaps as an unsigned integer, or part of a longer integer, or assembled from two bytes) then we need to recover the sign.
How? Well, if the value is over 0x7fff we know that it should be negative, and negative values are stored as two's complement. So we simply subtract 0x10000 from it and we get the negative value.
So you're converting between signed hex and floats. In python, signed floats are displayed as having a negative sign, so you can ignore the way it's actually represented in memory. But in hex, the negative part of the number is represented as part of the value. So, to convert correctly, the shift is put in.
You can play with this yourself using the Python interpreter:
tosigned = lambda n: float(n-0x10000) if n>0x7fff else float(n)
print(tosigned(0x3fff))
versus:
unsigned = lambda n: float(n)
Check this out to learn more:
http://www.swarthmore.edu/NatSci/echeeve1/Ref/BinaryMath/NumSys.html

Converting Algorithm from Python to C: Suggestions for Using bin() in C?

So essentially, I have a homework problem to write in c, and instead of taking the easy route, I thought that I would implement a little algorithm and some coding practice to impress my Professor. The question is to help us to pick up C (or review it, the former is for me), and the question tells us to return all of the integers that divide a given integer (such that there is no remainder).
What I did in python was to create a is_prime() method, a pool_of_primes() method, and a combinations() method. So far, I have everything done in C, up to the combinations() method. The problem that I am running into now is some syntax errors (i.e. not being able to alter a string by declaration) and mainly the binary that I was using for the purpose of what would be included in my list of combinations. But without being able to alter my string by declaration, the Python code is kind of broken...
Here is the python code:
def combinations(aList):
'''
The idea is to provide a list of ints and combinations will provide
all of the combinations of that list using binary.
To track the combinations, we use a string representation of binary
and count down from there. Each spot in the binary represents an
on/off (included/excluded) indicator for the numbers.
'''
length = len(aList) #Have this figured out
s = ""
canidates = 0
nList = []
if (length >=21):
print("\nTo many possible canidates for integers that divide our number.\n")
return False
for i in range(0,length):
s += "1"
canidates += pow(2,i)
#We now have a string for on/off switch of the elements in our
#new list. Canidates is the size of the new list.
nList.append(1)
while (canidates != 0):
x = 1
for i in range(0,length):
if(int(s[i]) == 1):
x = x*aList[i]
nList.append(x)
canidates -= 1
s = ''
temp = bin(canidates)
for i in range(2,len(temp)):
s = s+temp[i]
if (len(s) != length):
#This part is needed in cases of [1...000-1 = 0...111]
while( len(s) != length):
s = '0'+s
return nList
Sorry if the entire code is to lengthy or not optimized to a specific suiting. But it works, and it works well :)
Again, I currently have everything that aList would have, stored as a singly-linked list in c (which I am able to print/use). I also have a little macro I included in C to convert binary to an integer:
#define B(x) S_to_binary_(#x)
static inline unsigned long long S_to_binary_(const char *s)
{
unsigned long long i = 0;
while (*s) {
i <<= 1;
i += *s++ - '0';
}
return i;
}
This may be Coder's Block setting in, but I am not seeing how I can change the binary in the same way that I did in Python... Any help would be greatly appreciated! Also, as a note, what is typically the best way to return a finalized code in C?
EDIT:
Accidentally took credit for the macro above.
UPDATE
I just finished the code, and I uploaded it onto Github. I would like to thank #nneonneo for providing the step that I needed to finish it with exemplary code.If anyone has any further suggestions about the code, I would be happy to see there ideas on [Github]!
Why use a string at all? Keep it simple: use an integer, and use bitwise math to work with the number. Then you don't have to do any conversions back and forth. It will also be loads faster.
You can use a uint32_t to store the "bits", which is enough to hold 32 bits (since you max out at 21, this should work great).
For example, you can loop over the bits that are set by using a loop like this:
uint32_t my_number = ...;
for(int i=0; i<32; i++) {
if(my_number & (1<<i)) {
/* bit i is set */
}
}

How does python represent such large integers?

In C, C++, and Java, an integer has a certain range. One thing I realized in Python is that I can calculate really large integers such as pow(2, 100). The same equivalent code, in C, pow(2, 100) would clearly cause an overflow since in 32-bit architecture, the unsigned integer type ranges from 0 to 2^32-1. How is it possible for Python to calculate these large numbers?
Basically, big numbers in Python are stored in arrays of 'digits'. That's quoted, right, because each 'digit' could actually be quite a big number on its own. )
You can check the details of implementation in longintrepr.h and longobject.c:
There are two different sets of parameters: one set for 30-bit digits,
stored in an unsigned 32-bit integer type, and one set for 15-bit
digits with each digit stored in an unsigned short. The value of
PYLONG_BITS_IN_DIGIT, defined either at configure time or in pyport.h,
is used to decide which digit size to use.
/* Long integer representation.
The absolute value of a number is equal to
SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
Negative numbers are represented with ob_size < 0;
zero is represented by ob_size == 0.
In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
digit) is never zero. Also, in all cases, for all valid i,
0 <= ob_digit[i] <= MASK.
The allocation function takes care of allocating extra memory
so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.
*/
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
How is it possible for Python to calculate these large numbers?
How is it possible for you to calculate these large numbers if you only have the 10 digits 0-9? Well, you use more than one digit!
Bignum arithmetic works the same way, except the individual "digits" are not 0-9 but 0-4294967296 or 0-18446744073709551616.

Categories