float32 representation to float64 NumPy Python

float32 representation to float64 NumPy Python - python

Situation
I need to read data from a file using fits from astropy.io, which uses in numpy.
Some of the values I get when reading are very small negative float32 numbers, when there actually shouldn't exist negative values on the data (because of the data characteristics).
Questions
Can it be that those numbers were very small float64, that when read and casted to float32 became negative? If yes, how small do they have to be?
Is there a way to rewind the process, i.e., to get the original positive very small float64 value?

Can it be that those numbers were very small float64, that when read and casted to float32 became negative? If yes, how small do they have to be?
No - if the original float64 value was smaller than the smallest representable float32 number then it would simply be equal to zero after casting:
tiny = np.finfo(np.float64).tiny # smallest representable float64 value
print(tiny)
# 2.22507385851e-308
print(tiny == 0)
# False
print(np.float32(tiny))
# 0.0
print(np.float32(tiny) == 0)
# True
Casting from one signed representation to another always preserves the sign bit.
Is there a way to rewind the process, i.e., to get the original positive very small float64 value?
No - casting down from 64 to 32 bit means you are effectively throwing away half of the information in the original representation, and once it's gone there's no magic way to recover it.
A much more plausible explanation for the negative values is that they result from rounding errors on calculations performed on the data before it was stored.

Related

Converting Bytes to Fixed point

I'm reading some bytes from a binary file and I'm trying to convert them to decimals. They are big endian, so I try to unpack them as unpack('>f',bytes), but I'm getting the wrong result.
According to the specification I should be looking for
Fixed point numbers with 8 bits before the binary point and 24 bits
after the binary point. Three guard bits are reserved in the points to
eliminate most concerns over arithmetic overflow. Hence, the range for each component is 0xF0000000 to 0x0FFFFFFF representing a range of -16 to 16.
As an example I'm using 0x00d4f9c1, which should give me 0,831936, but I'm getting 1,95587[...]e-38.

The f designator is for floating point, which is entirely different from fixed point. You just need to convert that to an integer and divide by 2**24.
>>> x = 0x00d4f9c1
>>> x/(1<<24)
0.8319359421730042
>>>

Why does converting from np.float16 to np.float32 modify the value?

When converting a number from half to single floating representation I see a change in the numeric value.
Here I have 65500 stored as a half precision float, but upgrading to single precision changes the underlying value to 65504, which is many floating point increments away from the target.
In this specific case, why does this happen?
(Pdb) np.asarray(65500,dtype=np.float16).astype(np.float32)
array(65504., dtype=float32)
As a side note, I also observe
(Pdb) int(np.finfo(np.float16).max)
65504

The error is not "many floating point increments away" [corrected to match OP's improved wording]. Read the standard IEEE 754-2008. It specifies 10 bits for the mantissa, or 1024 distinct values. Your value is on the close order of 2^16, so you have an increment of 2^6, or 64.
The format also gives 1 bit for the sign and 5 for the characteristic (exponent).
65500 is stored as something equivalent to + 2^6 * 1023.5. This translates directly to 65504 when you convert to float32. You lost the precision when you converted your larger number to 10 bits of precision. When you convert in either direction, the result is always constrained by the less-precise type.

Python why is 10e26 != 10**26 ? (Floating point inaccuracy?)

I was trying to process some rather large numbers in python and came across an overflow error. I decided to investigate a little bit more and came across an inequality I cannot explain.
When I evaluate 10^26 I get:
>>> 10**26
100000000000000000000000000
Which is perfectly logical. However when I evaluate 10e26 and convert it to an int I get:
>>>int(10e26)
1000000000000000013287555072
Why is this?
Do I not understand the e notation properly? (From what I know 10e26 is 10*10^26 as seen in this answer: 10e notation used with variables?)
10^26 is way past the max integer size so I was also wondering if there was any mechanism in python which could allow to work with numbers in scientific format (not considering all those zeros) in order to be able to compute operations with numbers past the max size.

The short answer is that 10e26 and 10**26 do not represent identical values.
10**26, with both operands being int values, evaluates to an int. As int represents integers with arbitrary precision, its value is exactly 1026 as intended.
10e26, on the other hand, is a float literal, and as such the resulting value is subject to the limited precision of the float type on your machine. The result of int(10e26) is the integer value of the float closest to the real number 1027.

10e26 represents ten times ten to the power of 26, which is 1027.
10**26 represents represents ten to the power of 26, 1026.
Obviously, these are different, so 10e26 == 10**26 is false.
However, if we correct the mistake so we compare 1e26 and 10**26 by evaluating 1e26 == 10**26, we get false for a different reason:
1e26 is evaluated in a limited-precision floating-point format, producing 100000000000000004764729344 in most implementations. (Python is not strict about the floating-point format.) 100000000000000004764729344 is the closest one can get to 1026 using 53 significant bits.
10**26 is evaluated with integer arithmetic, producing 100000000000000000000000000.
Comparing them reports they are different.
(I am uncertain of Python semantics, but I presume it converts the floating-point value to an extended-precision integer for the comparison. If we instead convert the integer to floating-point, with float(10**26) == 1e26, the conversion of 100000000000000000000000000 to float produces the same value, 100000000000000004764729344, and the comparison returns true.)

Why does numpy integer subtraction produce a float64?

In numpy, why does subtraction of integers sometimes produce floating point numbers?
>>> x = np.int64(2) - np.uint64(1)
>>> x
1.0
>>> x.dtype
dtype('float64')
This seems to only occur when using multiple different integer types (e.g. signed and unsigned), and when no larger integer type is available.

This is a conscious design decision by the numpy authors. When deciding on the resulting type, only the types of the operands are considered, not their actual values. And for the operation you perform, there is a risk of having a result outside the valid range, e.g. if you subtract a very large uint64 number, the result would not fit in an int64. The safe selection is thus to convert to float64, which certainly will fit the result (possibly with reduced precision, though).
Compare with an example of x = np.int32(2) - np.uint32(1). This can always be safely represented as an int64, therefore that type is chosen. The same would be true for x = np.int64(2) - np.uint32(1). This will also yield an int64.
The alternative would be to follow e.g. the c rules, which would cast everything to uint64. But that could, of course, lead to very strange results with over/underflows.
If you want to know ahead of time what type you will end up with, look into np.result_type(), np.can_cast(), and np.promote_types(). Reading about this in the docs might also help you understand the issue a bit better.

I'm no expert on numpy, however, I suspect that since float64 is the smallest data type that can fit both the domain of int64 and uint64 that the subtraction converts both operands into a float64 so that the operation always succeeds.
For example, in a with int8 and uint8: +128 - (256) cannot fit in a int8 since -128 is not valid in int8, as it can only fit back to -127. Similarly, we can't use a uint8 since we obviously need the sign in this case. Hence, we settle on a float/double as it can fit both directions fine.

How to convert a generic float value into a corresponding integer?

I need to use a module that does some math on integers, however my input is in floats.
What I want to achieve is to convert a generic float value into a corresponding integer value and loose as little data as possible.
For example:
val : 1.28827339907e-08
result : 128827339906934
Which is achieved after multiplying by 1e22.
Unfortunately the range of values can change, so I cannot always multiply them by the same constant. Any ideas?
ADDED
To put it in other words, I have a matrix of values < 1, let's say from 1.323224e-8 to 3.457782e-6.
I want to convert them all into integers and loose as little data as possible.

The answers that suggest multiplying by a power of ten cause unnecessary rounding.
Multiplication by a power of the base used in the floating-point representation has no error in IEEE 754 arithmetic (the most common floating-point implementation) as long as there is no overflow or underflow.
Thus, for binary floating-point, you may be able to achieve your goal by multiplying the floating-point number by a power of two and rounding the result to the nearest integer. The multiplication will have no error. The rounding to integer may have an error up to .5, obviously.
You might select a power of two that is as large as possible without causing any of your numbers to exceed the bounds of the integer type you are using.
The most common conversion of floating-point to integer truncates, so that 3.75 becomes 3. I am not sure about Python semantics. To round instead of truncating, you might use a function such as round before converting to integer.

If you want to preserve the values for operations on matrices I would choose some value to multiply them all by.
For Example:
1.23423
2.32423
4.2324534
Multiply them all by 10000000 and you get
12342300
23242300
42324534
You can perform you multiplications, additions etc with your matrices. Once you have performed all your calculations you can convert them back to floats by dividing them all by the appropriate value depending on the operation you performed.
Mathematically it makes sense because
(Scalar multiplication)
M1` = M1 * 10000000
M2` = M2 * 10000000
Result = M1`.M2`
Result = (M1 x 10000000).(M2 x 10000000)
Result = (10000000 x 10000000) x (M1.M2)
So in the case of multiplication you would divide your result by 10000000 x 10000000.
If its addition / subtraction then you simply divide by 10000000.
You can either choose the value to multiply by through your knowledge of what decimals you expect to find or by scanning the floats and generating the value yourself at runtime.
Hope that helps.
EDIT: If you are worried about going over the maximum capacity of integers - then you would be happy to know that python automatically (and silently) converts integers to longs when it notices overflow is going to occur. You can see for yourself in a python console:
>>> i = 3423
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'long'>
If you are still worried about overflow, you can always choose a lower constant with a compromise for slightly less accuracy (since you will be losing some digits towards then end of the decimal point).
Also, the method proposed by Eric Postpischil seems to make sense - but I have not tried it out myself. I gave you a solution from a more mathematical perspective which also seems to be more "pythonic"

Perhaps consider counting the number of places after the decimal for each value to determine the value (x) of your exponent (1ex). Roughly something like what's addressed here. Cheers!

Here's one solution:
def to_int(val):
return int(repr(val).replace('.', '').split('e')[0])
Usage:
>>> to_int(1.28827339907e-08)
128827339907

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

float32 representation to float64 NumPy Python - python

Related

Converting Bytes to Fixed point

Why does converting from np.float16 to np.float32 modify the value?

Python why is 10e26 != 10**26 ? (Floating point inaccuracy?)

Why does numpy integer subtraction produce a float64?

How to convert a generic float value into a corresponding integer?

Categories

Resources