Copying internal formats float64 uint64 - python

I'm using Numpy and Python. I need to copy data, WITHOUT numeric conversion between np.uint64 and np.float64, e.g. 1.5 <-> 0x3ff8000000000000.
I'm aware of float.hex, but the output format a long way from uint64:
In [30]: a=1.5
In [31]: float.hex(a)
Out[31]: '0x1.8000000000000p+0'
Im also aware of various string input routines for the other way.
Can anybody suggest more direct methods? After all, its just simple copy and type change but python/numpy seem really rigid about converting the data on the way.

Use an intermediate array and the frombuffer method to "cast" one array type into the other:
>>> v = 1.5
>>> fa = np.array([v], dtype='float64')
>>> ua = np.frombuffer(fa, dtype='uint64')
>>> ua[0]
4609434218613702656 # 0x3ff8000000000000
Since frombuffer creates a view into the original buffer, this is efficient even for reinterpreting data in large arrays.

So, what you need is to see the 8 bytes that represent the float64 in memory as an integer number. (representing this int64 number as an hexadecimal string is another thing - it
is just its representation).
The Struct and Union functionality that comes bundled with the stdlib's ctypes
may be nice for you - no need for numpy. It has a Union type that works
quite like C language unions, and allow you to do this:
>>> import ctypes
>>> class Conv(ctypes.Union):
... _fields_ = [ ("float", ctypes.c_double), ("int", ctypes.c_uint64)]
...
>>> c = Conv()
>>> c.float = 1.5
>>> print hex(c.int)
0x3ff8000000000000L
The built-in "hex" function is a way to get the hexadecimal representation of the number.
You can use the struct module as well: pack the number to a string as a double, and unpack it as int. I think it is both less readable and less efficient than using ctypes Union:
>>> inport struct
>>> hex(struct.unpack("<Q", struct.pack("<d", 1.5))[0])
'0x3ff8000000000000'
Since you are using numpy , however, you can simply change the array type, "on the fly", and manipulate all the array as integers with 0 copy:
>>> import numpy
>>> x = numpy.array((1.5,), dtype=numpy.double)
>>> x[0]
1.5
>>> x.dtype=numpy.dtype("uint64")
>>> x[0]
4609434218613702656
>>> hex(x[0])
'0x3ff8000000000000L'
This is by far the most efficient way of doing it, whatever is your purpose in getting the raw bytes of the float64 numbers.

Related

Best possible bit array for Numba

I need to create a bit array in Python. So far, I've discovered that one can generate very memory-efficient arrays using the bitarray module.
However, my final intention is to use #vectorize decorator from Numba. Numba works with only a limited amount of Python and numpy features and bitarray is not one of them.
My question is, what's the best memory-efficient way of creating bit arrays using the structures that are supported by Numba?
I would go with numpy arrays, but I've done a quick memory test and it doesn't look good:
>>> import numpy as np
>>> import random
>>> from bitarray import bitarray
>>> from sys import getsizeof
>>> N = 10000
>>> a = bitarray(N)
>>> print(type(a), getsizeof(a))
<class 'bitarray.bitarray'> 96
>>> b = np.random.randint(0, 1, N)
>>> print(type(b), b.nbytes)
<class 'numpy.ndarray'> 40000
>>> c = [random.randint(0, 1) for i in range(N)]
>>> print(type(c), getsizeof(c))
<class 'list'> 87624
(not to say anything about list)
EDIT: As a side question, does anyone have any idea why getsizeof returns such an unrealistically low number for bitarray? I have just noticed.
You can simply specify the data type:
N=1000
b = np.random.randint(0, 1, N)
print(type(b),getsizeof(b))
<class 'numpy.ndarray'> 4096
c = np.random.randint(0, 1, N, dtype=np.bool)
print(type(b),getsizeof(c))
<class 'numpy.ndarray'> 1096
And for your side-question, numpy constructs much more in the numpy object then bitarrray so it is less efficient in terms of total memory of the object.
EDIT:
The memory of the object in python consist of all methods implemented in the object, at least their references to the code, attributes and items such as object.size which is a tuple in numpy that consist of integers, etc. In your list, you have several references to methods such as pop, delete, etc., and it consist of integers arranged in different nodes (list is an extended implementation of a classical linked list combined with other methods, see data structures in official docs).
Taking all of that into consideration, best practice is to use an appropriate data structure that works well in your pipeline, and specify types whenever is possible. Since you use numba, numpy seems the best fit. Memory is not always the issue.

Signed decimal from a signed hex short in python

I was wondering if theres any easy and fast way to obtain the decimal value of a signed hex in python.
What I do to check the value of a hex code is just open up python in terminal and type in the hex, it returns the decimal like this:
>>>0x0024
36
>>>0xcafe
51966
So I was wondering, is there any way that I could do that for signed short hex? for example 0xfffc should return -4
import ctypes
def hex2dec(v):
return ctypes.c_int16(v).value
print hex2dec(0xfffc)
print hex2dec(0x0024)
print hex2dec(0xcafe)
You can use the int classmethod from_bytes and provide it with the bytes:
>>> int.from_bytes(b'\xfc\xff', signed=True, byteorder='little')
-4
If you can use NumPy numerical types you could use np.int16 type conversion
>>> import numpy as np
>>> np.int16(0x0024)
36
>>> np.int16(0xcafe)
-13570
>>> np.int16(0xfffc)
-4
>>> np.uint16(0xfffc)
65532
>>> np.uint16(0xcafe)
51966
Python Language specifies a unified representation for integer numbers, the numbers.Integral class, a subtype of numbers.Rational. In Python 3 there are no specific constraints on the range of this type. In Python 2 only the minimal range is specified, that is numbers.Integral should have at least the range of -2147483648 through 2147483647, that corresponds to the 32 bit representation.
NumPy implementation is closer to the machine architecture. To implement numeric operations efficiently it provides fixed width datatypes, such as int8, int16, int32, and int64.

What is numpy method int0?

I've seen np.int0 used for converting bounding box floating point values to int in OpenCV problems.
What exactly is np.int0?
I've seen np.uint8, np.int32, etc. I can't seem to find np.int0 in any online documentation. What kind of int does this cast arguments to?
int0 is an alias for intp; this, in turn, is
Integer used for indexing (same as C ssize_t; normally either int32 or int64)
-- Numpy docs: basic types
It's a mere alias to int64, try this from either Python 2 or Python 3:
>>> import numpy
>>> numpy.int0 is numpy.int64
True
Here is some more information:
# get complete list of datatype names
>>> np.sctypeDict.keys()
# returns the default integer type (here `int64`)
>>> np.sctypeDict['int0']
<class 'numpy.int64'>
# verify
>>> arr = np.int0([1, 2, 3])
>>> arr.nbytes
24

Binary operations on Numpy scalars automatically up-casts to float64

I want to do binary operations (like add and multiply) between np.float32 and builtin Python int and float and get a np.float32 as the return type. However, it gets automatically up-casted to a np.float64.
Example code:
>>> a = np.float32(5)
>>> a.dtype
dtype('float32')
>>> b = a + 2
>>> b.dtype
dtype('float64')
If I do this with a np.float128, b also becomes a np.float128. This is good, as it thereby preserves precision. However, no up-casting to np.float64 is necessary to preserve precision in my example, but it still occurs. Had I added 2.0 (a Python float (64 bit)) to a instead of 2, the casting would make sense. But even here, I do not want it.
So my question is: How can I alter the casting done when applying a binary operator to a np.float32 and a builtin Python int/float? Alternatively, making single precision the standard in all calculations rather than double, would also count as a solution, as I do not ever need double precision. Other people have asked for this, and it seems that no solution has been found.
I know about numpy arrays and there dtypes. Here I get the wanted behavior, as an array always preserves its dtype. It is however when I do an operation on a single element of an array that I get the unwanted behavior.
I have a vague idea to a solution, involving subclassing np.ndarray (or np.float32) and changing the value of __array_priority__. So far I have not been able to get it working.
Why do I care? I am trying to write an n-body code using Numba. This is why I cannot simply do operations on the array as a whole. Changing all np.float64 to np.float32 makes for a speed up of about a factor of 2, which is important. The np.float64-casting behavior serves to ruin this speed up completely, as all operations on my np.float32 array are done in 64-precision and then downcasted to 32-precision.
I'm not sure about the NumPy behavior, or how exactly you're trying to use Numba, but being explicit about the Numba types might help. For example, if you do something like this:
#jit
def foo(a):
return a[0] + 2;
a = np.array([3.3], dtype='f4')
foo(a)
The float32 value in a[0] is promoted to a float64 before the add operation (if you don't mind diving into llvm IR, you can see this for yourself by running the code using the numba command and using the --dump-llvm or --dump-optimized flag: numba --dump-optimized numba_test.py). However, by specifying the function signature, including the return type as float32:
#jit('f4(f4[:]'))
def foo(a):
return a[0] + 2;
The value in a[0] is not promoted to float64, although the result is cast to a float64 so it can be converted to a Python float object when the function returns to Python land.
If you can allocate an array beforehand to hold the result, you can do something like this:
#jit
def foo():
a = np.arange(1000000, dtype='f4')
result = np.zeros(1000000, dtype='f4')
for i in range(a.size):
result[0] = a[0] + 2
Even though you're doing the looping yourself, the performance of the compiled code should be comparable to a NumPy ufunc, and no casts to float64 should occur (Again, this can be verified by looking at the llvm IR that Numba generates).

Type casting in Python 2.7

How do I cast a float into long in Python 2.7?
I'm doing the same in Python 2.3 like this:
from array import*
data = array('L',[12.34])
print data
which prints out:
array('L',[12L])
How do I do the same in Python 2.7?
Maybe like this?
>>> long(12.34)
12L
These days, it is much more common to see numpy arrays than arrays from the array module, however, your array can be constructed as:
>>> from array import *
>>> array('L',map(long,[12.34]))
array('L', [12L])
with numpy, it could be done as:
>>> import numpy as np
>>> np.array([12.34],dtype=long)
However, this doesn't actually create python longs, it actually creates an array of np.int64 integers (8 byte ints -- Not arbitrary precision like python long).
lst = [1.1,2.2]
data = map(long,lst)

Categories