Numpy octuple precision floats and 128 bit ints. Why and how?

Numpy octuple precision floats and 128 bit ints. Why and how? - python

This is mostly a question out of curiosity. I noticed that the numpy test suite contains tests for 128 bit integers, and the numerictypes module refers to int128, float256 (octuple precision?), and other types that don't seem to map to numpy dtypes on my machine.
My machine is 64bit, yet I can use quadruple 128bit floats (but not really). I suppose that if it's possible to emulate quadruple floats in software, one can theoretically also emulate octuple floats and 128bit ints. On the other hand, until just now I had never heard of either 128bit ints or octuple precision floating point before. Why is there a reference to 128bit ints and 256bit floats in numpy's numerictypes module if there are no corresponding dtypes, and how can I use those?

This is a very interesting question and probably there are reasons related to python, to computing and/or to hardware. While not trying to give a full answer, here is what I would go towards...
First note that the types are defined by the language and can be different from your hardware architecture. For example you could even have doubles with an 8-bits processor. Of course any arithmetic involves multiple CPU instructions, making the computation much slower. Still, if your application requires it, it might be worth it or even required (better being late than wrong, especially if say you are running a simulation for a say bridge stability...) So where is 128bit precision required? Here's the wikipedia article on it...
One more interesting detail is that when we say a computer is say 64-bit, this is not fully describing the hardware. There are a lot of pieces that can each be (and at least have been at times) different bits: The computational registers in the CPU, the memory addressing scheme / memory registers and the different buses with most important the buss from CPU to memory.
-The ALU (arithmetic and logic unit) has registers that do calculations. Your machines are 64bit (not sure if that also mean they could do 2 32bit calculations at a similar time) This is clearly the most relevant quantity for this discussion. Long time ago, it used to be the case you could go out and buy a co-processor to speed that for calculations of higher precision...
-The Registers that hold memory addresses limit the memory the computer can see (directly) that is why computers that had 32bit memory registers could only see 2^32 bytes (or approx 4 GB) Notice that for 16bits, this becomes 65K which is very low. The OS can find ways around this limit, but not for a single program, so no program in a 32bit computer can normally have more than 4GB memmory.
-Notice that those limits are about bytes, not bits. That is because when referring and loading from memory we load bytes. In fact, the way this is done, loading a byte (8 bits) or 8 (64 bits == buss length for your computer) takes the same time. I ask for an address, and then get at once all bits through the bus.
It can be that in an architecture all these quantities are not the same number of bits.

NumPy is amazingly powerful and can handle numbers much bigger than the internal CPU representation (e.g. 64 bit).
In case of dynamic type it stores the number in an array. It can extend the memory block too, that is why you can have an integer with 500 digits. This dynamic type is called bignum. In older Python versions it was the type long. In newer Python (3.0+) there is only long, which is called int, which supports almost arbitrarily number of digits (-> bignum).
If you specify a data type (int32 for example), then you specify bit length and bit format, i.e. which bits in memory stands for what. Example:
dt = np.dtype(np.int32) # 32-bit integer
dt = np.dtype(np.complex128) # 128-bit complex floating-point number
Look in: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

Related

int(np.float128) cast precision loss [duplicate]

What precision does numpy.float128 map to internally? Is it __float128 or long double? Or something else entirely?
A potential follow on question if anybody knows: is it safe in C to cast a __float128 to a (16 byte) long double, with just a loss in precision? (this is for interfacing with a C lib that operates on long doubles).
Edit: In response to the comment, the platform is 'Linux-3.0.0-14-generic-x86_64-with-Ubuntu-11.10-oneiric'. Now, if numpy.float128 has varying precision dependent on the platform, that is also useful knowledge for me!
Just to be clear, it is the precision I am interested in, not the size of an element.

numpy.longdouble refers to whatever type your C compiler calls long double. Currently, this is the only extended precision floating point type that numpy supports.
On x86-32 and x86-64, this is an 80-bit floating point type. On more exotic systems it may be something else (IIRC on Sparc it's an actual 128-bit IEEE float, and on PPC it's double-double). (It also may depend on what OS and compiler you're using -- e.g. MSVC on Windows doesn't support any kind of extended precision at all.)
Numpy will also export some name like numpy.float96 or numpy.float128. Which of these names is exported depends on your platform/compiler, but whatever you get always refers to the same underlying type as longdouble. Also, these names are highly misleading. They do not indicate a 96- or 128-bit IEEE floating point format. Instead, they indicate the number of bits of alignment used by the underlying long double type. So e.g. on x86-32, long double is 80 bits, but gets padded up to 96 bits to maintain 32-bit alignment, and numpy calls this float96. On x86-64, long double is again the identical 80 bit type, but now it gets padded up to 128 bits to maintain 64-bit alignment, and numpy calls this float128. There's no extra precision, just extra padding.
Recommendation: ignore the float96/float128 names, just use numpy.longdouble. Or better yet stick to doubles unless you have a truly compelling reason. They'll be faster, more portable, etc.

It's quite recommended to use longdouble instead of float128, since it's quite a mess, ATM. Python will cast it to float64 during initialization.
Inside numpy, it can be a double or a long double. It's defined in npy_common.h and depends of your platform. I don't know if you can include it out-of-the-box into your source code.
If you don't need performance in this part of your algorithm, a safer way could be to export it to a string and use strold afterwards.

TLDR from the numpy docs:
np.longdouble is padded to the system default; np.float96 and np.float128 are provided for users who want specific padding. In spite of the names, np.float96 and np.float128 provide only as much precision as np.longdouble, that is, 80 bits on most x86 machines and 64 bits in standard Windows builds.

How can I sort 128 bit unsigned integers in Python?

I have a huge number of 128-bit unsigned integers that need to be sorted for analysis (around a trillion of them!).
The research I have done on 128-bit integers has led me down a bit of a blind alley, numpy doesn't seem to fully support them and the internal sorting functions are memory intensive (using lists).
What I'd like to do is load, for example, a billion 128-bit unsigned integers into memory (16GB if just binary data) and sort them. The machine in question has 48GB of RAM so should be OK to use 32GB for the operation. If it has to be done in smaller chunks that's OK, but doing as large a chunk as possible would be better. Is there a sorting algorithm that Python has which can take such data without requiring a huge overhead?
I can sort 128-bit integers using the .sort method for lists, and it works, but it can't scale to the level that I need. I do have a C++ version that was custom written to do this and works incredibly quickly, but I would like to replicate it in Python to accelerate development time (and I didn't write the C++ and I'm not used to that language).
Apologies if there's more information required to describe the problem, please ask anything.

NumPy doesn't support 128-bit integers, but if you use a structured dtype composed of high and low unsigned 64-bit chunks, those will sort in the same order as the 128-bit integers would:
arr.sort(order=['high', 'low'])
As for how you're going to get an array with that dtype, that depends on how you're loading your data in the first place. I imagine it might involve calling ndarray.view to reinterpret the bytes of another array. For example, if you have an array of dtype uint8 whose bytes should be interpreted as little-endian 128-bit unsigned integers, on a little-endian machine:
arr_structured = arr_uint8.view([('low', 'uint64'), ('high', 'uint64')])
So that might be reasonable for a billion ints, but you say you've got about a trillion of these. That's a lot more than an in-memory sort on a 48GB RAM computer can handle. You haven't asked for something to handle the whole trillion-element dataset at once, so I hope you already have a good solution in mind for merging sorted chunks, or for pre-partitioning the dataset.

I was probably expecting too much from Python, but I'm not disappointed. A few minutes of coding allowed me to create something (using built-in lists) that can process the sorting a hundred million uint128 items on an 8GB laptop in a couple of minutes.
Given a large number of items to be sorted (1 trillion), it's clear that putting them into smaller bins/files upon creation makes more sense than looking to sort huge numbers in memory. The potential issues created by appending data to thousands of files in 1MB chunks (fragmentation on spinning disks) are less of a worry due to the sorting of each of these fragmented files creating a sequential file that will be read many times (the fragmented file is written once and read once).
The benefits of development speed of Python seem to outweigh the performance hit versus C/C++, especially since the sorting happens only once.

Large Numpy array handler, numpy data procession, memmap funciton mapping

Large numpy array (over 4GB) with nyp file and memmap function
I was using numpy package for array calculation where I read https://docs.scipy.org/doc/numpy/neps/npy-format.html
In "Format Specification: Version 2.0" it said that, for .npy file, "version 2.0 format extends the header size to 4 GiB".
My question was that:
What was header size? Did that mean I could only save numpy.array of sizeat most 4GB array into the npy file? How large could a single array go?
I also read https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.memmap.html
where it stated that "Memory-mapped files cannot be larger than 2GB on 32-bit systems"
did it mean numpy.memmap's limitation was based on the memory of the system? Was there anyway to avoid such limitation?
Further, I read that we could chose the dtype of the array, where the best resolution was "complex128". Was there any way to "use" and "save" elements with more accuracy on a 64 bit computer?(more accurate than complex128 or float64)

The previous header size field was 16 bits wide, allowing headers smaller than 64KiB. Because the header describes the structure of the data, and doesn't contain the data itself, this is not a huge concern for most people. Quoting the notes, "This can be exceeded by structured arrays with a large number of columns." So to answer the first question, header size was under 64KiB but the data came after, so this wasn't the array size limit. The format didn't specify a data size limit.
Memory map capacity is dependent on operating system as well as machine architecture. Nowadays we've largely moved to flat but typically virtual address maps, so the program itself, stack, heap, and mapped files all compete for the same space, in total 4GiB for 32 bit pointers. Operating systems frequently partition this in quite large chunks, so some systems might only allow 2GiB total for user space, others 3GiB; and often you can map more memory than you can allocate otherwise. The memmap limitation is more closely tied to the operating system in use than the physical memory.
Non-flat address spaces, such as using distinct segments on OS/2, could allow larger usage. The cost is that a pointer is no longer a single word. PAE, for instance, supplies a way for the operating system to use more memory but still leaves processes with their own 32 bit limits. Typically it's easier nowadays to use a 64 bit system, allowing memory spaces up to 16 exabytes. Because data sizes have grown a lot, we also handle it in larger pieces, such as 4MiB or 16MiB allocations rather than the classic 4KiB pages or 512B sectors. Physical memory typically has more practical limits.
Yes, there are elements with more precision than 64 bit floating point; in particular, 64 bit integers. This effectively uses a larger mantissa by sacrificing all of the exponent. Complex128 is two 64 bit floats, and doesn't have higher precision but a second dimension. There are types that can grow arbitrarily precise, such as Python's long integers (long in python 2, int in python 3) and fractions, but numpy generally doesn't delve into those because they also have matching storage and computation costs. A basic property of the arrays is that they can be addressed using index calculations since the element size is consistent.

Python float precision float

I need to implement a Dynamic Programming algorithm to solve the Traveling Salesman problem in time that beats Brute Force Search for computing distances between points. For this I need to index subproblems by size and the value of each subproblem will be a float (the length of the tour). However holding the array in memory will take about 6GB RAM if I use python floats (which actually have double precision) and so to try and halve that amount (I only have 4GB RAM) I will need to use single precision floats. However I do not know how I can get single precision floats in Python (I am using Python 3). Could someone please tell me where I can find them (I was not able to find much on this on the internet). Thanks.
EDIT: I notice that numpy also has a float16 type which will allow for even more memory savings. The distances between points are around 10000 and there are 25 unique points and my answer needs to be to the nearest integer. Will float16 provide enought accuracy or do I need to use float32?

As a first step, you should use a NumPy array to store your data instead of a Python list.
As you correctly observe, a Python float uses double precision internally, and the double-precision value underlying a Python float can be represented in 8 bytes. But on a 64-bit machine, with the CPython reference implementation of Python, a Python float object takes a full 24 bytes of memory: 8 bytes for the underlying double-precision value, 8 bytes for a pointer to the object type, and 8 bytes for a reference count (used for garbage collection). There's no equivalent of Java's "primitive" types or .NET's "value" types in Python - everything is boxed. That makes the language semantics simpler, but means that objects tend to be fatter.
Now if we're creating a Python list of float objects, there's the added overhead of the list itself: one 8-byte object pointer per Python float (still assuming a 64-bit machine here). So in general, a list of n Python float objects is going to cost you over 32n bytes of memory. On a 32-bit machine, things are a little better, but not much: our float objects are going to take 16 bytes each, and with the list pointers we'll be using 20n bytes of memory for a list of floats of length n. (Caveat: this analysis doesn't quite work in the case that your list refers to the same Python float object from multiple list indices, but that's not a particularly common case.)
In contrast, a NumPy array of n double-precision floats (using NumPy's float64 dtype) stores its data in "packed" format in a single data block of 8n bytes, so allowing for the array metadata the total memory requirement will be a little over 8n bytes.
Conclusion: just by switching from a Python list to a NumPy array you'll reduce your memory needs by about a factor of 4. If that's still not enough, then it might make sense to consider reducing precision from double to single precision (NumPy's float32 dtype), if that's consistent with your accuracy needs. NumPy's float16 datatype takes only 2 bytes per float, but records only about 3 decimal digits of precision; I suspect that it's going to be close to useless for the application you describe.

You could try the c_float type from the ctypes standard library. Alternatively, if you are capable of installing additional packages you might try the numpy package. It includes the float32 type.

sys.getsizeof(1) and sys.getsizeof(10000) return the same output. Why?

Observe the following (in Python verson 2.7.3):
>>> import sys
>>> sys.getsizeof(1)
24
>>> sys.getsizeof(10000)
24
Why does the larger integer only take the same amount of memory (memory of the heap as afaik) of the smaller integer?
Compare this the following
>>> sys.getsizeof("1")
38
>>> sys.getsizeof("100000")
42

If you do sys.getsizeof("1") vs sys.getsizeof("100000") the larger string takes up more memory. So why would it be different for integers?
Because integers are not stored character by character.
In Python, an integer starts as large as is the platform's native word (plus the extra space needed for the interpreter's internal bookkeeping) because arithmetic is way faster - if you use native types, the processor can directly do the arithmetic on them.
When you get outside the native word boundaries (-2**31 - (2**31-1) on 32 bit machines), Python automatically switches to arbitrary precision arithmetic, where operations are "emulated" in software (of course building on the usual primitives provided by the hardware).
This still won't show you a "string-like" increase because the used space increases in larger chunks (again, typically for efficiency reasons).
Also, looking for an 1:1 decimal digits-integer size relation is misleading, since integers are stored in binary, and a decimal digit "takes" ~3.32 binary digits; so, the size "jumps" won't be on "decimal digits" boundaries, but more on powers of two boundaries.

Strings and integers are indeed comparable in that they can be considered as an arbitrarily long sequence of "digits" or "characters". And when you use large enough integers (like 2^70, 5e100, ...) you will see memory usage go up1. When you add one "character"/"digit" to either, memory usage goes up. For strings, a "character" is roughly what you expect (it gets more complicated with unicode, but whatever).
For integers, however, one "digit" is quite large: Several dozen bits (instead of a single decimal digit, let alone a single bit). This is partly because memory is divided into bits and groups of bits (rather then decimal digits), partly you can't allocate individual bits (at most individual bytes, but even that is wasteful), partly because it's more effective to as many bits as the CPU can work on "natively" (4 to 8 byte usually), and partly to simplify the code.
There's the additional complication that Python 2 has two integer types, int and long. In the context of the above explanation, ints are a weird exception in that they allow you to use a larger single digit (e.g. 64 bits instead of 30), but only as long as the number fits into that single digit. The general principle still applies.
1 Though these integers will have the class long rather than int.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.