I am writing a little arbitrary precision in c(I know such libraries exist, such as gmp, nut I find it more fun to write it, just to exercise myself), and I would like to know if arrays are the best way to represent very long integers, or if there is a better solution(maybe linked chains)? And secondly how does python work to have big integers?(does it use arrays or another technique ?)
Thanks in advance for any response
Try reading documentation about libgmp, it already implements bigints. From what I see, it seems like integers are implemented as a dynamically allocated which is realloc'd when the number needs to grow. See http://gmplib.org/manual/Integer-Internals.html#Integer-Internals.
Python long integers are implemented as a simple structure with just an object header and an array of integers (which might be either 16- or 32-bit ints, depending on platform).
Code is at http://hg.python.org/cpython/file/f8942b8e6774/Include/longintrepr.h
Related
Is there any Python module that allows one to freely reinterpret raw bytes as different data types and perform various arithmetic operations on them? For example, take a look at this C snippet:
char test[9] = "test_data";
int32_t *x = test;
(*x)++;
printf("%d %.9s\n", x, test);
//outputs 1245780400 uest_data on LE system.
Here, both test and x point to the same chunk of memory. I can use x to perform airthmetics and testfor string-based operations and individual byte access.
I would like to do the same in Python - I know that I can use struct.pack and struct.unpack whenerver I feel the need to convert between list of bytes and an integer, but maybe there's a module that makes such fiddling much easier.
I'd like to have a datatype that would support arithmetics and at the same time, would allow me to mess with the individual bytes. This doesn't sound like a use-case that Python was designed for, but nevertheless, it's a quite common problem.
I took a look into pycrypto codebase, since cryptography is one of the most common use cases for such functionality, but pycrypto implements most of its algorithms in plain C and the much smaller Python part uses two handcrafted methods (long_to_bytes and bytes_to_long) for conversion between the types.
You may be able to use ctypes.Union, which is analogous to Unions in C.
I don't have much experience in C, but as I understand Unions allow you to write to a memory location as one type and read it as another type, and vice versa, which seems to fit your use case.
https://docs.python.org/3/library/ctypes.html#structures-and-unions
Otherwise, if your needs are simpler, you could just convert bytes / bytearray objects from / to integers using the built-in int.to_bytes and int.from_bytes methods.
Every time I perform an arthetic operation in python, new number objects are created. Wouldn't it be more efficient for the interpreter to perform arithmetic operations with primitive data types rather than having the overhead of creating objects (even for numbers) for arithmetic operations?
Thank you in advance
Yes, it would.
Just like contrived benchmarks prove more or less whatever you want them to prove.
What happens when you add 3 + 23431827340987123049712093874192376491287364912873641298374190234691283764? Well, if you're using primitive types you get all kinds of overflows. If you're using Python objects then it does some long maths, which yeah, is slower than native math, but they're big numbers so you kind of have to deal with that.
Your right, when you use objects, you have a small overhead which deteriorates efficiency.
But numbers are non-mutable and memory allocation is optimized for small numbers: see the article Python integer objects implementation.
Using objects allows the developer to inherit the number classes int, float, complex and add new, specialized methods.
In Python, a type Hierarchy for Numbers is also defined: Number :> Complex :> Real :> Rational :> Integral. That way, you can have features similar to Scheme.
If you need optimization, you can use Cython. It uses the C language to perform optimizations.
On the other hand, you also have the famous NumPy for scientific computing with Python. This library is also compiled in C.
If you really want to play with primitive types, there is an array standard packages: Efficient arrays of numeric values.
As a conclusion: We agree that Python numbers are not primitive types, but using objects offer many more possibilities without introducing complexity for the developer, and without too much lose of performance.
In Python, what is the best data structure of n bits (where n is about 10000) on which performing the usual binary operations (&, |, ^) with other such data structures is fast?
"Fast" is always relative :)
The BitVector package seems to do what you need. I have no experience concerning performance with it though.
There is also a BitString implementation. Perhaps you do some measurements to find out, which one is more performant for your specific needs?
If you don't want a specific class and do not need things such as slicing or bit counting, you might be ok simply using python's long values which are arbitrary length integers. This might be the most performant implementation.
This qestion seems to be similar, although the author need fewer bits and requires a standard library.
In addition to those mentioned by MartinStettner, there's also the bitarray module, which I've used on multiple occations with great results.
PS: My 100th answer, wohooo!
I want to send lots of numbers via zeromq but converting them to str is inefficient. What is the best way to send numbers via zmq?
A few options:
use python's struct.pack / struct.unpack methods e.g. struct.pack("!L", 1234567)
use another serializer like msgpack
You state that converting numbers to str is inefficient. And yet, unless you have a truly exotic network, that is exactly what must occur no matter what solution is chosen, because all networks in wide use today are byte-based.
Of course, some ways of converting numbers to byte-strings are faster than others. Performing the conversion in C code will likely be faster than in Python code, but consider also whether it is acceptable to exclude "long" (bignum) integers. If excluding them is not acceptable, the str function may be as good as it gets.
The struct and cpickle modules may perform better than str if excluding long integers is acceptable.
What is the best way to store between a million to 450,000 Boolean values in a dictionary like collection indexed by a long number? I need to use the least amount of memory possible. True and Int both take up more than 22 bytes per entry. Is there a lower memory per Boolean possible?
Check this question. Bitarray seems to be the preferred choice.
The two main modules for this are bitarray and bitstring (I wrote the latter). Each will do what you need, but some plus and minus points for each:
bitarray
Written as a C extension so very quick.
Python 2 only.
bitstring
Pure Python.
Python 2.6+ and Python 3.x
Richer array of methods for reading and interpreting data.
So it depends on what you need to do with your data. If it's just storage and retrieval then both will be fine, but for performance critical stuff it's better to use bitarray if you can. Take a look at the docs (bitstring, bitarray) to see which you prefer.
Have you thought about using a hybrid list/bitstring?
Use your list to store one dimension of your bits. Each list item would hold a bitstring of fixed length. You would use your list to focus your search to the bitstring of interest, then use the bitstring to find/modify your bit of interest.
The list should allow the most efficent recall of the bitstrings, the bitstrings should allow you to pack all your data as efficiently as possible, and the hybrid list/bitstring should allow a compromise between speed (slightly slower accessing the bit string in the list) and storage (bit packed data plus list overhead.)