Is there any Python module that allows one to freely reinterpret raw bytes as different data types and perform various arithmetic operations on them? For example, take a look at this C snippet:
char test[9] = "test_data";
int32_t *x = test;
(*x)++;
printf("%d %.9s\n", x, test);
//outputs 1245780400 uest_data on LE system.
Here, both test and x point to the same chunk of memory. I can use x to perform airthmetics and testfor string-based operations and individual byte access.
I would like to do the same in Python - I know that I can use struct.pack and struct.unpack whenerver I feel the need to convert between list of bytes and an integer, but maybe there's a module that makes such fiddling much easier.
I'd like to have a datatype that would support arithmetics and at the same time, would allow me to mess with the individual bytes. This doesn't sound like a use-case that Python was designed for, but nevertheless, it's a quite common problem.
I took a look into pycrypto codebase, since cryptography is one of the most common use cases for such functionality, but pycrypto implements most of its algorithms in plain C and the much smaller Python part uses two handcrafted methods (long_to_bytes and bytes_to_long) for conversion between the types.
You may be able to use ctypes.Union, which is analogous to Unions in C.
I don't have much experience in C, but as I understand Unions allow you to write to a memory location as one type and read it as another type, and vice versa, which seems to fit your use case.
https://docs.python.org/3/library/ctypes.html#structures-and-unions
Otherwise, if your needs are simpler, you could just convert bytes / bytearray objects from / to integers using the built-in int.to_bytes and int.from_bytes methods.
Related
I've used a bit of metaprogramming (with metal and pfr) + Converting Tuple to string
to map a c++ pod struct into its equivalent python struct representation (padding not accounted for yet but this is a separate step)
So my questions is, how can I do this better? I'm ok moving forward with this but it seems there must be some way I can simplify this code. Any suggestions?
Code here:
https://github.com/Kubiyak/pybuffer_container/blob/master/meta_example.cpp
Actually, this is a good guide:
I will take the solution based on std::apply w/ c
How can you iterate over the elements of an std::tuple?
of-an-stdtuple
That cleaned up my code significantly. How can I delete this q? I cannot find the delete button for it.
Can you point out a scenario in which Python's bytearray is useful? Is it simply a non-unicode string that supports list methods which can be easily achieved anyway with str objects?
I understand some think of it as "the mutable string". However, when would I need such a thing? And: unlike strings or lists, I am strictly limited to ascii, so in any case I'd prefer the others, is that not true?
Bach,
bytearray is a mutable block of memory. To that end, you can push arbitrary bytes into it and manage your memory allocations carefully. While many Python programmers never worry about memory use, there are those of us that do. I manage buffer allocations in high load network operations. There are numerous applications of a mutable array -- too many to count.
It isn't correct to think of the bytearray as a mutable string. It isn't. It is an array of bytes. Strings have a specific structure. bytearrays do not. You can interpret the bytes however you wish. A string is one way. Be careful though, there are many corner cases when using Unicode strings. ASCII strings are comparatively trivial. Modern code runs across borders. Hence, Python's rather complete Unicode-based string classes.
A bytearray reveals memory allocated by the buffer protocol. This very rich protocol is the foundation of Python's interoperation with C and enables numpy and other direct access memory technologies. In particular, it allows memory to be easily structured into multi-dimensional arrays in either C or FORTRAN order.
You may never have to use a bytearray.
Anon, Andrew
deque seems to need much more space than bytearray.
>>> sys.getsizeof(collections.deque(range(256)))
1336
>>> sys.getsizeof(bytearray(range(256)))
293
I Guess this is because of the layout.
If you need samples using bytearray I suggest searching the online code with nullege.
bytearray also has one more advantage: you do not need to import anything for that. This means that people will use it - whether it makes sense or not.
Further reading about bytearray: the bytes type in python 2.7 and PEP-358
I am writing a little arbitrary precision in c(I know such libraries exist, such as gmp, nut I find it more fun to write it, just to exercise myself), and I would like to know if arrays are the best way to represent very long integers, or if there is a better solution(maybe linked chains)? And secondly how does python work to have big integers?(does it use arrays or another technique ?)
Thanks in advance for any response
Try reading documentation about libgmp, it already implements bigints. From what I see, it seems like integers are implemented as a dynamically allocated which is realloc'd when the number needs to grow. See http://gmplib.org/manual/Integer-Internals.html#Integer-Internals.
Python long integers are implemented as a simple structure with just an object header and an array of integers (which might be either 16- or 32-bit ints, depending on platform).
Code is at http://hg.python.org/cpython/file/f8942b8e6774/Include/longintrepr.h
I want to send lots of numbers via zeromq but converting them to str is inefficient. What is the best way to send numbers via zmq?
A few options:
use python's struct.pack / struct.unpack methods e.g. struct.pack("!L", 1234567)
use another serializer like msgpack
You state that converting numbers to str is inefficient. And yet, unless you have a truly exotic network, that is exactly what must occur no matter what solution is chosen, because all networks in wide use today are byte-based.
Of course, some ways of converting numbers to byte-strings are faster than others. Performing the conversion in C code will likely be faster than in Python code, but consider also whether it is acceptable to exclude "long" (bignum) integers. If excluding them is not acceptable, the str function may be as good as it gets.
The struct and cpickle modules may perform better than str if excluding long integers is acceptable.
What is the best way to store between a million to 450,000 Boolean values in a dictionary like collection indexed by a long number? I need to use the least amount of memory possible. True and Int both take up more than 22 bytes per entry. Is there a lower memory per Boolean possible?
Check this question. Bitarray seems to be the preferred choice.
The two main modules for this are bitarray and bitstring (I wrote the latter). Each will do what you need, but some plus and minus points for each:
bitarray
Written as a C extension so very quick.
Python 2 only.
bitstring
Pure Python.
Python 2.6+ and Python 3.x
Richer array of methods for reading and interpreting data.
So it depends on what you need to do with your data. If it's just storage and retrieval then both will be fine, but for performance critical stuff it's better to use bitarray if you can. Take a look at the docs (bitstring, bitarray) to see which you prefer.
Have you thought about using a hybrid list/bitstring?
Use your list to store one dimension of your bits. Each list item would hold a bitstring of fixed length. You would use your list to focus your search to the bitstring of interest, then use the bitstring to find/modify your bit of interest.
The list should allow the most efficent recall of the bitstrings, the bitstrings should allow you to pack all your data as efficiently as possible, and the hybrid list/bitstring should allow a compromise between speed (slightly slower accessing the bit string in the list) and storage (bit packed data plus list overhead.)