How encode numpy arrays in string of minimal length? - python

I have around ten variables (variables / arrays / symmetric matrices) which i want to get through an url. Because i will use a a rest api there is a limit on the size of the url so i need to encode it in a string of minimal length and encrypt. Any idea ? I've always supposed that's how google or other website transmit information sometimes when the adress is downright initelligible
My original idea was to encode all numbers in scientific notation and use separators (2.4e14__3.1e12_2.5e10_ for example to pass a number 2.4e14 and a array [3.1e12_2.5e10]) and encode this string. Possibly use another base (base with numbers + letters) for futher concatenation but i'm not sure how i can save so much string space.
Maybe there's an existing library or technique ? i didn't find it.

Pickle and base64 will do the job nicely. Your floating point numbers remain as binary, not converted through ascii.
>>> import numpy as np
>>> a = np.array([0,1,2])
>>> import pickle
>>> import base64
>>> b64 = base64.b64encode(pickle.dumps(a))
At the other end
>>> n = pickle.loads(base64.b64decode(b64))
>>> print(n)
array([0, 1, 2])
However, this won't be the shortest representation possible. Sufficient information to fully reconstruct the object is transmitted. If it is short enough it is the most easily extended and modified option.

You can convert a numpy object to python list.Then convert list to a json string.
>>> import numpy as np
>>> import json
>>> a = np.array([0,1,2])
>>> b = a.tolist()
>>> c = json.dumps(b)
Similarly,you can convert json string to numpy by: json string->list->numpy
>>> d = np.array(json.loads(c))

Related

Convert 16 bit hex value to FP16 in Python?

I'm trying to write a basic FP16 based calculator in python to help me debug some hardware. Can't seem to find how to convert 16b hex values unto floating point values I can use in my code to do the math. I see lots of online references to numpy but I think the float16 constructor expects a string like float16("1.2345"). I guess what I'm looking for is something like float16("0xabcd").
Thanks!
The numpy.float16 is indeed a signed floating point format with a 5-bit exponent and 10-bit mantissa.
To get the result of your example:
import numpy as np
np.frombuffer(b'\xab\xcd', dtype=np.float16, count=1)
Result:
array([-22.67], dtype=float16)
Or, to show how you can encode and decode the other example 1.2345:
import numpy as np
a = np.array([1.2345], numpy.float16)
b = a.tobytes()
print(b)
c = np.frombuffer(b, dtype=np.float16, count=1)
print(c)
Result:
b'\xf0<'
[1.234]
If you literally needed to turn the string you provided into an FP16:
import numpy as np
s = "0xabcd"
b = int("0xabcd", base=16).to_bytes(2, 'big')
print(b)
c = np.frombuffer(b, dtype=np.float16, count=1)
print(c)
Output:
b'\xab\xcd'
[-22.67]

Copying internal formats float64 uint64

I'm using Numpy and Python. I need to copy data, WITHOUT numeric conversion between np.uint64 and np.float64, e.g. 1.5 <-> 0x3ff8000000000000.
I'm aware of float.hex, but the output format a long way from uint64:
In [30]: a=1.5
In [31]: float.hex(a)
Out[31]: '0x1.8000000000000p+0'
Im also aware of various string input routines for the other way.
Can anybody suggest more direct methods? After all, its just simple copy and type change but python/numpy seem really rigid about converting the data on the way.
Use an intermediate array and the frombuffer method to "cast" one array type into the other:
>>> v = 1.5
>>> fa = np.array([v], dtype='float64')
>>> ua = np.frombuffer(fa, dtype='uint64')
>>> ua[0]
4609434218613702656 # 0x3ff8000000000000
Since frombuffer creates a view into the original buffer, this is efficient even for reinterpreting data in large arrays.
So, what you need is to see the 8 bytes that represent the float64 in memory as an integer number. (representing this int64 number as an hexadecimal string is another thing - it
is just its representation).
The Struct and Union functionality that comes bundled with the stdlib's ctypes
may be nice for you - no need for numpy. It has a Union type that works
quite like C language unions, and allow you to do this:
>>> import ctypes
>>> class Conv(ctypes.Union):
... _fields_ = [ ("float", ctypes.c_double), ("int", ctypes.c_uint64)]
...
>>> c = Conv()
>>> c.float = 1.5
>>> print hex(c.int)
0x3ff8000000000000L
The built-in "hex" function is a way to get the hexadecimal representation of the number.
You can use the struct module as well: pack the number to a string as a double, and unpack it as int. I think it is both less readable and less efficient than using ctypes Union:
>>> inport struct
>>> hex(struct.unpack("<Q", struct.pack("<d", 1.5))[0])
'0x3ff8000000000000'
Since you are using numpy , however, you can simply change the array type, "on the fly", and manipulate all the array as integers with 0 copy:
>>> import numpy
>>> x = numpy.array((1.5,), dtype=numpy.double)
>>> x[0]
1.5
>>> x.dtype=numpy.dtype("uint64")
>>> x[0]
4609434218613702656
>>> hex(x[0])
'0x3ff8000000000000L'
This is by far the most efficient way of doing it, whatever is your purpose in getting the raw bytes of the float64 numbers.

How to make a fixed-size byte variable in Python

Let's say, I have a string (Unicode if it matters) variable which is less than 100 bytes. I want to create another variable with exactly 100 byte in size which includes this string and is padded with zero or whatever. How would I do it in Python 3?
For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct module.
struct — Interpret bytes as packed binary data
Just for the string, you might not need struct, but as soon as you start also packing binary values, struct will make your life much easier.
Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.
Protocol Buffer Basics: Python
PyMOTW - JavaScript Object Notation Serializer
Something like this should work:
st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation
To pad with null bytes you can do it the way they do it in the stdlib base64 module.
some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))
Here's a roundabout way of doing it:
>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24
So following this, an ASCII string of 100 bytes will need to be 79 characters long
>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100
This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.
def padder(strng):
TARGETSIZE = 100
padChar = "0"
curSize = sys.getsizeof(strng)
if curSize <= TARGETSIZE:
for i in range(TARGETSIZE - curSize):
strng = padChar + strng
return strng
else:
return strng # Not sure if you need to handle strings that start longer than your target, but you can do that here

Convert int to single byte in a string?

I'm implementing PKCS#7 padding right now in Python and need to pad chunks of my file in order to amount to a number divisible by sixteen. I've been recommended to use the following method to append these bytes:
input_chunk += '\x00'*(-len(input_chunk)%16)
What I need to do is the following:
input_chunk_remainder = len(input_chunk) % 16
input_chunk += input_chunk_remainder * input_chunk_remainder
Obviously, the second line above is wrong; I need to convert the first input_chunk_remainder to a single byte string. How can I do this in Python?
In Python 3, you can create bytes of a given numeric value with the bytes() type; you can pass in a list of integers (between 0 and 255):
>>> bytes([5])
b'\x05'
bytes([5] * 5)
b'\x05\x05\x05\x05\x05'
An alternative method is to use an array.array() with the right number of integers:
>>> import array
>>> array.array('B', 5*[5]).tobytes()
b'\x05\x05\x05\x05\x05'
or use the struct.pack() function to pack your integers into bytes:
>>> import struct
>>> struct.pack('{}B'.format(5), *(5 * [5]))
b'\x05\x05\x05\x05\x05'
There may be more ways.. :-)
In Python 2 (ancient now), you can do the same by using the chr() function:
>>> chr(5)
'\x05'
>>> chr(5) * 5
'\x05\x05\x05\x05\x05'
In Python3, the bytes built-in accepts a sequence of integers. So for just one integer:
>>> bytes([5])
b'\x05'
Of course, thats bytes, not a string. But in Python3 world, OP would probably use bytes for the app he described, anyway.

Is there a way to pad to an even number of digits?

I'm trying to create a hex representation of some data that needs to be transmitted (specifically, in ASN.1 notation). At some points, I need to convert data to its hex representation. Since the data is transmitted as a byte sequence, the hex representation has to be padded with a 0 if the length is odd.
Example:
>>> hex2(3)
'03'
>>> hex2(45)
'2d'
>>> hex2(678)
'02a6'
The goal is to find a simple, elegant implementation for hex2.
Currently I'm using hex, stripping out the first two characters, then padding the string with a 0 if its length is odd. However, I'd like to find a better solution for future reference. I've looked in str.format without finding anything that pads to a multiple.
def hex2(n):
x = '%x' % (n,)
return ('0' * (len(x) % 2)) + x
To be totally honest, I am not sure what the issue is. A straightforward implementation of what you describe goes like this:
def hex2(v):
s = hex(v)[2:]
return s if len(s) % 2 == 0 else '0' + s
I would not necessarily call this "elegant" but I would certainly call it "simple."
Python's binascii module's b2a_hex is guaranteed to return an even-length string.
the trick then is to convert the integer into a bytestring. Python3.2 and higher has that built-in to int:
from binascii import b2a_hex
def hex2(integer):
return b2a_hex(integer.to_bytes((integer.bit_length() + 7) // 8, 'big'))
Might want to look at the struct module, which is designed for byte-oriented i/o.
import struct
>>> struct.pack('>i',678)
'\x00\x00\x02\xa6'
#Use h instead of i for shorts
>>> struct.pack('>h',1043)
'\x04\x13'

Categories