I'm trying to use the pack function in the struct module to encode data into formats required by a network protocol. I've run into a problem in that I don't see any way to encode arrays of anything other than 8-bit characters.
For example, to encode "TEST", I can use format specifier "4s". But how do I encode an array or list of 32-bit integers or other non-string types?
Here is a concrete example. Suppose I have a function doEncode which takes an array of 32-bit values. The protocol requires a 32-bit length field, followed by the array itself. Here is what I have been able to come up with so far.
from array import *
from struct import *
def doEncode(arr):
bin=pack('>i'+len(arr)*'I',len(arr), ???)
arr=array('I',[1,2,3])
doEncode(arr)
The best I have been able to come up with is generating a format to the pack string dynamically from the length of the array. Is there some way of specifying that I have an array so I don't need to do this, like there is with a string (which e.g. would be pack('>i'+len(arr)+'s')?
Even with the above approach, I'm not sure how I would go about actually passing the elements in the array in a similar dynamic way, i.e. I can't just say , arr[0], arr[1], ... because I don't know ahead of time what the length will be.
I suppose I could just pack each individual integer in the array in a loop, and then join all the results together, but this seems like a hack. Is there some better way to do this? The array and struct modules each seem to do their own thing, but in this case what I'm trying to do is a combination of both, which neither wants to do.
data = pack('>i', len(arr)) + arr.tostring()
Related
I have a numpy.ndarray named values containing numpy.unicode_ strings and I have a C function foo that consumes an array of C-strings. There is a CFFI wrapper interface for foo.
So I have tried to do something like this
p = ffi.from_buffer("char**", values)
and also
p = ffi.from_buffer("char*[]", values)
This doesn't give any errors in CFFI. But once I run the code it crashes in the C implementation of foo and indeed when I look at the pointers they look bad:
(gdb) p d
$1 = (char **) 0x1f978a50
(gdb) p d[0]
$2 = 0x7300000061 <error: Cannot access memory at address 0x7300000061>
I am on a 64 bit architecture.
It won't work like you are trying to do, because the numpy array contains pointers to Python objects (all of type str), I believe. In any case, it is something else than a raw array of char * pointers to the UTF8-encoded versions of the strings.
I think there is no automatic way to do the conversion. You need to do the loop over the items manually, and manually convert all the strings to char[] arrays, and make sure they are all kept alive long enough. This should do it:
items = [ffi.new("char[]", x.encode('utf-8')) for x in values]
p = ffi.new("char *[]", items)
# keep 'items' alive as long as you need 'p'
or, if all you need is to call a C function that expects a char ** argument, you can rely on the automatic Python-list-to-C-array conversion, as long as every item of the Python list is a char *:
items = [ffi.new("char[]", x.encode('utf-8')) for x in values]
lib.my_c_function(items)
The problem is that numpy is not really representing an array of C strings as char*[]. But it is more like a big single char[] in which all strings are occurring using strides equal to .itemsize which in the case of an array of strings is the size of the biggest occurring string. Shorter strings are padded with zero bytes. And the optional first argument cdecl in ffi.from_buffer is not involved in any rigorous type checking on the received underlying buffer/memory view. It is the responsibility of the programmer to know the correct type of the perceived buffer/memory view.
The cdecl argument will provide type safety when for instance used in conjunction with calls to other CFFI wrapped functions.
The way I solved this is by allocating a separate array of char pointers in cffi
t = ffi.new('char*[]', array_size)
Next massage the numpy array a bit to guarantee that each string is null terminated.
then to implement some logic in Python (or C and then wrapped in CFFI if performance is required)
to point each member in the char*[] array to its corresponding string in the numpy array.
I am very new to python and I would like to write the following (something like fprintf in matlab) I do not know why this string not working ???
Here is the code
import numpy as np
coord=np.linspace(0,10,5)
keyy=("LE")
key=np.repeat(keyy,5)
out_arr=np.array_str(key)
zip=np.array([coord,out_arr])
zzip=zip.T
print(zzip)
savefile=np.savetxt("nam.dat",zzip,fmt="%f %s")
The problem is with the following line:
out_arr=np.array_str(key)
This is converting the array ['LE' 'LE' 'LE' 'LE' 'LE'] to the string "['LE' 'LE' 'LE' 'LE' 'LE']". Note the quotes. This is no longer an array, it is a single string, and numpy interprets it as a length-1 array. You first need to drop that line:
key=np.repeat(keyy,5)
zip=np.array([coord,key])
The next problem you will run into is that this will convert the coord numbers into strings, resulting in all elements being string. This is because numpy arrays have a single, fixed type (there are exceptions but they are more complicated). And the only way to do that in this case is to make everything a string.
The simple way around this is to use an "object" array (basically the same as a cell array in python), which stores arbitrary python objects rather than fixed data:
zip=np.array([coord,out_arr], dtype='object')
However, the better solution if you can is to use pandas. Pandas is sort of like MATLAB tables, but much more powerful. It is designed for this sort of data, and has very nice functions for writing text files like you want to do here in a cleaner, more explicit way.
Also, zip is a built-in function, and it is better not to name variables the same names as built-in functions. It is allowed, but zip is an important function and you don't want to block access to it.
I cant understand when should I use pack and unpack functions in struct library in python?
I also cant understand how to use them?
After reading about it, what I understood is that they are used to convert data into binary. However when I run some examples like:
>>> struct.pack("i",34)
'"\x00\x00\x00'
I cant make any sense out of it.
I want to understand its purpose, how these conversions take place, what does '\x' and other symbols represent/mean and how does unpacking work.
I cant understand when should I use pack and unpack functions in struct library in python?
Then you probably don't have cause to use them.
Other people deal with network and file formats that have low-level binary packing, where struct can be very useful.
However when I run some examples like:
>>> struct.pack("i",34)
'"\x00\x00\x00'
I cant make any sense out of it.
The \x notation is for representing individual bytes of your bytes object using hexadecimal. \x00 means that the byte's value is 0, \x02 means that byte's value is 2, \x10 means that that bytes value is 16, etc. " is byte 34, so we see a " instead of \x22 in this view of the string, but '\x22\x00\x00\x00' and '"\x00\x00\x00' are the same string
http://www.swarthmore.edu/NatSci/echeeve1/Ref/BinaryMath/NumSys.html might help you with some background if that is the level you need to understand numbers at.
I'm making a script to get Valve's server information (players online, map, etc)
the packet I get when I request for information is this:
'\xff\xff\xff\xffI\x11Stargate Central CAP SBEP\x00sb_wuwgalaxy_fix\x00garrysmod\x00Spacebuild\x00\xa0\x0f\n\x0c\x00dw\x00\x0114.09.08\x00\xb1\x87i\x06\xb4g\x17.\x15#\x01gm:spacebuild3\x00\xa0\x0f\x00\x00\x00\x00\x00\x00'
This may help you to see what I'm trying to do https://developer.valvesoftware.com/wiki/Server_queries#A2S_INFO
The problem is, I don't know how to decode this properly, it's easy to get the string but I have no idea how to get other types like byte and short
for example '\xa0\x0f'
For now I'm doing multiple split but do you know if there is any better way of doing this?
Python has functions for encoding/decoding different data types into bytes. Take a look at the struct package, the functions struct.pack() and struct.unpack() are your friends there.
taken from https://docs.python.org/2/library/struct.html
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
The first argument of the unpack function defines the format of the data stored in the second argument. Now you need to translate the description given by valve to a format string. If you wanted to unpack 2 bytes and a short from a data string (that would have a length of 4 bytes, of course), you could do something like this:
(first_byte, second_byte, the_short) = unpack("cc!h", data)
You'll have to take care yourself, to get the correct part of the data string (and I don't know if those numbers are signed or not, be sure to take care of that, too).
The strings you'll have to do differently (they are null-terminated here, so start were you know a string starts and read to the first "\0" byte).
pack() work's the other way around and stores data in a byte string. Take a look at the examples on the python doc and play around with it a bit to get a feel for it (when a tuple is returned/needed, e.g.).
struct supports you in getting the right byte order, which most of the time is network byte order and different from your system. That is of course only necessary for multi byte integers (like short) - so a format string of `"!h" should unpack a short correctly.
I want to put numerics and strings into the same numpy array. However, I very rarely (difficult to replicate, but sometimes) run into an error where the numeric to string conversion results in a value that cannot back-translate into a decimal (ie, I get "9.8267567e", as opposed to "9.8267567e-5" in the array). This is causing problems after writing files. Here is an example of what I am doing (though on a much smaller scale):
import numpy as np
x = np.array(.94749128494582)
y = np.array(x, dtype='|S100')
My understanding is that this should allow 100 string characters, but sometimes I am seeing a cut-off after ~10. Is there another type that I should be assigning, or a way to limit the number of characters in my array (x)?
First of all, x = np.array(.94749128494582) may not be doing what you think because the argument passed into np.array should be some kind of sequence or something with the array interface. Perhaps you meant x = np.array([.94749128494582])?
Now, as for preserving the strings properly, you could solve this by using
y = np.array(x, dtype=object)
However, as Joe has mentioned in his comment, it's not very numpythonic and you may as well be using plain old python lists.
I would recommend to examine carefully why you seem to have this requirement to hold strings and numbers in the same array, it smells to me like you might have inappropriate data structures set up and could benefit from redesigning/refactoring. numpy arrays are for fast numerical operations, they are not really suited to be used for string manipulations or as some kind of storage/database.