How to convert the string between numpy.array and bytes - python

I want to convert the string to bytes first, and then convert it to numpy array:
utf8 string -> bytes -> numpy.array
And then:
numpy.array -> bytes -> utf8 string
Here is the test:
import numpy as np
string = "any_string_in_utf8: {}".format(123456)
test = np.frombuffer(bytes(string, 'utf-8'))
print(test)
Here is the output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_9055/3077694159.py in <cell line: 5>()
3 string = "any_string_in_utf8: {}".format(123456)
4
----> 5 test = np.frombuffer(bytes(string, 'utf-8'))
6 print(test)
ValueError: buffer size must be a multiple of element size
How to convert the string between numpy.array and bytes?

Solution :
Main Problem in your Code is that you haven't mentioned dtype. By default dtype was set as Float and we generally started Conversion from String that's why it was throwing ValueError: buffer size must be a multiple of element size.
But If we convert the same into unsigned int then it will work because it can't interpret Object. For more refer to the Code Snippet given below: -
# Import all the Important Modules
import numpy as np
# Initialize String
utf_8_string = "any_string_in_utf8: {}".format(123456)
# utf8 string -> bytes -> numpy.array
np_array = np.frombuffer(bytes(utf_8_string, 'utf-8'), dtype=np.uint8)
# Print 'np_array'
print("Numpy Array after Bytes Conversion : -")
print(np_array)
# numpy.array -> bytes -> utf8 string
result_str = np.ndarray.tobytes(np_array).decode("utf-8")
# Print Result for the Verification of 'Data Loss'
print("\nOriginal String After Conversion : - \n" + result_str)
To Know more about np.frombuffer(): - Click Here !
# Output of the Above Code: -
Numpy Array after Bytes Conversion : -
[ 97 110 121 95 115 116 114 105 110 103 95 105 110 95 117 116 102 56
58 32 49 50 51 52 53 54]
Original String After Conversion : -
any_string_in_utf8: 123456

Related

convert string to hexadecimal in R

I am trying to convert a string such as
str <- 'KgHDRZ3N'
to hexadecimal. The function as.hexmode() returns an error
Error in as.hexmode("KgHDRZ3N") :
'x' cannot be coerced to class "hexmode"
are there any other functions I can use to achieve this?
I am trying to replicate what .encode() in python does.
You can use charToRaw
charToRaw(str)
#> [1] 4b 67 48 44 52 5a 33 4e
Or if you want a character vector
as.hexmode(as.numeric(charToRaw(str)))
#> [1] "4b" "67" "48" "44" "52" "5a" "33" "4e"
or if you want the "0x" prefix:
paste0("0x", as.hexmode(as.numeric(charToRaw(str))))
#> [1] "0x4b" "0x67" "0x48" "0x44" "0x52" "0x5a" "0x33" "0x4e"

Read binary file: Matlab differs from R and python

I am trying to recreate a Matlab script that reads a binary file in either R or python. Here is a link to a the test.bin data: https://github.com/AndrewHFarkas/BinaryReadTest
Matlab
FilePath = 'test.bin'
fid=fopen(FilePath,'r','b');
VersionStr=fread(fid,7,'char')
Version=fread(fid,1,'int16')
SizeFormat='float32'
DataFormat='float32'
EegMegStatus=fread(fid,1,SizeFormat)
NChanExtra=fread(fid,1,SizeFormat)
TrigPoint=fread(fid,1,'float32')
DataTypeVal=fread(fid,1,'float32')
TmpSize=fread(fid,2,SizeFormat)
AvgMat=fread(fid,1,DataFormat)
Matlab output:
VersionStr =
86
101
114
115
105
111
110
Version =
8
SizeFormat =
'float32'
DataFormat =
'float32'
EegMegStatus =
1
NChanExtra =
0
TrigPoint =
1
DataTypeVal =
0
TmpSize =
65
1076
AvgMat =
-12.9650
This is my closest attempt with python (I found some of this code from a different stackoverflow answer:
import numpy as np
import array
def fread(fid, nelements, dtype):
if dtype is np.str:
dt = np.uint8 # WARNING: assuming 8-bit ASCII for np.str!
else:
dt = dtype
data_array = np.fromfile(fid, dt, nelements)
data_array.shape = (nelements, 1)
return data_array
fid = open('test.bin', 'rb');
print(fread(fid, 7, np.str)) # so far so good!
[[ 86]
[101]
[114]
[115]
[105]
[111]
[110]]
#both of these options return 2048
print(fread(fid, 1, np.int16))
np.fromfile(fid, np.int16, 1)
And no matter what else I've tried I can't get any of the same numbers past that point. I have tried using little and big endian settings, but maybe not correctly.
If it helps, here is my closest attempt in R:
newdata = file("test.bin", "rb")
version_char = readBin(newdata, "character", n=1)
version_char
[1] "Version" # this makes sense because the first 7 bytes to spell Version
version_num = readBin(newdata, "int", size = 1 , n = 1, endian = "little")
version_num
[1] 8 #correct number
And nothing after that matches. This is were I get really confused because I was only able to get 8 with a (byte) size = 1 for the version_num, but an int16 should be two bytes as far as I understand. I have tried this code below to read in a float as suggested in another post:
readBin(newdata, "double", size = 4 , n = 1, endian = "little")
Thank you all for your time
struct is in general what you would use to unpack binary data
>>> keys = "VStr vNum EegMeg nChan trigPoint dType tmpSize[0] tmpSize[1] avgMat".split()
>>> dict(zip(keys,struct.unpack_from(">7sh7f",open("test.bin","rb").read())))
{'nChan': 0.0, 'EegMeg': 1.0, 'dType': 0.0, 'trigPoint': 1.0, 'VStr': 'Version', 'avgMat': -12.964995384216309, 'vNum': 8, 'tmpSize[0]': 65.0, 'tmpSize[1]': 1076.0}
the unpack string says unpack a string of 7 characters, followed by a short int followed by 7 float32

Reading 4 bytes with struct.unpack

I have a file whose bytes #11-15 hold an integer that is 4 bytes long. Using struct.unpack, I want to read it as a 4-byte integer. Right now, with PACK_FORMAT set to 8s2s4B2s16B96s40B40B, I read 4 separate bytes:
PACK_FORMAT = '8s2s4B2s16B96s40B40B'
fd = open('./myfile', 'r')
hdrBytes = fd.read(208)
print(repr(hdrBytes))
foo = struct.unpack(PACK_FORMAT, hdrBytes)
(Pdb) foo[0]
'MAGICSTR'
(Pdb) foo[1]
'01'
(Pdb) foo[2:6]
(48, 50, 48, 48)
(Pdb) print repr(hdrBytes)
'MAGICSTR010200a0000000001e100010........`
Now I can convert these 4 bytes to an int as:
(Pdb) int(''.join([chr(x) for x in foo[2:6]]), 16)
512
When I modified PACK_FORMAT to use using i instead of 4B to read 4 bytes, but always get an error:
foo = struct.unpack(PACK_FORMAT, hdrBytes)
error: unpack requires a string argument of length 210
It looks like you're running afoul of the alignment requirement: integers must be on a 4-byte boundary on your machine.
You can turn off alignment by starting your format string with an equals sign:
PACK_FORMAT = '=8s2si2s16B96s40B40B'
It has to do with alignment — see the docs.
import struct
PACK_FORMAT1 = '8s 2s 4B 2s 16B 96s 40B 40B'
print(struct.Struct(PACK_FORMAT1).size) # -> 208
PACK_FORMAT2 = '8s 2s i 2s 16B 96s 40B 40B'
print(struct.Struct(PACK_FORMAT2).size) # -> 210
PACK_FORMAT3 = '=8s 2s i 2s 16B 96s 40B 40B'
print(struct.Struct(PACK_FORMAT3).size) # -> 208

from hex string to int and back

I'm using Scapy to forge packets in Python, but I need to manually modify a sequence of bits (that scapy doesn't support) inside a specific packet, so I do the following:
Given a packet p, I convert it to a hex string, then to base 10 and finally to a binary number. I modify the bits I'm interested in, then I convert it back to a packet. I have trouble converting it back to the same format of hex string...
# I create a packet with Scapy
In [3]: p = IP(dst="www.google.com") / TCP(sport=10000, dport=10001) / "asdasdasd"
In [6]: p
Out[6]: <IP frag=0 proto=tcp dst=Net('www.google.com') |<TCP sport=webmin dport=10001 |<Raw load='asdasdasd' |>>>
# I convert it to a hex string
In [7]: p_str = str(p)
In [8]: p_str
Out[8]: "E\x00\x001\x00\x01\x00\x00#\x06Q\x1c\x86;\x81\x99\xad\xc2t\x13'\x10'\x11\x00\x00\x00\x00\x00\x00\x00\x00P\x02 \x00\x19a\x00\x00asdasdasd"
# I convert it to an integer
In [9]: p_int = int(p_str.encode('hex'), 16)
In [10]: p_int
Out[10]: 2718738542629841457617712654487115358609175161220115024628433766520503527612013312415911474170471993202533513363026788L
# Finally, I convert it to a binary number
In [11]: p_bin = bin(p_int)
In [11]: p_bin
Out[11]: '0b1000101000000000000000000110001000000000000000100000000000000000100000000000110010100010001110010000110001110111000000110011001101011011100001001110100000100110010011100010000001001110001000100000000000000000000000000000000000000000000000000000000000000000101000000000010001000000000000000011001011000010000000000000000011000010111001101100100011000010111001101100100011000010111001101100100'
# ... (I modify some bits in p_bin, for instance the last three)...
In [12]: p_bin_modified = p_bin[:-3] + '000'
# I convert it back to a packet!
# First to int
In [13]: p_int_modified = int(p_bin_modified, 2)
In [14]: p_int_modified
Out[14]: 2718738542629841457617712654487115358609175161220115024628433766520503527612013312415911474170471993202533513363026784L
# Then to a hex string
In [38]: hex(p_int_modified)
Out[38]: '0x45000031000100004006511c863b8199adc274132710271100000000000000005002200019610000617364617364617360L'
Ops! It doesn't really look like the format of the original hex string. Any ideas on how to do it?
EDIT:
ok, I found decode('hex'), which works on a hex number, but it breaks the reflexivity of the whole conversion...
In [73]: hex(int(bin(int(str(p).encode('hex'), 16)), 2)).decode('hex')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-73-f5b9d74d557f> in <module>()
----> 1 hex(int(bin(int(str(p).encode('hex'), 16)), 2)).decode('hex')
/usr/lib/python2.7/encodings/hex_codec.pyc in hex_decode(input, errors)
40 """
41 assert errors == 'strict'
---> 42 output = binascii.a2b_hex(input)
43 return (output, len(input))
44
TypeError: Odd-length string
EDIT2: I get the same error if I remove the conversion to a binary number...
In [13]: hex(int(str(p).encode('hex'), 16)).decode('hex')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/ricky/<ipython-input-13-47ae9c87a5d2> in <module>()
----> 1 hex(int(str(p).encode('hex'), 16)).decode('hex')
/usr/lib/python2.7/encodings/hex_codec.pyc in hex_decode(input, errors)
40 """
41 assert errors == 'strict'
---> 42 output = binascii.a2b_hex(input)
43 return (output, len(input))
44
TypeError: Odd-length string
Ok, I solved it.
I have to strip the trailing L in the long int and the leading 0x in the hex representation.
In [76]: binascii.unhexlify(hex(int(binascii.hexlify(str(p)), 16)).lstrip('0x').rstrip('L')) == str(p)
Out[76]: True

Why is deviceID getting set to an empty string by the time I print it?

This one is driving me crazy:
I defined these two functions to convert between bytes and int resp. bytes and bits:
def bytes2int(bytes) : return int(bytes.encode('hex'), 16)
def bytes2bits(bytes) : # returns the bits as a 8-zerofilled string
return ''.join('{0:08b}'.format(bytes2int(i)) for i in bytes)
This works as expected:
>>> bytes2bits('\x06')
'00000110'
Now I'm reading in a binary file (with a well defined data structure) and printing out some values, which works too. Here is an example:
The piece of code which reads the bytes from the file:
dataItems = f.read(dataSize)
for i in range(10) : // now only the first 10 items
dataItemBits = bytes2bits(dataItems[i*6:(i+1)*6]) // each item is 6 bytes long
dataType = dataItemBits[:3]
deviceID = dataItemBits[3:8]
# and here printing out the strings...
# ...
print(" => data type: %8s" % (dataType))
print(" => device ID: %8s" % (deviceID))
# ...
with this output:
-----------------------
Item #9: 011000010000000000111110100101011111111111111111
97 bits: 01100001
0 bits: 00000000
62 bits: 00111110
149 bits: 10010101
255 bits: 11111111
255 bits: 11111111
=> data type: 011 // first 3 bits
=> device ID: 00001 // next 5 bits
My problem is, that I'm unable to convert the 'bit-strings' to decimal numbers; if I try to print this
print int(deviceID, 2)
it gives me an the ValueError
ValueError: invalid literal for int() with base 2: ''
although deviceID is definitely a string (I'm using it before as string and printing it out) '00001', so it's not ''.
I also checked deviceID and dataType with __doc__ and they are strings.
This one works well in the console:
>>> int('000010', 2)
2
>>> int('000110', 2)
6
What is going on here?
UPDATE:
This is really weird: when I wrap it in a try/except block, it prints out the correct value.
try :
print int(pmtNumber, 2) // prints the correct value
except :
print "ERROR!" // no exception, so this is never printed
Any ideas?

Categories