Python decode unexpected behavior on hex string

Python decode unexpected behavior on hex string - python

I have an hexa string that I want to convert to a numpy array of int.
I don't want to use for loops because looping through numpy arrays is not advised.
So I do the following :
vector = np.fromstring( s.decode('hex'), dtype=np.uint8 )
If for example s = 'a312', s.decode('hex') returns '\xa3\x12' which is correct.
But if s = 'a320', s.decode('hex') returns '\xa3 ' which seems a little bit weird as first sight because I expect '\xa3\x20'.
Can you help me ?

The point is that a binary string in Pyhon is represented as its ASCII equivalent.
The equivalent of '\x20' is a space, as one can see in the ASCII table:
Hex Dec ASCII
20 32 (space)
If you write '\x20' in the terminal, it will print a space:
>>> '\x20'
' '
>>> 'a320'.decode('hex') == '\xa3\x20'
True
Note that this is only an aspect of representation: behind the curtains, the binary string contains the 0010 0000 binary value.

Related

encoding unicode using UTF-8

In Python, if I type
euro = u'\u20AC'
euroUTF8 = euro.encode('utf-8')
print(euroUTF8, type(euroUTF8), len(euroUTF8))
the output is
('\xe2\x82\xac', <type 'str'>, 3)
I have two questions:
1. it looks like euroUTF8 is encoded over 3 bytes, but how do I get its binary representation to see how many bits it contain?
2. what does 'x' in '\xe2\x82\xac' mean? I don't think 'x' is a hex number. And why there are three '\'?

In Python 2, print is a statement, not a function. You are printing a tuple here. Print the individual elements by removing the (..):
>>> euro = u'\u20AC'
>>> euroUTF8 = euro.encode('utf-8')
>>> print euroUTF8, type(euroUTF8), len(euroUTF8)
€ <type 'str'> 3
Now you get the 3 individual objects written as strings to stdout; my terminal just happens to be configured to interpret anything written to it as UTF-8, so the bytes correctly result in the € Euro symbol being displayed.
The \x<hh> sequences are Python string literal escape sequences (see the reference documentation); they are the default output for the repr() applied to a string with non-ASCII, non-printable bytes in them. You'll see the same thing when echoing the value in an interactive interpreter:
>>> euroUTF8
'\xe2\x82\xac'
>>> euroUTF8[0]
'\xe2'
>>> euroUTF8[1]
'\x82'
>>> euroUTF8[2]
'\xac'
They provide you with ASCII-safe debugging output. The contents of all Python standard library containers use this format; including lists, tuples and dictionaries.
If you want to format to see the bits that make up these values, convert each byte to an integer by using the ord() function, then format the integer as binary:
>>> ' '.join([format(ord(b), '08b') for b in euroUTF8])
'11100010 10000010 10101100'

Each letter in each encoding are represented using different number of bits. UTF-8 is a 8 bit encoding, so there is no need to get a binary representation to know each bit count of each character. (If you still want to present bits, refer to Martijn's answer.)
\x means that the following value is a byte. So x is not something like a hex number that you should convert or read. It identifies the following value, which is you are interested in. \'s are used to escape that x's because they are not a part of the value.

python string to hex with escaped hex values

I have a string like "Some characters \x00\x80\x34 and then some other characters". How can I convert the regular characters to their hex equivalent, while converting \x00 to the actual 00 hex value?
binascii.hexlify() considers '\', 'x', '0', '0' as actual characters.
Later edit:
The string itself is produced by another function. When I print it, it actually prints "\x00".

As my understanding you are trying to convert only the characters that are not hex values to hex. It would help if you gave a sample input string that you are trying to convert to hex.
Also you can convert to hex values using just the built in encoding and decoding method. That should take care of what you are trying to do. The following three lines are what I ran in terminal of my machine, and gave the output you are expecting. I also attached an image to show you. Hope it helps:
aStr = "Some characters \x00\x80\x34 and then some other characters"
aStr.encode("hex")
aStr.encode("hex").decode("hex")

It's unclear what you're asking, since binascii.hexlify should work:
>>> import binascii
>>> s = "\x00\x80\x34"
>>> binascii.hexlify(s)
'008034'
>>> s = "foobar \x00\x80\x34 foobar"
>>> binascii.hexlify(s)
'666f6f6261722000803420666f6f626172'
foorbar = 666f6f6261722, space = 20
↳ https://docs.python.org/3/library/binascii.html

Python: Converting HEX string to bytes

I'm trying to make byte frame which I will send via UDP. I have class Frame which has attributes sync, frameSize, data, checksum etc. I'm using hex strings for value representation. Like this:
testFrame = Frame("AA01","0034","44853600","D43F")
Now, I need to concatenate this hex values together and convert them to byte array like this?!
def convertToBits(self):
stringMessage = self.sync + self.frameSize + self.data + self.chk
return b16decode(self.stringMessage)
But when I print converted value I don't get the same values or I don't know to read python notation correctly:
This is sync: AA01
This is frame size: 0034
This is data:44853600
This is checksum: D43F
b'\xaa\x01\x004D\x856\x00\xd4?'
So, first word is converted ok (AA01 -> \xaa\x01) but (0034 -> \x004D) it's not the same. I tried to use bytearray.fromhex because I can use spaces between bytes but I got same result. Can you help me to send same hex words via UDP?

Python displays any byte that can represent a printable ASCII character as that character. 4 is the same as \x34, but as it opted to print the ASCII character in the representation.
So \x004 is really the same as \x00\x34, D\x856\x00 is the same as \x44\x85\x36\x00, and \xd4? is the same as \xd4\x3f, because:
>>> b'\x34'
'4'
>>> b'\x44'
'D'
>>> b'\x36'
'6'
>>> b'\x3f'
'?'
This is just the representation of the bytes value; the value is entirely correct and you don't need to do anything else.
If it helps, you can visualise the bytes values as hex again using binascii.hexlify():
>>> import binascii
>>> binascii.hexlify(b'\xaa\x01\x004D\x856\x00\xd4?')
b'aa01003444853600d43f'
and you'll see that 4, D, 6 and ? are once again represented by the correct hexadecimal characters.

How can I format an integer to a two digit hex?

Does anyone know how to get a chr to hex conversion where the output is always two digits?
for example, if my conversion yields 0x1, I need to convert that to 0x01, since I am concatenating a long hex string.
The code that I am using is:
hexStr += hex(ord(byteStr[i]))[2:]

You can use string formatting for this purpose:
>>> "0x{:02x}".format(13)
'0x0d'
>>> "0x{:02x}".format(131)
'0x83'
Edit: Your code suggests that you are trying to convert a string to a hexstring representation. There is a much easier way to do this (Python2.x):
>>> "abcd".encode("hex")
'61626364'
An alternative (that also works in Python 3.x) is the function binascii.hexlify().

You can use the format function:
>>> format(10, '02x')
'0a'
You won't need to remove the 0x part with that (like you did with the [2:])

If you're using python 3.6 or higher you can also use fstrings:
v = 10
s = f"0x{v:02x}"
print(s)
output:
0x0a
The syntax for the braces part is identical to string.format(), except you use the variable's name. See https://www.python.org/dev/peps/pep-0498/ for more.

htmlColor = "#%02X%02X%02X" % (red, green, blue)

The standard module binascii may also be the answer, namely when you need to convert a longer string:
>>> import binascii
>>> binascii.hexlify('abc\n')
'6162630a'

Use format instead of using the hex function:
>>> mychar = ord('a')
>>> hexstring = '%.2X' % mychar
You can also change the number "2" to the number of digits you want, and the "X" to "x" to choose between upper and lowercase representation of the hex alphanumeric digits.
By many, this is considered the old %-style formatting in Python, but I like it because the format string syntax is the same used by other languages, like C and Java.

The simpliest way (I think) is:
your_str = '0x%02X' % 10
print(your_str)
will print:
0x0A
The number after the % will be converted to hex inside the string, I think it's clear this way and from people that came from a C background (like me) feels more like home

How to convert an int to a hex string?

I want to take an integer (that will be <= 255), to a hex string representation
e.g.: I want to pass in 65 and get out '\x41', or 255 and get '\xff'.
I've tried doing this with the struct.pack('c',65), but that chokes on anything above 9 since it wants to take in a single character string.

You are looking for the chr function.
You seem to be mixing decimal representations of integers and hex representations of integers, so it's not entirely clear what you need. Based on the description you gave, I think one of these snippets shows what you want.
>>> chr(0x65) == '\x65'
True
>>> hex(65)
'0x41'
>>> chr(65) == '\x41'
True
Note that this is quite different from a string containing an integer as hex. If that is what you want, use the hex builtin.

This will convert an integer to a 2 digit hex string with the 0x prefix:
strHex = "0x%0.2X" % integerVariable

What about hex()?
hex(255) # 0xff
If you really want to have \ in front you can do:
print '\\' + hex(255)[1:]

Let me add this one, because sometimes you just want the single digit representation
( x can be lower, 'x', or uppercase, 'X', the choice determines if the output letters are upper or lower.):
'{:x}'.format(15)
> f
And now with the new f'' format strings you can do:
f'{15:x}'
> f
To add 0 padding you can use 0>n:
f'{2034:0>4X}'
> 07F2
NOTE: the initial 'f' in f'{15:x}' is to signify a format string

Try:
"0x%x" % 255 # => 0xff
or
"0x%X" % 255 # => 0xFF
Python Documentation says: "keep this under Your pillow: http://docs.python.org/library/index.html"

For Python >= 3.6, use f-string formatting:
>>> x = 114514
>>> f'{x:0x}'
'1bf52'
>>> f'{x:#x}'
'0x1bf52'

If you want to pack a struct with a value <255 (one byte unsigned, uint8_t) and end up with a string of one character, you're probably looking for the format B instead of c. C converts a character to a string (not too useful by itself) while B converts an integer.
struct.pack('B', 65)
(And yes, 65 is \x41, not \x65.)
The struct class will also conveniently handle endianness for communication or other uses.

With format(), as per format-examples, we can do:
>>> # format also supports binary numbers
>>> "int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}".format(42)
'int: 42; hex: 2a; oct: 52; bin: 101010'
>>> # with 0x, 0o, or 0b as prefix:
>>> "int: {0:d}; hex: {0:#x}; oct: {0:#o}; bin: {0:#b}".format(42)
'int: 42; hex: 0x2a; oct: 0o52; bin: 0b101010'

Note that for large values, hex() still works (some other answers don't):
x = hex(349593196107334030177678842158399357)
print(x)
Python 2: 0x4354467b746f6f5f736d616c6c3f7dL
Python 3: 0x4354467b746f6f5f736d616c6c3f7d
For a decrypted RSA message, one could do the following:
import binascii
hexadecimals = hex(349593196107334030177678842158399357)
print(binascii.unhexlify(hexadecimals[2:-1])) # python 2
print(binascii.unhexlify(hexadecimals[2:])) # python 3

(int_variable).to_bytes(bytes_length, byteorder='big'|'little').hex()
For example:
>>> (434).to_bytes(4, byteorder='big').hex()
'000001b2'
>>> (434).to_bytes(4, byteorder='little').hex()
'b2010000'

This worked best for me
"0x%02X" % 5 # => 0x05
"0x%02X" % 17 # => 0x11
Change the (2) if you want a number with a bigger width (2 is for 2 hex printned chars) so 3 will give you the following
"0x%03X" % 5 # => 0x005
"0x%03X" % 17 # => 0x011

Also you can convert any number in any base to hex. Use this one line code here it's easy and simple to use:
hex(int(n,x)).replace("0x","")
You have a string n that is your number and x the base of that number. First, change it to integer and then to hex but hex has 0x at the first of it so with replace we remove it.

I wanted a random integer converted into a six-digit hex string with a # at the beginning. To get this I used
"#%6x" % random.randint(0xFFFFFF)

As an alternative representation you could use
[in] '%s' % hex(15)
[out]'0xf'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python decode unexpected behavior on hex string - python

Related

encoding unicode using UTF-8

python string to hex with escaped hex values

Python: Converting HEX string to bytes

How can I format an integer to a two digit hex?

How to convert an int to a hex string?

Categories

Resources