Is there a way to pad to an even number of digits? - python

I'm trying to create a hex representation of some data that needs to be transmitted (specifically, in ASN.1 notation). At some points, I need to convert data to its hex representation. Since the data is transmitted as a byte sequence, the hex representation has to be padded with a 0 if the length is odd.
Example:
>>> hex2(3)
'03'
>>> hex2(45)
'2d'
>>> hex2(678)
'02a6'
The goal is to find a simple, elegant implementation for hex2.
Currently I'm using hex, stripping out the first two characters, then padding the string with a 0 if its length is odd. However, I'd like to find a better solution for future reference. I've looked in str.format without finding anything that pads to a multiple.

def hex2(n):
x = '%x' % (n,)
return ('0' * (len(x) % 2)) + x

To be totally honest, I am not sure what the issue is. A straightforward implementation of what you describe goes like this:
def hex2(v):
s = hex(v)[2:]
return s if len(s) % 2 == 0 else '0' + s
I would not necessarily call this "elegant" but I would certainly call it "simple."

Python's binascii module's b2a_hex is guaranteed to return an even-length string.
the trick then is to convert the integer into a bytestring. Python3.2 and higher has that built-in to int:
from binascii import b2a_hex
def hex2(integer):
return b2a_hex(integer.to_bytes((integer.bit_length() + 7) // 8, 'big'))

Might want to look at the struct module, which is designed for byte-oriented i/o.
import struct
>>> struct.pack('>i',678)
'\x00\x00\x02\xa6'
#Use h instead of i for shorts
>>> struct.pack('>h',1043)
'\x04\x13'

Related

How to make a fixed-size byte variable in Python

Let's say, I have a string (Unicode if it matters) variable which is less than 100 bytes. I want to create another variable with exactly 100 byte in size which includes this string and is padded with zero or whatever. How would I do it in Python 3?
For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct module.
struct — Interpret bytes as packed binary data
Just for the string, you might not need struct, but as soon as you start also packing binary values, struct will make your life much easier.
Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.
Protocol Buffer Basics: Python
PyMOTW - JavaScript Object Notation Serializer
Something like this should work:
st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation
To pad with null bytes you can do it the way they do it in the stdlib base64 module.
some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))
Here's a roundabout way of doing it:
>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24
So following this, an ASCII string of 100 bytes will need to be 79 characters long
>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100
This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.
def padder(strng):
TARGETSIZE = 100
padChar = "0"
curSize = sys.getsizeof(strng)
if curSize <= TARGETSIZE:
for i in range(TARGETSIZE - curSize):
strng = padChar + strng
return strng
else:
return strng # Not sure if you need to handle strings that start longer than your target, but you can do that here

Convert int to single byte in a string?

I'm implementing PKCS#7 padding right now in Python and need to pad chunks of my file in order to amount to a number divisible by sixteen. I've been recommended to use the following method to append these bytes:
input_chunk += '\x00'*(-len(input_chunk)%16)
What I need to do is the following:
input_chunk_remainder = len(input_chunk) % 16
input_chunk += input_chunk_remainder * input_chunk_remainder
Obviously, the second line above is wrong; I need to convert the first input_chunk_remainder to a single byte string. How can I do this in Python?
In Python 3, you can create bytes of a given numeric value with the bytes() type; you can pass in a list of integers (between 0 and 255):
>>> bytes([5])
b'\x05'
bytes([5] * 5)
b'\x05\x05\x05\x05\x05'
An alternative method is to use an array.array() with the right number of integers:
>>> import array
>>> array.array('B', 5*[5]).tobytes()
b'\x05\x05\x05\x05\x05'
or use the struct.pack() function to pack your integers into bytes:
>>> import struct
>>> struct.pack('{}B'.format(5), *(5 * [5]))
b'\x05\x05\x05\x05\x05'
There may be more ways.. :-)
In Python 2 (ancient now), you can do the same by using the chr() function:
>>> chr(5)
'\x05'
>>> chr(5) * 5
'\x05\x05\x05\x05\x05'
In Python3, the bytes built-in accepts a sequence of integers. So for just one integer:
>>> bytes([5])
b'\x05'
Of course, thats bytes, not a string. But in Python3 world, OP would probably use bytes for the app he described, anyway.

Xor Hex/ASCII/Conversion Issue

So I have a problem where I want to xor various hex strings, convert them to regular english strings, then re-convert them to hex strings. I'm not really familiar with working with hex or xor in any meaningful way, however. Do I need to convert the hex to binary or unicode before I perform a bitwise xor operation? If so, how do I retrieve the hex values once that is done? I've been looking into using str.encode('hex') and str.decode('hex'), but I keep getting errors saying that I am using non-hexadecimal characters. In short, I'm totally lost.
Python has an XOR operator for integers: ^. Here's how you could use it:
>>> hex(int("123abc", 16) ^ int("def456", 16))
'0xccceea'
EDIT: testing with long hex strings as per your comment:
>>> def hexor(hex1, hex2):
... """XOR two hex strings."""
... xor = hex(int(hex1, 16) ^ int(hex2, 16))
... return xor[2:].rstrip("L") # get rid of "0x" and maybe "L"
...
>>> import random
>>> a = "".join(random.choice("0123456789abcdef") for i in range(200))
>>> b = "".join(random.choice("0123456789abcdef") for i in range(200))
>>> a
'8db12de2f49f092620f6d79d6601618daab5ec6747266c2eea29c3493278daf82919aae6a72
64d4cf3dffd70cb1b6fde72ba2a04ac354fcb871eb60e088c2167e73006e0275287de6fc6133
56e44d7b0ff8378a0830d9d87151cbf3331382b096f02fd72'
>>> b
'40afe17fa8fbc56153c78f504e50a241df0a35fd204f8190c0591eda9c63502b41611aa9ac2
27fcd1a9faea642d89a3a212885711d024d2c973115eea11ceb6a57a6fa1f478998b94aa7d3e
993c04d24a0e1ac7c10fd834de61caefb97bcb65605f06eae'
>>> hexor(a, b)
'cd1ecc9d5c64cc47733158cd2851c3cc75bfd99a6769edbe2a70dd93ae1b8ad36878b04f0b0
43281e94053d689c3f5e45392af75b13702e7102fa3e0a990ca0db096fcff60db1f672561c0d
cfd849a945f62d4dc93f01ecaf30011c8a6849d5f6af293dc'
#user1427661: you are seeing the same output as one of the input(say input1) because -
len(input1) > len(input2)
What you possibly can do now is apply a check on the length of the two strings and strip the larger one to match the size of the smaller one (because rest of the part is anyways useless) with something like this-
if len(input1) > len(input2):
input1 = input1[:len(b)]
Likewise an else condition.
Let me give you a more simpler answer (ofcourse in my opinion!). You may use the in-built 'operator' package and then directly use the xor method in it.
http://docs.python.org/2/library/operator.html
Hope this helps.

Split integer into two concatenated hex strings- Python

I need to transmit a value that is larger than 65535 via two different hex strings so that when the strings are received, they can be concatenated to form the integer again. For example if the value was 70000 then the two strings would be 0x0001 and 0x1170.
I thought it would be as simple as converting the integer to hex then shifting it right by 4 to get the top string and removing all but the last 4 characters for the bottom.
I think I might be struggling with some syntax (fairly new to Python) and probably some of the logic too. Can anyone think of an easy way to do this?
Thanks
Use divmod builtin function:
>>> [hex(x) for x in divmod(70000, 65536)]
['0x1', '0x1170']
Your algorithm can be implemented easily, as in Lev Levitsky's answer:
hex(big)[2:-4], hex(big)[-4:]
However, it will fail for numbers under 65536.
You could fix that, but you're probably better off splitting the number, then converting the two halves into hex, instead of splitting the hex string.
ecatmur's answer is probably the simplest way to do this:
[hex(x) for x in divmod(70000, 65536)]
Or you could translate your "shift right/truncate" algorithm on the numbers like this:
hex(x >> 16), hex(x & 0xFFFF)
If you need these to be strings like '0x0006' rather than '0x6', instead of calling hex on the parts, you can do this:
['%#06x' % (x,) for x in divmod(x, 65536)]
Or, using the more modern string formatting style:
['0x{:04x}'.format(x) for x in divmod(x, 65536)]
But on the other side, you again probably want to undo this by converting to ints first and then shifting and masking the numbers, instead of concatenating the strings. The inverse of ecatmur's answer is:
int(bighalf) * 65536 + int(smallhalf)
The (equivalent) inverse of the shift/mask implementation is:
(int(bighalf) << 16) | int(smallhalf)
And in that case, you don't need the extra 0s on the left.
It's also worth pointing out that none of these algorithms will work if the number can be negative, or greater than 4294967295, but only because the problem is impossible in those cases.
You mean like this?
In [1]: big = 12345678
In [2]: first, second = hex(big)[2:][:-4], hex(big)[2:][-4:]
In [3]: first, second
Out[3]: ('bc', '614e')
In [4]: int(first+second, 16)
Out[4]: 12345678
Being wary of big/little endians, what you could do to keep it simple is:
val = 70000
to_send = '{:08X}'.format(val) # '00011170'
decoded = int('00011170', 16) # 70000
EDIT: to be very clear then...
hex1, hex2 = to_send[:4], to_send[4:] # send these two and on receipt
my_number = int(hex1 + hex2, 16)
for numbers greater than 65536 or for numbers whose with length >=5, you can use slicing:
>>> num=70000
>>> var1=hex(num)[:-4]
>>> var2='0x'+hex(num)[-4:]
>>> integ=int(var1+var2[2:],16)
>>> print(integ)
70000

Getting Raw Binary Representation of a file in Python

I'd like to get the exact sequence of bits from a file into a string using Python 3. There are several questions on this topic which come close, but don't quite answer it. So far, I have this:
>>> data = open('file.bin', 'rb').read()
>>> data
'\xa1\xa7\xda4\x86G\xa0!e\xab7M\xce\xd4\xf9\x0e\x99\xce\xe94Y3\x1d\xb7\xa3d\xf9\x92\xd9\xa8\xca\x05\x0f$\xb3\xcd*\xbfT\xbb\x8d\x801\xfanX\x1e\xb4^\xa7l\xe3=\xaf\x89\x86\xaf\x0e8\xeeL\xcd|*5\xf16\xe4\xf6a\xf5\xc4\xf5\xb0\xfc;\xf3\xb5\xb3/\x9a5\xee+\xc5^\xf5\xfe\xaf]\xf7.X\x81\xf3\x14\xe9\x9fK\xf6d\xefK\x8e\xff\x00\x9a>\xe7\xea\xc8\x1b\xc1\x8c\xff\x00D>\xb8\xff\x00\x9c9...'
>>> bin(data[:][0])
'0b11111111'
OK, I can get a base-2 number, but I don't understand why data[:][x], and I still have the leading 0b. It would also seem that I have to loop through the whole string and do some casting and parsing to get the correct output. Is there a simpler way to just get the sequence of 01's without looping, parsing, and concatenating strings?
Thanks in advance!
I would first precompute the string representation for all values 0..255
bytetable = [("00000000"+bin(x)[2:])[-8:] for x in range(256)]
or, if you prefer bits in LSB to MSB order
bytetable = [("00000000"+bin(x)[2:])[-1:-9:-1] for x in range(256)]
then the whole file in binary can be obtained with
binrep = "".join(bytetable[x] for x in open("file", "rb").read())
If you are OK using an external module, this uses bitstring:
>>> import bitstring
>>> bitstring.BitArray(filename='file.bin').bin
'110000101010000111000010101001111100...'
and that's it. It just makes the binary string representation of the whole file.
It is not quite clear what the sequence of bits is meant to be. I think it would be most natural to start at byte 0 with bit 0, but it actually depends on what you want.
So here is some code to access the sequence of bits starting with bit 0 in byte 0:
def bits_from_char(c):
i = ord(c)
for dummy in range(8):
yield i & 1
i >>= 1
def bits_from_data(data):
for c in data:
for bit in bits_from_char(c):
yield bit
for bit in bits_from_data(data):
# process bit
(Another note: you would not need data[:][0] in your code. Simply data[0] would do the trick, but without copying the whole string first.)
To convert raw binary data such as b'\xa1\xa7\xda4\x86' into a bitstring that represents the data as a number in binary system (base-2) in Python 3:
>>> data = open('file.bin', 'rb').read()
>>> bin(int.from_bytes(data, 'big'))[2:]
'1010000110100111110110100011010010000110...'
See Convert binary to ASCII and vice versa.

Categories