Deque changes bytes to ints when it is extended - python

from collections import deque
recvBuffer = deque()
x1 = b'\xFF'
recvBuffer.append(x1)
recvBuffer.extend(x1)
x2 = recvBuffer.pop()
x3 = recvBuffer.pop()
print(type(x1))
print(type(x2))
print(type(x3))
The above code prints the following on Python 3.2.3
<class 'bytes'>
<class 'int'>
<class 'bytes'>
Why did the byte change to an int when extend()-ed to a deque?

bytes are documented to be a sequence of integers:
"bytes" object, which is an immutable sequence of integers in the range 0 <= x < 256
When you extend, you iterate over the sequence. When you iterate over a bytes object, you get integers. Note that deque has nothing to do with this. You will see the same behavior using extend on a normal list, or just using for byte in x1.

Related

map() without conversion from byte to integer

I have a bytestring, i want to process each bytes in the bytestring. One of the way to do it is to use map(), however due to this absurd problem Why do I get an int when I index bytes? accessing bytestring by index will cause it to convert to integer (and there is no way to prevent this conversion), and so map will pass each bytes as integer instead of bytes. For example consider the following code
def test_function(input):
print(type(input))
before = b'\x00\x10\x00\x00\x07\x80\x00\x03'
print("After with map")
after_with_map = list(map(test_function, before[:]))
print("After without map")
for i in range(len(before)):
test_function(before[i:i+1])
After with map will print
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
After without map will print
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
Is there any way to force map() to pass bytes as bytes and not as integer?
What is the goal here? The problem, if it can be called that, is that there is no byte type. That is, there is no type in Python to represent a single byte, only a collection of bytes. Probably because the smallest python values are all multiple bytes in size.
This is a difference between bytestrings and regular strings; when you map a string, you get strings of length 1, while when you map a bytestring, you get ints instead. You could probably make an argument that Python should do the same thing for strings (mapping to Unicode codepoints) for consistency, but regardless, mapping a bytes doesn't get you byteses. But since you know that, it should be easy to work around?
The least ugly solution I can come up with is an eager solution that unpacks using the struct module, for which the c code is one of the few things in Python that natively converts to/from length 1 bytes objects:
import struct
before = b'...'
# Returns tuple of len 1 bytes
before_as_len1_bytes = struct.unpack(f'{len(before)}c', before)
A few more solutions with varying levels of ugliness that I came up with first before settling on struct.unpack as the cleanest:
Decode the bytes using latin-1 (the 1-1 bytes to str encoding that directly maps each byte value to the equivalent Unicode ordinal), then map that and encode each length 1 str back to a length 1 bytes:
from operator import methodcaller # At top of file
before = b'...'
# Makes iterator of length 1 bytes objects
before_as_len1_bytes = map(methodcaller('encode', 'latin-1'), before.decode('latin-1'))
Use re.findall to quickly convert from bytes to list of length 1 bytes:
import re # At top of file
before = b'...'
# Makes list of length 1 bytes objects
before_as_len1_bytes = re.findall(rb'.', before)
Use a couple map invocations to construct slice objects then use them to index as you do manually in your loop:
# Iterator of length 1 bytes objects
before_as_len1_bytes = map(before.__getitem__, map(slice, range(len(before)), range(1, len(before) + 1)))
Use struct.iter_unpack in one of a few ways which can then be used to reconstruct bytes objects:
import struct # At top of file for both approaches
from operator import itemgetter # At top of file for second approach
# Iterator of length 1 bytes objects
before_as_len1_bytes = map(bytes, struct.iter_unpack('B', before))
# Or a similar solution that makes the bytes directly inside tuples that must be unpacked
before_as_len1_bytes = map(itemgetter(0), struct.iter_unpack('c', before))
In practice, you probably don't need to do this, and should not, but those are some of the available options.
I don't think there's any way to keep the map from seeing integers instead of bytes. But you can easily convert back to bytes when you're done.
after_with_map = bytes(map(test_function, before[:]))

Byte array to Int in Python 2.x using Standard Libraries

I am comfortable in Python 3.x and Bytearray to Decimal conversion using int.from bytes(). Could come up with the below conversion snippet. Is there a way to achieve the same functionality using Python 2 for positive and negative integers.
val = bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00')
a = int.from_bytes(val, byteorder='big', signed=True)
# print(type(a), type(val), val, a)
# <class 'int'> <class 'bytearray'> bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00') -2083330000000000000000
Need to use Python 2.7 standard libraries to convert byte array to Int.
Eg. bytearray(b'\x00')--> Expected Result: 0
bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00') --> Expected Result: -300000000000000000000
bytearray(b'\x10CV\x1a\x88)0\x00\x00') --> Expected Result: 300000000000000000000
There is no built-in function in Python 2.7 to do the equivalent of int.from_bytes in 3.2+; that's why the method was added in the first place.
If you don't care about handling any cases other than big-endian signed ints, and care about readability more than performance (so you can extend it or maintain it yourself), the simplest solution is probably an explicit loop over the bytes.
For unsigned, this would be easy:
n = 0
for by in b:
n = n * 256 + by
But to handle negative numbers, you need to do three things:
Take off the sign bit from the highest byte. Since we only care about big-endian, this is the 0x80 bit on b[0].
That makes an empty bytearray a special case, so handle that specially.
At the end, if the sign bit was set, 2's-complement the result.
So:
def int_from_bytes(b):
'''Convert big-endian signed integer bytearray to int
int_from_bytes(b) == int.from_bytes(b, 'big', signed=True)'''
if not b: # special-case 0 to avoid b[0] raising
return 0
n = b[0] & 0x7f # skip sign bit
for by in b[1:]:
n = n * 256 + by
if b[0] & 0x80: # if sign bit is set, 2's complement
bits = 8*len(b)
offset = 2**(bits-1)
return n - offset
else:
return n
(This works on any iterable of ints. In Python 3, that includes both bytes and bytearray; in Python 2, it includes bytearray but not str.)
Testing your inputs in Python 3:
>>> for b in (bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00'),
... bytearray(b'\x00'),
... bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00'),
... bytearray(b'\x10CV\x1a\x88)0\x00\x00')):
... print(int.from_bytes(b, 'big', signed=True), int_from_bytes(b))
-2083330000000000000000 -2083330000000000000000
0 0
-300000000000000000000 -300000000000000000000
300000000000000000000 300000000000000000000
And in Python 2:
>>> for b in (bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00'),
... bytearray(b'\x00'),
... bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00'),
... bytearray(b'\x10CV\x1a\x88)0\x00\x00')):
... print int_from_bytes(b)
-2083330000000000000000
0
-300000000000000000000
300000000000000000000
If this is a bottleneck, there are almost surely faster ways to do this. Maybe via gmpy2, for example. In fact, even converting the bytes to a hex string and unhexlifying might be faster, even though it's more than twice the work, if you can find a way to move those main loops from Python to C. Or you could merge up the results of calling struct.unpack_from on 8 bytes at a time instead of handling each byte one by one. But this version should be easy to understand and maintain, and doesn't require anything outside the stdlib.

Why does bytes(5) return b'\x00\x00\x00\x00\x00' instead of b'\x05'?

I'm converting int to bytes using this command in python:
a = 5
b = bytes(a, 'utf-8')
but when I print b I get this value:
b'\x00\x00\x00\x00\x00'
what is wrong with this piece of code?
The bytes() function documentation points to the bytearray() documentation, which states:
The optional source parameter can be used to initialize the array in a few different ways:
[....]
If it is an integer, the array will have that size and will be initialized with null bytes.
You asked for a bytes() object of size 5, initialised to null bytes.
You probably want to turn a into a string first:
bytes(str(a), 'utf-8')
Demo:
>>> a = 5
>>> bytes(str(a), 'utf-8')
b'5'
If you wanted to have the byte value 5 (so the ENQ ASCII control code or whatever else you might want it to mean) you'll need to put it a in a list:
bytes([a])
(no need to provide an encoding then):
>>> bytes([a])
b'\x05'
You are creating a byte array of length 5.
To get the binary represenation of the number 5 you can use bin()
bin(5)
0b101

Convert int to single byte in a string?

I'm implementing PKCS#7 padding right now in Python and need to pad chunks of my file in order to amount to a number divisible by sixteen. I've been recommended to use the following method to append these bytes:
input_chunk += '\x00'*(-len(input_chunk)%16)
What I need to do is the following:
input_chunk_remainder = len(input_chunk) % 16
input_chunk += input_chunk_remainder * input_chunk_remainder
Obviously, the second line above is wrong; I need to convert the first input_chunk_remainder to a single byte string. How can I do this in Python?
In Python 3, you can create bytes of a given numeric value with the bytes() type; you can pass in a list of integers (between 0 and 255):
>>> bytes([5])
b'\x05'
bytes([5] * 5)
b'\x05\x05\x05\x05\x05'
An alternative method is to use an array.array() with the right number of integers:
>>> import array
>>> array.array('B', 5*[5]).tobytes()
b'\x05\x05\x05\x05\x05'
or use the struct.pack() function to pack your integers into bytes:
>>> import struct
>>> struct.pack('{}B'.format(5), *(5 * [5]))
b'\x05\x05\x05\x05\x05'
There may be more ways.. :-)
In Python 2 (ancient now), you can do the same by using the chr() function:
>>> chr(5)
'\x05'
>>> chr(5) * 5
'\x05\x05\x05\x05\x05'
In Python3, the bytes built-in accepts a sequence of integers. So for just one integer:
>>> bytes([5])
b'\x05'
Of course, thats bytes, not a string. But in Python3 world, OP would probably use bytes for the app he described, anyway.

Changing string to byte type in Python 2.7

In python 3.2, i can change the type of an object easily. For example :
x=0
print(type (x))
x=bytes(0)
print(type (x))
it will give me this :
<class 'int'>
<class 'bytes'>
But, in python 2.7, it seems that i can't use the same way to do it. If i do the same code, it give me this :
<type 'int'>
<type 'str'>
What can i do to change the type into a bytes type?
You are not changing types, you are assigning a different value to a variable.
You are also hitting on one of the fundamental differences between python 2.x and 3.x; grossly simplified the 2.x type unicode has replaced the str type, which itself has been renamed to bytes. It happens to work in your code as more recent versions of Python 2 have added bytes as an alias for str to ease writing code that works under both versions.
In other words, your code is working as expected.
What can i do to change the type into a bytes type?
You can't, there is no such type as 'bytes' in Python 2.7.
From the Python 2.7 documentation (5.6 Sequence Types):
"There are seven sequence types: strings, Unicode strings, lists, tuples, bytearrays, buffers, and xrange objects."
From the Python 3.2 documentation (5.6 Sequence Types):
"There are six sequence types: strings, byte sequences (bytes objects), byte arrays (bytearray objects), lists, tuples, and range objects."
In Python 2.x, bytes is just an alias for str, so everything works as expected. Moreover, you are not changing the type of any objects here – you are merely rebinding the name x to a different object.
May be not exactly what you need, but when I needed to get the decimal value of the byte d8 (it was a byte giving an offset in a file) i did:
a = (data[-1:]) # the variable 'data' holds 60 bytes from a PE file, I needed the last byte
#so now a == '\xd8' , a string
b = str(a.encode('hex')) # which makes b == 'd8' , again a string
c = '0x' + b # c == '0xd8' , again a string
int_value = int(c,16) # giving me my desired offset in decimal: 216
#I hope this can help someone stuck in my situation
Just example to emphasize a procedure of turning regular string into binary string and back:
sb = "a0" # just string with 2 characters representing a byte
ib = int(sb, 16) # integer value (160 decimal)
xsb = chr(ib) # a binary string (equals '\xa0')
Now backwards
back_sb = xsb.encode('hex')
back_sb == sb # returns True

Categories