map() without conversion from byte to integer - python

I have a bytestring, i want to process each bytes in the bytestring. One of the way to do it is to use map(), however due to this absurd problem Why do I get an int when I index bytes? accessing bytestring by index will cause it to convert to integer (and there is no way to prevent this conversion), and so map will pass each bytes as integer instead of bytes. For example consider the following code
def test_function(input):
print(type(input))
before = b'\x00\x10\x00\x00\x07\x80\x00\x03'
print("After with map")
after_with_map = list(map(test_function, before[:]))
print("After without map")
for i in range(len(before)):
test_function(before[i:i+1])
After with map will print
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
After without map will print
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
<class 'bytes'>
Is there any way to force map() to pass bytes as bytes and not as integer?

What is the goal here? The problem, if it can be called that, is that there is no byte type. That is, there is no type in Python to represent a single byte, only a collection of bytes. Probably because the smallest python values are all multiple bytes in size.
This is a difference between bytestrings and regular strings; when you map a string, you get strings of length 1, while when you map a bytestring, you get ints instead. You could probably make an argument that Python should do the same thing for strings (mapping to Unicode codepoints) for consistency, but regardless, mapping a bytes doesn't get you byteses. But since you know that, it should be easy to work around?

The least ugly solution I can come up with is an eager solution that unpacks using the struct module, for which the c code is one of the few things in Python that natively converts to/from length 1 bytes objects:
import struct
before = b'...'
# Returns tuple of len 1 bytes
before_as_len1_bytes = struct.unpack(f'{len(before)}c', before)
A few more solutions with varying levels of ugliness that I came up with first before settling on struct.unpack as the cleanest:
Decode the bytes using latin-1 (the 1-1 bytes to str encoding that directly maps each byte value to the equivalent Unicode ordinal), then map that and encode each length 1 str back to a length 1 bytes:
from operator import methodcaller # At top of file
before = b'...'
# Makes iterator of length 1 bytes objects
before_as_len1_bytes = map(methodcaller('encode', 'latin-1'), before.decode('latin-1'))
Use re.findall to quickly convert from bytes to list of length 1 bytes:
import re # At top of file
before = b'...'
# Makes list of length 1 bytes objects
before_as_len1_bytes = re.findall(rb'.', before)
Use a couple map invocations to construct slice objects then use them to index as you do manually in your loop:
# Iterator of length 1 bytes objects
before_as_len1_bytes = map(before.__getitem__, map(slice, range(len(before)), range(1, len(before) + 1)))
Use struct.iter_unpack in one of a few ways which can then be used to reconstruct bytes objects:
import struct # At top of file for both approaches
from operator import itemgetter # At top of file for second approach
# Iterator of length 1 bytes objects
before_as_len1_bytes = map(bytes, struct.iter_unpack('B', before))
# Or a similar solution that makes the bytes directly inside tuples that must be unpacked
before_as_len1_bytes = map(itemgetter(0), struct.iter_unpack('c', before))
In practice, you probably don't need to do this, and should not, but those are some of the available options.

I don't think there's any way to keep the map from seeing integers instead of bytes. But you can easily convert back to bytes when you're done.
after_with_map = bytes(map(test_function, before[:]))

Related

Index binary type of a number

b'1'[0]
The binary sequence of 1, when indexed is 49, I can't figure out why is it. Also the b'2'[0] is 50, what is the underlying binary sequence for the numbers?
What you have there is no a "binary sequence", it is a bytes literal.
>>> type(b'1')
<class 'bytes'>
A bytes object is an immutable sequence of single bytes, so all numbers in this sequence have to be in range(0, 256)`. You can construct it from a list of numbers as well:
>>> bytes([50, 33])
b'2!'
So what is this b'' notation all about?
Well, sequences of bytes are often related to text. Not always, but often enough that Python supports a lot of string methods on bytes objects, like capitalize, index and split, as well as this convenient literal syntax where you can enter text, and have it be equivalent to series of bytes corresponding to that text encoded in ASCII. It's still an immutable sequence of numbers in range(0, 256) under the hood, though, which is why indexing a bytes object gives a number.

How to transform a byte value "b'27 '" into int 27

How to get the number in a byte variable ?
I have to transfer data from Arduino to Raspberry Pi with serial and python. I succeed to isolate the variable but its type is bytes, how to get this into an int variable ?
The variable is
b'27'
but i want to get
27
I tried
print(int.from_bytes(b'\x27', "big", signed=True))
But i don't succeed to get the correct number 27
You can use decode to get it to a regular str and then use int:
x = b'27'
y = int(x.decode()) # decode is a method on the bytes class that returns a string
type(y)
# <class 'int'>
Alternatively:
y = int(b'27')
type(y)
# <class 'int'>
Per #chepner's comment, you'll want to note cases where weird encoding can break the latter approach, and for non-utf-8 encoding it could break both
To supplement the helpful and practical answer given by C.Nivs, I would like to add that if you had wanted to use int.from_bytes() to retrieve the value 27, you would have needed to do:
int.from_bytes(b'\x1B', "big", signed=True)
because '\x27' is actually the hex value for the value 39. There are loads of conversion tables online that can be helpful for cross-referencing decimal against hex values. These two forms are only 1:1 for values less than 10.

Deque changes bytes to ints when it is extended

from collections import deque
recvBuffer = deque()
x1 = b'\xFF'
recvBuffer.append(x1)
recvBuffer.extend(x1)
x2 = recvBuffer.pop()
x3 = recvBuffer.pop()
print(type(x1))
print(type(x2))
print(type(x3))
The above code prints the following on Python 3.2.3
<class 'bytes'>
<class 'int'>
<class 'bytes'>
Why did the byte change to an int when extend()-ed to a deque?
bytes are documented to be a sequence of integers:
"bytes" object, which is an immutable sequence of integers in the range 0 <= x < 256
When you extend, you iterate over the sequence. When you iterate over a bytes object, you get integers. Note that deque has nothing to do with this. You will see the same behavior using extend on a normal list, or just using for byte in x1.

Changing string to byte type in Python 2.7

In python 3.2, i can change the type of an object easily. For example :
x=0
print(type (x))
x=bytes(0)
print(type (x))
it will give me this :
<class 'int'>
<class 'bytes'>
But, in python 2.7, it seems that i can't use the same way to do it. If i do the same code, it give me this :
<type 'int'>
<type 'str'>
What can i do to change the type into a bytes type?
You are not changing types, you are assigning a different value to a variable.
You are also hitting on one of the fundamental differences between python 2.x and 3.x; grossly simplified the 2.x type unicode has replaced the str type, which itself has been renamed to bytes. It happens to work in your code as more recent versions of Python 2 have added bytes as an alias for str to ease writing code that works under both versions.
In other words, your code is working as expected.
What can i do to change the type into a bytes type?
You can't, there is no such type as 'bytes' in Python 2.7.
From the Python 2.7 documentation (5.6 Sequence Types):
"There are seven sequence types: strings, Unicode strings, lists, tuples, bytearrays, buffers, and xrange objects."
From the Python 3.2 documentation (5.6 Sequence Types):
"There are six sequence types: strings, byte sequences (bytes objects), byte arrays (bytearray objects), lists, tuples, and range objects."
In Python 2.x, bytes is just an alias for str, so everything works as expected. Moreover, you are not changing the type of any objects here – you are merely rebinding the name x to a different object.
May be not exactly what you need, but when I needed to get the decimal value of the byte d8 (it was a byte giving an offset in a file) i did:
a = (data[-1:]) # the variable 'data' holds 60 bytes from a PE file, I needed the last byte
#so now a == '\xd8' , a string
b = str(a.encode('hex')) # which makes b == 'd8' , again a string
c = '0x' + b # c == '0xd8' , again a string
int_value = int(c,16) # giving me my desired offset in decimal: 216
#I hope this can help someone stuck in my situation
Just example to emphasize a procedure of turning regular string into binary string and back:
sb = "a0" # just string with 2 characters representing a byte
ib = int(sb, 16) # integer value (160 decimal)
xsb = chr(ib) # a binary string (equals '\xa0')
Now backwards
back_sb = xsb.encode('hex')
back_sb == sb # returns True

How does Python manage int and long?

Does anybody know how Python manage internally int and long types?
Does it choose the right type dynamically?
What is the limit for an int?
I am using Python 2.6, Is is different with previous versions?
How should I understand the code below?
>>> print type(65535)
<type 'int'>
>>> print type(65536*65536)
<type 'long'>
Update:
>>> print type(0x7fffffff)
<type 'int'>
>>> print type(0x80000000)
<type 'long'>
int and long were "unified" a few versions back. Before that it was possible to overflow an int through math ops.
3.x has further advanced this by eliminating long altogether and only having int.
Python 2: sys.maxint contains the maximum value a Python int can hold.
On a 64-bit Python 2.7, the size is 24 bytes. Check with sys.getsizeof().
Python 3: sys.maxsize contains the maximum size in bytes a Python int can be.
This will be gigabytes in 32 bits, and exabytes in 64 bits.
Such a large int would have a value similar to 8 to the power of sys.maxsize.
This PEP should help.
Bottom line is that you really shouldn't have to worry about it in python versions > 2.4
Python 2 will automatically set the type based on the size of the value. A guide of max values can be found below.
The Max value of the default Int in Python 2 is 65535, anything above that will be a long
For example:
>> print type(65535)
<type 'int'>
>>> print type(65536*65536)
<type 'long'>
In Python 3 the long datatype has been removed and all integer values are handled by the Int class. The default size of Int will depend on your CPU architecture.
For example:
32 bit systems the default datatype for integers will be 'Int32'
64 bit systems the default datatype for integers will be 'Int64'
The min/max values of each type can be found below:
Int8: [-128,127]
Int16: [-32768,32767]
Int32: [-2147483648,2147483647]
Int64: [-9223372036854775808,9223372036854775807]
Int128: [-170141183460469231731687303715884105728,170141183460469231731687303715884105727]
UInt8: [0,255]
UInt16: [0,65535]
UInt32: [0,4294967295]
UInt64: [0,18446744073709551615]
UInt128: [0,340282366920938463463374607431768211455]
If the size of your Int exceeds the limits mentioned above, python will automatically change it's type and allocate more memory to handle this increase in min/max values. Where in Python 2, it would convert into 'long', it now just converts into the next size of Int.
Example: If you are using a 32 bit operating system, your max value of an Int will be 2147483647 by default. If a value of 2147483648 or more is assigned, the type will be changed to Int64.
There are different ways to check the size of the int and it's memory allocation.
Note: In Python 3, using the built-in type() method will always return <class 'int'> no matter what size Int you are using.
On my machine:
>>> print type(1<<30)
<type 'int'>
>>> print type(1<<31)
<type 'long'>
>>> print type(0x7FFFFFFF)
<type 'int'>
>>> print type(0x7FFFFFFF+1)
<type 'long'>
Python uses ints (32 bit signed integers, I don't know if they are C ints under the hood or not) for values that fit into 32 bit, but automatically switches to longs (arbitrarily large number of bits - i.e. bignums) for anything larger. I'm guessing this speeds things up for smaller values while avoiding any overflows with a seamless transition to bignums.
Interesting. On my 64-bit (i7 Ubuntu) box:
>>> print type(0x7FFFFFFF)
<type 'int'>
>>> print type(0x7FFFFFFF+1)
<type 'int'>
Guess it steps up to 64 bit ints on a larger machine.
Python 2.7.9 auto promotes numbers.
For a case where one is unsure to use int() or long().
>>> a = int("123")
>>> type(a)
<type 'int'>
>>> a = int("111111111111111111111111111111111111111111111111111")
>>> type(a)
<type 'long'>
It manages them because int and long are sibling class definitions. They have appropriate methods for +, -, *, /, etc., that will produce results of the appropriate class.
For example
>>> a=1<<30
>>> type(a)
<type 'int'>
>>> b=a*2
>>> type(b)
<type 'long'>
In this case, the class int has a __mul__ method (the one that implements *) which creates a long result when required.
From python 3.x, the unified integer libraries are even more smarter than older versions. On my (i7 Ubuntu) box I got the following,
>>> type(math.factorial(30))
<class 'int'>
For implementation details refer Include/longintrepr.h, Objects/longobject.c and Modules/mathmodule.c files. The last file is a dynamic module (compiled to an so file). The code is well commented to follow.
Just to continue to all the answers that were given here, especially #James Lanes
the size of the integer type can be expressed by this formula:
total range = (2 ^ bit system)
lower limit = -(2 ^ bit system)*0.5
upper limit = ((2 ^ bit system)*0.5) - 1

Categories