How to map characters to integer range [-128, 127] in Python? - python

I would like to convert a list of characters (represented on a single byte ie. the range [0, 255]) to be represented with integers in the range [-128,127]. I've read that Python's modulo operator (%) always return a number having the same sign as the denominator.
What is the right way to do this conversion in Python?
EDIT
Characters that map to [128,255] with ord should be remapped to [-128,-1], with 128 mapped to -128 and 255 mapped to -1. (For the inverse of the conversion I use chr(my_int%256), but my_int can be a negative number.)

I've found out that I could do this conversion with "unpacking from byte" with the struct module:
# gotcha|pitfall: my original idea, but this generates a list of 1-tuples:
# x = [struct.unpack("b",a) for a in charlist]
fmt = "%ib"%len(charlist) # eg. "5b", if charlist's length is 5
x = struct.unpack(fmt,charlist) # tuple of ints

Not sure if I understood the question... You want to do something like that?
[i - 255 if i > 127 else i for i in [ord(l) for l in "azertyuiopqsdféhjklm3{"]]

def to_ints(input):
return [o if o <= 128 else 255 - o for o in [ord(char) in input]]
def to_str(input):
return "".join([chr(i%256) for i in input])
out = to_ints("This is a test")
print to_str(out)

I wonder what do you mean by "a list of characters", are they numbers? If so, I think x % 256 - 128 or x % -256 + 128 should work.

This is an old question, but for future readers the straightforward solution for single integers is (x + 128)%256 - 128. Replace x with ord(c) if dealing with ASCII character data.
The result of the % operator there is congruent (mod 256) to x + 128; and subtracting 128 from that makes the final result congruent to x and shifts the range to [128,127].
This will work in a generator expression, and save a step in Python 3 where you'd need to convert a string to a bytes object.

Related

How to split bytes into bits [duplicate]

I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
Another way to do this is by using the bitstring module:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b:
>>> c.bin[2:]
'11111111'
The bitstring module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.
What about something like this?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.
As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:
>>> bin(int('ff', base=16))[2:]
'11111111'
... or, if you are using Python 3.9 or newer:
>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'
Note: using lstrip("0b") here will lead to 0 integer being converted to an empty string. This is almost always not what you want to do.
Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.
If you want bit 7 and 8 only, use e.g.
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)
These can easily be translated into simple CPU operations that are super fast.
using python format string syntax
>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111
The second line is where the magic happens. All byte objects have a .hex() function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b} is where the real magic happens. It is using the Format Specification Mini-Language format_spec. Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary.
I prefer this method over the bin() method because using a format string gives a lot more flexibility.
I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:
['0b1000001', '0b1000010', '0b1000011']
Here how to do it using format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.
To binary:
bin(byte)[2:].zfill(8)
Use ord when reading reading bytes:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or
Using str.format():
'{:08b}'.format(ord(f.read(1)))
The other answers here provide the bits in big-endian order ('\x01' becomes '00000001')
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc -
here's a snippet for that:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16).
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.
I wrote three variants of the function. First using numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs

16 bit hex into 14 bit signed int python?

I get a 16 bit Hex number (so 4 digits) from a sensor and want to convert it into a signed integer so I can actually use it.
There are plenty of codes on the internet that get the job done, but with this sensor it is a bit more arkward.
In fact, the number has only 14 bit, the first two (from the left) are irrelevant.
I tried to do it (in Python 3) but failed pretty hard.
Any suggestions how to "cut" the first two digits of the number and then make the rest a signed integer?
The Datasheet says, that E002 should be -8190 ane 1FFE should be +8190.
Thanks a lot!
Let's define a conversion function:
>>> def f(x):
... r = int(x, 16)
... return r if r < 2**15 else r - 2**16
...
Now, let's test the function against the values that the datahsheet provided:
>>> f('1FFE')
8190
>>> f('E002')
-8190
The usual convention for signed numbers is that a number is negative if the high bit is set and positive if it isn't. Following this convention, '0000' is zero and 'FFFF' is -1. The issue is that int assumes that a number is positive and we have to correct for that:
For any number equal to or less than 0x7FFF, then high bit is unset and the number is positive. Thus we return r=int(x,16) if r<2**15.
For any number r-int(x,16) that is equal to or greater than 0x8000, we return r - 2**16.
While your sensor may only produce 14-bin data, the manufacturer is following the standard convention for 16-bit integers.
Alternative
Instead of converting x to r and testing the value of r, we can directly test whether the high bit in x is set:
>>> def g(x):
... return int(x, 16) if x[0] in '01234567' else int(x, 16) - 2**16
...
>>> g('1FFE')
8190
>>> g('E002')
-8190
Ignoring the upper bits
Let's suppose that the manufacturer is not following standard conventions and that the upper 2-bits are unreliable. In this case, we can use modulo, %, to remove them and, after adjusting the other constants as appropriate for 14-bit integers, we have:
>>> def h(x):
... r = int(x, 16) % 2**14
... return r if r < 2**13 else r - 2**14
...
>>> h('1FFE')
8190
>>> h('E002')
-8190
There is a general algorithm for sign-extending a two's-complement integer value val whose number of bits is nbits (so that the top-most of those bits is the sign bit).
That algorithm is:
treat the value as a non-negative number, and if needed, mask off additional bits
invert the sign bit, still treating the result as a non-negative number
subtract the numeric value of the sign bit considered as a non-negative number, producing as a result, a signed number.
Expressing this algorithm in Python produces:
from __future__ import print_function
def sext(val, nbits):
assert nbits > 0
signbit = 1 << (nbits - 1)
mask = (1 << nbits) - 1
return ((val & mask) ^ signbit) - signbit
if __name__ == '__main__':
print('sext(0xe002, 14) =', sext(0xe002, 14))
print('sext(0x1ffe, 14) =', sext(0x1ffe, 14))
which when run shows the desired results:
sext(0xe002, 14) = -8190
sext(0x1ffe, 14) = 8190

How to make python look hex numbers as two's complemented?

I have an array of hex positive and negetive numbers. I want to transform them to decimal value:
>>> int("f107",16)
61703
>>>
how can I make python to look f107 as a two's complemented number? In the other word I want -3833 instead of 61703. How can I achieve it?
It's a very simple function:
def twos_complement(n, w):
if n & (1 << (w - 1)): n = n - (1 << w)
return n
Example:
>>> twos_complement(61703, 16)
-3833
Unlike Joran's answer this supports arbitrary bit-width.
struct.unpack(">h","f107".decode("hex"))
0xf107 = encode_to_bytes => "\xf1\x07"
since its two bytes we simply unpack it as > big-endian h signed-short

read single bit operation python 2.6

I am trying to read a single bit in a binary string but can't seem to get it to work properly. I read in a value then convert to a 32b string. From there I need to read a specific bit in the string but its not always the same. getBin function returns 32bit string with leading 0's. The code I have always returns a 1, even if the bit is a 0. Code example:
slot=195035377
getBin = lambda x, n: x >= 0 and str(bin(x))[2:].zfill(n) or "-" + str(bin(x))[3:].zfill(n)
bits = getBin(slot,32)
bit = (bits and (1 * (2 ** y)) != 0)
print("bit: %i\n"%(bit))
in this example bits = 00001011101000000000000011110011
and if I am looking for bit3 which i s a 0, bit will be equal to 1. Any ideas?
To test for specific bits in a integer value, use the & bitwise operand; no need to convert this to a binary string.
if slot & (1 << 3):
print 'bit 3 is set'
else:
print 'bit 3 is not set'
The above code shifts a test bit to the left twice. Alternatively, shift slot to the right 3 times:
if (slot >> 2) & 1:
To make this generic for any bit position, subtract 1:
if slot & (1 << (bitpos - 1)):
print 'bit {} is set'.format(bitpos)
or
if (slot >> (bitpos - 1)) & 1:
Your binary formatting code is overly verbose. Just use the format() function to create a binary string representation:
format(slot, '032b')
formats your binary value to a 0-padded 32-character binary string.
n = 223
bitpos = 3
bit3 = (n >> (bitpos-1))&1
is how you should be doing it ... don't use strings!
You can just use slicing to get the correct digit.
bits = getBin(slot, 32)
bit = bits[bit_location-1:bit_location] #Assumes zero based values
print("bit: %i\n"%(bit))

How to get the signed integer value of a long in python?

If lv stores a long value, and the machine is 32 bits, the following code:
iv = int(lv & 0xffffffff)
results an iv of type long, instead of the machine's int.
How can I get the (signed) int value in this case?
import ctypes
number = lv & 0xFFFFFFFF
signed_number = ctypes.c_long(number).value
You're working in a high-level scripting language; by nature, the native data types of the system you're running on aren't visible. You can't cast to a native signed int with code like this.
If you know that you want the value converted to a 32-bit signed integer--regardless of the platform--you can just do the conversion with the simple math:
iv = 0xDEADBEEF
if(iv & 0x80000000):
iv = -0x100000000 + iv
Essentially, the problem is to sign extend from 32 bits to... an infinite number of bits, because Python has arbitrarily large integers. Normally, sign extension is done automatically by CPU instructions when casting, so it's interesting that this is harder in Python than it would be in, say, C.
By playing around, I found something similar to BreizhGatch's function, but that doesn't require a conditional statement. n & 0x80000000 extracts the 32-bit sign bit; then, the - keeps the same 32-bit representation but sign-extends it; finally, the extended sign bits are set on n.
def toSigned32(n):
n = n & 0xffffffff
return n | (-(n & 0x80000000))
Bit Twiddling Hacks suggests another solution that perhaps works more generally. n ^ 0x80000000 flips the 32-bit sign bit; then - 0x80000000 will sign-extend the opposite bit. Another way to think about it is that initially, negative numbers are above positive numbers (separated by 0x80000000); the ^ swaps their positions; then the - shifts negative numbers to below 0.
def toSigned32(n):
n = n & 0xffffffff
return (n ^ 0x80000000) - 0x80000000
Can I suggest this:
def getSignedNumber(number, bitLength):
mask = (2 ** bitLength) - 1
if number & (1 << (bitLength - 1)):
return number | ~mask
else:
return number & mask
print iv, '->', getSignedNumber(iv, 32)
You may use struct library to convert values like that. It's ugly, but works:
from struct import pack, unpack
signed = unpack('l', pack('L', lv & 0xffffffff))[0]
A quick and dirty solution (x is never greater than 32-bit in my case).
if x > 0x7fffffff:
x = x - 4294967296
If you know how many bits are in the original value, e.g. byte or multibyte values from an I2C sensor, then you can do the standard Two's Complement conversion:
def TwosComp8(n):
return n - 0x100 if n & 0x80 else n
def TwosComp16(n):
return n - 0x10000 if n & 0x8000 else n
def TwosComp32(n):
return n - 0x100000000 if n & 0x80000000 else n
In case the hexadecimal representation of the number is of 4 bytes, this would solve the problem.
def B2T_32(x):
num=int(x,16)
if(num & 0x80000000): # If it has the negative sign bit. (MSB=1)
num -= 0x80000000*2
return num
print(B2T_32(input("enter a input as a hex value\n")))
Simplest solution with any bit-length of number
Why is the syntax of a signed integer so difficult for the human mind to understand. Because this is the idea of machines. :-)
Let's explain.
If we have a bi-directional 7-bit counter with the initial state
000 0000
and we get a pulse for the back count input. Then the next number to count will be
111 1111
And the people said:
Hey, the counter we need to know that this is a negative reload. You
should add a sign letting you know about this.
And the counter added:
1111 1111
And people asked,
How are we going to calculate that this is -1.
The counter replied: Find a number one greater than the reading and subtract it and you get the result.
1111 1111
-10000 0000
____________
(dec) -1
def sigIntFromHex(a): # a = 0x0xffe1
if a & (1 << (a.bit_length()-1)): # check if highest bit is 1 thru & with 0x1000
return a - (1 << (a.bit_length())) # 0xffe1 - 0x10000
else:
return a
###and more elegant:###
def sigIntFromHex(a):
return a - (1 << (a.bit_length())) if a & (1 << (a.bit_length()-1)) else a
b = 0xFFE1
print(sigIntFromHex(b))
I hope I helped

Categories