Alternative ways for binary conversion in python

Alternative ways for binary conversion in python - python

I often need to convert status code to bit representation in order to determine what error/status are active on analyzers using plain-text or binary communication protocol.
I use python to poll data and to parse it. Sometime I really get confuse because I found that there is so many ways to solve a problem. Today I had to convert a string where each character is an hexadecimal digit to its binary representation. That is, each hexadecimal character must be converted into 4 bits, where the MSB start from left. Note: I need a char by char conversion, and leading zero.
I managed to build these following function which does the trick in a quasi one-liner fashion.
def convertStatus(s, base=16):
n = int(math.log2(base))
b = "".join(["{{:0>{}b}}".format(n).format(int(x, base)) for x in s])
return b
Eg., this convert the following input:
0123456789abcdef
into:
0000000100100011010001010110011110001001101010111100110111101111
Which was my goal.
Now, I am wondering what another elegant solutions could I have used to reach my goal? I also would like to better understand what are advantages and drawbacks among solutions. The function signature can be changed, but usually it is a string for input and output. Lets become imaginative...

This is simple in two steps
Converting a string to an int is almost trivial: use int(aString, base=...)
the first parameter is can be a string!
and with base, almost every option is possible
Converting a number to a string is easy with format() and the mini print language
So converting hex-strings to binary can be done as
def h2b(x):
val = int(x, base=16)
return format(val, 'b')
Here the two steps are explicitly. Possible it's better to do it in one line, or even in-line

Related

Uniquely encode any ASCII string into a string that uses a subset of ASCII

For this question, please assume python, but it doesn't necessarily matter.
Imagine you have an arbitrary ASCII string, for example:
jrioj4oi3m_=\.,ei9#
Sparing the extensive details, I need to pass this string as a "label" on to another program, but that program doesn't support "labels" containing "special characters" or even numbers. So I'm trying to encode an ASCII string into a string that uses an arbitrary subset of ASCII.
One very naive solution would be to convert the original string into binary, then convert 0s into "a" and 1s into "b". This works to solve my problem, but I would like to learn a better solution here, to become a better programmer.
First of all, what exactly is this problem called?
This is not exactly a hashing problem, because IIRC hashing generally involves encoding into a string that is shorter than the original, and involves collisions.
I need no collisions, and I don't really care how long the encoded string is, as long as it's shorter than the naive case. (Ideally it would be the shortest length possible given the subset)
In fact, it would be ideal to specify exactly what the allowed character set is, then use a generalized encoding algorithm to do the encoding.
Decoding would be nice to know also.

A simple solution would be to first convert to a hex encoding:
jrioj4oi3m_=.,ei9# => 6a72696f6a346f69336d5f3d2e2c65693923
and then translate any numbers into non-hex letters:
6a72696f6a346f69336d5f3d2e2c65693923 => waxswzwfwatuwfwzttwdvftdsescwvwztzst
So the output string would always be exactly twice the length of the input string and only ever contain characters in the range a-z.
This can be easily achieved in python like this:
>>> enc = str.maketrans('0123456789', 'qrstuvwxyz')
>>> dec = str.maketrans('qrstuvwxyz', '0123456789')
>>> s = 'jrioj4oi3m_=.,ei9#'
>>> x = s.encode('ascii').hex().translate(enc)
>>> x
'waxswzwfwatuwfwzttwdvftdsescwvwztzst'
>>> bytes.fromhex(x.translate(dec)).decode('ascii')
'jrioj4oi3m_=.,ei9#'

Interestingly, this actually turns out to be a really simple and common math problem: Base conversion. As a programmer, you probably know, at least in theory, how to convert between base 2, 10, and 16 representations of a value. There are 96 printable ASCII characters, so any ASCII string can be considered to be a base 96 representation of a (probably very large) value. If your label only accepts 64 characters (uppercase, lowercase, digits, and 2 others, for instance), then you simply need to convert your base 96 representation into a base 64 representation of the same value.
Decoding is simply converting your base 64 representation back to the base 96 representation.

Need help understanding binary conversion in Python

Or I guess binary in general. I'm obviously quite new to coding, so I'll appreciate any help here.
I just started learning about converting numbers into binary, specifically two's complement. The course presented the following code for converting:
num = 19
if num < 0:
isNeg = True
num = abs(num)
else:
isNeg = False
result = ''
if num == 0:
result = '0'
while num > 0:
result = str(num % 2) + result
num = num // 2
if isNeg:
result = '-' + result
This raised a couple of questions with me and after doing some research (mostly here on Stack Overflow), I found myself more confused than I was before. Hoping somebody can break things down a bit more for me. Here are some of those questions:
I thought it was outright wrong that the code suggested just appending a - to the front of a binary number to show its negative counterpart. It looks like bin() does the same thing, but don't you have to flip the bits and add a 1 or something? Is there a reason for this other than making it easy to comprehend/read?
Was reading here and one of the answers in particular said that Python doesn't really work in two's complement, but something else that mimics it. The disconnect here for me is that Python shows me one thing but is storing the numbers a different way. Again, is this just for ease of use? Is bin() using two's complement or Python's method?
Follow-up to that one, how does the 'sign-magnitude' format mentioned in the above answer differ from two's complement?
The Professor doesn't talk at all about 8-bit, 16-bit, 64-bit, etc., which I saw a lot of while reading up on this. Where does this distinction come from, and does Python use one? Or are those designations specific to the program that I might be coding?
A lot of these posts I've only reference how Python stores integers. Is that suggesting that it stores floats a different way, or are they just speaking broadly?
As I wrote this all up, I sort of realized that maybe I'm diving into the deep end before learning how to swim, but I'm curious like that and like to have a deeper understanding of stuff before moving on.

I thought it was outright wrong that the code suggested just appending a - to the front of a binary number to show its negative counterpart. It looks like bin() does the same thing, but don't you have to flip the bits and add a 1 or something? Is there a reason for this other than making it easy to comprehend/read?
You have to somehow designate the number being negative. You can add another symbol (-), add a sign bit at the very beginning, use ones'-complement, use two's-complement, or some other completely made-up scheme that works. Both the ones'- and two's-complement representation of a number require a fixed number of bits, which doesn't exist for Python integers:
>>> 2**1000
1071508607186267320948425049060001810561404811705533607443750
3883703510511249361224931983788156958581275946729175531468251
8714528569231404359845775746985748039345677748242309854210746
0506237114187795418215304647498358194126739876755916554394607
7062914571196477686542167660429831652624386837205668069376
The natural solution is to just prepend a minus sign. You can similarly write your own version of bin() that requires you to specify the number of bits and return the two's-complement representation of the number.
Was reading here and one of the answers in particular said that Python doesn't really work in two's complement, but something else that mimics it. The disconnect here for me is that Python shows me one thing but is storing the numbers a different way. Again, is this just for ease of use? Is bin() using two's complement or Python's method?
Python is a high-level language, so you don't really know (or care) how your particular Python runtime interally stores integers. Whether you use CPython, Jython, PyPy, IronPython, or something else, the language specification only defines how they should behave, not how they should be represented in memory. bin() just takes a number and prints it out using binary digits, the same way you'd convert 123 into base-2.
Follow-up to that one, how does the 'sign-magnitude' format mentioned in the above answer differ from two's complement?
Sign-magnitude usually encodes a number n as 0bXYYYYYY..., where X is the sign bit and YY... are the binary digits of the non-negative magnitude. Arithmetic with numbers encoded as two's-complement is more elegant due to the representation, while sign-magnitude encoding requires special handling for operations on numbers of opposite signs.
The Professor doesn't talk at all about 8-bit, 16-bit, 64-bit, etc., which I saw a lot of while reading up on this. Where does this distinction come from, and does Python use one? Or are those designations specific to the program that I might be coding?
No, Python doesn't define a maximum size for its integers because it's not that low-level. 2**1000000 computes fine, as will 2**10000000 if you have enough memory. n-bit numbers arise when your hardware makes it more beneficial to make your numbers a certain size. For example, processors have instructions that quickly work with 32-bit numbers but not with 87-bit numbers.
A lot of these posts I've only reference how Python stores integers. Is that suggesting that it stores floats a different way, or are they just speaking broadly?
It depends on what your Python runtime uses. Usually floating point numbers are like C doubles, but that's not required.

don't you have to flip the bits and add a 1 or something?
Yes, for two complement notation you invert all bits and add one to get the negative counterpart.
Is bin() using two's complement or Python's method?
Two's complement is a practical way to represent negative number in electronics that can have only 0 and 1. Internally the microprocessor uses two's complement for negative numbers and all modern microprocessors do. For more info, see your textbook on computer architecture.
how does the 'sign-magnitude' format mentioned in the above answer
differ from two's complement?
You should look what this code does and why it is there:
while num > 0:
result = str(num % 2) + result
num = num // 2

Getting error in python: Value Error: invalid literal for int() with base 10: '470.21'

i want adding and subtracting this type of data: $12,587.30.which returns answer in same format.how can do this ?
Here is my code example:
print(int(col_ammount2.lstrip('$'))-int(col_ammount.lstrip('$')))
I removed $ sign and convert it to int but it gives me base 10 error.

You mentioned you want to do arithmetic operations to the numbers (addition/subtraction) so you probably want them in float instead. The difference between an integer (int) and float is that integers do not carry decimal points.
Additionally, as #officialaimm mentioned you need to remove the commas too, for example
float('$3,333.33'.replace('$', '').replace(',', ''))
will give you
3333.33
So putting it into your code
print(float(col_ammount2.lstrip('$').replace(',', ''))
- float(col_ammount.lstrip('$').replace(',', '')))
An additional note for when you parse a floating point number (same applies to integers too), you may want to watch out for empty values, i.e.
float('')
is bad. One of the things u can do in case col_amount and col_amount2 may be empty at some point is default them to 0 if that happens
float(col_amount.lstrip(...).replace(...) or 0)
You also want to read this to know about workaround to problems you may face with floating point arithmetic https://docs.python.org/3/tutorial/floatingpoint.html

There are two things you are missing here. Firstly python int(...) cannot parse numbers with commas so you will need to remove commas as well by using .replace(',',''). Secondly int() cannot parse floating point values you will have to use float(...) first and after that maybe typecast it to int using int or math.ceil, math.floor appropriately as per your choice and needs.
Maybe something like this will solve your problem:
col_ammount2='$1,587.30'
col_ammount = '$2,567.67'
print(int(float(col_ammount2.lstrip('$').replace(',','')))-int(float(col_ammount.lstrip('$').replace(',',''))))
If you are doing these sorts of things quite often in your code, making a function as such might be handy:
integerify_currency = lambda x:int(float(x.lstrip('$').replace(',','')))

Converting "0x08h, 0x8ah" to [int,int] in Python

I've got string like x='0x08h, 0x0ah' in Python, wanting to convert it to [8,10] (like unsigned ints). I could split and index it like [int(a[-3:-1],16) for a in x.split(', ')] but is there a better way to convert it to a list of ints?
Would it matter if I had y='080a'?
edit (for plus points:).) what (sane) string-based hexadecimal notations have python support, and which not?

You really have to know what the pattern you're trying to parse is, before you write a parser.
But it looks like your pattern is: optional 0x, then hex digits, then optional h. At least that's the most reasonable thing I can come up with that handles both '0x08h' and '080a'. So:
def parse_hex(s):
return int(s.lstrip('0x').rstrip('h'), 16)
Then:
numbers = [parse_hex(s) for s in x.split(', ')]
Of course you don't actually need to remove the 0x prefix, because Python accepts that as part of a hex string, so you could write it as:
def parse_hex(s):
return int(s.rstrip('h'), 16)
However, I think the intention is clearer if you're more explicit.
From your edit:
edit what (sane) string-based hexadecimal notations have python support, and which not?
See the documentation for int:
Base-2, -8, and -16 literals can be optionally prefixed with 0b/0B, 0o/0O, or 0x/0X, as with integer literals in code.
That's it. (If you read the rest of the paragraph, if you're guaranteed to have 0x/0X, you don't have to explicitly use base=16. But that doesn't help you here, so that one sentence is really all you need.) The docs on Numeric Types and Numeric literals detail exactly what "as with integer literals in code"; the only thing surprising there is that negative numbers aren't literals, complex numbers aren't literals (but pure imaginary numbers are), and non-ASCII digits can be used but the documentation doesn't explain how.

You can also use map: map(lambda s:int(s.lower().replace('0x','').replace('h',''), 16),x.split(', '))

Randomly flipping bits in a python binary string

I'm creating some fuzz tests in python and it would be invaluable for me to be able to, given a binary string, randomly flip some bits and ensure that exceptions are correctly raised, or results are correctly displayed for slight alterations on given valid binaries. Does anyone know how I might go about this in Python? I realize this is pretty trivial in lower level languages but for work reasons I've been told to do this in Python, but I'm not sure how to start this, or get the binary representation for something in python. Any ideas on how to execute these fuzz tests in Python?

Strings are immutable, so to make changes, the first thing to do is probably to convert it into a list. At the same time, you can convert the digits into ints for greater ease in manipulation.
hexstring = "1234567890deadbeef"
values = [int(digit, 16) for digit in hexstring]
Then you can flip an individual bit in any of the hex digits.
digitindex = 2
bitindex = 3
values[digitindex] ^= 1 << bitindex
If needed, you can then convert back to hex.
result = "".join("0123456789abcdef"[val] for val in values)

One thing you could try is to convert the string into a bytearray, then performing bit manipulations on each character. You can access each character by index and treat it as an integer.
For example:
>>> a = "hello world"
>>> b = bytearray(a)
>>> b[0] = b[0] ^ 5 # bitwise XOR
>>> print b # or do str(b) to convert it back to a string
mello world
You may also find this article on the Python wiki about bit manipulation to be useful. It goes over bit manipulation in Python to far greater detail, along with loads of useful tips and tricks.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.