Randomly flipping bits in a python binary string

Randomly flipping bits in a python binary string - python

I'm creating some fuzz tests in python and it would be invaluable for me to be able to, given a binary string, randomly flip some bits and ensure that exceptions are correctly raised, or results are correctly displayed for slight alterations on given valid binaries. Does anyone know how I might go about this in Python? I realize this is pretty trivial in lower level languages but for work reasons I've been told to do this in Python, but I'm not sure how to start this, or get the binary representation for something in python. Any ideas on how to execute these fuzz tests in Python?

Strings are immutable, so to make changes, the first thing to do is probably to convert it into a list. At the same time, you can convert the digits into ints for greater ease in manipulation.
hexstring = "1234567890deadbeef"
values = [int(digit, 16) for digit in hexstring]
Then you can flip an individual bit in any of the hex digits.
digitindex = 2
bitindex = 3
values[digitindex] ^= 1 << bitindex
If needed, you can then convert back to hex.
result = "".join("0123456789abcdef"[val] for val in values)

One thing you could try is to convert the string into a bytearray, then performing bit manipulations on each character. You can access each character by index and treat it as an integer.
For example:
>>> a = "hello world"
>>> b = bytearray(a)
>>> b[0] = b[0] ^ 5 # bitwise XOR
>>> print b # or do str(b) to convert it back to a string
mello world
You may also find this article on the Python wiki about bit manipulation to be useful. It goes over bit manipulation in Python to far greater detail, along with loads of useful tips and tricks.

Related

Need help understanding binary conversion in Python

Or I guess binary in general. I'm obviously quite new to coding, so I'll appreciate any help here.
I just started learning about converting numbers into binary, specifically two's complement. The course presented the following code for converting:
num = 19
if num < 0:
isNeg = True
num = abs(num)
else:
isNeg = False
result = ''
if num == 0:
result = '0'
while num > 0:
result = str(num % 2) + result
num = num // 2
if isNeg:
result = '-' + result
This raised a couple of questions with me and after doing some research (mostly here on Stack Overflow), I found myself more confused than I was before. Hoping somebody can break things down a bit more for me. Here are some of those questions:
I thought it was outright wrong that the code suggested just appending a - to the front of a binary number to show its negative counterpart. It looks like bin() does the same thing, but don't you have to flip the bits and add a 1 or something? Is there a reason for this other than making it easy to comprehend/read?
Was reading here and one of the answers in particular said that Python doesn't really work in two's complement, but something else that mimics it. The disconnect here for me is that Python shows me one thing but is storing the numbers a different way. Again, is this just for ease of use? Is bin() using two's complement or Python's method?
Follow-up to that one, how does the 'sign-magnitude' format mentioned in the above answer differ from two's complement?
The Professor doesn't talk at all about 8-bit, 16-bit, 64-bit, etc., which I saw a lot of while reading up on this. Where does this distinction come from, and does Python use one? Or are those designations specific to the program that I might be coding?
A lot of these posts I've only reference how Python stores integers. Is that suggesting that it stores floats a different way, or are they just speaking broadly?
As I wrote this all up, I sort of realized that maybe I'm diving into the deep end before learning how to swim, but I'm curious like that and like to have a deeper understanding of stuff before moving on.

I thought it was outright wrong that the code suggested just appending a - to the front of a binary number to show its negative counterpart. It looks like bin() does the same thing, but don't you have to flip the bits and add a 1 or something? Is there a reason for this other than making it easy to comprehend/read?
You have to somehow designate the number being negative. You can add another symbol (-), add a sign bit at the very beginning, use ones'-complement, use two's-complement, or some other completely made-up scheme that works. Both the ones'- and two's-complement representation of a number require a fixed number of bits, which doesn't exist for Python integers:
>>> 2**1000
1071508607186267320948425049060001810561404811705533607443750
3883703510511249361224931983788156958581275946729175531468251
8714528569231404359845775746985748039345677748242309854210746
0506237114187795418215304647498358194126739876755916554394607
7062914571196477686542167660429831652624386837205668069376
The natural solution is to just prepend a minus sign. You can similarly write your own version of bin() that requires you to specify the number of bits and return the two's-complement representation of the number.
Was reading here and one of the answers in particular said that Python doesn't really work in two's complement, but something else that mimics it. The disconnect here for me is that Python shows me one thing but is storing the numbers a different way. Again, is this just for ease of use? Is bin() using two's complement or Python's method?
Python is a high-level language, so you don't really know (or care) how your particular Python runtime interally stores integers. Whether you use CPython, Jython, PyPy, IronPython, or something else, the language specification only defines how they should behave, not how they should be represented in memory. bin() just takes a number and prints it out using binary digits, the same way you'd convert 123 into base-2.
Follow-up to that one, how does the 'sign-magnitude' format mentioned in the above answer differ from two's complement?
Sign-magnitude usually encodes a number n as 0bXYYYYYY..., where X is the sign bit and YY... are the binary digits of the non-negative magnitude. Arithmetic with numbers encoded as two's-complement is more elegant due to the representation, while sign-magnitude encoding requires special handling for operations on numbers of opposite signs.
The Professor doesn't talk at all about 8-bit, 16-bit, 64-bit, etc., which I saw a lot of while reading up on this. Where does this distinction come from, and does Python use one? Or are those designations specific to the program that I might be coding?
No, Python doesn't define a maximum size for its integers because it's not that low-level. 2**1000000 computes fine, as will 2**10000000 if you have enough memory. n-bit numbers arise when your hardware makes it more beneficial to make your numbers a certain size. For example, processors have instructions that quickly work with 32-bit numbers but not with 87-bit numbers.
A lot of these posts I've only reference how Python stores integers. Is that suggesting that it stores floats a different way, or are they just speaking broadly?
It depends on what your Python runtime uses. Usually floating point numbers are like C doubles, but that's not required.

don't you have to flip the bits and add a 1 or something?
Yes, for two complement notation you invert all bits and add one to get the negative counterpart.
Is bin() using two's complement or Python's method?
Two's complement is a practical way to represent negative number in electronics that can have only 0 and 1. Internally the microprocessor uses two's complement for negative numbers and all modern microprocessors do. For more info, see your textbook on computer architecture.
how does the 'sign-magnitude' format mentioned in the above answer
differ from two's complement?
You should look what this code does and why it is there:
while num > 0:
result = str(num % 2) + result
num = num // 2

Alternative ways for binary conversion in python

I often need to convert status code to bit representation in order to determine what error/status are active on analyzers using plain-text or binary communication protocol.
I use python to poll data and to parse it. Sometime I really get confuse because I found that there is so many ways to solve a problem. Today I had to convert a string where each character is an hexadecimal digit to its binary representation. That is, each hexadecimal character must be converted into 4 bits, where the MSB start from left. Note: I need a char by char conversion, and leading zero.
I managed to build these following function which does the trick in a quasi one-liner fashion.
def convertStatus(s, base=16):
n = int(math.log2(base))
b = "".join(["{{:0>{}b}}".format(n).format(int(x, base)) for x in s])
return b
Eg., this convert the following input:
0123456789abcdef
into:
0000000100100011010001010110011110001001101010111100110111101111
Which was my goal.
Now, I am wondering what another elegant solutions could I have used to reach my goal? I also would like to better understand what are advantages and drawbacks among solutions. The function signature can be changed, but usually it is a string for input and output. Lets become imaginative...

This is simple in two steps
Converting a string to an int is almost trivial: use int(aString, base=...)
the first parameter is can be a string!
and with base, almost every option is possible
Converting a number to a string is easy with format() and the mini print language
So converting hex-strings to binary can be done as
def h2b(x):
val = int(x, base=16)
return format(val, 'b')
Here the two steps are explicitly. Possible it's better to do it in one line, or even in-line

Represent string as an integer in python

I would like to be able to represent any string as a unique integer (means every integer in the world could mean only one string, and a certain string would result constantly in the same integer).
The obvious point is, that's how the computer works, representing the string 'Hello' (for example) as a number for each character, specifically a byte (assuming ASCII encoding).
But... I would like to perform arithmetic calculations over that number (Encode it as a number using RSA).
The reason this is getting messy is because assuming I have a bit larger string 'I am an average length string' I have more characters (29 in this case), and an integer with 29 bytes could come up HUGE, maybe too much for the computer to handle (when coming up with bigger strings...?).
Basically, my question is, how could I do? I wouldn't like to use any module for RSA, it's a task I would like to implement myself.

Here's how to turn a string into a single number. As you suspected, the number will get very large, but Python can handle integers of any arbitrary size. The usual way of working with encryption is to do individual bytes all at once, but I'm assuming this is only for a learning experience. This assumes a byte string, if you have a Unicode string you can encode to UTF-8 first.
num = 0
for ch in my_string:
num = num << 8 + ord(ch)

Converting 2.5 byte comparisons to 3

I'm trying to convert a 2.5 program to 3.
Is there a way in python 3 to change a byte string, such as b'\x01\x02' to a python 2.5 style string, such as '\x01\x02', so that string and byte-by-byte comparisons work similarly to 2.5? I'm reading the string from a binary file.
I have a 2.5 program that reads bytes from a file, then compares or processes each byte or combination of bytes with specified constants. To run the program under 3, I'd like to avoid changing all my constants to bytes and byte strings ('\x01' to b'\x01'), then dealing with issues in 3 such as:
a = b'\x01'
b = b'\x02'
results in
(a+b)[0] != a
even though similar operation work in 2.5. I have to do (a+b)[0] == ord(a), while a+b == b'\x01\x02' works fine. (By the way, what do I do to (a+b)[0] so it equals a?)
Unpacking structures is also an issue.
Am I missing something simple?

Bytes is an immutable sequence of integers (in the range 0<= to <256), therefore when you're accessing (a+b)[0] you're getting back an integer, exactly the same one you'd get by accessing a[0]. so when you're comparing sequence a to an integer (a+b)[0], they're naturally different.
using the slice notation you could however get a sequence back:
>>> (a+b)[:1] == a # 1 == len(a) ;)
True
because slicing returns bytes object.
I would also advised to run 2to3 utility (it needs to be run with py2k) to convert some code automatically. It won't solve all your problems, but it'll help a lot.

How Does One Read Bytes from File in Python

Similar to this question, I am trying to read in an ID3v2 tag header and am having trouble figuring out how to get individual bytes in python.
I first read all ten bytes into a string. I then want to parse out the individual pieces of information.
I can grab the two version number chars in the string, but then I have no idea how to take those two chars and get an integer out of them.
The struct package seems to be what I want, but I can't get it to work.
Here is my code so-far (I am very new to python btw...so take it easy on me):
def __init__(self, ten_byte_string):
self.whole_string = ten_byte_string
self.file_identifier = self.whole_string[:3]
self.major_version = struct.pack('x', self.whole_string[3:4]) #this
self.minor_version = struct.pack('x', self.whole_string[4:5]) # and this
self.flags = self.whole_string[5:6]
self.len = self.whole_string[6:10]
Printing out any value except is obviously crap because they are not formatted correctly.

If you have a string, with 2 bytes that you wish to interpret as a 16 bit integer, you can do so by:
>>> s = '\0\x02'
>>> struct.unpack('>H', s)
(2,)
Note that the > is for big-endian (the largest part of the integer comes first). This is the format id3 tags use.
For other sizes of integer, you use different format codes. eg. "i" for a signed 32 bit integer. See help(struct) for details.
You can also unpack several elements at once. eg for 2 unsigned shorts, followed by a signed 32 bit value:
>>> a,b,c = struct.unpack('>HHi', some_string)
Going by your code, you are looking for (in order):
a 3 char string
2 single byte values (major and minor version)
a 1 byte flags variable
a 32 bit length quantity
The format string for this would be:
ident, major, minor, flags, len = struct.unpack('>3sBBBI', ten_byte_string)

Why write your own? (Assuming you haven't checked out these other options.) There's a couple options out there for reading in ID3 tag info from MP3s in Python. Check out my answer over at this question.

I am trying to read in an ID3v2 tag header
FWIW, there's already a module for this.

I was going to recommend the struct package but then you said you had tried it. Try this:
self.major_version = struct.unpack('H', self.whole_string[3:5])
The pack() function convers Python data types to bits, and the unpack() function converts bits to Python data types.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.