PNG Chunk type-code Bit #5 - python

I'm trying to write my own little PNG reader in Python. There is something in the documentation I don't quite understand. In chapter 3.3 (where chunks are handled) it says:
Four bits of the type code, namely bit 5 (value 32) of each byte, are used to convey chunk properties. This
choice means that a human can read off the assigned properties according to whether each letter of the type
code is uppercase (bit 5 is 0) or lowercase (bit 5 is 1). However, decoders should test the properties of an unknown
chunk by numerically testing the specified bits; testing whether a character is uppercase or lowercase
is inefficient, and even incorrect if a locale-specific case definition is used.
Ok, so it explicitly denotes one should not test whether a byte is uppercase or lowercase. Then, how do I check that bit 5?
Furthermore, the documentation states
Ancillary bit: bit 5 of first byte
0 (uppercase) = critical, 1 (lowercase) = ancillary.
I have the following function to convert an integer to a bit-stream:
def bits(x, n):
""" Convert an integer value *x* to a sequence of *n* bits as a string. """
return ''.join(str([0, 1][x >> i & 1]) for i in xrange(n - 1, -1, -1))
Just for example, take the sRGB chunk. The lowercase s denotes the chunk is ancillary. But comparing the bit-streams of an uppercase S and lowercase s
01110011
01010011
we can see that bit #5 is zero in both cases.
I think I do have a wrong understanding of counting the bits. As the only bit that changes is the third one (i.e. indexed with 2), i assume this is the bit I'm searching for? It is also the 6th bit from the right and indexed with 5 (from the right of course). Is this what I'm searching for?

Python does have bitwise manipulation. You are doing it the hard way, when they already gave you the bitmask (32 or 0x20).
is_critical = (type_code & 0x20) == 0
or, equivalently:
is_critical = (type_code & (0x1 << 5)) == 0
(with extra parentheses for clarity)

Related

How can read, manipulate, and write a range of (less than 8) bits of a byte array in Python?

I'm looking for a way to read, manipulate, and write specific ranges of (less than 8) bits in a byte array.
I want something like, let's say I want to work with the 6-8th bit of a byte array:
R0 = bytearray(8)
bits = readBits(R0, 6, 2) # bytearray, position, range
print(bits) # [0, 0] or maybe it's easier to work with a byte
# with the irrelevant bits zeroed, you tell me
# do stuff ... XOR, AND, ETC
writeBits(R0, bits, 6) # target, source, position
I'm writing an x86-64 emulator in Python and some instructions in the General Purpose Registers can reference 2 bit segments of the 64 bit register, which I've been representing as bytearray(8). The smallest unit of a byte array seems to be a byte (go figure). Is there a way to select bits? Let's say I want to read, manipulate, and update the 6th-8th bits of an 8 byte object? What does that look like?
Maybe there's a more granular data structure I should be using rather than a byte array?
One answer to a similar question working with a hex string suggests:
i = int("140900793d002327", 16)
# getting bit at position 28 (counting from 0 from right)
i >> 28 & 1
# getting bits at position 24-27
bin(i >> 24 & 0b111)
But this code isn't explained super well, for example what does 0b111 do here and how do I use this approach dynamically to isolate any desired range of bits as with my imaginary readBits and writeBits funcs, rather than the hard-coded functionality here?

Implementation of SHA256 in python3, final hash is too short

I'm trying to write an implementation of SHA-256 in python 3. My version is supposed to take in a hexadecimal encoding and output the corresponding hash value. I've used https://en.wikipedia.org/wiki/SHA-2#Pseudocode as guide.
My function works well for most inputs but sometimes it gives an output that is only 63bits (instead of 64). My function uses 32bit binary strings.
I think I have found the problem, in the last step of the algorithm the binary addition
h4 := h4 + e (or another h-vector and corresponding letter)
yields a binary number that is too small. The last thing I do is to use hex() and I should get a string of 8 characters. In this example I only get 7.
out4 = hex(int(h4,2))[2:]
One problematic input is e5e5e5
It gives
"10110101111110101011010101101100" for h4 and "01010001000011100101001001111111" for e
so the addition gives "00000111000010010000011111101011"
and out4 = 70907eb.
What should I do in these cases?
I should get a string of 8 characters
Why do you think so? hex doesn't allow to specify the length of the output to begin with, so, for example, if the correct output is 8 bytes of zeros, hex will return 0x0 - the shortest representation possible.
I'm guessing the correct output should begin with zero, but hex is cutting it off. Use format strings to specify the length of output:
In [1]: f'{0:08x}'
Out[1]: '00000000' # lowercase hexadecimal (x) digits that must fit into at least 8 characters, prefixed with zero (08) as needed

trouble with grabbing tag bits

I'm implementing a direct mapped cache using python which is direct mapped. Each line in cache contains 4 bytes. I'm having trouble for some reason with pulling out the first (in this case) 27 bits, and also the last 5 bits by using bit shifting.
I'm not sure what exactly I'm doing wrong in terms of bitshifting, but everything I've done is not giving me the desired bits I want. I'm doing a sort of "hard-coded" solution for now but converting the integer stored in cache to a bit string, and using python's string indexing to just get the first 27 bits, though I do want to know how to do it via bit shifting.
def getTag(d_bytes):
b = bin(d_bytes)
b = b[2:]
return (b[0:27])
Is the hard-coded solution I'm referring to.
If the value stored in cache is
0b11010101010101010000100010001
I would like to have a tag of:
110101010101010100001000 (The first 27 bits, as tag = (line size - index - offset)
An index of:
100 - next 3 bits following tag
and an offset of:
01 (The last two bits) - last two bits
You can extract the bits by masking and shifting.
To get the first n bits, the mask to use is 000011..(n times)..11. This mask can simply be generated with (1<<n)-1. This is equal to the number 2^n-1 whose code is exactly the mask that we want.
Now if you want to extract a bitfield that is at any position in your word, you have first to shift it right to the proper position, then use masking.
So for your problem, you can use
# extract n bits of x starting at position m
def getfield(x,n,m):
r=x>>m # shift it right to have lsb of bitfield at position 0
return r&((1<<n)-1) # then mask to extract n bits
lsb27=getfield(tag,27,0) # get bits x[26:0]
msb5=getfield(tag,5,27) # get bits x[31:27]

Reverse bit in python

Reverse bits of a given 32 bits unsigned integer.
For example, given input 43261596 (represented in binary as
00000010100101000001111010011100), return 964176192 (represented in
binary as 00111001011110000010100101000000).
This does not work
def reverseBits(self, n):
return int(bin(n)[:1:-1], 2)
Your problem is in assuming Python's bin produces a 32 bit aligned output. It doesn't; it outputs the smallest number of bits possible. Python 3's int type has an unbounded number of bits, and even in Python 2, int will auto-promote to long if it overflows the bounds of int (which is not related to the limits of C's int).
If you want it to act like a specific width, the easiest way is to use formatting tools with more control (which will also simplify your slice operation).
For example, by formatting to a fixed 32 characters wide, padding with zeroes, you get your desired result:
>>> int('{:032b}'.format(43261596)[::-1], 2)
964176192
The answer is in the output of bin():
>>> bin(12345)
'0b11000000111001'
As you can see, it only outputs the first 14 ones and zeros. This is because it removes any leading zeros. Why does it do this? Well, python doesn't use a fixed size for integers like many other languages. The ints might be any number of bytes in practice, depending on the system and implementation.
So instead of 00000000000000001111111111111111 becoming 11111111111111110000000000000000, it becomes 1111111111111111 instead

The tilde operator in Python

What's the usage of the tilde operator in Python?
One thing I can think about is do something in both sides of a string or list, such as check if a string is palindromic or not:
def is_palindromic(s):
return all(s[i] == s[~i] for i in range(len(s) / 2))
Any other good usage?
It is a unary operator (taking a single argument) that is borrowed from C, where all data types are just different ways of interpreting bytes. It is the "invert" or "complement" operation, in which all the bits of the input data are reversed.
In Python, for integers, the bits of the twos-complement representation of the integer are reversed (as in b <- b XOR 1 for each individual bit), and the result interpreted again as a twos-complement integer. So for integers, ~x is equivalent to (-x) - 1.
The reified form of the ~ operator is provided as operator.invert. To support this operator in your own class, give it an __invert__(self) method.
>>> import operator
>>> class Foo:
... def __invert__(self):
... print 'invert'
...
>>> x = Foo()
>>> operator.invert(x)
invert
>>> ~x
invert
Any class in which it is meaningful to have a "complement" or "inverse" of an instance that is also an instance of the same class is a possible candidate for the invert operator. However, operator overloading can lead to confusion if misused, so be sure that it really makes sense to do so before supplying an __invert__ method to your class. (Note that byte-strings [ex: '\xff'] do not support this operator, even though it is meaningful to invert all the bits of a byte-string.)
~ is the bitwise complement operator in python which essentially calculates -x - 1
So a table would look like
i ~i
-----
0 -1
1 -2
2 -3
3 -4
4 -5
5 -6
So for i = 0 it would compare s[0] with s[len(s) - 1], for i = 1, s[1] with s[len(s) - 2].
As for your other question, this can be useful for a range of bitwise hacks.
One should note that in the case of array indexing, array[~i] amounts to reversed_array[i]. It can be seen as indexing starting from the end of the array:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
^ ^
i ~i
Besides being a bitwise complement operator, ~ can also help revert a boolean value, though it is not the conventional bool type here, rather you should use numpy.bool_.
This is explained in,
import numpy as np
assert ~np.True_ == np.False_
Reversing logical value can be useful sometimes, e.g., below ~ operator is used to cleanse your dataset and return you a column without NaN.
from numpy import NaN
import pandas as pd
matrix = pd.DataFrame([1,2,3,4,NaN], columns=['Number'], dtype='float64')
# Remove NaN in column 'Number'
matrix['Number'][~matrix['Number'].isnull()]
The only time I've ever used this in practice is with numpy/pandas. For example, with the .isin() dataframe method.
In the docs they show this basic example
>>> df.isin([0, 2])
num_legs num_wings
falcon True True
dog False True
But what if instead you wanted all the rows not in [0, 2]?
>>> ~df.isin([0, 2])
num_legs num_wings
falcon False False
dog True False
I was solving this leetcode problem and I came across this beautiful solution by a user named Zitao Wang.
The problem goes like this for each element in the given array find the product of all the remaining numbers without making use of divison and in O(n) time
The standard solution is:
Pass 1: For all elements compute product of all the elements to the left of it
Pass 2: For all elements compute product of all the elements to the right of it
and then multiplying them for the final answer
His solution uses only one for loop by making use of. He computes the left product and right product on the fly using ~
def productExceptSelf(self, nums):
res = [1]*len(nums)
lprod = 1
rprod = 1
for i in range(len(nums)):
res[i] *= lprod
lprod *= nums[i]
res[~i] *= rprod
rprod *= nums[~i]
return res
Explaining why -x -1 is correct in general (for integers)
Sometimes (example), people are surprised by the mathematical behaviour of the ~ operator. They might reason, for example, that rather than evaluating to -19, the result of ~18 should be 13 (since bin(18) gives '0b10010', inverting the bits would give '0b01101' which represents 13 - right?). Or perhaps they might expect 237 (treating the input as signed 8-bit quantity), or some other positive value corresponding to larger integer sizes (such as the machine word size).
Note, here, that the signed interpretation of the bits 11101101 (which, treated as unsigned, give 237) is... -19. The same will happen for larger numbers of bits. In fact, as long as we use at least 6 bits, and treating the result as signed, we get the same answer: -19.
The mathematical rule - negate, and then subtract one - holds for all inputs, as long as we use enough bits, and treat the result as signed.
And, this being Python, conceptually numbers use an arbitrary number of bits. The implementation will allocate more space automatically, according to what is necessary to represent the number. (For example, if the value would "fit" in one machine word, then only one is used; the data type abstracts the process of sign-extending the number out to infinity.) It also does not have any separate unsigned-integer type; integers simply are signed in Python. (After all, since we aren't in control of the amount of memory used anyway, what's the point in denying access to negative values?)
This breaks intuition for a lot of people coming from a C environment, in which it's arguably best practice to use only unsigned types for bit manipulation and then apply 2s-complement interpretation later (and only if appropriate; if a value is being treated as a group of "flags", then a signed interpretation is unlikely to make sense). Python's implementation of ~, however, is consistent with its other design choices.
How to force unsigned behaviour
If we wanted to get 13, 237 or anything else like that from inverting the bits of 18, we would need some external mechanism to specify how many bits to invert. (Again, 18 conceptually has arbitrarily many leading 0s in its binary representation in an arbitrary number of bits; inverting them would result in something with leading 1s; and interpreting that in 2s complement would give a negative result.)
The simplest approach is to simply mask off those arbitrarily-many bits. To get 13 from inverting 18, we want 5 bits, so we mask with 0b11111, i.e., 31. More generally (and giving the same interface for the original behaviour):
def invert(value, bits=None):
result = ~value
return result if bits is None else (result & ((1 << bits) - 1))
Another way, per Andrew Jenkins' answer at the linked example question, is to XOR directly with the mask. Interestingly enough, we can use XOR to handle the default, arbitrary-precision case. We simply use an arbitrary-sized mask, i.e. an integer that conceptually has an arbitrary number of 1 bits in its binary representation - i.e., -1. Thus:
def invert(value, bits=None):
return value ^ (-1 if bits is None else ((1 << bits) - 1))
However, using XOR like this will give strange results for a negative value - because all those arbitrarily-many set bits "before" (in more-significant positions) the XOR mask weren't cleared:
>>> invert(-19, 5) # notice the result is equal to 18 - 32
-14
it's called Binary One’s Complement (~)
It returns the one’s complement of a number’s binary. It flips the bits. Binary for 2 is 00000010. Its one’s complement is 11111101.
This is binary for -3. So, this results in -3. Similarly, ~1 results in -2.
~-3
Output : 2
Again, one’s complement of -3 is 2.
This is minor usage is tilde...
def split_train_test_by_id(data, test_ratio, id_column):
ids = data[id_column]
in_test_set = ids.apply(lambda id_: test_set_check(id_, test_ratio))
return data.loc[~in_test_set], data.loc[in_test_set]
the code above is from "Hands On Machine Learning"
you use tilde (~ sign) as alternative to - sign index marker
just like you use minus - is for integer index
ex)
array = [1,2,3,4,5,6]
print(array[-1])
is the samething as
print(array[~1])

Categories