Getting file size of bmp image - python

I have a bitmap image and using this page I am attempting to read the file size.
In case the link breaks:
FileSize | 4 bytes | File size in bytes
Here is part of the bitmap BM\xe6\x04\x00\x00\x00\x00\x00\x006 I want to read from, which as I understand it the file size is between the 3rd and 7th bytes. So \xe6\x04\x00\x00.
I remove all the \x00 since they are null values and don't tell me anything about the file size, so I used:
raw = '\xe6\x04\x00\x00'
character_list = [raw[b:b+1] for b in range(0, len(raw))]
non_empty = [list_ for list_ in character_list if list_ != b'\x00']
This returned me: [b'\xe6', b'\x04']
Now I get all the values in the list using:
size = ''
for byte in non_empty:
size += str(ord(byte))
print(size)
Here are the results of the conversion:
\xe6 > 230
\x04 > 4
This returns me 2304 (since '230' + '4' is 2304), while my bitmap image has the size of 1,254 bytes and 4,096 bytes on disk. Clearly this is not the image size. Where have I gone wrong?
As a side note. If I take another image of size 90 bytes and run the same process with Z\x00\x00\x00 it returns 90 as I expected. (ord('Z') returning 90).

From poking around it looks like the byte order for the size in a bitmap is little endian (https://en.wikipedia.org/wiki/Endianness#Little-endian).
There's a built-in method for int that can convert bytes to a integer. https://docs.python.org/3/library/stdtypes.html#int.from_bytes
So for example:
raw = b'\xe6\x04\x00\x00'
size = int.from_bytes(raw, byteorder='little')
print(size)

Related

Specific image returns strange value when parsed with struct.unpack_from

I'm using the following bit of code to find the bit depth of a given image:
def parseImage(self):
with open(self.imageAddress, "rb") as image:
data = bytearray(image.read())
bitDepth = struct.unpack_from("<L", data, 0x0000001c)
print("the image's colour depth is " + str(bitDepth[0]))
It works as it should when I input my other test images, but when I specifically input the small sample image from this page, it outputs 196640. I've viewed the file in Hex Editor Neo, and the value of the chosen byte is 32. Does anyone know why the program doesn't return this value?
The 4 bytes starting at offset 0x1c are 20 00 03 00 which indeed is 196640 in decimal in little-endian byte format. The problem is all you want is the 20 00 which, again in little-endian byte format, is 32 in decimal.
The Wikipedia article on the BMP file format (in the Windows BITMAPINFOHEADER section) says it's only a two-byte value — so the problem is you're parsing too many bytes.
The fix is simple, specify the correct number of bytes for the unsigned integer in the struct format string ("<H" instead of "<L"). Note I've also added some scaffolding to make the code posted into something runnable.
import struct
class Test:
def __init__(self, filename):
self.imageAddress = filename
def parseImage(self):
with open(self.imageAddress, "rb") as image:
data = bytearray(image.read())
bitDepth = struct.unpack_from("<H", data, 0x1c)
print("the image's colour depth is " + str(bitDepth[0]))
t = Test('Small Sample BMP Image File Download.bmp')
t.parseImage() # -> the image's colour depth is 32

Get int value from each two bytes

I am trying to read bytes from an image, and get all the int (16 bit) values from that image.
After I parsed the image header, I got to the pixel values. The values that I get when the pair of bytes are like b"\xd4\x00" is incorrect. In this case it should be 54272, not 3392.
This are parts of the code:
I use a generator to get the bytes:
import itertools
def osddef_generator(in_file):
with open(in_file, mode='rb') as f:
dat = f.read()
for byte in dat:
yield byte
def take_slice(in_generator, size):
return ''.join(str(chr(i)) for i in itertools.islice(in_generator, size))
def take_single_pixel(in_generator):
pix = itertools.islice(in_generator, 2)
hex_list = [hex(i) for i in pix]
hex_str = "".join(hex_list)[2:].replace("0x", '')
intval = int(hex_str, 16)
print("hex_list: ", hex_list)
print("hex_str: ", hex_str)
print("intval: ", intval)
After I get the header correctly using the take_slice method, I get to the part with the pixel values, where I use the take_single_pixel method.
Here, I get the bad results.
This is what I get:
hex_list: ['0xd4', '0x0']
hex_str: d40
intval: 3392
But the actual sequence of bytes that should be interpreted is: \xd4\x00, which equals to 54272, so that my hex_list = ['0xd4', '0x00'] and hex_str = d400.
Something happens when I have a sequence of bytes when the second one is \x00.
Got any ideas? Thanks!
There are much better ways of converting bytes to integters:
int.from_bytes() takes bytes input, and a byte order argument:
>>> int.from_bytes(b"\xd4\x00", 'big')
54272
>>> int.from_bytes(b"\xd4\x00", 'little')
212
The struct.unpack() function lets you convert a whole series of bytes to integers following a pattern:
>>> import struct
>>> struct.unpack('!4H', b'\xd4\x00\xd4\x00\xd4\x00\xd4\x00')
(54272, 54272, 54272, 54272)
The array module lets you read binary data representing homogenous integer data into a memory structure efficiently:
>>> array.array('H', fileobject)
However, array can't be told what byte order to use. You'd have to determine the current architecture byte order and call arr.byteswap() to reverse order if the machine order doesn't match the file order.
When reading image data, it is almost always preferable to use the struct module to do the parsing. You generally then use file.read() calls with specific sizes; if the header consists of 10 bytes, use:
headerinfo = struct.unpack('<expected header pattern for 10 bytes>', f.read(10))
and go from there. For examples, look at the Pillow / PIL image plugins source code; here is how the Blizzard Mipmap image format header is read:
def _read_blp_header(self):
self._blp_compression, = struct.unpack("<i", self.fp.read(4))
self._blp_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_depth, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_mips, = struct.unpack("<b", self.fp.read(1))
self._size = struct.unpack("<II", self.fp.read(8))
if self.magic == b"BLP1":
# Only present for BLP1
self._blp_encoding, = struct.unpack("<i", self.fp.read(4))
self._blp_subtype, = struct.unpack("<i", self.fp.read(4))
self._blp_offsets = struct.unpack("<16I", self.fp.read(16 * 4))
self._blp_lengths = struct.unpack("<16I", self.fp.read(16 * 4))
Because struct.unpack() always returns tuples, you can assign individual elements in a tuple to name1, name2, ... names on the left-hand size, including single_name, = assignments to extract a single result.
The separate set of read calls above could also be compressed into fewer calls:
comp, enc, adepth, aenc, mips, *size = struct.unpack("<i4b2I", self.fp.read(16))
if self.magic == b"BLP1":
# Only present for BLP1
enc, subtype = struct.unpack("<2i", self.fp.read(8))
followed by specific attribute assignments.

Resize bytes in python

I read file from disk and in case if its size less than 256 bytes I need to extend it. Some think like:
data = open("test.txt", "rb").read()
if ( len(data) < 256 ):
data.resize( 256 ) # Fill with zeroes or something
But since bytes is immutable type there is no anything like resize method.
I don't need to modify original file, I need len(data) to be not less than 256 for further processing.
If you want to extend it with spaces:
data = open("test.txt", "rb").read()
padding_character = ' ' # space
data += padding_character*(256-len(data))
You can use zfill for str and bytes alike:
data = b'abc'
data = data.zfill(20) # or 256 in your case
print(data)
# b'00000000000000000abc'
Note: If your data length is already 256 (or whatever value you use) or more, zfill will return the original object.
Docs: https://docs.python.org/3/library/stdtypes.html#bytes.zfill

Python Pillow Image.frombytes mode '1' bad result

Where am I wrong ? I want to create a basic white pict from bytes
from PIL import Image
if __name__ == "__main__":
data = [chr(1)] * 8192
data = "".join(data)
im = Image.frombytes('1', (128,64), data, 'raw')
im = im.convert("RGB")
im.save("image.png", "PNG")
But I get this:
Just use Image.new instead:
im = Image.new(mode='RGB', size=(128,64), color=(255,255,255))
If you really want to make it from bytes, it would be like this:
Image.frombytes(mode='RGB', size=(128,64), data=b'\xff'*128*64*3)
edit: Image.frombytes expects bytes, not a list of integers. To convert a list of integers to the right type, use this:
>>> bytes([0,1,2]) # Python 3
b'\x00\x01\x02'
>>> bytes(bytearray([0,1,2])) # Python 2
'\x00\x01\x02'
edit 2: mode='1' or the docs have bug (see comment thread). Assuming you have a list of zeros and ones, 1024 elements long, and you want to convert this to an 128x64 monochromatic image (one bit per pixel) then you'll have to pack the bytes manually:
bits = [int(not (y%13 and x%7)) for x in range(64) for y in range(128)]
# asymmetric grid
octets = [bits[i:i+8] for i in range(0, len(bits), 8)]
def bits2byte(bits8):
result = 0
for bit in bits8:
result <<= 1
result |= bit
return result
data = bytes(bytearray([bits2byte(octet) for octet in octets]))
im = Image.frombytes(mode='1', size=(128,64), data=data)
im.show()
Result:
In mode 1 each byte represents 8 pixels (there might be zero padding at end of each row if the width does not divide by 8). So to get a white image, you have to pass in only the byte b'\xff'
data = b'\xff' * 1024
im = Image.frombytes('1', (128,64), data)
Even if the Pillow docs say that there's one pixel per byte in this mode, that is not true for the frombytes and tobytes methods, at least.
Any other repeating input other than \xff (all white) or \x00 (all black) will give some sort of pinstripe pattern, like the one in your question.

Is it possible to encrypt integers?

So my program is a Stenography program, it inserts an image into another image, and I'm trying to encrypt the data before inserting it into the cover image. However, most encryption modules expect strings and I'm trying to pass integers.
I've tried converting to string then encrypting, but the encryption is full of special characters and letters so converting back to integer for insertion is impossible.
Anyone know if I can somehow encrypt an integer? It doesn't have to be very secure.
I'm trying to add the encryption in here:
for i in range(0,3):
#verify we have reached the end of our hidden file
if count >= len(Stringbits):
#convert the bits to their rgb value and appened them
for rgbValue in pixelList:
pixelnumbers1 = int(''.join(str(b) for b in rgbValue), 2)
#print pixelnumbers1
rgb_Array.append(pixelnumbers1)
pixels[x, y] = (rgb_Array[0], rgb_Array[1], rgb_Array[2])
print "Completed"
return imageObject.save(output)
I've been trying to encrypt pixelnumbers1 then add it in. But pixels[x, y] requires an integer.
Below is the rest of the code in-case:
def write(mainimage, secret, output):
#string contains the header, data and length in binary
Stringbits = dcimage.createString(secret)
imageObject = Image.open(mainimage).convert('RGB')
imageWidth, imageHeight = imageObject.size
pixels = imageObject.load()
rgbDecimal_Array = []
rgb_Array = []
count = 0
#loop through each pixel
for x in range (imageWidth):
for y in range (imageHeight):
r,g,b = pixels[x,y]
#convert each pixel into an 8 bit representation
redPixel = list(bin(r)[2:].zfill(8))
greenPixel = list(bin(g)[2:].zfill(8))
bluePixel = list(bin(b)[2:].zfill(8))
pixelList = [redPixel, greenPixel, bluePixel]
#for each of rgb
for i in range(0,3):
#verify we have reached the end of our hidden file
if count >= len(Stringbits):
#convert the bits to their rgb value and appened them
for rgbValue in pixelList:
pixelnumbers1 = int(''.join(str(b) for b in rgbValue), 2)
#print pixelnumbers1
rgb_Array.append(pixelnumbers1)
pixels[x, y] = (rgb_Array[0], rgb_Array[1], rgb_Array[2])
print "Completed"
return imageObject.save(output)
#If we haven't rached the end of the file, store a bit
else:
pixelList[i][7] = Stringbits[count]
count+=1
pixels[x, y] = dcimage.getPixel(pixelList)
You have a fundamental misunderstanding of how computers see any type of data.
You read the bytestream of a file, which looks like a string to you, but each character is actually a byte, a value from 0 to 255. It just happens that some of them are represented by conventional string characters. Try print(bytes(range(256)) to see them all. Most standard encryption functions take a byte array in and spit a byte array out. It just happens that you get more of the bytes that don't have a "simple" representation. But they are not any less bytes than what you initially fed in.
Your dcimage.py has the following:
#get the file data in binary
fileData = bytearray(open(secret, 'rb').read())#opens the binary file in read or write mode
for bits in fileData:
binDataString += bin(bits)[2:].zfill(8)#convert the file data to binary
There is nothing that stops you from doing this
fileData = open(secret, 'rb').read() # a bytes object by default
encryptedData = myEncryptionFuction(fileData) # also a bytes object
for bits in encryptedData:
# ...
VERY IMPORTANT: You add a null byte at the end of your message so your extracting sequence knows when to stop. If you compress, or encrypt, a string (or byte array), it is likely a null byte will be part of that stream, which will break your extraction sequence. In that case you want to use a header that tells your program ahead of time how many bits to extract.
By the way, bytes are already in an integer form.
>>> some_byte = b'G'
>>> some_byte[0]
71
You're better of using bitwise operations for steganography. You take bytes and instead of using bitwise operations between them and your pixels, you turn both to binary strings, slice and stitch them and then turn them back to integers.
def bytes_to_bits(stream):
for byte in stream:
for shift in range(7, -1, -1):
yield (byte >> shift) & 0x01
secret_bits = tuple(bytes_to_bits(encoded_data))
# simplified for one colour plane
for x in range(image_height):
for y in range(image_width):
# (pixel AND 254) OR bit - the first part zeroes out the lsb
pixels[x,y] = (pixels[x,y] & 0xfe) | secret_bits[count]
count += 1
# -------------------------------------
# to extract the bit from a stego pixel
bit = pixel & 0x01
Integers can be encryted by adding each digit to a random integer stream in the range 0 to 9, subtracting 10 when the sum > 9. Modulo should be avoided because of ambiguities.

Categories