Resize bytes in python - python

I read file from disk and in case if its size less than 256 bytes I need to extend it. Some think like:
data = open("test.txt", "rb").read()
if ( len(data) < 256 ):
data.resize( 256 ) # Fill with zeroes or something
But since bytes is immutable type there is no anything like resize method.
I don't need to modify original file, I need len(data) to be not less than 256 for further processing.

If you want to extend it with spaces:
data = open("test.txt", "rb").read()
padding_character = ' ' # space
data += padding_character*(256-len(data))

You can use zfill for str and bytes alike:
data = b'abc'
data = data.zfill(20) # or 256 in your case
print(data)
# b'00000000000000000abc'
Note: If your data length is already 256 (or whatever value you use) or more, zfill will return the original object.
Docs: https://docs.python.org/3/library/stdtypes.html#bytes.zfill

Related

Get int value from each two bytes

I am trying to read bytes from an image, and get all the int (16 bit) values from that image.
After I parsed the image header, I got to the pixel values. The values that I get when the pair of bytes are like b"\xd4\x00" is incorrect. In this case it should be 54272, not 3392.
This are parts of the code:
I use a generator to get the bytes:
import itertools
def osddef_generator(in_file):
with open(in_file, mode='rb') as f:
dat = f.read()
for byte in dat:
yield byte
def take_slice(in_generator, size):
return ''.join(str(chr(i)) for i in itertools.islice(in_generator, size))
def take_single_pixel(in_generator):
pix = itertools.islice(in_generator, 2)
hex_list = [hex(i) for i in pix]
hex_str = "".join(hex_list)[2:].replace("0x", '')
intval = int(hex_str, 16)
print("hex_list: ", hex_list)
print("hex_str: ", hex_str)
print("intval: ", intval)
After I get the header correctly using the take_slice method, I get to the part with the pixel values, where I use the take_single_pixel method.
Here, I get the bad results.
This is what I get:
hex_list: ['0xd4', '0x0']
hex_str: d40
intval: 3392
But the actual sequence of bytes that should be interpreted is: \xd4\x00, which equals to 54272, so that my hex_list = ['0xd4', '0x00'] and hex_str = d400.
Something happens when I have a sequence of bytes when the second one is \x00.
Got any ideas? Thanks!
There are much better ways of converting bytes to integters:
int.from_bytes() takes bytes input, and a byte order argument:
>>> int.from_bytes(b"\xd4\x00", 'big')
54272
>>> int.from_bytes(b"\xd4\x00", 'little')
212
The struct.unpack() function lets you convert a whole series of bytes to integers following a pattern:
>>> import struct
>>> struct.unpack('!4H', b'\xd4\x00\xd4\x00\xd4\x00\xd4\x00')
(54272, 54272, 54272, 54272)
The array module lets you read binary data representing homogenous integer data into a memory structure efficiently:
>>> array.array('H', fileobject)
However, array can't be told what byte order to use. You'd have to determine the current architecture byte order and call arr.byteswap() to reverse order if the machine order doesn't match the file order.
When reading image data, it is almost always preferable to use the struct module to do the parsing. You generally then use file.read() calls with specific sizes; if the header consists of 10 bytes, use:
headerinfo = struct.unpack('<expected header pattern for 10 bytes>', f.read(10))
and go from there. For examples, look at the Pillow / PIL image plugins source code; here is how the Blizzard Mipmap image format header is read:
def _read_blp_header(self):
self._blp_compression, = struct.unpack("<i", self.fp.read(4))
self._blp_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_depth, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_mips, = struct.unpack("<b", self.fp.read(1))
self._size = struct.unpack("<II", self.fp.read(8))
if self.magic == b"BLP1":
# Only present for BLP1
self._blp_encoding, = struct.unpack("<i", self.fp.read(4))
self._blp_subtype, = struct.unpack("<i", self.fp.read(4))
self._blp_offsets = struct.unpack("<16I", self.fp.read(16 * 4))
self._blp_lengths = struct.unpack("<16I", self.fp.read(16 * 4))
Because struct.unpack() always returns tuples, you can assign individual elements in a tuple to name1, name2, ... names on the left-hand size, including single_name, = assignments to extract a single result.
The separate set of read calls above could also be compressed into fewer calls:
comp, enc, adepth, aenc, mips, *size = struct.unpack("<i4b2I", self.fp.read(16))
if self.magic == b"BLP1":
# Only present for BLP1
enc, subtype = struct.unpack("<2i", self.fp.read(8))
followed by specific attribute assignments.

Getting file size of bmp image

I have a bitmap image and using this page I am attempting to read the file size.
In case the link breaks:
FileSize | 4 bytes | File size in bytes
Here is part of the bitmap BM\xe6\x04\x00\x00\x00\x00\x00\x006 I want to read from, which as I understand it the file size is between the 3rd and 7th bytes. So \xe6\x04\x00\x00.
I remove all the \x00 since they are null values and don't tell me anything about the file size, so I used:
raw = '\xe6\x04\x00\x00'
character_list = [raw[b:b+1] for b in range(0, len(raw))]
non_empty = [list_ for list_ in character_list if list_ != b'\x00']
This returned me: [b'\xe6', b'\x04']
Now I get all the values in the list using:
size = ''
for byte in non_empty:
size += str(ord(byte))
print(size)
Here are the results of the conversion:
\xe6 > 230
\x04 > 4
This returns me 2304 (since '230' + '4' is 2304), while my bitmap image has the size of 1,254 bytes and 4,096 bytes on disk. Clearly this is not the image size. Where have I gone wrong?
As a side note. If I take another image of size 90 bytes and run the same process with Z\x00\x00\x00 it returns 90 as I expected. (ord('Z') returning 90).
From poking around it looks like the byte order for the size in a bitmap is little endian (https://en.wikipedia.org/wiki/Endianness#Little-endian).
There's a built-in method for int that can convert bytes to a integer. https://docs.python.org/3/library/stdtypes.html#int.from_bytes
So for example:
raw = b'\xe6\x04\x00\x00'
size = int.from_bytes(raw, byteorder='little')
print(size)

Python Pillow Image.frombytes mode '1' bad result

Where am I wrong ? I want to create a basic white pict from bytes
from PIL import Image
if __name__ == "__main__":
data = [chr(1)] * 8192
data = "".join(data)
im = Image.frombytes('1', (128,64), data, 'raw')
im = im.convert("RGB")
im.save("image.png", "PNG")
But I get this:
Just use Image.new instead:
im = Image.new(mode='RGB', size=(128,64), color=(255,255,255))
If you really want to make it from bytes, it would be like this:
Image.frombytes(mode='RGB', size=(128,64), data=b'\xff'*128*64*3)
edit: Image.frombytes expects bytes, not a list of integers. To convert a list of integers to the right type, use this:
>>> bytes([0,1,2]) # Python 3
b'\x00\x01\x02'
>>> bytes(bytearray([0,1,2])) # Python 2
'\x00\x01\x02'
edit 2: mode='1' or the docs have bug (see comment thread). Assuming you have a list of zeros and ones, 1024 elements long, and you want to convert this to an 128x64 monochromatic image (one bit per pixel) then you'll have to pack the bytes manually:
bits = [int(not (y%13 and x%7)) for x in range(64) for y in range(128)]
# asymmetric grid
octets = [bits[i:i+8] for i in range(0, len(bits), 8)]
def bits2byte(bits8):
result = 0
for bit in bits8:
result <<= 1
result |= bit
return result
data = bytes(bytearray([bits2byte(octet) for octet in octets]))
im = Image.frombytes(mode='1', size=(128,64), data=data)
im.show()
Result:
In mode 1 each byte represents 8 pixels (there might be zero padding at end of each row if the width does not divide by 8). So to get a white image, you have to pass in only the byte b'\xff'
data = b'\xff' * 1024
im = Image.frombytes('1', (128,64), data)
Even if the Pillow docs say that there's one pixel per byte in this mode, that is not true for the frombytes and tobytes methods, at least.
Any other repeating input other than \xff (all white) or \x00 (all black) will give some sort of pinstripe pattern, like the one in your question.

Is it possible to encrypt integers?

So my program is a Stenography program, it inserts an image into another image, and I'm trying to encrypt the data before inserting it into the cover image. However, most encryption modules expect strings and I'm trying to pass integers.
I've tried converting to string then encrypting, but the encryption is full of special characters and letters so converting back to integer for insertion is impossible.
Anyone know if I can somehow encrypt an integer? It doesn't have to be very secure.
I'm trying to add the encryption in here:
for i in range(0,3):
#verify we have reached the end of our hidden file
if count >= len(Stringbits):
#convert the bits to their rgb value and appened them
for rgbValue in pixelList:
pixelnumbers1 = int(''.join(str(b) for b in rgbValue), 2)
#print pixelnumbers1
rgb_Array.append(pixelnumbers1)
pixels[x, y] = (rgb_Array[0], rgb_Array[1], rgb_Array[2])
print "Completed"
return imageObject.save(output)
I've been trying to encrypt pixelnumbers1 then add it in. But pixels[x, y] requires an integer.
Below is the rest of the code in-case:
def write(mainimage, secret, output):
#string contains the header, data and length in binary
Stringbits = dcimage.createString(secret)
imageObject = Image.open(mainimage).convert('RGB')
imageWidth, imageHeight = imageObject.size
pixels = imageObject.load()
rgbDecimal_Array = []
rgb_Array = []
count = 0
#loop through each pixel
for x in range (imageWidth):
for y in range (imageHeight):
r,g,b = pixels[x,y]
#convert each pixel into an 8 bit representation
redPixel = list(bin(r)[2:].zfill(8))
greenPixel = list(bin(g)[2:].zfill(8))
bluePixel = list(bin(b)[2:].zfill(8))
pixelList = [redPixel, greenPixel, bluePixel]
#for each of rgb
for i in range(0,3):
#verify we have reached the end of our hidden file
if count >= len(Stringbits):
#convert the bits to their rgb value and appened them
for rgbValue in pixelList:
pixelnumbers1 = int(''.join(str(b) for b in rgbValue), 2)
#print pixelnumbers1
rgb_Array.append(pixelnumbers1)
pixels[x, y] = (rgb_Array[0], rgb_Array[1], rgb_Array[2])
print "Completed"
return imageObject.save(output)
#If we haven't rached the end of the file, store a bit
else:
pixelList[i][7] = Stringbits[count]
count+=1
pixels[x, y] = dcimage.getPixel(pixelList)
You have a fundamental misunderstanding of how computers see any type of data.
You read the bytestream of a file, which looks like a string to you, but each character is actually a byte, a value from 0 to 255. It just happens that some of them are represented by conventional string characters. Try print(bytes(range(256)) to see them all. Most standard encryption functions take a byte array in and spit a byte array out. It just happens that you get more of the bytes that don't have a "simple" representation. But they are not any less bytes than what you initially fed in.
Your dcimage.py has the following:
#get the file data in binary
fileData = bytearray(open(secret, 'rb').read())#opens the binary file in read or write mode
for bits in fileData:
binDataString += bin(bits)[2:].zfill(8)#convert the file data to binary
There is nothing that stops you from doing this
fileData = open(secret, 'rb').read() # a bytes object by default
encryptedData = myEncryptionFuction(fileData) # also a bytes object
for bits in encryptedData:
# ...
VERY IMPORTANT: You add a null byte at the end of your message so your extracting sequence knows when to stop. If you compress, or encrypt, a string (or byte array), it is likely a null byte will be part of that stream, which will break your extraction sequence. In that case you want to use a header that tells your program ahead of time how many bits to extract.
By the way, bytes are already in an integer form.
>>> some_byte = b'G'
>>> some_byte[0]
71
You're better of using bitwise operations for steganography. You take bytes and instead of using bitwise operations between them and your pixels, you turn both to binary strings, slice and stitch them and then turn them back to integers.
def bytes_to_bits(stream):
for byte in stream:
for shift in range(7, -1, -1):
yield (byte >> shift) & 0x01
secret_bits = tuple(bytes_to_bits(encoded_data))
# simplified for one colour plane
for x in range(image_height):
for y in range(image_width):
# (pixel AND 254) OR bit - the first part zeroes out the lsb
pixels[x,y] = (pixels[x,y] & 0xfe) | secret_bits[count]
count += 1
# -------------------------------------
# to extract the bit from a stego pixel
bit = pixel & 0x01
Integers can be encryted by adding each digit to a random integer stream in the range 0 to 9, subtracting 10 when the sum > 9. Modulo should be avoided because of ambiguities.

Unspecified byte lengths in Python

I'm writing a client for a P2P application at the minute and the spec for the protocol says that the header for each packet should have each field with a particular byte length like so:
Version: 1 Byte
Type: 1 Byte
Length: 2 Bytes
And then the data
I've got the way of packing and unpacking the header fields (I think) like this:
packed = struct.pack('cch' , '1' , '1' , 26)
This constructs a header for a packet with a data length of 26, but when it comes to unpacking the data I'm unsure how to go about getting the rest of the data afterwards. To unpack we need to know the size of all the fields, unless I'm missing something? I guess to pack the data I'd use a format indicator 'cch26s' meaning:
1 Byte char
1 Byte char
2 Byte short
26 Byte char array
But how do I unpack the data when I don't know how much data will be included in the packet first?
The way you're describing the protocol, you should unpack the first four bytes first, and extract Length (a 16-bit int). This tells you how many bytes to unpack in a second step.
version, type, length = struct.unpack("cch", packed[:4])
content, = struct.unpack("%ds" % length, packed[4:])
This is if everything checks out. unpack() requires that the packed buffer contain exactly as much data as you unpack. Also, check whether the 4 header bytes are included in the length count.
You can surmise the number of characters to unpack by inspecting len(data).
Here is a helper function which does this for you:
def unpack(fmt, astr):
"""
Return struct.unpack(fmt, astr) with the optional single * in fmt replaced with
the appropriate number, given the length of astr.
"""
# http://stackoverflow.com/a/7867892/190597
try:
return struct.unpack(fmt, astr)
except struct.error:
flen = struct.calcsize(fmt.replace('*', ''))
alen = len(astr)
idx = fmt.find('*')
before_char = fmt[idx-1]
n = (alen-flen)/struct.calcsize(before_char)+1
fmt = ''.join((fmt[:idx-1], str(n), before_char, fmt[idx+1:]))
return struct.unpack(fmt, astr)
You can use it like this:
unpack('cchs*', data)

Categories