Tensorflow: How to encode and read bmp images? - python

I am trying to read .bmp images, do some augmentation on these, save them to a .tfrecords file and then open the .tfrecords files and use the images for image classification. I know that there is a tf.image.encode_jpeg() and a tf.image.encode_png() function, but there is no tf.image.encode_bmp() function. I know that .bmp images are uncompressed, so I've tried to simply base64-encode, np.tostring() and np.tobytes() the images, but I get the following error when trying to decode these formats:
tensorflow.python.framework.errors_impl.InvalidArgumentError: channels attribute 3 does not match bits per pixel from file <some long number>
My take is that tensorflow, in its encoding to jpeg or png, does something extra with the byte encoding of the images; saving information about array dimensionality, etc. However, I am quite clueless about this, so any help would be great!
Some code to show what it is I am trying to achieve:
with tf.gfile.FastGFile(filename, 'rb') as f:
image_data = f.read()
bmp_data = tf.placeholder(dtype=tf.string)
decode_bmp = tf.image.decode_bmp(self._decode_bmp_data, channels=3)
augmented_bmp = <do some augmentation on decode_bmp>
sess = tf.Session()
np_img = sess.run(augmented_bmp, feed_dict={bmp_data: image_data})
byte_img = np_img.tostring()
# Write byte_img to file using tf.train.Example
writer = tf.python_io.TFRecordWriter(<output_tfrecords_filename>)
example = tf.train.Example(features=tf.train.Features(feature={
'encoded_img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[byte_img])}))
writer.write(example.SerializeToString())
# Read img from file
dataset = tf.data.TFRecordDataset(<img_file>)
dataset = dataset.map(parse_img_fn)
The parse_img_fn may be condensed to the following:
def parse_img_fn(serialized_example):
features = tf.parse_single_example(serialized_example, feature_map)
image = features['encoded_img']
image = tf.image.decode_bmp(image, channels=3) # This is where the decoding fails
features['encoded_img']
return features

in your comment, surely you mean encode instead of encrypt
The BMP file format is quite simplistic, consisting of a bunch of headers and pretty much raw pixel data. This is why BMP images are so big. I suppose this is also why TensorFlow developers did not bother to write a function to encode arrays (representing images) into this format. Few people still use it. It is recommended to use PNG instead, which performs lossless compression of the image. Or, if you can deal with lossy compression, use JPG.
TensorFlow doesn't do anything special for encoding images. It just returns the bytes that represent the image in that format, similar to what matplotlib does when you do save_fig (except MPL also writes the bytes to a file).
Suppose you produce a numpy array where the top rows are 0 and the bottom rows are 255. This is an array of numbers which, if you think it as a picture, would represent 2 horizontal bands, the top one black and the bottom one white.
If you want to see this picture in another program (GIMP) you need to encode this information in a standard format, such as PNG. Encoding means adding some headers and metadata and, optionally, compressing the data.
Now that it is a bit more clear what encoding is, I recommend you work with PNG images.
with tf.gfile.FastGFile('image.png', 'rb') as f:
# get the bytes representing the image
# this is a 1D array (string) which includes header and stuff
raw_png = f.read()
# decode the raw representation into an array
# so we have 2D array representing the image (3D if colour)
image = tf.image.decode_png(raw_png)
# augment the image using e.g.
augmented_img = tf.image.random_brightness(image)
# convert the array back into a compressed representation
# by encoding it into png
# we now end up with a string again
augmented_png = tf.image.encode_png(augmented_img, compression=9)
# Write augmented_png to file using tf.train.Example
writer = tf.python_io.TFRecordWriter(<output_tfrecords_filename>)
example = tf.train.Example(features=tf.train.Features(feature={
'encoded_img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[augmented_png])}))
writer.write(example.SerializeToString())
# Read img from file
dataset = tf.data.TFRecordDataset(<img_file>)
dataset = dataset.map(parse_img_fn)
There are a few important pieces of advice:
don't use numpy.tostring. This returns a HUUGE representation because each pixel is represented as a float, and they are all concatenated. No compression, nothing. Try and check the file size :)
no need to pass back into python by using tf.Session. You can perform all the ops on TF side. This way you have an input graph which you can reuse as part of an input pipeline.

There is no encode_bmp in the tensorflow main package, but if you import tensorflow_io (also a Google officially supported package) you can find the encode_bmp method there.
For the documentation see:
https://www.tensorflow.org/io/api_docs/python/tfio/image/encode_bmp

Related

OpenCV whole Images to bytes without Saving to Disk

Basically, I want to add a few bytes to my new PNG image file. An example case is like the following code:
img = open("example.png", "rb") # Open images and read as binnary
hex = img.read().hex() # Read images as Bytes to Hexadecimal.
add_hex = hex+"7feab1e74a4bdb755cca" # Add some bytes to it (as hex)
to_bytes_img = bytes.fromhex(add_hex) # Convert hex to bytes
with open("example2.png", "wb") as f: # Write images
f.write(to_bytes_img)
But, the problem is, I have a special case that requires me to perform the above operation using OpenCV (CV2). Where cv2.imread() only reads and stores Pixels as a numpy array (an array of pixels, not the whole file).
Then, I want to write that image into a new file cv2.imwrite(), which will rebuild the image and save the PNG on disk. My question is, how do I add some bytes to the PNG image file (in buffer/memory), before the cv2.imwrite() operation.
I could probably do it with with open() as above, but that would be very inefficient opening, writing, opening, writing to disk again.

How can i import an image file into python , read it as an array, then output the array as same image file type

I am tasked with writing a program that can take an image file as input, then encrypt the image with some secondary code that i have already written, and finally output the encrypted image file.
I would like to import an image, make it a 1d array of numbers, perform some encryption on this 1d array (which will make it into a 2d array which i will flatten), and then be able to output the encrypted 1d array as an image file after converting it back to whatever format it was in on input.
I am wondering how this can be done, and details about what types of image files can be accepted, and what libraries may be required.
Thanks
EDIT:
this is some code i have used, img_arr stores the image in an array of integers, max 255. This is what i want, however now i require to convert back into the original format, after i have performed some functions on img_arr.
from PIL import Image
img = Image.open('testimage.jfif')
print('img first: ',img)
img = img.tobytes()
img_arr=[]
for x in img:
img_arr.append(x)
img2=Image.frombytes('RGB',(460,134),img)
print('img second: ',img2)
my outputs are slightly different
img first: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=460x134 at 0x133C2D6F970>
img second: <PIL.Image.Image image mode=RGB size=460x134 at 0x133C2D49EE0>
In programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in an ASCII string format by translating the data into a radix-64 representation.
Fortunately, you can encode and decode the image binary file with python based on base64. The following link helps you.
Encoding an image file with base64

Why image (numpy array) is convert to string before encoding into tfrecord file?

Recently, I'm working on decoding image (let's say a bitmap format) into a tfrecord file
But, I'm wondering about the reason
Why do we need to convert numpy array data into a string type
before the data is been written into tfrecord file?
like
from PIL import Image
...
npimg = np.array(Image.open(img_path))
# My question:
# why do we need to convert numpy array img to stirng?
img_raw = npimg.tostring()
...
# later on, write img_raw to tf.train.Example
Here's the full code example that I found on the blog post Tfrecords Guide.
from PIL import Image
import numpy as np
import skimage.io as io
import tensorflow as tf
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
tfrecords_filename = 'pascal_voc_segmentation.tfrecords'
writer = tf.python_io.TFRecordWriter(tfrecords_filename)
original_images = []
filename_pairs = [
('/path/to/example1.jpg',
'/path/to/example2.jpg'),
...,
('/path/to/exampleN.jpg',
'/path/to/exampleM.jpg'),
]
for img_path, annotation_path in filename_pairs:
# read data into numpy array
img = np.array(Image.open(img_path))
annotation = np.array(Image.open(annotation_path))
height = img.shape[0]
width = img.shape[1]
original_images.append((img, annotation))
# My question:
# why do we need to convert numpy array img to stirng?
img_raw = img.tostring()
annotation_raw = annotation.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'height': _int64_feature(height),
'width': _int64_feature(width),
'image_raw': _bytes_feature(img_raw),
'mask_raw': _bytes_feature(annotation_raw)}))
writer.write(example.SerializeToString())
writer.close()
Any hint would be grateful. Thanks in advance.
To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB each) that can each be read linearly. This is especially true if the data is being streamed over a network. This can also be useful for caching any data-preprocessing.
Edit:
This comes in handy when you are transfering the image to a server (tensorflow-server). There you have to send the data in serialized string
because some media are made for streaming text. You never know -- some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you've entered a special character combination (like how FTP translates line endings).
So to get around this, people encode the binary data into characters. Base64 is one of these types of encodings.
Why 64?
Because you can generally rely on the same 64 characters being present in many character sets, and you can be reasonably confident that your data's going to end up on the other side of the wire uncorrupted.

Merge multiple base64 images into one

If I have multiple base64 strings that are images (one string = one image). Is there a way to combine them and decode to a single image file? i.e. from multiple base64 strings, merge and output a single image file.
I'm not sure how I would approach this using Pillow (or if I even need it).
Further clarification:
The source images are TIFFs that are encoded into base64
When I say "merge", I mean turning multiple images into a multi-page image like you see in a multi-page PDF
I dug through the Pillow documentation (v5.3) and found something that seems to work. Basically, there are two phases to this:
Save encoded base64 strings as TIF
Append them together and save to disk
Example using Python 3.7:
from PIL import Image
import io
import base64
base64_images = ["asdfasdg...", "asdfsdafas..."]
image_files = []
for base64_string in base64_images:
buffer = io.BytesIO(base64.b64decode(base64_string))
image_file = Image.open(buffer)
image_files.append(image_file)
combined_image = images_files[0].save(
'output.tiff',
save_all=True,
append_images=image_files[1:]
)
In the above code, I first create PIL Image objects from a bytes buffers in order to do this whole thing in-memory but you can probably use .save() and create a bunch of tempfiles instead if I/O isn't a concern.
Once I have all the PIL Image objects, I choose the first image (assuming they were in desired order in base64_images list) and append the rest of the images with append_images flag. The resulting image has all the frames in one output file.
I assume this pattern is extensible to any image format that supports the save_all and append_images keyword arguments. The Pillow documentation should let you know if it is supported.

Extracting images from tfrecords files with protobuf without running a TensorFlow session

I'm using TensorFlow in Python and I have the data stored in TFRecords files containing tf.train.Example protocol buffers.
I'm trying to extract the fields stored in each example (in the code example below these are height, width, image), without the need to run a TensorFlow session. And by trial and error I found the following code to work OK:
import numpy as np
import tensorflow as tf
def _im_feature_to_im(example, key):
feature_ser = example.features.feature[key].bytes_list.SerializeToString()
feature_ser_clean = feature_ser[4:]
image = np.fromstring(feature_ser_clean, dtype=np.uint8).reshape((height, width))
return image
for serialized_example in tf.python_io.tf_record_iterator(tfrec_filename):
example = tf.train.Example()
example.ParseFromString(serialized_example)
# traverse the Example format to get data
height = example.features.feature['height'].int64_list.value[0]
width = example.features.feature['width'].int64_list.value[0]
image = _im_feature_to_im(example, 'image')
So:
int fields are extracted easily.
But my question is regarding the extraction of the image: why do I have to remove 4 bytes from the start of the bytes array in order to get the original image? Is there some header there?
That's the key for protocol buffer encoding.
https://developers.google.com/protocol-buffers/docs/encoding
You can print it out and follow the instructions at the above website to decode it. Most likely it's some encoding of tag = 1, type = 2, length = height * width.
Hope that helps!
Sherry
What you are doing in _im_feature_to_im() is to encode a message to string by calling .SerializeToString() and then to decode it by hand by removing the first 4 bytes (or, as you said in the comment, by removing all the bytes with the MSB set). This is just a redundant operation.
Instead, you can get your image by accessing the value property:
image_string = example.features.feature[key].bytes_list.value[0]
Note that this is an array of one element, hence the [0] at the end.
You can then construct the array from this, like you did:
image_arr = np.frombuffer(image_string, dtype=np.uint8)
Now, many times the images are put in tfrecords with their encoded representation (e.g. PNG or JPG), because it takes significantly less space than the raw bytes. What this means is that you should then decode the image. Tensorflow has the decode_image(...) function for this, but it will return a tensor, and you want to do this without a TF session.
You can use OpenCV to decode the image representation without a TF Session:
import cv2
image = cv2.imdecode(image_arr, cv2.IMREAD_UNCHANGED)
assert image is not None, "Could not decode image"

Categories