How would I read a raw RGBA4444 image using Pillow? - python

I'm trying to read a '.waltex' image, which is a 'walaber image'. It's basically just in raw image format. The problem is, it uses 'RGBA8888', 'RGBA4444', 'RGB565' and 'RGB5551' (all of which can be determined from the header), and I could not find a way to use these color specs in PIL.
I've tried doing this
from PIL import Image
with open('Carl.waltex', 'rb') as file:
rawdata = file.read()
image = Image.frombytes('RGBA', (1024,1024), rawdata, 'raw', 'RGBA;4B')
image.show()
I've tried all the 16-bit raw modes in the last input, and I couldn't find a single one that worked. To be clear, this file is specifically 'RGBA4444' with little endian, 8 bytes per pixel.
If you need the file, then I can link it.

Updated Answer
I have made some changes to my original code so that:
you can pass a filename to read as parameter
it parses the header and checks the magic string and derives the format (RGBA8888 or RGBA4444) and height and width automatically
it now handles RGBA8888 like your newly-shared sample image
So, it looks like this:
#!/usr/bin/env python3
import struct
import sys
import numpy as np
from PIL import Image
def loadWaltex(filename):
# Open input file in binary mode
with open(filename, 'rb') as fd:
# Read 16 byte header and extract metadata
# https://zenhax.com/viewtopic.php?t=14164
header = fd.read(16)
magic, vers, fmt, w, h, _ = struct.unpack('4sBBHH6s', header)
if magic != b'WALT':
sys.exit(f'ERROR: {filename} does not start with "WALT" magic string')
# Check if fmt=0 (RGBA8888) or fmt=3 (RGBA4444)
if fmt == 0:
fmtdesc = "RGBA8888"
# Read remainder of file (part following header)
data = np.fromfile(fd, dtype=np.uint8)
R = data[0::4].reshape((h,w))
G = data[1::4].reshape((h,w))
B = data[2::4].reshape((h,w))
A = data[3::4].reshape((h,w))
# Stack the channels to make RGBA image
RGBA = np.dstack((R,G,B,A))
else:
fmtdesc = "RGBA4444"
# Read remainder of file (part following header)
data = np.fromfile(fd, dtype=np.uint16).reshape((h,w))
# Split the RGBA444 out from the uint16
R = (data>>12) & 0xf
G = (data>>8) & 0xf
B = (data>>4) & 0xf
A = data & 0xf
# Stack the channels to make RGBA image
RGBA = np.dstack((R,G,B,A)).astype(np.uint8) << 4
# Debug info for user
print(f'Filename: {filename}, version: {vers}, format: {fmtdesc} ({fmt}), w: {w}, h: {h}')
# Make into PIL Image
im = Image.fromarray(RGBA)
return im
if __name__ == "__main__":
# Load image specified by first parameter
im = loadWaltex(sys.argv[1])
im.save('result.png')
And when you run it with:
./decodeRGBA.py objects.waltex
You get:
The debug output for your two sample images is:
Filename: Carl.waltex, version: 1, format: RGBA4444 (3), w: 1024, h: 1024
Filename: objects.waltex, version: 1, format: RGBA8888 (0), w: 256, h: 1024
Original Answer
I find using Numpy is the easiest approach for this type of thing, and it is also highly performant:
#!/usr/bin/env python3
import numpy as np
from PIL import Image
# Define the known parameters of the image and read into Numpy array
h, w, offset = 1024, 1024, 16
data = np.fromfile('Carl.waltex', dtype=np.uint16, offset=offset).reshape((h,w))
# Split the RGBA4444 out from the uint16
R = (data >> 12) & 0xf
G = (data >> 8) & 0xf
B = (data >> 4) & 0xf
A = data & 0xf
# Stack the 4 individual channels to make an RGBA image
RGBA = np.dstack((R,G,B,A)).astype(np.uint8) << 4
# Make into PIL Image
im = Image.fromarray(RGBA)
im.save('result.png')
Note: Your image has 16 bytes of padding at the start. Sometimes that amount is variable. A useful technique in that case is to read the entire file, work out how many useful samples of pixel data there are (in your case 1024*1024), and then slice the data to take the last N samples - thereby ignoring any variable padding at the start. That would look like this:
# Define the known parameters of the image and read into Numpy array
h, w = 1024, 1024
data = np.fromfile('Carl.waltex', dtype=np.uint16)[-h*w:].reshape((h,w))
If you don't like Numpy and prefer messing about with lists and structs, you can get exactly the same result like this:
#!/usr/bin/env python3
import struct
from PIL import Image
# Define the known parameters of the image
h, w, offset = 1024, 1024, 16
data = open('Carl.waltex', 'rb').read()[offset:]
# Unpack into bunch of h*w unsigned shorts
uint16s = struct.unpack("H" * h *w, data)
# Build a list of RGBA tuples
pixels = []
for RGBA4444 in uint16s:
R = (RGBA4444 >> 8) & 0xf0
G = (RGBA4444 >> 4) & 0xf0
B = RGBA4444 & 0xf0
A = ( RGBA4444 & 0xf) << 4
pixels.append((R,G,B,A))
# Push the list of RGBA tuples into an empty image
RGBA = Image.new('RGBA', (w,h))
RGBA.putdata(pixels)
RGBA.save('result.png')
Note that the Numpy approach is 60x faster than the list-based approach:
Numpy: 3.6 ms ± 73.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
listy: 213 ms ± 712 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Note: These images and the waltex file format seem to be from the games "Where's My Water?" and "Where's My Perry?". I got some hints as to the header format from ZENHAX.

Now that I have a better idea of how your Waltex files work, I attempted to write a custom PIL Plugin for them - a new experience for me. I've put it as a different answer because the approach is very different.
You use it very simply like this:
from PIL import Image
import WaltexImagePlugin
im = Image.open('objects.waltex')
im.show()
You need to save the following as WaltexImagePlugin.py in the directory beside your main Python program:
from PIL import Image, ImageFile
import struct
def _accept(prefix):
return prefix[:4] == b"WALT"
class WaltexImageFile(ImageFile.ImageFile):
format = "Waltex"
format_description = "Waltex texture image"
def _open(self):
header = self.fp.read(HEADER_LENGTH)
magic, vers, fmt, w, h, _ = struct.unpack('4sBBHH6s', header)
# size in pixels (width, height)
self._size = w, h
# mode setting
self.mode = 'RGBA'
# Decoder
if fmt == 0:
# RGBA8888
# Just use built-in raw decoder
self.tile = [("raw", (0, 0) + self.size, HEADER_LENGTH, (self.mode,
0, 1))]
elif fmt == 3:
# RGBA4444
# Use raw decoder with custom RGBA;4B unpacker
self.tile = [("raw", (0, 0) + self.size, HEADER_LENGTH, ('RGBA;4B',
0, 1))]
Image.register_open(WaltexImageFile.format, WaltexImageFile, _accept)
Image.register_extensions(
WaltexImageFile.format,
[
".waltex"
],
)
HEADER_LENGTH = 16
It works perfectly for your RGBA888 images, but cannot quite handle the byte ordering of your RGBA444 file, so you need to reverse it for those images. I used this:
...
...
im = Image.open(...)
# Split channels and recombine in correct order
a, b, c, d = im.split()
im = Image.merge((c,d,a,b))
If anyone knows how to use something in the Unpack.c file to do this correctly, please ping me. Thank you.

Related

Vectorise nested for loops

I'm wondering if it is possible to vectorise the nested for loop in my code to speed up the code. I am basically trying to split (or trim) an image into smaller bits.
import numpy as np
import os
from PIL import Image
image_path = 'sample.png'
number_of_rows = 4
input_np_arr_image = np.asarray(Image.open(image_path))
height, width = input_np_arr_image.shape[0:2]
height_trimmed = (height // number_of_rows) * number_of_rows
width_trimmed = (width // number_of_rows) * number_of_rows
trimmed_image = input_np_arr_image[:height_trimmed, :width_trimmed]
image_pieces = [np.hsplit(segment, 4)
for segment in np.vsplit(trimmed_image, 4)
]
I then try to save the result using this nested for loop
for row in range(len(image_pieces)):
for col in range(len(image_pieces[row])):
# create output image name
segment_name = f"{image_path[:len(image_path)-4]}_{row}_{col}{image_path[-4:]}"
# convert array to image
label_image = Image.fromarray(image_pieces[row][col])
# make output directory for saving
output_directory = "split_images"
os.makedirs(output_directory, exist_ok = True)
label_image.save(output_directory +"/"+ segment_name)
I'm curious if it's possible to do this without using the nested for loops. Thanks

Why converting npy files (containing video frames) to tfrecords consumes too much disk space?

I am working on a violence detection service. I am trying to develop software based on the code in this repo. My dataset consists of videos resided in two directories "Violence" and "Non-Violence".
I used this code to generate npy files out of RGB channels and optical flow features. The output of this part would be 2 folders containing npy array with 244x244x5 shape. (np.float32 dtype). so it's like I have video frames in RGB in the first 3 channels (npy[...,:3]) and optical flow features in the next two channels (npy[..., 3:]).
Now I am trying to convert them to tfrecords and use tf.data.tfrecorddataset to speed up the training process. Since my model input has to be a cube tensor, my training elements has to be 64 frames of each video. It means the data point shape has to be 64x244x244x5.
So I used this code to convert the npy files to tfrecords.
from pathlib import Path
from os.path import join
import tensorflow as tf
import numpy as np
import cv2
from tqdm import tqdm
def normalize(data):
mean = np.mean(data)
std = np.std(data)
return (data - mean) / std
def random_flip(video, prob):
s = np.random.rand()
if s < prob:
video = np.flip(m=video, axis=2)
return video
def color_jitter(video):
# range of s-component: 0-1
# range of v component: 0-255
s_jitter = np.random.uniform(-0.2, 0.2)
v_jitter = np.random.uniform(-30, 30)
for i in range(len(video)):
hsv = cv2.cvtColor(video[i], cv2.COLOR_RGB2HSV)
s = hsv[..., 1] + s_jitter
v = hsv[..., 2] + v_jitter
s[s < 0] = 0
s[s > 1] = 1
v[v < 0] = 0
v[v > 255] = 255
hsv[..., 1] = s
hsv[..., 2] = v
video[i] = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
return video
def uniform_sample(video: str, target_frames: int = 64) -> np.ndarray:
"""
gets video and outputs n_frames number of frames in video.
Args:
video:
target_frames:
Returns:
"""
len_frames = int(len(data))
interval = int(np.ceil(len_frames / target_frames))
# init empty list for sampled video and
sampled_video = []
for i in range(0, len_frames, interval):
sampled_video.append(video[i])
# calculate number of padded frames and fix it
num_pad = target_frames - len(sampled_video)
if num_pad > 0:
padding = [video[i] for i in range(-num_pad, 0)]
sampled_video += padding
return np.array(sampled_video, dtype=np.float32)
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
if __name__ == '__main__':
path = Path('transformed/')
npy_files = list(path.rglob('*.npy'))[:100]
aug = True
# one_hots = to_categorical(range(2), dtype=np.int8)
path_to_save = 'data_tfrecords'
tfrecord_path = join(path_to_save, 'all_data.tfrecord')
with tf.io.TFRecordWriter(tfrecord_path) as writer:
for file in tqdm(npy_files, desc='files converted'):
# load npy files
npy = np.load(file.as_posix(), mmap_mode='r')
data = np.float32(npy)
del npy
# Uniform sampling
data = uniform_sample(data, target_frames=64)
# Add augmentation
if aug:
data[..., :3] = color_jitter(data[..., :3])
data = random_flip(data, prob=0.5)
# Normalization
data[..., :3] = normalize(data[..., :3])
data[..., 3:] = normalize(data[..., 3:])
# Label one hot encoding
label = 1 if file.parent.stem.startswith('F') else 0
# label = one_hots[label]
feature = {'image': _bytes_feature(tf.compat.as_bytes(data.tobytes())),
'label': _int64_feature(int(label))}
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
The code works fine, but the real problem is that it consumes too much disk drive. my whole dataset consisting of 2000 videos takes 12 GB, when I converted them to npy files, it became around 80 GB, and now using tfrecords It became over 120 GB or so. How can I convert them in an efficient way to reduce the space required to store them?
The answer might be too late. But I see you are still saving the video frame in your tfrecords file.
Try removing the "image" feature from your features list. And saving per frame as their Height, Width, Channels, and so forth.
feature = {'label': _int64_feature(int(label))}
Which is why the file is taking more space.

Using a Data Converter to Display 3D Volume as Images

I would like to write a data converter tool. I need analyze the bitstream in a file to display the 2D cross-sections of a 3D volume.
The dataset I am trying to view can be found here: https://figshare.com/articles/SSOCT_test_dataset_for_OCTproZ/12356705.
It's the file titled: burned_wood_with_tape_1664x512x256_12bit.raw (832 MB)
Would extremely appreciate some direction. Willing to award a bounty if I could get some software to display the dataset as images using a data conversion.
As I'm totally new to this concept, I don't have code to show for this problem. However, here's a little something I tried using inspiration from other questions on SO:
import rawpy
import imageio
path = "Datasets/burned_wood_with_tape_1664x512x256_12bit.raw"
for item in path:
item_path = path + item
raw = rawpy.imread(item_path)
rgb = raw.postprocess()
rawpy.imshow(rgb)
Down below I implemented next visualization.
Example RAW file burned_wood_with_tape_1664x512x256_12bit.raw consists of 1664 samples per A-scan, 512 A-scans per B-scan, 16 B-scans per buffer, 16 buffers per volume, and 2 volumes in this file, each sample is encoded as 2-bytes unsigned integer in little endian order, only 12 higher bits are used, lower 4 bits contain zeros. Samples are centered approximately around 2^15, to be precise data has these stats min 0 max 47648 mean 32757 standard deviation 454.5.
I draw gray images of size 1664 x 512, there are total 16 * 16 * 2 = 512 such images (frames) in a file. I draw animated frames on screen using matplotlib library, also rendering these animation into GIF file. One example of rendered GIF at reduced quality is located after code.
To render/draw images of different resulting resolution you need to change code line with plt.rcParams['figure.figsize'], this fig size contains (widht_in_inches, height_in_inches), by default DPI (dots per inch) equals to 100, meaning that if you want to have resulting GIF of resolution 720x265 then you need to set this figure size to (7.2, 2.65). Also resulting GIF contains animation of a bit smaller resolution because axes and padding is included into resulting figure size.
My next code needs pip modules to be installed one time by command python -m pip install numpy matplotlib.
Try it online!
# Needs: python -m pip install numpy matplotlib
def oct_show(file, *, begin = 0, end = None):
import os, numpy as np, matplotlib, matplotlib.pyplot as plt, matplotlib.animation
plt.rcParams['figure.figsize'] = (7.2, 2.65) # (4.8, 1.75) (7.2, 2.65) (9.6, 3.5)
sizeX, sizeY, cnt, bits = 1664, 512, 16 * 16 * 2, 12
stepX, stepY = 16, 8
fps = 5
try:
fsize, opened_here = None, False
if type(file) is str:
fsize = os.path.getsize(file)
file, opened_here = open(file, 'rb'), True
by = (bits + 7) // 8
if end is None and fsize is not None:
end = fsize // (sizeX * sizeY * by)
imgs = []
file.seek(begin * sizeY * sizeX * by)
a = file.read((end - begin) * sizeY * sizeX * by)
a = np.frombuffer(a, dtype = np.uint16)
a = a.reshape(end - begin, sizeY, sizeX)
amin, amax, amean, stdd = np.amin(a), np.amax(a), np.mean(a), np.std(a)
print('min', amin, 'max', amax, 'mean', round(amean, 1), 'std_dev', round(stdd, 3))
a = (a.astype(np.float32) - amean) / stdd
a = np.maximum(0.1, np.minimum(a * 128 + 128.5, 255.1)).astype(np.uint8)
a = a[:, :, :, None].repeat(3, axis = -1)
fig, ax = plt.subplots()
plt.subplots_adjust(left = 0.08, right = 0.99, bottom = 0.06, top = 0.97)
for i in range(a.shape[0]):
title = ax.text(
0.5, 1.02, f'Frame {i}',
size = plt.rcParams['axes.titlesize'],
ha = 'center', transform = ax.transAxes,
)
imgs.append([ax.imshow(a[i], interpolation = 'antialiased'), title])
ani = matplotlib.animation.ArtistAnimation(plt.gcf(), imgs, interval = 1000 // fps)
print('Saving animated frames to GIF...', flush = True)
ani.save(file.name + '.gif', writer = 'imagemagick', fps = fps)
print('Showing animated frames on screen...', flush = True)
plt.show()
finally:
if opened_here:
file.close()
oct_show('burned_wood_with_tape_1664x512x256_12bit.raw')
Example output GIF:
I don't think it's a valid RAW file at all.
If you try this code:
import rawpy
import imageio
path = 'Datasets/burned_wood_with_tape_1664x512x256_12bit.raw'
raw = rawpy.imread(path)
rgb = raw.postprocess()
You will get a following error:
----> 5 raw = rawpy.imread(path)
6 rgb = raw.postprocess()
~\Anaconda3\envs\py37tf2gpu\lib\site-packages\rawpy\__init__.py in imread(pathOrFile)
18 d.open_buffer(pathOrFile)
19 else:
---> 20 d.open_file(pathOrFile)
21 return d
rawpy\_rawpy.pyx in rawpy._rawpy.RawPy.open_file()
rawpy\_rawpy.pyx in rawpy._rawpy.RawPy.handle_error()
LibRawFileUnsupportedError: b'Unsupported file format or not RAW file'

convert numpy array to uint8 using python

My code below is intended to get a batch of images and convert them to RGB. But I keep getting an error which says to convert to type uint8. I have seen other questions regarding the conversion to uint8, but none directly from an array to uint8. Any advice on how to make that happen is welcome, thank you!
from skimage import io
import numpy as np
import glob, os
from tkinter import Tk
from tkinter.filedialog import askdirectory
import cv2
# wavelength in microns
MWIR = 4.5
R = .692
G = .582
B = .140
rgb_sum = R + G + B;
NRed = R/rgb_sum;
NGreen = G/rgb_sum;
NBlue = B/rgb_sum;
path = askdirectory(title='Select PNG Folder') # shows dialog box and return the path
outpath = askdirectory(title='Select SAVE Folder')
for file in os.listdir(path):
if file.endswith(".png"):
imIn = io.imread(os.path.join(path, file))
imOut = np.zeros(imIn.shape)
for i in range(imIn.shape[0]): # Assuming Rayleigh-Jeans law
for j in range(imIn.shape[1]):
imOut[i,j,0] = imIn[i,j,0]/((NRed/MWIR)**4)
imOut[i,j,1] = imIn[i,j,0]/((NGreen/MWIR)**4)
imOut[i,j,2] = imIn[i,j,0]/((NBlue/MWIR)**4)
io.imsave(os.path.join(outpath, file) + '_RGB.png', imOut)
the code I am trying to integrate into my own (found in another thread, used to convert type to uint8) is:
info = np.iinfo(data.dtype) # Get the information of the incoming image type
data = data.astype(np.float64) / info.max # normalize the data to 0 - 1
data = 255 * data # Now scale by 255
img = data.astype(np.uint8)
cv2.imshow("Window", img)
thank you!
Normally imInt is of type uint8, after your normalisation it is of type float32 because of the casting cause by the division. you must convert back to uint8 before saving to PNG file:
io.imsave(os.path.join(outpath, file) + '_RGB.png', imOut.astype(np.uint8))
Note that the two loops are not necessary, you can use numpy vector operations instead:
MWIR = 4.5
R = .692
G = .582
B = .140
vector = [R, G, B]
vector = vector / vector.sum()
vector = vector / MWIR
vector = np.pow(vector, 4)
for file in os.listdir(path):
if file.endswith((".png"):
imgIn = ...
imgOut = imgIn * vector
io.imsave(
os.path.join(outpath, file) + '_RGB.png',
imgOut.astype(np.uint8))

Read PFM format in python

I want to read pfm format images in python. I tried with imageio.read but it is throwing an error. Can I have any suggestion, please?
img = imageio.imread('image.pfm')
The following Python 3 implementation will decode .pfm files.
Download the example memorial.pfm from Paul Devebec's page.
from pathlib import Path
import numpy as np
import struct
def read_pfm(filename):
with Path(filename).open('rb') as pfm_file:
line1, line2, line3 = (pfm_file.readline().decode('latin-1').strip() for _ in range(3))
assert line1 in ('PF', 'Pf')
channels = 3 if "PF" in line1 else 1
width, height = (int(s) for s in line2.split())
scale_endianess = float(line3)
bigendian = scale_endianess > 0
scale = abs(scale_endianess)
buffer = pfm_file.read()
samples = width * height * channels
assert len(buffer) == samples * 4
fmt = f'{"<>"[bigendian]}{samples}f'
decoded = struct.unpack(fmt, buffer)
shape = (height, width, 3) if channels == 3 else (height, width)
return np.flipud(np.reshape(decoded, shape)) * scale
import matplotlib.pyplot as plt
image = read_pfm('memorial.pfm')
plt.imshow(image)
plt.show()
I am not at all familiar with Python, but here are a few suggestions on reading a PFM (Portable Float Map) file.
Option 1
The ImageIO documentation here suggests there is a FreeImage reader you can download and use.
Option 2
I pieced together a simple reader myself below that seems to work fine on a few sample images I found around the 'net and generated with ImageMagick. It may contain inefficiencies or bad practices because I do not speak Python.
#!/usr/local/bin/python3
import sys
import re
from struct import *
# Enable/disable debug output
debug = True
with open("image.pfm","rb") as f:
# Line 1: PF=>RGB (3 channels), Pf=>Greyscale (1 channel)
type=f.readline().decode('latin-1')
if "PF" in type:
channels=3
elif "Pf" in type:
channels=1
else:
print("ERROR: Not a valid PFM file",file=sys.stderr)
sys.exit(1)
if(debug):
print("DEBUG: channels={0}".format(channels))
# Line 2: width height
line=f.readline().decode('latin-1')
width,height=re.findall('\d+',line)
width=int(width)
height=int(height)
if(debug):
print("DEBUG: width={0}, height={1}".format(width,height))
# Line 3: +ve number means big endian, negative means little endian
line=f.readline().decode('latin-1')
BigEndian=True
if "-" in line:
BigEndian=False
if(debug):
print("DEBUG: BigEndian={0}".format(BigEndian))
# Slurp all binary data
samples = width*height*channels;
buffer = f.read(samples*4)
# Unpack floats with appropriate endianness
if BigEndian:
fmt=">"
else:
fmt="<"
fmt= fmt + str(samples) + "f"
img = unpack(fmt,buffer)
Option 3
If you cannot read your PFM files in Python, you could convert them at the command line using ImageMagick to another format, such as TIFF, that can store floating point samples. ImageMagick is installed on most Linux distros and is available for macOS and Windows:
magick input.pfm output.tif

Categories