How to convert all bytes files under the folder into images - python

import numpy
from PIL import Image
import binascii
def getMatrixfrom_bin(filename,width):
with open(filename, 'rb') as f:
content = f.read()
...
return fh
filename = "path\bin_filename(1)"
im = Image.fromarray(getMatrixfrom_bin(filename,512))
//getMatrixfrom_bin () is a function that generates a matrix from the binary bytes
im.save("path\bin_filename(1).png")
The code above can only generate a picture at a time, now I need to convert all the binary files under the path to images, how should I do?

If you are on a decent (i.e. Unix/Linux/macOS) platform, you can convert all your binary files to PNG images in parallel without writing any Python, if you use GNU Parallel and ImageMagick which are installed on most Linux distros and are available for macOS via homebrew.
So, the command to convert all files ending in .bin into PNG images, in parallel would be:
parallel 's=$(wc -c < {}); w=512; ((h=s/w)); convert -depth 8 -size ${w}x${h} gray:{} {.}.png' ::: *bin
That is a bit scary if you are not accustomed to it, so I'll break it down. Basically it is running "some stuff" in parallel for all files ending in .bin, so look again and it is:
parallel 'some stuff' ::: *.bin
What is the "some stuff"? Well, note that {} is short-hand for the file we are currently processing, so it is doing this:
s=$(wc -c < {}) # s=total bytes in current file, i.e. s=filesize
w=512 # w=image width
((h=s/w)) # h=s/w, i.e. h=height in pixels of current file
convert ...
The last line, the one starting convert is calling ImageMagick telling it your image depth is 8 bits, and the dimensions in pixels are WxH, it is then reading the current file into an image and saving it as a new image ending in PNG instead of the original extension. Easy!
Of course, if you knew the width was 500 pixels and the height was 400 pixels, life would be even easier:
parallel 'convert -depth 8 -size 500x400 gray:{} {.}.png' ::: *bin

Related

Is there a way of attaching an image on a python code in such a way that it becomes part of the soure code?

I'm a beginner in python and I'm trying to send someone my small python program together with a picture that'll display when the code is run.
I tried to first convert the image to a binary file thinking that I'd be able to paste it in the source code but I'm not sure if that's even possible as I failed to successfully do it.
You can base64-encode your JPEG/PNG image which will make it into a regular (non-binary string) like this:
base64 -w0 IMAGE.JPG
Then you want to get the result into a Python variable, so repeat the command but copy the output to your clipboard:
base64 -w0 IMAGE.JPG | xclip -selection clipboard # Linux
base64 -w0 IMAGE.JPG | pbcopy # macOS
Now start Python and make a variable called img and paste the clipboard into it:
img = 'PASTE'
It will look like this:
img = '/9j/4AAQSk...' # if your image was JPEG
img = 'iVBORw0KGg...' # if your image was PNG
Now do some imports:
from PIL import Image
import base64
import io
# Make PIL Image from base64 string
pilImage = Image.open(io.BytesIO(base64.b64decode(img)))
Now you can do what you like with your image:
# Print its description and size
print(pilImage)
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=200x100>
# Save it to local disk
pilImage.save('result.jpg')
You can save a picture in byte format inside a variable in your program. You can then convert the bytes back into a file-like object using the BytesIO function of the io module and plot that object using the Image module from the Pillow library.
import io
import PIL.Image
with open("filename.png", "rb") as file:
img_binary = file.read()
img = PIL.Image.open(io.BytesIO(img_binary))
img.show()
To save the binary data inside your program without having to read from the source file you need to encode it with something like base64, use print() and then simply copy the output into a new variable and remove the file reading operation from your code.
That would look like this:
img_encoded = base64.encodebytes(img_binary)
print(img_binary)
img_encoded = " " # paste the output from the console into the variable
the output will be very long, especially if you are using a big image. I only used a very small png for testing.
This is how the program should look like at the end:
import io
import base64
import PIL.Image
# with open("filename.png", "rb") as file:
# img_binary = file.read()
# img_encoded = base64.encodebytes(img_binary)
img_encoded = b'iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABX[...]'
img = PIL.Image.open(io.BytesIO(base64.decodebytes(img_encoded)))
img.show()
You could perhaps have your Python program download the image from a site where you upload files such as Google Drive, Mega, or Imgur. That way, you can always access and view the image easily without the need of running the program or for example converting the binary back into the image in the method you mentioned.
Otherwise, you could always store the image as bytes in a variable and have your program read this variable. I'm assuming that you really wish to do it this way as it would be easier to distribute as there is only one file that needs to be downloaded and run.
Or you could take a look at pyinstaller which is made for python programs to be easily distributed across machines without the need to install Python by packaging it as an executable (.exe) file! That way you can include the image file together by embedding it into the program. There are plenty of tutorials for pyinstaller you could google up. Note: Include the '--onefile' in your parameters when running pyinstaller as this will package the executable into a single file that the person you're sending it to can easily open whoever it may be-- granted the executable file can run on the user's operating system. :)

Convert .IMG (Classic Disk Image) to .PNG/.JPG in Python

I have a dataset of 1,00,000+ .IMG files that I need to convert to .PNG / .JPG format to apply CNN for a simple classification task.
I referred to this answer and the solution works for me partially. What I mean is that some images are not properly converted. The reason for that, according to my understanding is that some images have a Pixel Depth of 16 while some have 8.
for file in fileList:
rawData = open(file, 'rb').read()
size = re.search("(LINES = \d\d\d\d)|(LINES = \d\d\d)", str(rawData))
pixelDepth = re.search("(SAMPLE_BITS = \d\d)|(SAMPLE_BITS = \d)", str(rawData))
size = (str(size)[-6:-2])
pixelDepth = (str(pixelDepth)[-4:-2])
print(int(size))
print(int(pixelDepth))
imgSize = (int(size), int(size))
img = Image.frombytes('L', imgSize, rawData)
img.save(str(file)+'.jpg')
Data Source: NASA Messenger Mission
.IMG files and their corresponding converted .JPG Files
Files with Pixel Depth of 8 are successfully converted:
Files with Pixel Depth of 16 are NOT properly converted:
Please let me know if there's any more information that I should provide.
Hopefully, from my other answer, here, you now have a better understanding of how your files are formatted. So, the code should look something like this:
#!/usr/bin/env python3
import sys
import re
import numpy as np
from PIL import Image
import cv2
rawData = open('EW0220137564B.IMG', 'rb').read()
# File size in bytes
fs = len(rawData)
bitDepth = int(re.search("SAMPLE_BITS\s+=\s+(\d+)",str(rawData)).group(1))
bytespp = int(bitDepth/8)
height = int(re.search("LINES\s+=\s+(\d+)",str(rawData)).group(1))
width = int(re.search("LINE_SAMPLES\s+=\s+(\d+)",str(rawData)).group(1))
print(bitDepth,height,width)
# Offset from start of file to image data - assumes image at tail end of file
offset = fs - (width*height*bytespp)
# Check bitDepth
if bitDepth == 8:
na = np.frombuffer(rawData, offset=offset, dtype=np.uint8).reshape(height,width)
elif bitDepth == 16:
dt = np.dtype(np.uint16)
dt = dt.newbyteorder('>')
na = np.frombuffer(rawData, offset=offset, dtype=dt).reshape(height,width).astype(np.uint8)
else:
print(f'ERROR: Unexpected bit depth: {bitDepth}',file=sys.stderr)
# Save either with PIL
Image.fromarray(na).save('result.jpg')
# Or with OpenCV may be faster
cv2.imwrite('result.jpg', na)
If you have thousands to do, I would recommend GNU Parallel which you can easily install on your Mac with homebrew using:
brew install parallel
You can then change my program above to accept a filename as parameter in-place of the hard-coded filename and the command to get them all done in parallel is:
parallel --dry-run script.py {} ::: *.IMG
For a bit more effort, you can get it done even faster by putting the code above in a function and calling the function for each file specified as a parameter. That way you can avoid starting a new Python interpreter per image and tell GNU Parallel to pass as many files as possible to each invocation of your script like this:
parallel -X --dry-run script.py ::: *.IMG
The structure of the script then looks like this:
def processOne(filename):
open, read, search, extract, save as per my code above
# Main - process all filenames received as parameters
for filename in sys.argv[1:]:
processOne(filename)

Converting raw bytes to image

I have a file that contains a 240x320 image but its byte format I opened it in a hex editor and got what something like an array 16 columns 4800 raw.
Im completely new to this thats why im facing trouble I have tried using a python script but it gave an error on line 17, in data = columnvector[0][i]:
IndexError: list index out of range.
I have tried a java code but that was an error as well, I wanted to try some c# codes but none of the codes i found explains how i can feed my file to the code. This is the python code
import csv
import sys
import binascii
csv.field_size_limit(500 * 1024 * 1024)
columnvector = []
with open('T1.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile,delimiter=' ', quotechar='|')
for row in csvreader:
columnvector.append(row)
headers =['42','4D','36','84','03','00','00','00','00','00','36','00','00','00','28','00','00','00',
'40','01','00','00','F0','00','00','00','01','00','18','00','00','00','00','00','00','84','03','00','C5','00',
'00','00','C5','00','00','00','00','00','00','00','00','00','00','00']
hexArray=[]
for i in range(0,76800):
data = columnvector[0][i]
hexArray.extend([data,data,data])
with open('T1.txt', 'wb') as f:
f.write(binascii.unhexlify(''.join(headers)))
f.write(binascii.unhexlify(''.join(hexArray)))
I want to convert the file to an image using any method I honestly don't care what method to use just as long as it gets the job done.
this is some the files
https://github.com/Mu-A/OV7670-files/tree/Help
You can make the binary data into images without writing any Python, just use ImageMagick in the Terminal. It is included in most Linux distros and is available for macOS and Windows.
If your image is 320x240 it should be:
320 * 240 bytes long if single channel (greyscale), or
320 * 240 * 3 if 3-channel RGB.
As your images are 76800, I am assuming they are greyscale.
So, in Terminal, to make that raw data into a JPEG, use:
magick -depth 8 -size 320x240 gray:T1 result.jpg
or, if you are using version 6 of ImageMagick, use:
convert -depth 8 -size 320x240 gray:T1 result.jpg
If you want a PNG with automatic contrast-stretch, use:
magick -depth 8 -size 320x240 gray:T1 -auto-level result.png
Unfortunately none of your images come out to anything sensible. Here is T1, for example:
The histograms do look somewhat sensible though:
I think you have something fundamentally wrong so I would try reverting to first principles to debug it. I would shine a torch into, or point the camera at a window and save a picture called bright.dat and then cover the lens with a black card and take another image called dark.dat. Then I would plot a histogram of the data and see if the bright one was at the rightmost end and the dark one was at the leftmost end. Make a histogram like this:
magick -depth 8 -size 320x240 Gray:bright.dat histogram:brightHist.png
and:
magick -depth 8 -size 320x240 Gray:dark.dat histogram:darkHist.png
for i in range(0,76800):
is a hardcoded value, and because columnvector[0][i] does not have that many values, you get that IndexError: list index out of range.
Consider why you need to set your range from 0-76800 or if the value can be dynamically sourced from len() of something.
Another simple way to make an image from a binary file is to convert it to a NetPBM image.
As your file is 320x240 and 8-bit binary greyscale, you just need to make a header with that information in it and append your binary file:
printf "P5\n320 240\n255\n" > image.pgm
cat T1 >> image.pgm
You can now open image.pgm with feh, Photoshop, GIMP or many other image viewers.

Read .img medical image without header in python

I have a radiograph .img file without the header file. However, the researchers who have published the file have given this information about it
High resolution (2048 × 2048 matrix size, 0.175mm pixel size)
Wide density range (12-bit, 4096 gray scale)
Universal image format (no header, big-endian raw data)
I am trying to open the file using Python but unable to do so. Could someone suggest any method to read this image file?
I found some radiograph images, like yours, by downloading the JSRT database. I have tested the following code on the first image of this database: JPCLN001.IMG.
import matplotlib.pyplot as plt
import numpy as np
# Parameters.
input_filename = "JPCLN001.IMG"
shape = (2048, 2048) # matrix size
dtype = np.dtype('>u2') # big-endian unsigned integer (16bit)
output_filename = "JPCLN001.PNG"
# Reading.
fid = open(input_filename, 'rb')
data = np.fromfile(fid, dtype)
image = data.reshape(shape)
# Display.
plt.imshow(image, cmap = "gray")
plt.savefig(output_filename)
plt.show()
It produces an output file JPCLN001.PNG which looks like this:
I hope I have answered to your question.
Happy coding!
Just in case anybody else is looking at these images and wants to convert them in batches, or outside of Python and without needing any programming knowledge... you can convert them pretty readily at the command line in Terminal with ImageMagick (which is installed on most Linux distros anyway, and available for OS X and Windows) like this:
convert -size 2048x2048 -depth 16 -endian MSB -normalize gray:JPCLN130.IMG -compress lzw result.tif
which makes them into compressed 16-bit TIF files that can be viewed in any application. They also then take up half the space on disk without loss of quality since I specified LZW compression.
Likewise, if you want 16-bit PNG files, you can use:
convert -size 2048x2048 -depth 16 -endian MSB -normalize gray:JPCLN130.IMG result.png

Compute hash of only the core image data (excluding metadata) for an image

I'm writing a script to calculate the MD5 sum of an image excluding the EXIF tag.
In order to do this accurately, I need to know where the EXIF tag is located in the file (beginning, middle, end) so that I can exclude it.
How can I determine where in the file the tag is located?
The images that I am scanning are in the format TIFF, JPG, PNG, BMP, DNG, CR2, NEF, and some videos MOV, AVI, and MPG.
It is much easier to use the Python Imaging Library to extract the picture data (example in iPython):
In [1]: import Image
In [2]: import hashlib
In [3]: im = Image.open('foo.jpg')
In [4]: hashlib.md5(im.tobytes()).hexdigest()
Out[4]: '171e2774b2549bbe0e18ed6dcafd04d5'
This works on any type of image that PIL can handle. The tobytes method returns the a string containing the pixel data.
BTW, the MD5 hash is now seen as pretty weak. Better to use SHA512:
In [6]: hashlib.sha512(im.tobytes()).hexdigest()
Out[6]: '6361f4a2722f221b277f81af508c9c1d0385d293a12958e2c56a57edf03da16f4e5b715582feef3db31200db67146a4b52ec3a8c445decfc2759975a98969c34'
On my machine, calculating the MD5 checksum for a 2500x1600 JPEG takes around 0.07 seconds. Using SHA512, it takes 0,10 seconds. Complete example:
#!/usr/bin/env python3
from PIL import Image
import hashlib
import sys
im = Image.open(sys.argv[1])
print(hashlib.sha512(im.tobytes()).hexdigest(), end="")
For movies, you can extract frames from them with e.g. ffmpeg, and then process them as shown above.
One simple way to do it is to hash the core image data. For PNG, you could do this by counting only the "critical chunks" (i.e. the ones starting with capital letters). JPEG has a similar but simpler file structure.
The visual hash in ImageMagick decompresses the image as it hashes it. In your case, you could hash the compressed image data right away, so (if implemented correctly) a it should be just as quick as hashing the raw file.
This is a small Python script illustrating the idea. It may or may not work for you, but it should at least give an indication to what I mean :)
import struct
import os
import hashlib
def png(fh):
hash = hashlib.md5()
assert fh.read(8)[1:4] == "PNG"
while True:
try:
length, = struct.unpack(">i",fh.read(4))
except struct.error:
break
if fh.read(4) == "IDAT":
hash.update(fh.read(length))
fh.read(4) # CRC
else:
fh.seek(length+4,os.SEEK_CUR)
print "Hash: %r" % hash.digest()
def jpeg(fh):
hash = hashlib.md5()
assert fh.read(2) == "\xff\xd8"
while True:
marker,length = struct.unpack(">2H", fh.read(4))
assert marker & 0xff00 == 0xff00
if marker == 0xFFDA: # Start of stream
hash.update(fh.read())
break
else:
fh.seek(length-2, os.SEEK_CUR)
print "Hash: %r" % hash.digest()
if __name__ == '__main__':
png(file("sample.png"))
jpeg(file("sample.jpg"))
You can use stream which is part of the ImageMagick suite:
$ stream -map rgb -storage-type short image.tif - | sha256sum
d39463df1060efd4b5a755b09231dcbc3060e9b10c5ba5760c7dbcd441ddcd64 -
or
$ sha256sum <(stream -map rgb -storage-type short image.tif -)
d39463df1060efd4b5a755b09231dcbc3060e9b10c5ba5760c7dbcd441ddcd64 /dev/fd/63
This example is for a TIFF file which is RGB with 16 bits per sample (i.e. 48 bits per pixel). So I use map to rgb and a short storage-type (you can use char here if the RGB values are 8-bits).
This method reports the same signature hash that the verbose Imagemagick identify command reports:
$ identify -verbose image.tif | grep signature
signature: d39463df1060efd4b5a755b09231dcbc3060e9b10c5ba5760c7dbcd441ddcd64
(for ImageMagick v6.x; the hash reported by identify on version 7 is different to that obtained using stream, but the latter may be reproduced by any tool capable of extracting the raw bitmap data - such as dcraw for some image types.)
I would use a metadata stripper to preprocess your hashing :
From ImageMagick package you have ...
mogrify -strip blah.jpg
and if you do
identify -list format
it apparently works with all the cited formats.

Categories