Python Image Library can't convert to pdf - python

I have a function to convert a bmp to pdf with PILLOW, this script I have it in non compiled version and compiled version (.exe). In the first one it works correctly, but in the second PILLOW throws an exception ('PDF'). Specifically fails in the .save ()
Paths and filename with extension are correct.
from PIL import Image
def bmp2pdf(self, file):
''' Convert a bmp file to PDF file, and delete old bmp file '''
img = Image.open(file)
output = file.replace('.bmp', '.pdf')
try:
img.save(output, "PDF", resolution=100.0)
remove(file)
except Exception as err:
print(err)
In the compiled version the output is:
'PDF'
Thx.

Follow this code.It works. 3 line code.
from PIL import Image
def bmp2pdf(self,path):
img = Image.open(path)
img.save('image.pdf','pdf')
I got a file named image.pdf with the image in it.

I had to add in my setup to generate the .exe I should import PIL and not PIL.IMAGE, so the whole module is loaded and the pdf feature is available
I'm using cx_freeze:
'Packages': [
'PyQt5.uic',
"...",
'PIL',
]

Related

cv2.imread file with accent (unicode)

I am trying to load the following file: 'data/chapter_1/capd_yard_signs\\Dueñas_2020.png'
But when I do so, cv2.imread returns an error: imread_('data/chapter_1/capd_yard_signs\Due├▒as_2020.png'): can't open/read file: check file path/integrity load file
When I specified the file name with os.path.join, I tried encoding and decoding the file
f = os.path.join("data/chapter_1/capd_yard_signs", filename.encode().decode())
But that didn't solve the problem.
What am I missing?
This is how I ended up getting it to work:
from PIL import Image
pil = Image.open(f).convert('RGB') # load the image with pillow and make sure it is in RGB
pilCv = np.array(pil) # convert the image to an array
img = pilCv[:,:,::-1].copy() # convert the array to be in BGR

Convert Jpg image to heice in python [duplicate]

The High Efficiency Image File (HEIF) format is the default when airdropping an image from an iPhone to a OSX device. I want to edit and modify these .HEIC files with Python.
I could modify phone settings to save as JPG by default but that doesn't really solve the problem of being able to work with the filetype from others. I still want to be able to process HEIC files for doing file conversion, extracting metadata, etc. (Example Use Case -- Geocoding)
Pillow
Here is the result of working with Python 3.7 and Pillow when trying to read a file of this type.
$ ipython
Python 3.7.0 (default, Oct 2 2018, 09:20:07)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from PIL import Image
In [2]: img = Image.open('IMG_2292.HEIC')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-2-fe47106ce80b> in <module>
----> 1 img = Image.open('IMG_2292.HEIC')
~/.env/py3/lib/python3.7/site-packages/PIL/Image.py in open(fp, mode)
2685 warnings.warn(message)
2686 raise IOError("cannot identify image file %r"
-> 2687 % (filename if filename else fp))
2688
2689 #
OSError: cannot identify image file 'IMG_2292.HEIC'
It looks like support in python-pillow was requested (#2806) but there are licensing / patent issues preventing it there.
ImageMagick + Wand
It appears that ImageMagick may be an option. After doing a brew install imagemagick and pip install wand however I was unsuccessful.
$ ipython
Python 3.7.0 (default, Oct 2 2018, 09:20:07)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from wand.image import Image
In [2]: with Image(filename='img.jpg') as img:
...: print(img.size)
...:
(4032, 3024)
In [3]: with Image(filename='img.HEIC') as img:
...: print(img.size)
...:
---------------------------------------------------------------------------
MissingDelegateError Traceback (most recent call last)
<ipython-input-3-9d6f58c40f95> in <module>
----> 1 with Image(filename='ces2.HEIC') as img:
2 print(img.size)
3
~/.env/py3/lib/python3.7/site-packages/wand/image.py in __init__(self, image, blob, file, filename, format, width, height, depth, background, resolution, pseudo)
4603 self.read(blob=blob, resolution=resolution)
4604 elif filename is not None:
-> 4605 self.read(filename=filename, resolution=resolution)
4606 # clear the wand format, otherwise any subsequent call to
4607 # MagickGetImageBlob will silently change the image to this
~/.env/py3/lib/python3.7/site-packages/wand/image.py in read(self, file, filename, blob, resolution)
4894 r = library.MagickReadImage(self.wand, filename)
4895 if not r:
-> 4896 self.raise_exception()
4897
4898 def save(self, file=None, filename=None):
~/.env/py3/lib/python3.7/site-packages/wand/resource.py in raise_exception(self, stacklevel)
220 warnings.warn(e, stacklevel=stacklevel + 1)
221 elif isinstance(e, Exception):
--> 222 raise e
223
224 def __enter__(self):
MissingDelegateError: no decode delegate for this image format `HEIC' # error/constitute.c/ReadImage/556
Any other alternatives available to do a conversion programmatically?
Consider using PIL in conjunction with pillow-heif:
pip3 install pillow-heif
from PIL import Image
from pillow_heif import register_heif_opener
register_heif_opener()
image = Image.open('image.heic')
That said, I'm not aware of any licensing/patent issues that would prevent HEIF support in Pillow (see this or this). libheif is widely adopted and free to use, provided you do not bundle the HEIF decoder with a device and fulfill the requirements of the LGPLv3 license.
You guys should check out this library, it's a Python 3 wrapper to the libheif library, it should serve your purpose of file conversion, extracting metadata:
https://github.com/david-poirier-csn/pyheif
https://pypi.org/project/pyheif/
Example usage:
import io
import whatimage
import pyheif
from PIL import Image
def decodeImage(bytesIo):
fmt = whatimage.identify_image(bytesIo)
if fmt in ['heic', 'avif']:
i = pyheif.read_heif(bytesIo)
# Extract metadata etc
for metadata in i.metadata or []:
if metadata['type']=='Exif':
# do whatever
# Convert to other file format like jpeg
s = io.BytesIO()
pi = Image.frombytes(
mode=i.mode, size=i.size, data=i.data)
pi.save(s, format="jpeg")
...
I was quite successful with Wand package :
Install Wand:
https://docs.wand-py.org/en/0.6.4/
Code for conversion:
from wand.image import Image
import os
SourceFolder="K:/HeicFolder"
TargetFolder="K:/JpgFolder"
for file in os.listdir(SourceFolder):
SourceFile=SourceFolder + "/" + file
TargetFile=TargetFolder + "/" + file.replace(".HEIC",".JPG")
img=Image(filename=SourceFile)
img.format='jpg'
img.save(filename=TargetFile)
img.close()
Here is another solution to convert heic to jpg while keeping the metadata intact. It is based on mara004s solution above, however I was not able to extract the images timestamp in that way, so had to add some code. Put the heic files in dir_of_interest before applying the function:
import os
from PIL import Image, ExifTags
from pillow_heif import register_heif_opener
from datetime import datetime
import piexif
import re
register_heif_opener()
def convert_heic_to_jpeg(dir_of_interest):
filenames = os.listdir(dir_of_interest)
filenames_matched = [re.search("\.HEIC$|\.heic$", filename) for filename in filenames]
# Extract files of interest
HEIC_files = []
for index, filename in enumerate(filenames_matched):
if filename:
HEIC_files.append(filenames[index])
# Convert files to jpg while keeping the timestamp
for filename in HEIC_files:
image = Image.open(dir_of_interest + "/" + filename)
image_exif = image.getexif()
if image_exif:
# Make a map with tag names and grab the datetime
exif = { ExifTags.TAGS[k]: v for k, v in image_exif.items() if k in ExifTags.TAGS and type(v) is not bytes }
date = datetime.strptime(exif['DateTime'], '%Y:%m:%d %H:%M:%S')
# Load exif data via piexif
exif_dict = piexif.load(image.info["exif"])
# Update exif data with orientation and datetime
exif_dict["0th"][piexif.ImageIFD.DateTime] = date.strftime("%Y:%m:%d %H:%M:%S")
exif_dict["0th"][piexif.ImageIFD.Orientation] = 1
exif_bytes = piexif.dump(exif_dict)
# Save image as jpeg
image.save(dir_of_interest + "/" + os.path.splitext(filename)[0] + ".jpg", "jpeg", exif= exif_bytes)
else:
print(f"Unable to get exif data for {filename}")
Adding to the answer by danial, i just had to modify the byte array slighly to get a valid datastream for further work. The first 6 bytes are 'Exif\x00\x00' .. dropping these will give you a raw format that you can pipe into any image processing tool.
import pyheif
import PIL
import exifread
def read_heic(path: str):
with open(path, 'rb') as file:
image = pyheif.read_heif(file)
for metadata in image.metadata or []:
if metadata['type'] == 'Exif':
fstream = io.BytesIO(metadata['data'][6:])
# now just convert to jpeg
pi = PIL.Image.open(fstream)
pi.save("file.jpg", "JPEG")
# or do EXIF processing with exifread
tags = exifread.process_file(fstream)
At least this worked for me.
You can use the pillow_heif library to read HEIF images in a way compatible with PIL.
The example below will import a HEIF picture and save it in png format.
from PIL import Image
import pillow_heif
heif_file = pillow_heif.read_heif("HEIC_file.HEIC")
image = Image.frombytes(
heif_file.mode,
heif_file.size,
heif_file.data,
"raw",
)
image.save("./picture_name.png", format="png")
This will do go get the exif data from the heic file
import pyheif
import exifread
import io
heif_file = pyheif.read_heif("file.heic")
for metadata in heif_file.metadata:
if metadata['type'] == 'Exif':
fstream = io.BytesIO(metadata['data'][6:])
exifdata = exifread.process_file(fstream,details=False)
# example to get device model from heic file
model = str(exifdata.get("Image Model"))
print(model)
Example for working with HDR(10/12) bit HEIF files using OpenCV and pillow-heif:
import numpy as np
import cv2
import pillow_heif
heif_file = pillow_heif.open_heif("images/rgb12.heif", convert_hdr_to_8bit=False)
heif_file.convert_to("BGRA;16" if heif_file.has_alpha else "BGR;16")
np_array = np.asarray(heif_file)
cv2.imwrite("rgb16.png", np_array)
Input file for this example can be 10 or 12 bit file.
I am facing the exact same problem as you, wanting a CLI solution. Doing some further research, it seems ImageMagick requires the libheif delegate library. The libheif library itself seems to have some dependencies as well.
I have not had success in getting any of those to work as well, but will continue trying. I suggest you check if those dependencies are available to your configuration.
Simple solution after going over multiple responses from people.
Please install whatimage, pyheif and PIL libraries before running this code.
[NOTE] : I used this command for install.
python3 -m pip install Pillow
Also using linux was lot easier to install all these libraries. I recommend WSL for windows.
code
import whatimage
import pyheif
from PIL import Image
import os
def decodeImage(bytesIo, index):
with open(bytesIo, 'rb') as f:
data = f.read()
fmt = whatimage.identify_image(data)
if fmt in ['heic', 'avif']:
i = pyheif.read_heif(data)
pi = Image.frombytes(mode=i.mode, size=i.size, data=i.data)
pi.save("new" + str(index) + ".jpg", format="jpeg")
# For my use I had my python file inside the same folder as the heic files
source = "./"
for index,file in enumerate(os.listdir(source)):
decodeImage(file, index)
It looked like that there is a solution called heic-to-jpg, but I might be not very sure about how this would work in colab.
the first answer works, but since its just calling save with a BytesIO object as the argument, it doesnt actually save the new jpeg file, but if you create a new File object with open and pass that, it saves to that file ie:
import whatimage
import pyheif
from PIL import Image
def decodeImage(bytesIo):
fmt = whatimage.identify_image(bytesIo)
if fmt in ['heic', 'avif']:
i = pyheif.read_heif(bytesIo)
# Convert to other file format like jpeg
s = open('my-new-image.jpg', mode='w')
pi = Image.frombytes(
mode=i.mode, size=i.size, data=i.data)
pi.save(s, format="jpeg")
I use the pillow_heif library. For example, I use this script when I have a folder of heif files I want to convert to png.
from PIL import Image
import pillow_heif
import os
from tqdm import tqdm
import argparse
def get_images(heic_folder):
# Get all the heic images in the folder
imgs = [os.path.join(heic_folder, f) for f in os.listdir(heic_folder) if f.endswith('.HEIC')]
# Name of the folder where the png files will be stored
png_folder = heic_folder + "_png"
# If it doesn't exist, create the folder
if not os.path.exists(png_folder):
os.mkdir(png_folder)
for img in tqdm(imgs):
heif_file = pillow_heif.read_heif(img)
image = Image.frombytes(
heif_file.mode,
heif_file.size,
heif_file.data,
"raw",
)
image.save(os.path.join(png_folder,os.path.basename(img).split('.')[0])+'.png', format("png"))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert heic images to png')
parser.add_argument('heic_folder', type=str, help='Folder with heic images')
args = parser.parse_args()
get_images(args.heic_folder)

How can i pass image itself (np.array) not path of it to zxing library for decode pdf417

Code:
import zxing
from PIL import Image
reader = zxing.BarCodeReader()
path = 'C:/Users/UI UX/Desktop/Uasa.png'
im = Image.open(path)
barcode = reader.decode(path)
print(barcode)
when i use code above work fine and return result:
BarCode(raw='P<E....
i need to use this code:
import zxing
import cv2
reader = zxing.BarCodeReader()
path = 'C:/Users/UI UX/Desktop/Uasa.png'
img = cv2.imread (path)
cv2.imshow('img', img)
cv2.waitKey(0)
barcode = reader.decode(img)
print(barcode)
but this code return an error:
TypeError: expected str, bytes or os.PathLike object, not numpy.ndarray
In another program i have image at base64 could help me somewhere here?
any body could help me with this?
ZXing does not support passing an image directly as it is using an external application to process the barcode image. If you're not locked into using the ZXing library for decoding PDF417 barcodes you can take a look at the PyPI package pdf417decoder.
If you're starting with a Numpy array like in your example then you have to convert it to a PIL image first.
import cv2
import pdf417decoder
from PIL import Image
npimg = cv2.imread (path)
cv2.imshow('img', npimg)
cv2.waitKey(0)
img = Image.fromarray(npimg)
decoder = PDF417Decoder(img)
if (decoder.decode() > 0):
print(decoder.barcode_data_index_to_string(0))
else:
print("Failed to decode barcode.")
You cannot. if you look at the source code you will see that what it does is call a java app with the provided path (Specifically com.google.zxing.client.j2se.CommandLineRunner).
If you need to pre-process your image then you will have to save it somewhere and pass the path to it to your library
I fix this by:
path = os.getcwd()
# print(path)
writeStatus = cv2.imwrite(os.path.join(path, 'test.jpg'), pdf_image)
if writeStatus is True:
print("image written")
else:
print("problem") # or raise exception, handle problem, etc.
sss = (os.path.join(path, 'test.jpg'))
# print(sss)
pp = sss.replace('\\', '/')
# print(pp)
reader = zxing.BarCodeReader()
barcode = reader.decode(pp)
The zxing package is not recommended. It is just a command line tool to invoke Java ZXing libraries.
You should use zxing-cpp, which is a Python module built with ZXing C++ code. Here is the sample code:
import cv2
import zxingcpp
img = cv2.imread('myimage.png')
results = zxingcpp.read_barcodes(img)
for result in results:
print("Found barcode:\n Text: '{}'\n Format: {}\n Position: {}"
.format(result.text, result.format, result.position))
if len(results) == 0:
print("Could not find any barcode.")

How to work with HEIC image file types in Python

The High Efficiency Image File (HEIF) format is the default when airdropping an image from an iPhone to a OSX device. I want to edit and modify these .HEIC files with Python.
I could modify phone settings to save as JPG by default but that doesn't really solve the problem of being able to work with the filetype from others. I still want to be able to process HEIC files for doing file conversion, extracting metadata, etc. (Example Use Case -- Geocoding)
Pillow
Here is the result of working with Python 3.7 and Pillow when trying to read a file of this type.
$ ipython
Python 3.7.0 (default, Oct 2 2018, 09:20:07)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from PIL import Image
In [2]: img = Image.open('IMG_2292.HEIC')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-2-fe47106ce80b> in <module>
----> 1 img = Image.open('IMG_2292.HEIC')
~/.env/py3/lib/python3.7/site-packages/PIL/Image.py in open(fp, mode)
2685 warnings.warn(message)
2686 raise IOError("cannot identify image file %r"
-> 2687 % (filename if filename else fp))
2688
2689 #
OSError: cannot identify image file 'IMG_2292.HEIC'
It looks like support in python-pillow was requested (#2806) but there are licensing / patent issues preventing it there.
ImageMagick + Wand
It appears that ImageMagick may be an option. After doing a brew install imagemagick and pip install wand however I was unsuccessful.
$ ipython
Python 3.7.0 (default, Oct 2 2018, 09:20:07)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from wand.image import Image
In [2]: with Image(filename='img.jpg') as img:
...: print(img.size)
...:
(4032, 3024)
In [3]: with Image(filename='img.HEIC') as img:
...: print(img.size)
...:
---------------------------------------------------------------------------
MissingDelegateError Traceback (most recent call last)
<ipython-input-3-9d6f58c40f95> in <module>
----> 1 with Image(filename='ces2.HEIC') as img:
2 print(img.size)
3
~/.env/py3/lib/python3.7/site-packages/wand/image.py in __init__(self, image, blob, file, filename, format, width, height, depth, background, resolution, pseudo)
4603 self.read(blob=blob, resolution=resolution)
4604 elif filename is not None:
-> 4605 self.read(filename=filename, resolution=resolution)
4606 # clear the wand format, otherwise any subsequent call to
4607 # MagickGetImageBlob will silently change the image to this
~/.env/py3/lib/python3.7/site-packages/wand/image.py in read(self, file, filename, blob, resolution)
4894 r = library.MagickReadImage(self.wand, filename)
4895 if not r:
-> 4896 self.raise_exception()
4897
4898 def save(self, file=None, filename=None):
~/.env/py3/lib/python3.7/site-packages/wand/resource.py in raise_exception(self, stacklevel)
220 warnings.warn(e, stacklevel=stacklevel + 1)
221 elif isinstance(e, Exception):
--> 222 raise e
223
224 def __enter__(self):
MissingDelegateError: no decode delegate for this image format `HEIC' # error/constitute.c/ReadImage/556
Any other alternatives available to do a conversion programmatically?
Consider using PIL in conjunction with pillow-heif:
pip3 install pillow-heif
from PIL import Image
from pillow_heif import register_heif_opener
register_heif_opener()
image = Image.open('image.heic')
That said, I'm not aware of any licensing/patent issues that would prevent HEIF support in Pillow (see this or this). libheif is widely adopted and free to use, provided you do not bundle the HEIF decoder with a device and fulfill the requirements of the LGPLv3 license.
You guys should check out this library, it's a Python 3 wrapper to the libheif library, it should serve your purpose of file conversion, extracting metadata:
https://github.com/david-poirier-csn/pyheif
https://pypi.org/project/pyheif/
Example usage:
import io
import whatimage
import pyheif
from PIL import Image
def decodeImage(bytesIo):
fmt = whatimage.identify_image(bytesIo)
if fmt in ['heic', 'avif']:
i = pyheif.read_heif(bytesIo)
# Extract metadata etc
for metadata in i.metadata or []:
if metadata['type']=='Exif':
# do whatever
# Convert to other file format like jpeg
s = io.BytesIO()
pi = Image.frombytes(
mode=i.mode, size=i.size, data=i.data)
pi.save(s, format="jpeg")
...
I was quite successful with Wand package :
Install Wand:
https://docs.wand-py.org/en/0.6.4/
Code for conversion:
from wand.image import Image
import os
SourceFolder="K:/HeicFolder"
TargetFolder="K:/JpgFolder"
for file in os.listdir(SourceFolder):
SourceFile=SourceFolder + "/" + file
TargetFile=TargetFolder + "/" + file.replace(".HEIC",".JPG")
img=Image(filename=SourceFile)
img.format='jpg'
img.save(filename=TargetFile)
img.close()
Here is another solution to convert heic to jpg while keeping the metadata intact. It is based on mara004s solution above, however I was not able to extract the images timestamp in that way, so had to add some code. Put the heic files in dir_of_interest before applying the function:
import os
from PIL import Image, ExifTags
from pillow_heif import register_heif_opener
from datetime import datetime
import piexif
import re
register_heif_opener()
def convert_heic_to_jpeg(dir_of_interest):
filenames = os.listdir(dir_of_interest)
filenames_matched = [re.search("\.HEIC$|\.heic$", filename) for filename in filenames]
# Extract files of interest
HEIC_files = []
for index, filename in enumerate(filenames_matched):
if filename:
HEIC_files.append(filenames[index])
# Convert files to jpg while keeping the timestamp
for filename in HEIC_files:
image = Image.open(dir_of_interest + "/" + filename)
image_exif = image.getexif()
if image_exif:
# Make a map with tag names and grab the datetime
exif = { ExifTags.TAGS[k]: v for k, v in image_exif.items() if k in ExifTags.TAGS and type(v) is not bytes }
date = datetime.strptime(exif['DateTime'], '%Y:%m:%d %H:%M:%S')
# Load exif data via piexif
exif_dict = piexif.load(image.info["exif"])
# Update exif data with orientation and datetime
exif_dict["0th"][piexif.ImageIFD.DateTime] = date.strftime("%Y:%m:%d %H:%M:%S")
exif_dict["0th"][piexif.ImageIFD.Orientation] = 1
exif_bytes = piexif.dump(exif_dict)
# Save image as jpeg
image.save(dir_of_interest + "/" + os.path.splitext(filename)[0] + ".jpg", "jpeg", exif= exif_bytes)
else:
print(f"Unable to get exif data for {filename}")
Adding to the answer by danial, i just had to modify the byte array slighly to get a valid datastream for further work. The first 6 bytes are 'Exif\x00\x00' .. dropping these will give you a raw format that you can pipe into any image processing tool.
import pyheif
import PIL
import exifread
def read_heic(path: str):
with open(path, 'rb') as file:
image = pyheif.read_heif(file)
for metadata in image.metadata or []:
if metadata['type'] == 'Exif':
fstream = io.BytesIO(metadata['data'][6:])
# now just convert to jpeg
pi = PIL.Image.open(fstream)
pi.save("file.jpg", "JPEG")
# or do EXIF processing with exifread
tags = exifread.process_file(fstream)
At least this worked for me.
You can use the pillow_heif library to read HEIF images in a way compatible with PIL.
The example below will import a HEIF picture and save it in png format.
from PIL import Image
import pillow_heif
heif_file = pillow_heif.read_heif("HEIC_file.HEIC")
image = Image.frombytes(
heif_file.mode,
heif_file.size,
heif_file.data,
"raw",
)
image.save("./picture_name.png", format="png")
This will do go get the exif data from the heic file
import pyheif
import exifread
import io
heif_file = pyheif.read_heif("file.heic")
for metadata in heif_file.metadata:
if metadata['type'] == 'Exif':
fstream = io.BytesIO(metadata['data'][6:])
exifdata = exifread.process_file(fstream,details=False)
# example to get device model from heic file
model = str(exifdata.get("Image Model"))
print(model)
Example for working with HDR(10/12) bit HEIF files using OpenCV and pillow-heif:
import numpy as np
import cv2
import pillow_heif
heif_file = pillow_heif.open_heif("images/rgb12.heif", convert_hdr_to_8bit=False)
heif_file.convert_to("BGRA;16" if heif_file.has_alpha else "BGR;16")
np_array = np.asarray(heif_file)
cv2.imwrite("rgb16.png", np_array)
Input file for this example can be 10 or 12 bit file.
I am facing the exact same problem as you, wanting a CLI solution. Doing some further research, it seems ImageMagick requires the libheif delegate library. The libheif library itself seems to have some dependencies as well.
I have not had success in getting any of those to work as well, but will continue trying. I suggest you check if those dependencies are available to your configuration.
Simple solution after going over multiple responses from people.
Please install whatimage, pyheif and PIL libraries before running this code.
[NOTE] : I used this command for install.
python3 -m pip install Pillow
Also using linux was lot easier to install all these libraries. I recommend WSL for windows.
code
import whatimage
import pyheif
from PIL import Image
import os
def decodeImage(bytesIo, index):
with open(bytesIo, 'rb') as f:
data = f.read()
fmt = whatimage.identify_image(data)
if fmt in ['heic', 'avif']:
i = pyheif.read_heif(data)
pi = Image.frombytes(mode=i.mode, size=i.size, data=i.data)
pi.save("new" + str(index) + ".jpg", format="jpeg")
# For my use I had my python file inside the same folder as the heic files
source = "./"
for index,file in enumerate(os.listdir(source)):
decodeImage(file, index)
It looked like that there is a solution called heic-to-jpg, but I might be not very sure about how this would work in colab.
the first answer works, but since its just calling save with a BytesIO object as the argument, it doesnt actually save the new jpeg file, but if you create a new File object with open and pass that, it saves to that file ie:
import whatimage
import pyheif
from PIL import Image
def decodeImage(bytesIo):
fmt = whatimage.identify_image(bytesIo)
if fmt in ['heic', 'avif']:
i = pyheif.read_heif(bytesIo)
# Convert to other file format like jpeg
s = open('my-new-image.jpg', mode='w')
pi = Image.frombytes(
mode=i.mode, size=i.size, data=i.data)
pi.save(s, format="jpeg")
I use the pillow_heif library. For example, I use this script when I have a folder of heif files I want to convert to png.
from PIL import Image
import pillow_heif
import os
from tqdm import tqdm
import argparse
def get_images(heic_folder):
# Get all the heic images in the folder
imgs = [os.path.join(heic_folder, f) for f in os.listdir(heic_folder) if f.endswith('.HEIC')]
# Name of the folder where the png files will be stored
png_folder = heic_folder + "_png"
# If it doesn't exist, create the folder
if not os.path.exists(png_folder):
os.mkdir(png_folder)
for img in tqdm(imgs):
heif_file = pillow_heif.read_heif(img)
image = Image.frombytes(
heif_file.mode,
heif_file.size,
heif_file.data,
"raw",
)
image.save(os.path.join(png_folder,os.path.basename(img).split('.')[0])+'.png', format("png"))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Convert heic images to png')
parser.add_argument('heic_folder', type=str, help='Folder with heic images')
args = parser.parse_args()
get_images(args.heic_folder)

Tesseract OCR fails on TIFF files

I have a multiple page .tif file, I am trying to extract text from it using Tesseract OCR but I am getting this error
TypeError: Unsupported image object
Code
from PIL import Image
import pytesseract
img = Image.open('Group 1/1_CHE_MDC_1.tif')
text = pytesseract.image_to_string(img.seek(0)) # OCR on 1st Page
text = ' '.join(text.split())
print(text)
ERROR
Any idea why its happening
Image.seek does not have a return value so you're essentially running:
pytesseract.image_to_string(None)
Instead do:
img.seek(0)
text = pytesseract.image_to_string(img)
I had a same question and i have tried below code and it worked for me :-
import glob
import pytesseract
import os
os.chdir("Set your Tesseract-OCR .exe file path")
b = ''
for i in glob.glob('Fullpath of your image directory/*.tif'): <-- you can give *.jpg extension in case of jpg image
if glob.glob('*.tif'):
b = b + (pytesseract.image_to_string(i))
print(b)
Happy learning !

Categories