I am trying norm my document data input and I am facing a lot of .pdf, .tiff, and .tiff documents.
I want to normalise all documents by getting them into .pdf format but facing issues with the .tif documents in my to pdf conversion function.
The problem is dealing with
def tiff_to_pdf(tiff_path: str) -> str:
if tiff_path.endswith(".tif"):
tiff_path.replace(".tif", ".tiff")
pdf_path = tiff_path.replace('.tiff', '.pdf')
if not os.path.exists(tiff_path): raise Exception(f'{tiff_path} does not find.')
image = Image.open(tiff_path)
images = []
for i, page in enumerate(ImageSequence.Iterator(image)):
page = page.convert("RGB")
images.append(page)
if len(images) == 1:
images[0].save(pdf_path)
else:
images[0].save(pdf_path, save_all=True, append_images=images[1:])
return pdf_path
Also simply trying to load the .tif as in the snippet below causes an error:
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'a_image.tif'
from PIL import Image
im = Image.open('a_image.tif')
im.show()
I honestly checked the directory and it is finding every .tiff file and is able to get a pdf out of it, but not working with .tif files.
Any help would be highly appreciated and intermediate steps to get from .tif to .pdf are of course fine.
You change in string .tif to .tiff but you don't assign new string to variable.
But you should change .tif directly to .pdf
if tiff_path.endswith(".tif"):
pdf_path = tiff_path.replace(".tif", ".pdf")
elif tiff_path.endswith(".tiff"):
pdf_path = tiff_path.replace('.tiff', '.pdf')
else:
print('Wrong extension:', tiff_path)
return # exit function with `None` when extension is wrong
# or raise error
# raise Exception(f'{tiff_path} has wrong extension.')
# ... code which converts image to PDF ...
image = Image.open(tiff_path)
# ...
images[0].save(pdf_path)
# ...
images[0].save(pdf_path, save_all=True, , append_images=images[1:])
And when you check file then you may need to use full path
im = Image.open('/full/path/to/a_image.tif')
because code may run in different folder and it may not find this file in different folder.
Related
For me, the code is running perfectly but unable to get the output. I am extracting multiple-page TIFF to text and later to HOCR. Please suggest, where it went wrong
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract- OCR\tesseract.exe"
image = Image.open(r"C:\Users\multipage.tiff")
config = ("--oem 3 --psm 6")
txt = ''
for frame in range(image.n_frames):
image.seek(frame)
txt += pytesseract.image_to_string(image, config = config, lang='eng') + '\n'
print(txt)
with open(r"C:\Users\multipage_output.txt", mode = 'w') as f:
f.write(txt)
My Query is how to extract tiff images from the subfolder into. hocr format in another similar subfolder. The above code is working fine to retrieve tiff to a text file.
Example
D:\\subfolder\Subfolder1\tiff image to E:\subfolder\Subfolder1\Hocr image
This is my attempt:
import os
from PIL import Image
directory = r'../Icons/ico'
for filename in os.listdir(directory):
if filename.endswith(".ico"):
print(os.path.join(directory, filename))
img = Image.open(os.path.join(directory,filename))
sizes = img.info['sizes']
for i in sizes:
img.size = i
print(img.size)
size_in_string = str(img.size)
img.save('png/' + filename.strip('.ico') + size_in_string + '.png')
else:
continue
I'm afraid that this code is not grabbing the separate ico files and instead, grabbing the largest ico file and resizing it. Can someone please help me?
According to your title.
Here is how to convert a ico to png through python.
from PIL import Image
filename = 'image.ico'
img = Image.open(filename)
img.save('image.png')
#Optinally to save with size
icon_sizes = [...]
img.save('image.png', sizes=icon_sizes)
I am pretty sure you can adapt it in your code.
you can give a try to :
https://www.convertapi.com/ico-to-png
Code snippet is using ConvertAPI Python Client
convertapi.api_secret = '<YOUR SECRET HERE>'
convertapi.convert('png', {
'File': '/path/to/my_file.ico'
}, from_format = 'ico').save_files('/path/to/dir')
In addition, we do have a question in stackoverflow.com:
How to convert an .ICO to .PNG with Python?
or you can just change the end of the .ico file to .png
I have a folder that contains 2000 TIF images and I want to convert them to jpg images.
I wrote two codes and both work well until they convert 370 images and then they raise an error
Here is my first code :
DestPath='/media/jack/Elements/ToJPG95/'
from PIL import Image
import os
def change(path, row):
filename1=path+row
filename=row.split('.')[0] + '.jpg'
im = Image.open(filename1)
img= im.convert('RGB')
Dest=os.path.join(DestPath,filename)
img.save(Dest, format='JPEG',quality=95)
import csv
sourcePath='/media/jack/Elements/TifImages/'
with open("TIFFnames.csv") as f:
filtered = (line.replace('\n', '') for line in f)
reader = csv.reader(filtered)
for row in filtered:
change(sourcePath , row)
and here is my second code which I ran in inside the folder that has the images :
from PIL import Image # Python Image Library - Image Processing
import glob
DestPath='/media/jack/Elements/ToJPG95/'
print(glob.glob("*.TIF"))
for file in glob.glob("*.TIF"):
im = Image.open(file)
rgb_im = im.convert('RGB')
rgb_im.save(DestPath+file.replace("TIF", "jpg"), quality=95)
# based on SO Answer: https://stackoverflow.com/a/43258974/5086335
they convert up to 370 images and then give an error
Here is the error I am getting :
Traceback (most recent call last):
File "conmg.py", line 7, in <module>
rgb_im = im.convert('RGB')
File "/home/jack/.local/lib/python3.6/site-packages/PIL/Image.py", line 873, in convert
self.load()
File "/home/jack/.local/lib/python3.6/site-packages/PIL/TiffImagePlugin.py", line 1070, in load
return self._load_libtiff()
File "/home/jack/.local/lib/python3.6/site-packages/PIL/TiffImagePlugin.py", line 1182, in _load_libtiff
raise OSError(err)
OSError: -2
I have tried imagemagick mentioned in the solution Here
but this is what I am getting when I click enter to run the command:
jack#jack-dell:/media/jack/Elements/TifImages$ for f in *.tif; do echo "Converting $f"; convert "$f" "$(basename "$f" .tif).jpg"
>
>
>
>
As you can see, it does nothing
I think the codes work well but for some reason they fail after converting 370 images
I am running on a 6 TB external hard drive.
Can any one tell me please whats wrong ?
As #fmw42 says, you likely have a problem with the 370th file (corrupt, or some ill-supported TIFF variant). You bash code will convert all the files that can be read, it doesn't work because you are missing a closing done:
for f in *.tif; do echo "Converting $f"; convert "$f" "$(basename "$f" .tif).jpg" ; done
Your Python would also convert all the readable files if you use try/except to catch errors and continue with the next file:
for file in glob.glob("*.TIF"):
try:
im = Image.open(file)
rgb_im = im.convert('RGB')
rgb_im.save(DestPath+file.replace("TIF", "jpg"), quality=95)
except:
print('File not converted:',file)
Hi I am facing issues while trying to convert PDF files to .jpeg
I am running python from anaconda distribution on windows machine.
Below is the code that is working for some of the pdfs
import os
from wand.image import Image as wi
pdf_dir = r"C:\\Users\Downloads\python computer vison\Computer-Vision-with-Python\pdf_to_convert"
os.chdir(pdf_dir)
path = r"C:/Users/Downloads/python computer vison/Computer-Vision-with-Python/jpeg_extract/"
for pdf_file in os.listdir(pdf_dir):
print("filename is ",pdf_file)
pdf = wi(filename=pdf_file,resolution=300)
#print("filename is ",pdf_file)
pdfImage = pdf.convert("jpeg")
i = 1
for img in pdfImage.sequence:
page = wi(image=img)
page.save(filename=path+pdf_file+str(i)+".jpg")
i+=
and below is the output
filename is tmpdocument-page0.pdf
filename is tmpdocument-page1.pdf
filename is tmpdocument-page100.pdf
filename is tmpdocument-page1000.pdf
filename is tmpdocument-page1001.pdf
filename is tmpdocument-page1002.pdf
filename is tmpdocument-page1003.pdf
filename is tmpdocument-page1004.pdf
filename is tmpdocument-page1005.pdf
filename is tmpdocument-page1006.pdf
filename is tmpdocument-page1007.pdf
filename is tmpdocument-page1008.pdf
filename is tmpdocument-page1009.pdf
filename is tmpdocument-page1012.pdf
---------------------------------------------------------------------------
CorruptImageError Traceback (most recent call last)
<ipython-input-7-84715f25da7c> in <module>()
8 #path = r"C://Users/Downloads/Work /ml_training_samples/tmp/"
9 print("filename is ",pdf_file)
---> 10 pdf = wi(filename=pdf_file,resolution=300)
11 #print("filename is ",pdf_file)
12 pdfImage = pdf.convert("jpeg")
~\Anaconda3\envs\python-cvcourse\lib\site-packages\wand\image.py in __init__(self, image, blob, file, filename, format, width, height, depth, background, resolution, pseudo)
4706 self.read(blob=blob, resolution=resolution)
4707 elif filename is not None:
-> 4708 self.read(filename=filename, resolution=resolution)
4709 # clear the wand format, otherwise any subsequent call to
4710 # MagickGetImageBlob will silently change the image to this
~\Anaconda3\envs\python-cvcourse\lib\site-packages\wand\image.py in read(self, file, filename, blob, resolution)
5000 r = library.MagickReadImage(self.wand, filename)
5001 if not r:
-> 5002 self.raise_exception()
5003
5004 def save(self, file=None, filename=None):
~\Anaconda3\envs\python-cvcourse\lib\site-packages\wand\resource.py in raise_exception(self, stacklevel)
220 warnings.warn(e, stacklevel=stacklevel + 1)
221 elif isinstance(e, Exception):
--> 222 raise e
223
224 def __enter__(self):
CorruptImageError: unable to read image data `C:/Users/AppData/Local/Temp/magick-40700dP2k-1ORw81R1' # error/pnm.c/ReadPNMImage/1346
bach ground
so i have a pdf Image document i named as tmpdocument which has over 2200 pages so i split them using python into individual pdf documents.Now I am trying to convert them into jpeg.
problem:
so when I am trying to convert the pdf's into jpeg some of the pages are successful and some page fa9.ils with the above error since all these pages are from same document i highly doubt this is an format issue. also I am able to open and view the image in adobe so i'm sure that page is not corrupted.
Lastly Image magic takes so much disk space and then this issue I am truly lost is there any other way to achieve the above scenerio any inputs would be helpful.
Thanks.
Updated
Thanks for the reply.
Yes I am using ghostscript 9.26. The pdf is kinda sensitive data so I cant post online unfortunately. temp folder is 18mb so i think that is okay.
I have found some code online it is generating the jpeg files but replacing them rather than creating new files i have never done any subprocess before and there is no visibility in this code if program is running or failed or how to kill it any inputs here also appreciated.
I understand it is not using image magick anymore still I am okay as long as i can generate jpeg.
import os, subprocess
pdf_dir = r"C:\\Users\Downloads\latest_python\python computer vison\Computer-Vision-with-Python\pdf_to_convert"
os.chdir(pdf_dir)
pdftoppm_path = r"C:\Program Files\poppler-0.68.0_x86\poppler-0.68.0\bin\pdftoppm.exe"
i = 1
for pdf_file in os.listdir(pdf_dir):
if pdf_file.endswith(".pdf"):
subprocess.Popen('"%s" -jpeg %s out' % (pdftoppm_path, pdf_file))
i+=1
Im trying to make a function which needs as input an image file in jpg format and outputs an array every time i call it. This is what i achieved so far:
import scipy.misc as sm
import numpy as np
from PIL import Image
def imagefunc(image):
try:
i = Image.open(image)
if i.format == 'jpg':
return i.format == 'jpg'
except OSError: # Checking for different possible errors in the input file
print ('This is not a jpg image! Input has to be a jpg image!')
return False
except FileNotFoundError: # Another check for error in the input file
print ('No image was found! Input file has to be in the same directory as this code is!')
return False
imgarray = np.array(sm.imread(image, True))
return imgarray
The problem is that when i call it, "imagefunc(kvinna)" to open a jpeg picture it outputs: NameError: name 'kvinna' is not defined. What am i missing here? Is the code wrong or is it file directory problem? Thanks
Reading and Writing Images
You are not opening the image correctly, hence the Name Error
i = Image.open(image) # image should be "image_name.ext"
here image should be "kvinna.jpeg" with the extension.
so the function call will be: imagefunc("kvinna.jpeg") further check or either jpeg or jpg in your function definition.
Image.open(image) returns an Image object, later check the extension for it.