Convert PDF to image using python (pdf2image module) - python

I am using Python with pdf2image module to convert a PDF to image.
My Code :
import numpy as np
from pdf2image import convert_from_path
pages = convert_from_path('file.pdf')
print(pages)
I don't know why i am getting this errors

the error is due to the path ! you should put the absolute path starting from C:\ (windows)

Related

ModuleError: No module named pptx found

I am writing a python code for merging ppts. It takes the location of the 2 ppts, merges them and puts the merged ppt formed in the folder given by user. The code used is:
import sys
from pptx import Presentation
#import Aspose.Words.License
#import aspose.slides as a_slides
#import os
#import win32com.client
def merge_powerpoint_ppts(pres_loc1, pres_loc2, output_loc):
p1 = open(pres_loc1)
pres1 = Presentation(p1)
p2 = open(pres_loc2)
pres2 = Presentation(p2)
for slide in pres2.slides:
for lide in pres1.slides:
if slide.shape.title.text == lide.shape.title.text:
pres1.slides.add_Clone(slide)
pres1.save(output_loc)
p1.close()
p2.close()
When I try to debug the code, I get the following:
[Error][1]
[1]: https://i.stack.imgur.com/yID2l.png
I have already installed the module pptx on my system and it is updated, but I am still getting this error.
First if you have a folder named Presentation or pptx change it cause this could happen because of naming confusion of files or folder and python modules
Secondly make sure you use the correct python interpreter or env where you install pptx in in
last Option uninstall pptx and write the following command
conda install -c conda-forge python-pptx

Reading text from image with Pytesseract gives bad path error

Im trying to read text from image with pytesseract. Im using mac.
I have install pytesseract with pip.
import cv2
import pytesseract
img = cv2.imread('slika1.png')
text = pytesseract.image_to_string(img)
print(text)
It gives me this error:
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
when I do this:
import importlib.util
print(importlib.util.find_spec('pytesseract'))
It prints:
ModuleSpec(name='pytesseract', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f8a7837c160>, origin='/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract/__init__.py', submodule_search_locations=['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract'])
So what should I do, what am I doing wrong?
Is there any other way to read text from image?
Try to open the module source file (as admin) and edit the path to the Tesseract exe file - set it with absolute path if needed. It should be a const in the top lines.
Something like this (on Win):
"C:\Program Files\Python36\Lib\site-packages\pytesseract\pytesseract.py"
Set the path: ...
pytesseract.tesseract_cmd = r"D:\OCR\tesseract.exe"
https://github.com/Twenkid/ComputerVision_Pyimagesearch_OpenCV_Dlib_OCR-Tesseract-DL/blob/master/OCR_Tesseract/ocr.py

FileNotFoundError: [WinError 2] The system cannot find the file specified while trying to use pytesseract

I am trying to write a python script to extract the text from image and i keep getting this error. The script is given below. Error
from PIL import Image
from pytesseract import image_to_string
print (image_to_string(Image.open('samp.png')))
print (image_to_string(Image.open('test-english.jpg'), lang='eng'))
Try the following steps, this worked for me.
1) Download tesseract-OCR from here and install it in the location C:/Program Files
2)write the following code
from PIL import Image
from pytesseract import image_to_string
#pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
i.e
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
3)Now run this
print(pytesseract.image_to_string(Image.open('D:/image_file.jpg')))
Hope that help!

Import python package in the module

I am beginner in Python. I am using set python librarys and I want to take the part of my code in .py module. Where should I write "import" of the set of libraries, in the module, or in main file? If I dont write it in the module, program doesn't work.
#mainfile.py
import cv2
import faceResearch
faceResearch.mn()
#faceResearch.py
import cv2
def mn():
image = cv2.imread("Smiling/3--1873301-Smiling woman looking at camera.jpg")
cv2.imshow("im", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
#so, in wich file should I write "import cv2"? in mainfile? in file of the module? or in both?
You should keep import cv2 in your module (faceResearch.py ?) not in main.py. With that all others scripts importing your module will automatically import cv2 ; if cv2 is installed. You can check if cv2 is installed, and if not display an error message : link

Python error when importing image_to_string from tesseract

I recently used tesseract OCR with python and I kept getting an error when I was trying to import image_to_string from tesseract.
Code causing the problem:
# Perform OCR using tesseract-ocr library
from tesseract import image_to_string
image = Image.open('input-NEAREST.tif')
print image_to_string(image)
Error caused by above code:
Traceback (most recent call last):
file "./captcha.py", line 52, in <module>
from tesseract import image_to_string
ImportError: cannot import name image_to_string
I've verified that the tesseract module is installed:
digital_alchemy#roaming-gnome /home $ pydoc modules | grep 'tesseract'
Hdf5StubImagePlugin _tesseract gzip sipconfig
ORBit cairo mako tesseract
I believe that I've grabbed all the required packages but unfortunately I'm just stuck at this point. It appears that the function is not in the module.
Any help greatly appreciated.
Another possibility that seems to have worked for me is to modify pytesseract so that instead of import Image it has from PIL import Image
Code that works in PyCharm after modifying pytesseract:
from pytesseract import image_to_string
from PIL import Image
im = Image.open(r'C:\Users\<user>\Downloads\dashboard-test.jpeg')
print(im)
print(image_to_string(im))
Pytesseract I installed via the package management built into PyCharm
Is your syntax correct for the module you have installed? That image_to_string functions looks like it is from PyTesser per the usage example on this page:
https://code.google.com/p/pytesser/
Your import looks like it is for python-tesseract which has a more complicated usage example listed:
https://code.google.com/p/python-tesseract/
For windows followed below steps
pip3 install pytesseract
pip3 install pillow
Installation of tessaract-ocr is also required
https://github.com/tesseract-ocr/tesseract/wiki
otherwise you will get an error Tessract is not on path
Python code
from PIL import Image
from pytesseract import image_to_string
print ( image_to_string(Image.open('test.tif'),lang='eng') )
what works for me:
after I install the pytesseract form tesseract-ocr-setup-3.05.02-20180621.exe
I add the line
pytesseract.pytesseract.tesseract_cmd="C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
and use the code form the above this is all the code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd="C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
im=Image.open("C:\\Users\\<user>\\Desktop\\ro\\capt.png")
print(pytesseract.image_to_string(im,lang='eng'))
I am using windows 10 with PyCharm Community Edition 2018.2.3 x64

Categories