I am trying to make a program that will scrape the text off of a screenshot using tesseract and python, and am having no issue getting one piece of it, however some text is lighter colored and is not being picked up by tesseract. Below is an example of a picture I am using:
I am am to get the text at the top of the picture, but not the 3 options below.
Here is the code I am using for grabbing the text
result = pytesseract.image_to_string(
screen, config="load_system_dawg=0 load_freq_dawg=0")
print("below is the total value scraped by the tesseract")
print(result)
# Split up newlines until we have our question and answers
parts = result.split("\n\n")
question = parts.pop(0).replace("\n", " ")
q_terms = question.split(" ")
q_terms = list(filter(lambda t: t not in stop, q_terms))
q_terms = set(q_terms)
parts = "\n".join(parts)
parts = parts.split("\n")
answers = list(filter(lambda p: len(p) > 0, parts))
I when I have plain text in black without a colored background I can get the answers array to be populated by the 3 below options, however not in this case. Is there any way I can go about fixing this?
You're missing binarization, or thresholding step.
In your case you can simply apply binary threshold on grayscale image.
Here is result image with threshold = 177
Here1 you can learn more about Thresholding with opencv python library
Related
We have paper invoices coming in, which are in paper format. We take images of these invoices, and wish to extract the information contained within the cells of the tabular region(s), and export them as CSV or similar.
The tables include multiple columns, and the cells contain numbers and words.
I have been searching around for ML-based Python procedures to have this performed, expecting this to be a relatively straightforward task (or maybe I'm mistaken), yet not much luck in coming across a procedure.
I can detect the horizontal and vertical lines, and combine them to locate the cells. But retrieving the information contained within the cells seems to be problematic.
Could I please get help?
I followed one procedure from this reference, yet came across an error with "bitnot":
import pytesseract
extract=[]
for i in range(len(order)):
for j in range(len(order[i])):
inside=''
if(len(order[i][j])==0):
extract.append(' ')
else:
for k in range(len(order[i][j])):
side1,side2,width,height = order[i][j][k][0],order[i][j][k][1], order[i][j][k][2],order[i][j][k][3]
final_extract = bitnot[side2:side2+h, side1:side1+width]
final_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1))
get_border = cv2.copyMakeBorder(final_extract,2,2,2,2, cv2.BORDER_CONSTANT,value=[255,255])
resize = cv2.resize(get_border, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
dil = cv2.dilate(resize, final_kernel,iterations=1)
ero = cv2.erode(dil, final_kernel,iterations=2)
ocr = pytesseract.image_to_string(ero)
if(len(ocr)==0):
ocr = pytesseract.image_to_string(ero, config='--psm 3')
inside = inside +" "+ ocr
extract.append(inside)
a = np.array(extract)
dataset = pd.DataFrame(a.reshape(len(hor), total))
dataset.to_excel("output1.xlsx")
The error I get is this:
final_extract = bitnot[side2:side2+h, side1:side1+width]
NameError: name 'bitnot' is not defined`
I am new to python. I am trying to extract mixed fractions from pdf file using Python. But I have no idea which tool I should use to extract. My sample pdf contains only one page with simple text. I would like to extract Part name and length of part using Python. Screenshot of sample pdf page is as shown in image link Page 1 of Pdf- Screenshot. Pdf file can be downloaded from the following link (Sample Pdf)
EDIT 1: - UPDATED
Thank you for suggesting Pdfplumber. It is a great tool. I could extract information with it. Though in some cases, when I extract length, I get the whole number combined with denominator. Say, if I have 36 1/2 as length (as shown in screenshot), then I get the value as 362 inches.
import pdfplumber
with pdfplumber.open("Sample.pdf") as pdf:
first_page = pdf.pages[0]
text = first_page.extract_text()
for row in text.split('\n'):
if 'inches' in row:
num = row.split()[0]
print(num)
Output: 362
This code works for me in most cases. Just in some cases, I get 362 as my output, instead of getting 36 as a separate value. How could I resolve this issue?
pdfplumber gives output like that
shape: square
part name: square
1
36 𝑖𝑛𝑐ℎ𝑒𝑠
2
I would suggest to use PDF Pluber, it's a very powerful and well documented tool for extracting text, table, images from PDFs.
Moreover, it has a very convenient function, called crop, that allows you to crop and extract just the portion of the page that you need.
Just as an example, the code would be something like this (note that this will work with any number of pages):
filename = 'path/to/your/PDF'
crop_coords = [x0, top, x1, bottom]
text = ''
pages = []
with pdfplumber.open(filename) as pdf:
for i, page in enumerate(pdf.pages):
my_width = page.width
my_height = page.height
# Crop pages
my_bbox = (crop_coords[0]*float(my_width), crop_coords[1]*float(my_height), crop_coords[2]*float(my_width), crop_coords[3]*float(my_height))
page_crop = page.crop(bbox=my_bbox)
text = text+str(page_crop.extract_text()).lower()
pages.append(page_crop)
Here is the explanation of coords:
x0 = % Distance from left vertical cut to left side of page.
top = % Distance from upper horizontal cut to upper side of page.
x1 = % Distance from right vertical cut to right side of page.
bottom = % Distance from lower horizontal cut to lower side of page.
I'm attempting to find an image in another.
im = cv.LoadImage('1.png', cv.CV_LOAD_IMAGE_UNCHANGED)
tmp = cv.LoadImage('e1.png', cv.CV_LOAD_IMAGE_UNCHANGED)
w,h = cv.GetSize(im)
W,H = cv.GetSize(tmp)
width = w-W+1
height = h-H+1
result = cv.CreateImage((width, height), 32, 1)
cv.MatchTemplate(im, tmp, result, cv.CV_TM_SQDIFF)
print result
When I run this, everything executes just fine, no errors get thrown. But I'm unsure what to do from here. The doc says that result stores "A map of comparison results". I tried printing it, but it gives me width, height, and step.
How do I use this information to find whether or not one image is in another/where it is located?
This might work for you! :)
def FindSubImage(im1, im2):
needle = cv2.imread(im1)
haystack = cv2.imread(im2)
result = cv2.matchTemplate(needle,haystack,cv2.TM_CCOEFF_NORMED)
y,x = np.unravel_index(result.argmax(), result.shape)
return x,y
CCOEFF_NORMED is just one of many comparison methoeds.
See: http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
for full list.
Not sure if this is the best method, but is fast, and works just fine for me! :)
MatchTemplate returns a similarity map and not a location.
You can then use this map to find a location.
If you are only looking for a single match you could do something like this to get a location:
minVal,maxVal,minLoc,maxLoc = cv.MinMaxLoc(result)
Then minLoc has the location of the best match and minVal describes how well the template fits. You need to come up with a threshold for minVal to determine whether you consider this result a match or not.
If you are looking for more than one match per image you need to use algorithms like non-maximum supression.
So I was working on this school project (I know really basic programming, and python is the only language I know) where I need to change my pixel colour to encode a message in a picture, but PIL's putpixel doesn't seem to be working, here is my code.
P.S.: all my PIL information is self taught and English isn't my main language so if you could talk simplified I'd be grateful
from PIL import Image
e=input('file and location? ')
img=Image.open(e)
pmap=img.load()
imy=img.height
imx=img.width
if int(input('1 for encoding, 2 for decoding '))==1:
a=input('Your message? ')
for i in range(len(a)):
r , g , b=img.getpixel((i+10,imy//2))
img.putpixel((i+10,imy//2),(ord(a[i]),g,b))
r,g,b=img.getpixel((len(a)+10,imy//2))
img.putpixel((len(a)+10,imy//2),(999,g,b)) #999 is the stop code in decoding
else:
r=u=0
m=''
while r!=999:
r , g , b=img.getpixel((10+u,imy//2))
m+=chr(r)
u+=1
print(m[:len(a)-1])
img.save(e)
please bare in mind that I'm not looking to make a visual difference and I've already done debugging.There are also no errors,putpixel is not working for some reason though.
as I said, I'm new to programming, so sorry if it includes stupid mistakes.
After using your code and trying it out on an image, putpixel is working as expected. The change in the pixels is very hard to see and that may be why you believe that it isn't working. Believe me, it is working, you just can't see it.
However, there are two problems I see with your code:
1) 999 is not encodable
999 can not be encoded in a single pixel. The maximum value for a pixel is 255 (The range is 0-255). You need to choose a different stop code/sequence. I recommend changing the stop code to 255.
2) When decoding, a has never been defined
You need to get the length of the message by another means. I suggest doing this with a counter:
counter = 0
while something:
counter += 1
# do something with count here
All in all, a working version of your code would look like:
e=input('file and location? ')
img=Image.open(e)
pmap=img.load()
imy=img.height
imx=img.width
if int(input('1 for encoding, 2 for decoding '))==1:
a=input('Your message? ')
for i in range(len(a)):
r , g , b= img.getpixel((i+10,imy//2))
img.putpixel((i+10,imy//2),(ord(a[i]),g,b))
r,g,b=img.getpixel((len(a)+10,imy//2))
img.putpixel((len(a)+10,imy//2),(255,g,b)) #255 is the stop code in decoding
else:
r=u=0
m=''
message_length=0
while r!=255:
message_length+=1
r , g , b=img.getpixel((10+u,imy//2))
m+=chr(r)
u+=1
print(m[:message_length-1])
img.save(e)
The difference is there, but it's just a few single pixels. If I calculate the difference between original and new image, you'll see it in the middle left, stored in test2.png. In order to enhance contrast I have "equalized" the image.
from PIL import Image, ImageChops, ImageOps
img=Image.open("image.jpg")
pmap=img.load()
img2=img.copy()
imy=img.height
imx=img.width
if int(input('1 for encoding, 2 for decoding '))==1:
a=input('Your message? ')
for i in range(len(a)):
r , g , b=img.getpixel((i+10,imy//2))
img.putpixel((i+10,imy//2),(ord(a[i]),g,b))
r,g,b=img.getpixel((len(a)+10,imy//2))
img.putpixel((len(a)+10,imy//2),(999,g,b)) #999 is the stop code in decoding
else:
r=u=0
m=''
while r!=999:
r , g , b=img.getpixel((10+u,imy//2))
m+=chr(r)
u+=1
print(m[:len(a)-1])
img.save("test.png")
img3=ImageChops.difference(img, img2)
img3=ImageOps.equalize(img3)
img3.save("test2.png")
This is the result:
I am using a simple code to compare an image to a desktop screenshot through the function getcolors() from PIL. When I open an image, it works:
im = Image.open('sprites\Bowser\BowserOriginal.png')
current_sprite = im.getcolors()
print current_sprite
However, using both pyautogui.screenshot() and ImageGrab.grab() for the screenshot, my code returns none. I have tried using the RGB conversion as shown here: Cannot use im.getcolors.
Additionally, even when I save a screenshot to a .png, it STILL returns none.
i = pyautogui.screenshot('screenshot.png')
f = Image.open('screenshot.png')
im = f.convert('RGB')
search_image = im.getcolors()
print search_image
First time posting, help is much appreciated.
Pretty old question but for those who sees this now:
Image.getcolors() takes as a parameter "maxcolors – Maximum number of colors." (from the docs here).
The maximum number of colors an image can have, equals to the number of pixels it contains.
For example, an image of 50*60px will have maximum 3,000 colors.
To translate it into code, try this:
# Open the image.
img = Image.open("test.jpg")
# Set the maxcolors number to the image's pixels number.
colors = img.getcolors(img.size[0]*img.size[1])
If you'd check the docs, getcolors returns None if the number of colors in the image is greater than the default parameter, which is set to 256.