Binary image B2
Binary image Y2
I think these images are quite simple and clear. Still pytesseract does not work. I really wonder why.
Here is my code
from pytesseract import pytesseract as tesseract
import cv2 as cv
binary = cv.imread(filepath)
lang = 'eng'
config = 'tessedit_char_whitelist=RGB123'
print(tesseract.image_to_string(binary, lang=lang, config=config))
The output is just blank string.
To Dennlinger's point, I would definitely rotate it before sending it through PyTess. PyTess should rotate it automatically though. Should.
Alternatively, I see in your configuration that you have white listed "RGB123" which, correct me if I'm wrong, may mean that PyTess is mainly looking for those specific numbers and characters.
I'd try changing your configuration by omiting that configuration so that it can pick up the "Y" in there.
Related
UPDATE: I tried increasing size in the chess.svg.board and it somehow cleared all the rendering issues at size = 900 1800
I tried using the svglib and reportlab to make .png files from .svg, and here is how the code looks:
import sys
import chess.svg
import chess
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPM
board = chess.Board("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR")
drawing = chess.svg.board(board, size=350)
f = open('file.svg', 'w')
f.write(drawing)
drawing = svg2rlg("file.svg")
renderPM.drawToFile(drawing, "file.png", fmt="png")
If you try to open file.png there is a lot of missing parts of the image, which i guess are rendering issues. How can you fix this?
Sidenote: also getting a lot of 'x_order_2: colinear!' messages when running this on a discord bot, but I am not sure if this affects anything yet.
THIS!! I am having the same error with the same libraries... I didn't find a solution but just a workaround which probably won't help too much in your case, where the shapes generating the bands are not very sparse vertically.
I'll try playing with the file dimensions too, but so far this is what I got. Note that my svg consists of black shapes on a white background (hence the 255 - x in the following code)
Since the appearance of the bands is extremely random, and processing the same file several times in a row produces different results, I decided to take advantage of randomness: what I do is I export the same svg a few times into different pngs, import them all into a list and then only take those pixels that are white in all the exported images, something like:
images_files = [my_convert_function(svgfile=file, index=i) for i in range(3)]
images = [255 - imageio.imread(x) for x in images_files]
result = reduce(lambda a,b: a & b, images)
imageio.imwrite(<your filename here>, result)
[os.remove(x) for x in images_files]
where my_convert_function contains your same svg2rlg and renderPM.drawToFile, and returns the name of the png file being written. The index 'i' is to save several copies of the same png with different names.
It's some very crude code but I hope it can help other people with the same issue
The format parameter has to be in uppercase
renderPM.drawToFile(drawing, "file.png", fmt="PNG")
Here is my code:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'F:\Installations\tesseract'
print(pytesseract.image_to_string('images/meme1.png', lang='eng'))
And here is the image:
And the output is as follows:
GP.
ed <a
= va
ay Roce Thee .
‘ , Pe ship
RCAC Tm alesy-3
Pein Reg a
years —
? >
ee bs
I see the word years in the output so it does recognize the text but why doesn't it recognize it fully?
OCR is still a very hard problem in cluttered scenes. You probably won't get better results without doing some preprocessing on the image. In this specific case it makes sense to threshold the image first, to only extract the white regions (i.e. the text). You can look into opencv for this: https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html
Additionally, in your image, there are only two lines of text in arbitrary positions, so it might make sense to play around with page segmentation modes: https://github.com/tesseract-ocr/tesseract/issues/434
I have been trying to teach myself more advanced methods in Python but can't seem to find anything similar to this problem to base my code off of.
First question: Is this only way to display an image in the terminal to install Pillow? I would prefer not to, as I'm trying to then teach what I learn to a very beginner student. My image.show() function doesn't do anything.
Second question: What is the best way to go about lowering the brightness of all RGB pixels in an image by 20%? What I have below doesn't do anything to the alter the brightness, but it also can compile completely. I would prefer the most simple way to go about this as far as importing minimal libraries.
Third Question: How do I made a new picture instead of changing the original? (IE- lower brightness 20%, "image-decreasedBrightness.jpg" is created from "image.jpg")
here is my code - sorry it isn't formatted correctly. Every time i tried to indent it would tab down to the tags bar.
import Image
import ImageEnhance
fileToBeOpened = raw_input("What is the file name? Include file type.")
image = Image.open(fileToBeOpened)
def decreaseBrightness(image):
image.show()
image = image.convert('L')
brightness = ImageEnhance.Brightness(image)
image = brightness.enhance(20)
image.show()
return image
decreaseBrightness(image)
To save the image as a file, there's an example on the documentation:
from PIL import ImageFile
fp = open("lena.pgm", "rb")
p = ImageFile.Parser()
while 1:
s = fp.read(1024)
if not s:
break
p.feed(s)
im = p.close()
im.save("copy.jpg")
The key function is im.save.
For a more in-depth solution, get a nice beverage, find a comfortable place to sit and enjoy your read:
Pillow 3.4.x Documentation.
I'm learning OCR using PyTesser and Tesseract. As the first milestone, I want to write a tool to recognize captcha that simply consists of some digits. I read some tutorials and wrote such a test program.
from pytesser.pytesser import *
from PIL import Image, ImageFilter, ImageEnhance
im = Image.open("test.tiff")
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
text = image_to_string(im)
print "text={}".format(text)
I tested my code with the image below. But the result is 2(T?770. And I've tested some other similar images as well, in 80% case the results are incorrect.
I'm not familiar with imaging processing. I've two questions here:
Is it possible to tell PyTesser to guess digits only?
I think the image is quite easy for human to read. If it is so difficult for PyTesser to read digits only image, is there any alternatives can do a better OCR?
Any hints are very appreciated.
I think your code is quite okay. It can recognize 207770. The problem is at pytesser installation. The Tesseract in pytesser is out-of-date. You'd download a most recent version and overwrite corresponding files. You'd also edit pytesser.py and change
tesseract_exe_name = 'tesseract'
to
import os.path
tesseract_exe_name = os.path.join(os.path.dirname(__file__), 'tesseract')
Here's the code I'm using:
from PIL import Image
import ImageFont, ImageDraw
import sys
import pdb
img = Image.new("RGBA",(300,300))
draw = ImageDraw.Draw(img)
font = ImageFont.truetype(sys.argv[1],30)
draw.text((0,100),"world",font=font,fill="red")
del draw
img.save(sys.argv[2],"PNG")
and here's the image that results:
img http://www.freeimagehosting.net/image.php?976a0d3eaa.png ( for some reason, I can't make it show on SO, so the link is http://www.freeimagehosting.net/image.php?976a0d3eaa.png )
The thing is, I don't understand why it isn't drawing the font correctly? I should be able to read the word "world" off of it. It's like the picture's been cut in half or something. Does anyone have any clue?
EDIT: after balpha's comment, I decided to try another font. I'm only interested in ttf fonts, so I tried with another one, and it worked. This is kind of strange. The original font I tried to run this with is Beautiful ES. I'm curious if you guys can reproduce the same image on your computers, and if you happen to know the reason for why that is.
PIL uses the freetype2 library, so most possibly it is an issue with the font file; for example, it could have bad metrics defined (e.g see the OS/2 related ones opening the font with FontForge).