How to use PIL (pillow) to draw text in any language? - python

I'm rendering user input text on a background image with Python PIL(I'm using pillow).
the code is simple:
draw = ImageDraw.Draw(im)
draw.text((x, y), text, font=font, fill=font_color)
the problem is, the user may input in any language, how could I determine which font to use?
ps: I know I have to have font files first, so I searched and found Google Noto, downloaded all the fonts, put them in /usr/local/share/fonts/, but these fonts are separated by language, so I still can't load a font that can render all user input texts.

NoTo (which is literally just Adobe's Source Pro fonts with a different name because it's easier for Google to market it that way) isn't a single font, it's a family of fonts. When you go to download them, Google explicitly tells you that there are lots of different versions for lots of different target languages, for the two simple reasons that:
if you need to typeset the entire planet's set of known scripts, there are vastly more glyphs than fit in a single font (OpenType fonts have a hard limit of 65535 glyphs per file due to the fact that glyph IDs are encoded as USHORT fields. And fonts are compositional: the "letter" ℃ can actually be the letter C and the symbol °, so it relies on three glyphs: two real glyphs, and one virtual composition. You run out of space real fast that way) , and
even if a font could fit all the glyphs, the same script may need to be rendered rather different depending on the language it's used for, so even having a single font for both Chinese and Japanese, or for Arabic and Urdu, simply doesn't work. While OpenType fonts can cope with that by being told which variation sets to use, and which compositional rules based on specific language tags, that is the kind of control that works great in InDesign or LaTeX, and is the worst thing for fonts that are going to be used in control-less context (like an Android webview, for instance).
So the proper solution is to grab all the fonts, and then pick the right one based on the {script, language} pair you're generating text for. Is that more complicated than what you're trying to do? Yes. Is it necessary? Equally yes =)

Related

Rasterising a TTF font

I'm on the Raspberry Pi with a screen attached.
Rather than using X, I'm writing pixel data directly to the frame buffer. I've been able to draw images and primitive shapes, blend, use double buffering, etc...
Where I'm hitting a problem is drawing text. The screen is just a byte array from this level, so I need a way to take the font, size, text, etc. and convert it into a bitmap (actually, a bool[] and width/height would be preferable as it saves additional read/writes.
I have no idea how to approach this.
Things I've considered so far...
Using a fixed-width font and an atlas/spritemap. Should work, I can already read images, however monospaced fonts have limited visual appeal. Also means adding more fonts is arduous.
Using a fixed-width font, an atlas and a mask to indicate where each character is. Would support variable-width fonts, however, scaling would be lossy and it seems like a maintenance nightmare unless I can automate the atlas/mask generation.
Has anyone managed to do anything like this before?
If a library is required, I can live with that but as this is more an exercise in understanding my Pi than it is a serious project, I'd prefer an explanation/tutorial.
Consider using the Cairo graphics library, either for all your graphics, or as a tool to generate the font atlases. Cairo has extensive support for rendering fonts using TTF fonts, as well as for other useful graphics operations.
At a lower level, you could also use the Freetype library to load fonts and render characters from them directly. It's more difficult to work with, though.

How does Photoshop convert type format to a rasterized layer?

I have been thinking of fonts quite recently. I find the whole process of a keystroke converted to a character displayed in a particular font quite fascinating. What fascinates me more is that each character is not an image but just the right bunch of pixels switched on (or off).
In Photoshop when I make a text layer, I am assuming it's like any other text layer in a word processor. There's a glyph attached to a character and that is displayed. So technically it's still not an 'image' so as to speak and it can be treated as a text in a word processor. However, when you rasterize the text layer, an image of the text is created with the font that was used. Can somebody tell me how Photoshop does this? I am assuming there should be a lookup table with the characters' graphics which Photoshop accesses to rasterize the layer.
I want to kind of create a program where I generate an image of the character that I am pressing (in C or Python or something like that). Is there a way to do this?
Adobe currently has publicly accessible documentation for the Photoshop file format. I've needed to extract information from PSD files (about a year ago, but actually the ancient CS2 version of Photoshop) so I can warn you that this isn't light reading, and there are some parts (at least in the CS2 documentation) that are incomplete or inaccurate. Usually, even when you have file format documentation, you need to do some reverse engineering to work with that file format.
Even so, see here for info about the TySh chunk from Photoshop 6.0 (not sure at a quick glance if it's still the current form for text - "type" to Photoshop).
Anyway, yes - text is stored as a sequence of character codes in memory and in the file. Fonts are basically collections of vector artwork, so that text can be converted to vector paths. That can be done either by dealing with the font files yourself, using on operating system call (there's definitely one for Windows, but I don't remember the name, it's bugging me now so I might figure it out later), or using a library.
Once you have the vector form, that's basically Bezier paths just like any other vector artwork, and can be rendered the same way.
Or to go directly from text to pixels, you just ask e.g. Windows to draw the text for you - perhaps to a memory DC (device context) if you don't want to draw to the screen.
FreeType is an open source library for working with fonts. It can definitely render to a bitmap. I haven't checked but it can probably convert text to vector paths too - after all it needs to do that as part of rendering to pixels anyway.
Cairo is another obvious library to look at for font handling and much more, but I've never used it directly myself.
wxWidgets is yet another obvious library to look at, and uses a memory-DC scheme similar to that for Windows, though I don't remember exact class/method names. Converting text to vectors might be outside wxWidgets scope, though.

Tell if text of PDF is visible or not

I'm parsing some PDF files using the pdfminer library.
I need to know if the document is a scanned document, where the scanning machine places the scanned image on top and OCR-extracted text in the background.
Is there a way to identify if text is visible, as OCR machines do place it on the page for selection.
Generally the problem is distinguishing between two very different, but similar looking cases.
In one case there's an image of a scanned document that covers most of the page, with the OCR text behind it.
Here's the PDF as text with the image truncated: http://pastebin.com/a3nc9ZrG
In the other case there's a background image that covers most of the page with the text in front of it.
Telling them apart is proving difficult for me.
Your question is a bit confusing so I'm not really sure what is going to help you the most. However, you describe two ways to "hide" text from OCR. Both I think are detectable but one is much easier than the other.
Hidden text
Hidden text is regular or invisible text that is placed behind something else. In other words, you use the stacking order of objects to hide some of them. The only way you can detect this type of case is by figuring out where all of the text objects on the page are (calculating their bounding boxes isn't trivial but certainly possible) and then figuring out whether any of the images on the page overlaps that text and is in front of it. Some additional comments:
Theoretically it could be something else than an image hiding it, but in your OCR case I would guess it's always an image.
Though an image may be overlapping it, it may also be transparent in some way. In that case, the text that is underneath may still shine through. In your case of a general OCR engine, probably not likely.
Invisible text
PDF supports invisible text. More precisely, PDF supports different text rendering modes; those rendering modes determine whether characters are filled, outlined, filled + outlined, or invisible (there are other possibilities yet). In the PDF file you posted, you find this fragment:
BT
3 Tr
0.00 Tc
/F3 8.5 Tf
1 0 0 1 42.48 762.96 Tm
(Chicken ) Tj
That's an invisible chicken right there! The instruction "3 Tr" sets the text rendering mode to "3", which is equal to "invisible" or "neither stroked nor filled" as the PDF specification very elegantly puts it.
It's worthwhile mentioning that these two techniques can be used interchangeably by OCR engines. Placing invisible text on top of a scanned image is actually good practice because it means that most PDF viewers will allow you to select the text. Some PDF viewers that I looked at at some point didn't allow text selection if the text was "behind" the image.
I don't have a copy of the PDF 1.7 specification, but I suspect that the objects on a page are rendered in order, that is, the preceding objects end up covered up by succeeding objects.
Thus, you would have to iterate through the layout objects (See Performing Layout Analysis) and calculate where everything falls on the page, their dimensions, and their rendering order (and possibly their transparency).
As the pdfminer documentation mentions, PDF is evil.

Extract text from screen in python

Is there a library etc for extracting text from a png bitmap screen shot?
It is for a automizer and would (for example) be able to read buttons etc. I've checked Tesseract, but it seems to be made for pictures, not computer screen fonts.
If you're dealing with a small amount of possible matches (i.e.: you want to recognize two or three different buttons), the simplest way is to isolate those in a previous screenshot, save them to individual files, and then use some form of template matching, which is quite easy in opencv.
If, however, you need to actually perform recognition of the button text, you're going to need a OCR engine. Tesseract is a good candidate, if you can get it trained for your font (it's a lengthy process). As you mention, you'll need to do this if you're dealing with a small font, which tesseract is not originally trained to recognize. If you can't, there's a couple other engines usable in python around, like Ocropus

ocr'ing application text (not scanned, NOT captchas)

I'd like to interface an application by reading the text it displays.
I've had success in some applications when windows isn't doing any font smoothing by typing in a phrase manually, rendering it in all windows fonts, and finding a match - from there I can map each letter image to a letter by generating all letters in the font.
This won't work if any font smoothing is being done, though, either by Windows or by the application. What's the state of the art like in OCRing computer-generated text? It seems like it should be easier than breaking CAPTCHAs or OCRing scanned text. Where can I find resources about this? So far I've only found articles on CAPTCHA breaking or OCRing scanned text.
I prefer solutions easily accessible from Python, though if there's a good one in some other lang I'll do the work to interface it.
I'm not exactly sure what you mean, but I think just reading the text with an OCR program would work well.
Tesseract is amazingly accurate for scanned documents, so a specific font would be a breeze for it to read. Here's my Python OCR solution: Python OCR Module in Linux?.
But you could generate each character as an image and find the locations on the image. It (might) work, but I have no idea how accurate it would be with smoothing.

Categories