Python text to image generation problems - python

I'm using PIL to load in various fonts and draw text to images. At the basic level, it all works.
However, I am running into a number of problems such as letters being clipped (mainly cursive or stylistic fonts with lots of tails and such). textsize() does return width/height values, yet letters are still clipped. There also doesn't seem to be methods in PIL to specify larger image sizes for the character generating. Another issue is the vertical spacing. It seems PIL returns large height values for certain fonts and thus the vertical spacing between lines is overly large.
I'm in search of a more advanced font and text handling system than PIL, given its apparent limitations.
I've been researching this a lot over the last week (Google, Python docs, Stackoverflow, etc) and I've seen people recommending to use either Imagemagick or a combination of pango and cairo. However, as much as I've read and searched for these respective technologies I am simply not finding any usable documentation that pertains to what I am trying to do. There are some Python bindings for Imagemagick, but they all seem several years out of date.
Can some of the helpful souls here on SO point me to some tutorials on how to use Pango/Cairo and/or Imagemagick?

The Cairo cookbook has a number of examples for using Cairo, and the Python routines are almost mirror images of the C routines.

I've had some fine results with PyGame, but I don't know if it will necessarily solve your problem.

Related

Rasterising a TTF font

I'm on the Raspberry Pi with a screen attached.
Rather than using X, I'm writing pixel data directly to the frame buffer. I've been able to draw images and primitive shapes, blend, use double buffering, etc...
Where I'm hitting a problem is drawing text. The screen is just a byte array from this level, so I need a way to take the font, size, text, etc. and convert it into a bitmap (actually, a bool[] and width/height would be preferable as it saves additional read/writes.
I have no idea how to approach this.
Things I've considered so far...
Using a fixed-width font and an atlas/spritemap. Should work, I can already read images, however monospaced fonts have limited visual appeal. Also means adding more fonts is arduous.
Using a fixed-width font, an atlas and a mask to indicate where each character is. Would support variable-width fonts, however, scaling would be lossy and it seems like a maintenance nightmare unless I can automate the atlas/mask generation.
Has anyone managed to do anything like this before?
If a library is required, I can live with that but as this is more an exercise in understanding my Pi than it is a serious project, I'd prefer an explanation/tutorial.
Consider using the Cairo graphics library, either for all your graphics, or as a tool to generate the font atlases. Cairo has extensive support for rendering fonts using TTF fonts, as well as for other useful graphics operations.
At a lower level, you could also use the Freetype library to load fonts and render characters from them directly. It's more difficult to work with, though.

Dynamic font sizing/layout in Python/PIL

I have a problem where I need to programmatically lay out text and output a raster image. My initial approach is based around Python and PIL (or Pillow), however I am reasonably language agnostic (as long as it runs on Linux).
I have a list of several thousand long strings, roughly a paragraph each. The naive approach is to use Python's textwrap and PIL's font.getsize() and iterate to find the optimal size, but this seems inefficient to me - there are a lot of strings, and this is potentially running on a Rasperry Pi.
I feel that this is probably a solved problem, but I haven't been able to find a decent solution - I'm not tied to Python/PIL if another stack has a better solution (something in LaTeX? Even matplotlib or something?).
Flexibility to achieve more complex layouts would be a bonus, as well - for example, down the track I would like to treat one part of text as a special case, by increasing the font size and flowing the other text around it.
Any pointers or ideas greatly appreciated.
I would use cairo (2d graphics) and pango ("pretty" text formatting/layout) libraries (they both have binding for python):
http://cairographics.org/tutorial/
http://zetcode.com/gui/pygtk/pangoII/
http://cairographics.org/pycairo_pango/

Python: Manipulating a 16-bit .tiff image in PIL &/or pygame: convert to 8-bit somehow?

Hello all,
I am working on a program which determines the average colony size of yeast from a photograph, and it is working fine with the .bmp images I tested it on. The program uses pygame, and might use PIL later.
However, the camera/software combo we use in my lab will only save 16-bit grayscale tiff's, and pygame does not seem to be able to recognize 16-bit tiff's, only 8-bit. I have been reading up for the last few hours on easy ways around this, but even the Python Imaging Library does not seem to be able to work with 16-bit .tiff's, I've tried and I get "IOError: cannot identify image file".
import Image
img = Image.open("01 WT mm.tif")
My ultimate goal is to have this program be user-friendly and easy to install, so I'm trying to avoid adding additional modules or requiring people to install ImageMagick or something.
Does anyone know a simple workaround to this problem using freeware or pure python? I don't know too much about images: bit-depth manipulation is out of my scope. But I am fairly sure that I don't need all 16 bits, and that probably only around 8 actually have real data anyway. In fact, I once used ImageMagick to try to convert them, and this resulted in an all-white image: I've since read that I should use the command "-auto-levels" because the data does not actually encompass the 16-bit range.
I greatly appreciate your help, and apologize for my lack of knowledge.
P.S.: Does anyone have any tips on how to make my Python program easy for non-programmers to install? Is there a way, for example, to somehow bundle it with Python and pygame so it's only one install? Can this be done for both Windows and Mac? Thank you.
EDIT: I tried to open it in GIMP, and got 3 errors:
1) Incorrect count for field "DateTime" (27, expecting 20); tag trimmed
2) Sorry, can not handle images with 12-bit samples
3) Unsupported layout, no RGBA loader
What does this mean and how do I fit it?
py2exe is the way to go for packaging up your application if you are on a windows system.
Regarding the 16bit tiff issue:
This example http://ubuntuforums.org/showthread.php?t=1483265 shows how to convert for display using PIL.
Now for the unasked portion question: When doing image analysis, you want to maintain the highest dynamic range possible for as long as possible in your image manipulations - you lose less information that way. As you may or may not be aware, PIL provides you with many filters/transforms that would allow you enhance the contrast of an image, even out light levels, or perform edge detection. A future direction you might want to consider is displaying the original image (scaled to 8 bit of course) along side a scaled image that has been processed for edge detection.
Check out http://code.google.com/p/pyimp/wiki/screenshots for some more examples and sample code.
I would look at pylibtiff, which has a pure python tiff reader.
For bundling, your best bet is probably py2exe and py2app.
This is actually a 2 part question:
1) 16 bit image data mangling for Python - I usually use GDAL + Numpy. This might be a bit too much for your requirements, you can use PIL + Numpy instead.
2) Release engineering Python apps can get messy. Depending on how complex your app is you can get away with py2deb, py2app and py2exe. Learning distutils will help too.

Designing an open source OCR engine specifically for rendered text (screenshots)

So my current personal project is to be able to automatically grab screenshots out of a game, OCR the text, and count the number of occurrences of given words.
Having spent all evening looking around at different OCR solutions, I've come to realize that the majority of OCR packages out there are designed for scanned text. If there are any packages that can read screen text reliably, they're well outside this hobbyist's budget.
I've been reading through some other questions, and the closest I found was OCR engines designed for screen-reading.
It seems to me that reading rendered text should be much easier than printed and scanned text. Lines are always straight, and any given letter will always appear with the exact same pixel representation (mostly, anyways). Also, why not use the actual font file (if you have it) as a cheat sheet to recognizing characters? We might actually reach 100% accuracy with a system like this.
Assuming you have the font file for a cheat sheet and your source image is perfectly square and has no noise, how would you go about recognizing characters from the screen?
(Problems I can foresee are ui lines and images that could confuse any crude attempt at pixel-guessing.)
If you already know of a free/open-source OCR package designed for screen-reading, please let me know. I kind of doubt that's going to show up though, as no other askers seem to have gotten a lead either.
A Python interface is preferred, but beggars can't be choosers.
EDIT:
To clarify, I'm looking for design suggestions for an OCR solution that is specifically designed to read text from screenshots. Popular tools like tesseract (mentioned in the question I linked) are hard to use at best because they are not designed for this kind of source file.
So I've been thinking about it and I feel that the best approach will be to count the number of pixels in each blob/glyph/character. This should really cut down on the number of tests I need to do to differentiate between glyphs.
Regretfully, I'll have to be very specific about fonts. The software will only be able to recognize fonts at the right dpi, for the right font face and weight, etc.
It isn't ideal, and I'd still like to see someone who knows more about this stuff design OCR for rendered text; but it will work for my limited case.
If your goal is to count occurrences of certain events in a game, OCR is really not the right way to be going about it. That said, if you are determined to use OCR, then tesseract-OCR is a well-known open source package for performing optical character recognition. I'm not really sure what you are getting at with respect to scanned vs. rendered text, but tesseract will probably do as good a job as any opensource package that is available. OCR is still a tricky art, so I wouldn't expect 100% accuracy.
This isn't exactly what you want, but you may want to look at Sikuli.

PIL vs RMagick/ruby-gd

For my next project I plan to create images with text and graphics. I'm comfortable with ruby, but interested in learning python. I figured this may be a good time because PIL looks like a great library to use. However, I don't know how it compares to what ruby has to offer (e.g. RMagick and ruby-gd). From what I can gather PIL had better documentation (does ruby-gd even have a homepage?) and more features. Just wanted to hear a few opinions to help me decide.
Thanks.
Vince
PIL is a good library, use it. ImageMagic (what RMagick wraps) is a very heavy library that should be avoided if possible. Its good for doing local processing of images, say, a batch photo editor, but way too processor inefficient for common image manipulation tasks for web.
EDIT: In response to the question, PIL supports drawing vector shapes. It can draw polygons, curves, lines, fills and text. I've used it in a project to produce rounded alpha corners to PNG images on the fly over the web. It essentially has most of the drawing features of GDI+ (in Windows) or GTK (in Gnome on Linux).
PIL has been around for a long time and is very stable, so it's probably a good candidate for your first Python project. The PIL documentation includes a helpful tutorial, which should get you up to speed quickly.
ImageMagic is a huge library and will do everything under the sun, but many report memory issues with the RMagick variant and I have personally found it to be an overkill for my needs.
As you say ruby-gd is a little thin on the ground when it comes to English documentation.... but GD is a doddle to install on post platforms and there is a little wrapper with some helpful examples called gruby thats worth a look. (If you're after alpha transparency make sure you install the latest GD lib)
For overall community blogy help, PIL's the way.

Categories