I have a problem where I need to programmatically lay out text and output a raster image. My initial approach is based around Python and PIL (or Pillow), however I am reasonably language agnostic (as long as it runs on Linux).
I have a list of several thousand long strings, roughly a paragraph each. The naive approach is to use Python's textwrap and PIL's font.getsize() and iterate to find the optimal size, but this seems inefficient to me - there are a lot of strings, and this is potentially running on a Rasperry Pi.
I feel that this is probably a solved problem, but I haven't been able to find a decent solution - I'm not tied to Python/PIL if another stack has a better solution (something in LaTeX? Even matplotlib or something?).
Flexibility to achieve more complex layouts would be a bonus, as well - for example, down the track I would like to treat one part of text as a special case, by increasing the font size and flowing the other text around it.
Any pointers or ideas greatly appreciated.
I would use cairo (2d graphics) and pango ("pretty" text formatting/layout) libraries (they both have binding for python):
http://cairographics.org/tutorial/
http://zetcode.com/gui/pygtk/pangoII/
http://cairographics.org/pycairo_pango/
Related
I have been thinking of fonts quite recently. I find the whole process of a keystroke converted to a character displayed in a particular font quite fascinating. What fascinates me more is that each character is not an image but just the right bunch of pixels switched on (or off).
In Photoshop when I make a text layer, I am assuming it's like any other text layer in a word processor. There's a glyph attached to a character and that is displayed. So technically it's still not an 'image' so as to speak and it can be treated as a text in a word processor. However, when you rasterize the text layer, an image of the text is created with the font that was used. Can somebody tell me how Photoshop does this? I am assuming there should be a lookup table with the characters' graphics which Photoshop accesses to rasterize the layer.
I want to kind of create a program where I generate an image of the character that I am pressing (in C or Python or something like that). Is there a way to do this?
Adobe currently has publicly accessible documentation for the Photoshop file format. I've needed to extract information from PSD files (about a year ago, but actually the ancient CS2 version of Photoshop) so I can warn you that this isn't light reading, and there are some parts (at least in the CS2 documentation) that are incomplete or inaccurate. Usually, even when you have file format documentation, you need to do some reverse engineering to work with that file format.
Even so, see here for info about the TySh chunk from Photoshop 6.0 (not sure at a quick glance if it's still the current form for text - "type" to Photoshop).
Anyway, yes - text is stored as a sequence of character codes in memory and in the file. Fonts are basically collections of vector artwork, so that text can be converted to vector paths. That can be done either by dealing with the font files yourself, using on operating system call (there's definitely one for Windows, but I don't remember the name, it's bugging me now so I might figure it out later), or using a library.
Once you have the vector form, that's basically Bezier paths just like any other vector artwork, and can be rendered the same way.
Or to go directly from text to pixels, you just ask e.g. Windows to draw the text for you - perhaps to a memory DC (device context) if you don't want to draw to the screen.
FreeType is an open source library for working with fonts. It can definitely render to a bitmap. I haven't checked but it can probably convert text to vector paths too - after all it needs to do that as part of rendering to pixels anyway.
Cairo is another obvious library to look at for font handling and much more, but I've never used it directly myself.
wxWidgets is yet another obvious library to look at, and uses a memory-DC scheme similar to that for Windows, though I don't remember exact class/method names. Converting text to vectors might be outside wxWidgets scope, though.
I'm going to be creating visual representations of numeric sums, which can have contents such as:
constants, functions, operators and arguments.
I would like to be able to represent each of those things seperatly with adjustable properties such as line width, size, font size, colour etc.
My program loops round a math problem, and solves it - however I need to draw how to solve the math problem step by step, using boxes and lines (maybe animation? however static boxes and lines are okay)
I've already tried using tkinter however it doesn't seem to have the functionality I require.
I have no knowledge with graphical representation in any computing language so could anyone suggest something I could do this with? (I have to use a python back-end as I already have the code to calculate the math problem in python).
The output would preferbly be a window upon running the .exe
here is an example of the type of visulisation that I need to draw.
(each picture is a different example with modified atributes such as line width etc)
the numbers are passed in via variables and so are the operators - however the lines should be automatically placed.
If it doesn't need to be interactive, you could always write a SVG file, which can then be opened in any number of programs (including most web browsers). You wouldn't need any extra modules or libraries to accomplish this, just open a file and start writing text to it. You could even achieve animation later I believe. A quick google turned up svgwrite which is a python module intended to make writing to SVG files easier, but SVG as is, is a pretty simple XML format so you don't technically even need another module for it (this just ought to make it a bit easier).
I'm interested in using python to make diagrams representing the size of values based on the size of squares (and optionally their colour). Basically I'm looking for a way to make overviews of a bunch of values like the good old program windirstat does with hard-drive usage (it basically makes a big square representing your harddrive and then smaller squares making up the area inside of it representing different programs, the bigger the square the larger the file, colour indicates the type of file). I'm fairly familiar with matplotlib, and I don't think it's possible to do something like this with it. Is there any other python package that would help? Any suggestions for something more low level if it's not? I guess I could do it manually if I could find a way to draw the boxes programatically (I don't really care about the format, but the option to export SVG as well as PNG would be nice).
Ultimately, it would be nice to have it be interactive like windirstat is, where if you were to hover over a particular square you get more information on it, and if you clicked on it maybe you'd go in and see the makeup of that particular square. I'm only familiar with wxpython for GUI stuff, not sure if it could be used for something like this. For now I'd be happy with just outputting them though.
Thanks a lot!
Alex
Edit:
Thanks guys, both your answers helped a lot.
You're looking for Treemapping algorithms. Once implemented, you can transform the output (which should be rectangles) into plotting commands to anything that can draw layered rectangles.
Edit:
More links and information:
If you don't mind reading papers, the browser-based d3 library provides for 'squarified' treemaps (js implementation). They reference this paper by Bruls, Huizing, and van Wijk. (This is also citation 3 on the wikipedia article)
I'd search on the algorithms listed on the linked Wikipedia article. For instance, they also link to this article, which describes an algorithm for "mixed treemaps". The paper also includes some interesting portions at the end describing transformations into other-than-rectangular shapes.
Squarified certainly appears to be the most common variety around. The above links should give you enough to work towards a solution or, even, directly port the d3 implementation. However, the cost of grokking d3's model (which is something like a declarative form of jQuery) may be somewhat high. At first glance, though, the implementation appears relatively straightforward.
Squaremap does this. I haven't used it (I only know it from RunSnakeRun) and its documentation is severely lacking, but it seems to work.
So my current personal project is to be able to automatically grab screenshots out of a game, OCR the text, and count the number of occurrences of given words.
Having spent all evening looking around at different OCR solutions, I've come to realize that the majority of OCR packages out there are designed for scanned text. If there are any packages that can read screen text reliably, they're well outside this hobbyist's budget.
I've been reading through some other questions, and the closest I found was OCR engines designed for screen-reading.
It seems to me that reading rendered text should be much easier than printed and scanned text. Lines are always straight, and any given letter will always appear with the exact same pixel representation (mostly, anyways). Also, why not use the actual font file (if you have it) as a cheat sheet to recognizing characters? We might actually reach 100% accuracy with a system like this.
Assuming you have the font file for a cheat sheet and your source image is perfectly square and has no noise, how would you go about recognizing characters from the screen?
(Problems I can foresee are ui lines and images that could confuse any crude attempt at pixel-guessing.)
If you already know of a free/open-source OCR package designed for screen-reading, please let me know. I kind of doubt that's going to show up though, as no other askers seem to have gotten a lead either.
A Python interface is preferred, but beggars can't be choosers.
EDIT:
To clarify, I'm looking for design suggestions for an OCR solution that is specifically designed to read text from screenshots. Popular tools like tesseract (mentioned in the question I linked) are hard to use at best because they are not designed for this kind of source file.
So I've been thinking about it and I feel that the best approach will be to count the number of pixels in each blob/glyph/character. This should really cut down on the number of tests I need to do to differentiate between glyphs.
Regretfully, I'll have to be very specific about fonts. The software will only be able to recognize fonts at the right dpi, for the right font face and weight, etc.
It isn't ideal, and I'd still like to see someone who knows more about this stuff design OCR for rendered text; but it will work for my limited case.
If your goal is to count occurrences of certain events in a game, OCR is really not the right way to be going about it. That said, if you are determined to use OCR, then tesseract-OCR is a well-known open source package for performing optical character recognition. I'm not really sure what you are getting at with respect to scanned vs. rendered text, but tesseract will probably do as good a job as any opensource package that is available. OCR is still a tricky art, so I wouldn't expect 100% accuracy.
This isn't exactly what you want, but you may want to look at Sikuli.
I'm using PIL to load in various fonts and draw text to images. At the basic level, it all works.
However, I am running into a number of problems such as letters being clipped (mainly cursive or stylistic fonts with lots of tails and such). textsize() does return width/height values, yet letters are still clipped. There also doesn't seem to be methods in PIL to specify larger image sizes for the character generating. Another issue is the vertical spacing. It seems PIL returns large height values for certain fonts and thus the vertical spacing between lines is overly large.
I'm in search of a more advanced font and text handling system than PIL, given its apparent limitations.
I've been researching this a lot over the last week (Google, Python docs, Stackoverflow, etc) and I've seen people recommending to use either Imagemagick or a combination of pango and cairo. However, as much as I've read and searched for these respective technologies I am simply not finding any usable documentation that pertains to what I am trying to do. There are some Python bindings for Imagemagick, but they all seem several years out of date.
Can some of the helpful souls here on SO point me to some tutorials on how to use Pango/Cairo and/or Imagemagick?
The Cairo cookbook has a number of examples for using Cairo, and the Python routines are almost mirror images of the C routines.
I've had some fine results with PyGame, but I don't know if it will necessarily solve your problem.