The Unifont contains glyphs for Tags, Variation Selectors, and other non-printable characters.
For example at the end of https://unifoundry.com/pub/unifont/unifont-14.0.04/font-builds/unifont_upper-14.0.04.ttf are these tags (as shown in FontForge):
Each one has a glyph which should be printable:
I want to draw that glyph, using the Unifont, on an image with Pillow.
from PIL import Image, ImageDraw, ImageFont
text = chr(0x2A6B2) + " " + chr(0x0E0026)
font = ImageFont.truetype("unifont_upper-14.0.04.ttf", size=64)
image1 = Image.new("RGB", (256, 64), "white")
draw1 = ImageDraw.Draw(image1)
draw1.text( (0 , 0), text, font=font, fill="black")
image1.save("test aa.png")
The first character (a CJK ideograph) draws correctly. But the tag character is invisible.
Is there any way to get Pillow to draw the shape that I can see in FontForge?
It seems the short answer is, unfortunately, "no you can't".
Pillow generally uses libraqm to lay out text (i.e. do stuff like map the Unicode string to the glyphs in the font, specifically the raqm_layout function.
That library in turn has uses a library called harfbuzz to do the text shaping.
The tag characters you want, including U+E0026, have the Unicode default ignorable property. By default harfbuzz doesn't display characters with this property, replacing them with a blank glyph. But it is possible, with the use of flags, to modify this behaviour: specifically, calling hb_buffer_set_flags with HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES seems like it will achieve what you want, displaying these characters rather than blanking them out.
The trouble is, libraqm has no way of setting this flag when it calls harfbuzz - it does let you set some of the other flags, but not this one :(
To achieve what you want I guess you'd have to use a lower level library - there are apparently Python bindings for both FreeType and harfbuzz, though I've not used either so I can't comment on how much pain that might involve.
From Section 23.9, Tag Characters in The Unicode Standard, Chapter 23, Special Areas and Format Characters:
Tag Characters: U+E0000–U+E007F
This block encodes a set of 95 special-use tag characters to enable
the spelling out of ASCII-based string tags using characters that can
be strictly separated from ordinary text content characters in
Unicode…
Display. Characters in the tag character block have no visible rendering in normal text and the language tags themselves are not
displayed.
And from the Unicode Frequently Asked Questions (with my own emphasizing):
Q: Which characters should be displayed as invisible, if not supported?
All default-ignorable characters should be rendered as completely invisible (and non advancing, i.e. "zero width"), if not explicitly
supported in rendering.
Q: Does that mean that a font can never display one of these characters?
No. Rendering systems may also support special modes such as “Display
Hidden”, which are intended to reveal characters that would not
otherwise display. Fonts can contain glyphs intended for visible
display of default ignorable code points that would otherwise be
rendered invisibly when not supported.
More resources (required reading, incomplete):
Default_Ignorable_Code_Point character property
Section 5.21, Ignoring Characters in Processing in Implementation Guidelines
🏴 Emoji Tag Sequence
Related
I have two issues with how PyQt is formatting my QLabels
Issue 1:
When hyperlinks are added it displays as if there were no newlines in the string.
For the input text:
https://www.google.co.uk/
https://www.google.co.uk/
https://www.google.co.uk/
It's shown like this without newlines
Issue 2: Sometimes PyQt just doesn't even detect the 'a' tag this happens when the start of string is not a hyperlink but it is then followed by newlines with hyperlinks e.g. this input:
test
https://www.google.co.uk/
https://www.google.co.uk/
https://www.google.co.uk/
As you can see the newlines are properly shown but PyQt has no longer detected the hyperlinks
From the text property documentation of QLabel:
The text will be interpreted either as plain text or as rich text, depending on the text format setting; see setTextFormat(). The default setting is Qt::AutoText; i.e. QLabel will try to auto-detect the format of the text set.
The AutoText flag can only make a guess using simple tag syntax checks (basic tags without arguments, such as <b>, or document type declaration headers, like <html>).
This is obviously done for performance reasons.
If you are sure that you're always setting rich text content, use the appropriate Qt.TextFormat enum:
label.setTextFormat(QtCore.Qt.RichText)
Using the HTML-like syntax of rich text will obviously use the same basic concept HTML had since its birth, almost 30 years ago: line breaks between any word in the document (text or tag) are ignored, as much as multiple spaces are always considered as one.
So, if you want to add line breaks, you have to use the appropriate <br> (or <br/> for xhtml) tag.
Also remember that Qt rich text engine has a limited support, as described in the documentation about the Supported HTML Subset.
I'm trying to type a set of arabic characters without space on an image using pillow. The problem I'm currently having is that some arabic characters when get next to each other, appear differently when they are seperate.((e.g. س and ل will be سل when put next to each other.) I'm trying to somehow force my font settings to always seperate all characters without injection of any other characters, what should I do?
Here is a snippet of my code:
#font is an arabic font, and font_path is pointing to that location.
font = ImageFont.truetype(
font=font_path, size=size,
layout_engine=ImageFont.LAYOUT_RAQM)
h, w = font.getsize(text, direction='rtl')
offset = font.getoffset(text)
H, W = int(1.5 * h), int(1.5 * w)
imgSize = H, W
img = Image.new(mode='1', size=imgSize, color=0)
draw = ImageDraw.Draw(img)
pos = ((H-h)/2, (W-w)/2)
draw.text(pos, text, fill=255, font=font,
direction='rtl', align='center')
What you're describing might be possible with some fonts that support Arabic, specifically, those that encode the position-sensitive forms in the Arabic Presentation Forms-B Block of Unicode. You would need to map your input text character codes into the correct positional variant. So for the example characters seen and lam as you described, U+0633 س and U+0644 ل, you want the initial form of U+0633, which is U+FEB3 ﺳ, and the final form of U+0644, which is U+FEDE ﻞ, putting those together (separated by a regular space): ﺳ ﻞ.
There is a useful chart showing the positional forms at https://en.wikipedia.org/wiki/Arabic_script_in_Unicode#Contextual_forms.
But, important to understand:
not all fonts that contain Arabic have the Presentation Forms encoded (many fonts do not)
not all Arabic codes have an equivalent in the Presentation Forms range (most of the basic ones do, but there are some extended Arabic characters for other languages that do not have Presentation Forms).
you are responsible for processing your input text (in the U+06xx range) into the correct presentation form (U+FExx range) codes based on the word/group context, which can be tricky. That job normally falls to an OpenType Layout engine, but it also performs the joining. So you're basically overriding that logic.
Why is it that some characters show up normally, and some characters (for example, ๜ - ຀) show up as boxes? The website I'm using is http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html, and even when I try to return the characters in python, they show up as boxes.
Note: The character codes end with semicolons
Some code points are not yet assigned to a character yet. Code point 3676, or U+0E5C as it's commonly written, is one of those.
As a consequence you don't have to worry about these, as they will not show up in any text.
What would be the most efficient way to render text, which in the TextBuffer could be any case, as uppercase in a TextView?
It isn't for the entirety of the text, only specific styles within it - and the original capitalization of that section needs to be preserved in case the user changes the text style back to a non-capitalized style.
So if the relevant section of text could be tagged with a TextTag that would be ideal, but there isn't a tag to fully capitalize (there is a small_caps font variant, which for some reason doesn't seem to work in a textview) - can one create a custom TextTag property like "all_caps" and, if so, how would it be implemented?
Other thoughts would be overriding the textview draw function (sounds painful) or possibly creating a secondary TextBuffer and changing the text case on the fly?
UPDATE:
For this application, the best would likely be to intercept the string being passed to Pango from the TextBuffer (from TextView's do_draw, I think) and change it on the fly: for other text styles in this application, some additional text character additions would be needed (It's a screenwriting application, so there is a 'Parenthical' style which, unsurprisingly, is always contained in parentheses - these should be added as part of the style, not relying on the user to add them)
So the updated question would be: How would one subclass / monkey code / something Pango / PangoCairo / Gtk+ 3 to intercept the string being passed to Pango (along with its TextTags) so as to alter / add to it according to its TextTag styles?
In Python 2, I’m using str.format() to align a bunch of columns of text I’m printing to a terminal. Basically, it’s a table, but I’m not printing any borders or anything—it’s simply rows of text, aligned into columns.
With no color-fiddling, everything prints as expected.
If I wrap an entire row (i.e., one print statement) with ANSI color codes, everything prints as expected.
However: If I try to make each column a different color within a row, the alignment is thrown off. Technically, the alignment is preserved; it’s the fill characters (spaces) that aren’t printing as desired; in fact, the fill characters seem to be completely removed.
I’ve verified the same issue with both colorama and xtermcolor. The results were the same. Therefore, I’m certain the issue has to do with str.format() not playing well with ANSI escape sequences in the middle of a string.
But I don’t know what to do about it! :( I would really like to know if there’s any kind of workaround for this problem.
Color and alignment are powerful tools for improving readability, and readability is an important part of software usability. It would mean a lot to me if this could be accomplished without manually aligning each column of text.
Little help? ☺
This is a very late answer, left as bread crumbs for anyone who finds this page while struggling to format text with built-in ANSI color codes.
byoungb's comment about making padding decisions on the length of pre-colorized text is exactly right. But if you already have colored text, here's a work-around:
See my ansiwrap module on PyPI. Its primary purpose is providing textwrap for ANSI-colored text, but it also exports ansilen() which tells you "how long would this string be if it didn't contain ANSI control codes?" It's quite useful in making formatting, column-width, and wrapping decisions on pre-colored text. Add width - ansilen(s) spaces to the end or beginning of s to left (or respectively, right) justify s in a column of your desired width. E.g.:
def ansi_ljust(s, width):
needed = width - ansilen(s)
if needed > 0:
return s + ' ' * needed
else:
return s
Also, if you need to split, truncate, or combine colored text at some point, you will find that ANSI's stateful nature makes that a chore. You may find ansi_terminate_lines() helpful; it "patch up" a list of sub-strings so that each has independent, self-standing ANSI codes with equivalent effect as the original string.
The latest versions of ansicolors also contain an equivalent implementation of ansilen().
Python doesn't distinguish between 'normal' characters and ANSI colour codes, which are also characters that the terminal interprets.
In other words, printing '\x1b[92m' to a terminal may change the terminal text colour, Python doesn't see that as anything but a set of 5 characters. If you use print repr(line) instead, python will print the string literal form instead, including using escape codes for non-ASCII printable characters (so the ESC ASCII code, 27, is displayed as \x1b) to see how many have been added.
You'll need to adjust your column alignments manually to allow for those extra characters.
Without your actual code, that's hard for us to help you with though.
Also late to the party. Had this same issue dealing with color and alignment. Here is a function I wrote which adds padding to a string that has characters that are 'invisible' by default, such as escape sequences.
def ljustcolor(text: str, padding: int, char=" ") -> str:
import re
pattern = r'(?:\x1B[#-_]|[\x80-\x9F])[0-?]*[ -/]*[#-~]'
matches = re.findall(pattern, text)
offset = sum(len(match) for match in matches)
return text.ljust(padding + offset,char[0])
The pattern matches all ansi escape sequences, including color codes. We then get the total length of all matches which will serve as our offset when we add it to the padding value in ljust.