Extract all SVG between two unicode values fontforge - python

So I have a font that i made a while back din't do very much of anything with it at the time, i lost a few of the original SVGs when i have reinstalled the OS on my computer, The font is full unicode with lots of characters, all the icons saved in the font are outside the normal character range, i know it is possible to extract a bunch of SVG from the font using this command
fontforge -lang=ff -c 'Open($1); SelectWorthOutputting(); foreach Export("svg"); endloop;' Typeface.ttf
and
fontforge -lang=ff -c 'Open($1); SelectAll(); foreach Export("svg"); endloop;' Typeface.ttf
However the first misses the icons completely, and the second is no different. all the icons are between two points in the file starting at U+e000 and going through to U+e17d i want to know how i can extract all the icons between these two points. and if possible match with a namelist.txt for naming.

How to extract several characters between two Unicode values from fontforge?
The conceptually cleanest way would be to iterate over the hexadecimal number-range you specified and call font[char_name].export() on every single one of them. However, this comes with the hassle of incrementing hexadecimal numbers (which while feasible is a bit more involved than what I'm going to propose).
The following for-loop paired with the function 'nameFromUnicode(position)' should do the trick while staying (mostly) clear of hexadecimals. The function 'nameFromUnicode(position)' takes the position of a Unicode character as an integer and returns the name of the character at that position (something like Uni2D42). This name can then be passed to font[that_char_name].export() to export it. To find the starting and ending position of your range in decimal simply interpret the character names as a hexadecimal number and convert it to decimal. In your case e000 becomes 57344 and e17d becomes 57725. The range in decimal would then be from 57344 to 57725.
The following code snippet is a bare bone loop extracting the characters in the specified range (although for '.png' instead of '.svg'; adapting it should be fairly straight forward though).
from fontforge import *
font = open("/path/to/your/.ttf/file")
for position in range(57344,57726):
glyph = nameFromUnicode(position)
font[glyph].export(font[glyph].glyphname + ".png", 150)
I don't quite understand your second question about matching against a namelist.txt. If after 5 years you still crave an answer your welcome to elaborate and I'll see if I can come up with an answer.

Related

Python Curses, reading wide character's attribute from screen

The problem I'm trying to solve is to get a couple ch,att representing the character and the associated attribute currently displayed at some given position.
Now, when the displayed character is not a wide one (i.e. an ASCII character), the method .inch does the job up to masking correctly the results. The issue comes when the displayed character is wide. More precisely I know how to get the given character through .instr, however this function does not return any information about the attribute.
Since, as far as I know, there is no specific function to get the attribute alone, my first attempt was to use .inch, drop the 8 less significant bit and interpret the result as the attribute. This seemed to work to some extent but double checking I realized that reading greek letters (u"u\03b1" for instance) with no attribute in this way returns att = 11.0000.0000 instead of 0. Is there a better way to approach the problem?
EDIT, a minimal example for Python3
import curses
def bin(x):
out = ''
while x > 0:
out = str(x % 2) + out
x = x // 2
return out
def main(s):
s.addstr(1, 1, u'\u03b1')
s.refresh()
chratt = s.inch(1, 1)
att = chratt & 0xFF00
s.addstr(2, 1, bin(att))
s.refresh()
while True:
pass
curses.wrapper(main)
In curses, inch and instr is only for ascii characters as you suspected. "complex" or "wide" characters like characters from utf-8 have another system, as explained here on stackoverflow by one of the ncurses creators.
However, onto the bad news. They aren't implemented in python curses (yet). A pull request was submitted here and it is very close to merging (90%), so if you really need it then why not go contribute yourself?
And if that isn't an option, then you could try to store every change you make to your screen in a variable and then pull the wide characters from there.

React Native Text custom ellipsis

I'm currently using React Native's Text component like this
<Text numberOfLines={2} ellipsizeMode="tail">Some long texts...</Text>
This renders texts like this
First line
Second line…
I'd like to use different ellipsis, for example,
Instead of
End of a long line...
It would be
End of a long line ...More
Currently, possible solutions would be:
Count the number of characters and cut the string then concatenate custom ellipsis.
Problem: Font's width differs per character.
Use ellipsizeMode="clip" and create absolutely positioned View with a custom ellipsis.
Problem: Can't programmatically tell when the Text is clipped.
Does anyone have a solution?

Unicode characters are boxes

Why is it that some characters show up normally, and some characters (for example, &#3676 - &#3712) show up as boxes? The website I'm using is http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html, and even when I try to return the characters in python, they show up as boxes.
Note: The character codes end with semicolons
Some code points are not yet assigned to a character yet. Code point 3676, or U+0E5C as it's commonly written, is one of those.
As a consequence you don't have to worry about these, as they will not show up in any text.

Python - Dividing a book in PDF form into individual text files that correspond with page numbers

I've converted my PDF file into a long string using PDFminer.
I'm wondering how I should go about dividing this string into smaller, individual strings/pages. Each page is divided by a certain series of characters (CRLF, FF, page number etc), and the string should be split and appended to a new text file according to these characters occurring.
I have no experience with regex, but is using the re module the best way to go about this?
My vague idea for implementation is that I have to iterate through the file using the re.search function, creating text files with each new form feed found. The only code I have is PDF > text conversion. Can anyone point me in the right direction?
Edit: I think the expression I should use is something like ^.*(?=(\d\n\n\d\n\n\f\bFavela\b)) (capture everything before 2 digits, the line breaks and the book's title 'Favela' which appears on top of each page.
Can I save these \d digits as variables? I want to use them as file names, as I iterate through the book and scoop up the portions of text divided by each appearance of \f\Favela.
I'm thinking the re.sub method would do it, looping through and replacing with an empty string as I go.

Python’s `str.format()`, fill characters, and ANSI colors

In Python 2, I’m using str.format() to align a bunch of columns of text I’m printing to a terminal. Basically, it’s a table, but I’m not printing any borders or anything—it’s simply rows of text, aligned into columns.
With no color-fiddling, everything prints as expected.
If I wrap an entire row (i.e., one print statement) with ANSI color codes, everything prints as expected.
However: If I try to make each column a different color within a row, the alignment is thrown off. Technically, the alignment is preserved; it’s the fill characters (spaces) that aren’t printing as desired; in fact, the fill characters seem to be completely removed.
I’ve verified the same issue with both colorama and xtermcolor. The results were the same. Therefore, I’m certain the issue has to do with str.format() not playing well with ANSI escape sequences in the middle of a string.
But I don’t know what to do about it! :( I would really like to know if there’s any kind of workaround for this problem.
Color and alignment are powerful tools for improving readability, and readability is an important part of software usability. It would mean a lot to me if this could be accomplished without manually aligning each column of text.
Little help? ☺
This is a very late answer, left as bread crumbs for anyone who finds this page while struggling to format text with built-in ANSI color codes.
byoungb's comment about making padding decisions on the length of pre-colorized text is exactly right. But if you already have colored text, here's a work-around:
See my ansiwrap module on PyPI. Its primary purpose is providing textwrap for ANSI-colored text, but it also exports ansilen() which tells you "how long would this string be if it didn't contain ANSI control codes?" It's quite useful in making formatting, column-width, and wrapping decisions on pre-colored text. Add width - ansilen(s) spaces to the end or beginning of s to left (or respectively, right) justify s in a column of your desired width. E.g.:
def ansi_ljust(s, width):
needed = width - ansilen(s)
if needed > 0:
return s + ' ' * needed
else:
return s
Also, if you need to split, truncate, or combine colored text at some point, you will find that ANSI's stateful nature makes that a chore. You may find ansi_terminate_lines() helpful; it "patch up" a list of sub-strings so that each has independent, self-standing ANSI codes with equivalent effect as the original string.
The latest versions of ansicolors also contain an equivalent implementation of ansilen().
Python doesn't distinguish between 'normal' characters and ANSI colour codes, which are also characters that the terminal interprets.
In other words, printing '\x1b[92m' to a terminal may change the terminal text colour, Python doesn't see that as anything but a set of 5 characters. If you use print repr(line) instead, python will print the string literal form instead, including using escape codes for non-ASCII printable characters (so the ESC ASCII code, 27, is displayed as \x1b) to see how many have been added.
You'll need to adjust your column alignments manually to allow for those extra characters.
Without your actual code, that's hard for us to help you with though.
Also late to the party. Had this same issue dealing with color and alignment. Here is a function I wrote which adds padding to a string that has characters that are 'invisible' by default, such as escape sequences.
def ljustcolor(text: str, padding: int, char=" ") -> str:
import re
pattern = r'(?:\x1B[#-_]|[\x80-\x9F])[0-?]*[ -/]*[#-~]'
matches = re.findall(pattern, text)
offset = sum(len(match) for match in matches)
return text.ljust(padding + offset,char[0])
The pattern matches all ansi escape sequences, including color codes. We then get the total length of all matches which will serve as our offset when we add it to the padding value in ljust.

Categories