So basically, how is non-western input handled in SDL or OpenGL games or applications? Googling for it reveals http://sdl-im.csie.net/ but that doesn't seem to be maintained or available anymore. Just to view the page I had to use the Google cache.
To clarify, I'm not having any kind of issue in terms of the application displaying text in non-western languages to users. This is a solved problem. There are many unicode fonts available, and many different ways to process text into glyphs and then into display surfaces.
I run a-foul in the opposite direction. Even if my program could safely handle text data in any arbitrary encoding, there's no way for users to actually type their name if it happens to include a character that requires more than one keystroke to produce.
You are interested in SDL_EnableUNICODE(). When you enable unicode translation, you can use the unicode field of SDL_keysym structure to get the unicode character based on the key user typed.
Generally I think whenever you do text input (e.g. user focuses on a textbox) you should use the unicode field and not attempt to do translation yourself.
Here's something we did in YATC. Not really a shining example of how things should be done, but demonstrates the use of the unicode field.
Usually everybody just ends up using unicode for the text to internationalize their apps.
I don't remember SDL or neither OpenGL implemented anything that'd prevent you from implementing international input/output, except they are neither helping at that.
There's utilities over OpenGL you can use to render with .ttf fonts.
It appears there is now a Google summer of code project on this topic, for both X11 and for MacOS X
Related
Used Pymupdf faced the problem of getting information about the text in the pdf file
I asked in the library's discord channel about the possibility of obtaining information about intervals, but they told me that the library does not know how to work with them
Perhaps there are other libraries that can do this?
I tried to look in other libraries but did not find it. Maybe I missed something....
disclaimer: I am the author of borb, the library used in this answer
Usually, the information you're looking for is hidden behind layers of abstraction. A PDF library might typically allow you to extract text (and it uses information about word and character spacing to do so), but it does not make this information available to the outside world.
You can use borb to get access to this (low level) information.
The key concept here is EventListener. This is an interface. Classes implementing this interface get notified whenever a rendering event has finished.
Rendering events may include:
text being rendered
images being rendered
switching to a new page
and so on
There is a class that extracts text. So I would recommend you check out its code.
Looking at line 62, we see that any event that is "render a piece of text" gets redirected to its own separate method.
The method _render_text stores the TextRenderInfo objects until a page has finished rendering (at which point it will use the TextRenderInfo objects to determine the text that was on the page).
You can see the "end of page" logic in action on line 87.
Here you see that TextRenderInfo has all kinds of attributes related to position. You can use get_baseline to access it.
i solved my problem by pdfminer.six and pymupdf by getting line and character position
thx all of you
Update: This issue is now solved. It actually appears that the issue is that the Unicode characters are somehow not being recognized by Uniscribe at all, and so no fallback font is used at all. The only feasible system-wide solution without trying to repair or change Uniscribe appears to be to change the name of Segoe UI Symbol to Arial (this can be done by converting the file to .ttx XML format and manually editing the file, then converting back to .ttf), and use this new "Arial" font as the base font which other Unicode fonts can be merged into with FontTools. I discuss what I've learned about merge / pyftmerge in more detail here.
The issue I'm trying to solve is the exact same issue/question asked here many years ago:
https://superuser.com/questions/784078/some-unicode-fonts-not-working-in-windows-7-firefox
If you read through the question/responses, you'll see that no actual conclusion was reached and no actual answer was found as to why some Windows 7 systems fail to display characters in uncommon Unicode blocks like Egyptian Hieroglyphics even though the relevant fonts are installed.
Furthermore, no general solution was found either (the OP concluded with a "solution" specifically for Firefox). The OP mentions that he has very extensively tried a large number of potential solutions to no avail.
My intuition is that this issue has nothing to do with graphics cards. I believe this is entirely a Windows issue and specifically a Uniscribe issue.
I suspect that the discrepancy between the OP testing the issue on his VM (where the unicode characters display properly) and on his local machine (where the unicode characters do not display), is possibly due to his local machine and his VM having different versions of usp10.dll. However, as the question is 6 years old and the OP does not appear to be active, there is no way to ask to confirm this hypothesis.
Upon inspecting usp10.dll with a hex editor, there are actually a decent amount of intelligible ASCII sections:
My actual question is what "should" I do to solve this issue? And furthermore, if someone has a decent idea of how to actually execute one of the solutions that I'll propose below, please explain the nitty-gritty. (I don't consider "upgrade to Windows 10" a valid answer. Although I understand that there are many good reasons to do so, please do not post a reply if this is your only suggestion.)
From what I've gathered, I can see 3 solutions to this issue:
I can go out of my way to get a VM and a fresh installation of Windows 7 to test my hypothesis relating to usp10.dll. If my hypothesis was true, I can possibly transfer the "good" version of usp10.dll to my local machine. The downside is that I do actually have to go out of my way to do this, as I don't own a VM and I no longer have my Windows 7 installation disc.
I can experiment with manually editing usp10.dll locally and see if this produces any helpful results. For instance, there may be some parts of the code where I can append additional font names such as Symbola and Aegyptus. If I'm very lucky, this may fix the issue by allowing Windows to fallback to these fonts. The obvious downside is that I don't understand Uniscribe or Windows DLLs near enough to be "qualified" to do this. Furthermore, I would have to do this for every additional "obscure" unicode font that I install in the future.
Lastly, perhaps the easiest solution I can think of, is to use the same approach EmojiOneColor and TwitterColorEmoji uses for their Windows installation procedure. Which is to merge additional fonts (in my case, Symbola and Aegyptus) into Segoe UI Symbol, and I believe this should almost guarantee that these additional Unicode characters are displayed properly, since Segoe UI Symbol is used by Windows by default.
The obvious advantage to the last approach is that it's relatively easy to do and almost guaranteed to work. And if defined to not replace any existing glyphs, would also give the same appearance as Windows using the correct fallback font. Furthermore, it can be automated with a script that you can run every time you want to merge a new font into Segoe UI Symbol. I currently don't have such a script, but maybe someone here can help me with that (TwitterColorEmoji uses Python FontTools). The downside to this approach is that I believe font files have a max number of glyphs, so there might be a limit to how many fonts I can stack up inside of Segoe UI Symbol.
I am writing a Discord bot that greets new users with images (with their names written on them) using Pillow. And people often tend to use foreign Unicode characters in their names (or names in different languages such as Chinese, Japanese etc.) that my font doesn't support. They look as blank boxes. Examples:
你好
| |ƶลƒҡเεŁ| |
G̷̈̐e̸̾̾n̶͛̊e̴̊͗r̴̾́a̴͆̑t̸̿̌o̶̽̃r̶̈́̔Z̶̈́̑a̸̋̀l̸͋͝g̵̀̓
I've tried several fonts and learned that I can't just use a single font that supports all of this. Then I've come across Google Noto and apparently their font family supports all kinds of languages.
So here's my plan:
Check every character in a string and see if the main font supports it
If yes, just draw the character
If no, find a different font that supports it, then draw it.
There are several issues with this:
I shouldn't iterate over all of the fonts for each character and check if one of them supports it. It'll be really bad performance-wise.
I don't know how I would draw each character and keep them look organized. I assume there will be a noticeable size and style difference between each font.
So far I've found the fontTools library that will let me check if a font supports a character or not, but I don't know how I should proceed from this point. I'm curious about;
How almost every website supports and handles these texts?
How would I achieve my goal using Pillow in Python?
Some help would be appreciated. Thank you.
I had the similar problem. I had to analyze all fonts I use and create a database of fonts and characters.
Simple request to this DB allows you to detect all required fonts.
I am making some assumptions here on technology based on what I know, but other technology recommendations are welcome.
My goal: Write an ANSI Art viewer that as closely as possible resembles viewing on a DOS machine as possible, preferably without the overhead of running dosbox. This will run on a Raspberry Pi.
I have gotten my console to properly cat an ANSI with proper characters, colors, etc. The catch with the "viewer" is that I would like to be able to use the arrow keys to scroll up and down through the document, much like, say, the "less" command does.
From what I have been able to research, curses is a perfect candidate for this. The problem is that curses does not support ANSI escape code sequences. There is an ANSI editor written in C++ that uses curses, but it builds its own support for parsing the escape code sequences. Right now this is my last resort.
So my question is: Is there a better route to creating a scrollable console-mode application for viewing ANSI Art (Code Page 437 + ANSI escape code sequences) in python on linux?
There are really only two possibilities: Parse the ANSI sequences into something curses can accept, or use the ANSI sequences as-is.
At first, the latter may seem more attractive. Most of the limitations are either irrelevant to you, or easy to deal with:
It only works for static ANSI art, not animated files. Which is pretty reasonable, because it wouldn't make much sense to "scroll up" in an animated file. (You could of course render it on the fly to a canvas and then scroll a window up and down within that, but once you start thinking about what that rendering and windowing means… you're parsing ANSI art.) But it sounds like you only need static ANSI art.
It only works if your terminal is (close enough to) ANSI compatible… but it is (or can be made so) or your cat command wouldn't work. (You may still have a problem with the color settings, but I assume you know how to work around that too.)
It only works if your terminal is cp437, which may be more of a problem… but that's trivial to work around; just decode('cp437') then encode as appropriately in your code; the escape sequences are going to pass through unchanged.
You probably want raw keyboard input. But this is as easy as tty.setraw(sys.stdin.fileno()) and just reading stdin as an unbuffered file. (Well, you may want to stash the original tcgetattr so you can restore it later, but that's not much harder.)
You'll have to parse keyboard input escape sequences yourself. This is a lot of work to do generally… but just handling the up and down arrows for ANSI-art-compatible terminals is easy.
You'll have to know how to map the ANS file to actual lines.
That last one sounds like the easy part, but it's not. For example, I grabbed a random file, GR-BANT; it's only 33 lines long, but it's got 154 newlines in it. And that's going to be pretty common. In many cases, it's just going to be "overlay lines" that start with esc-[-A, that you have to treat as part of the previous line, which is not hard to do, but there will be plenty of cases that require something more than that.
So, you're going to have to do at least some ANSI parsing, no matter what.
And once you start on that, I think you'll find an easier time with your "last resort" of doing a full parse and drawing manually to a curses pad. (And of course this has the side effects of making it possible to handle animated files, working on non-ANSI terminals, handling "keypad" keys more easily and on all terminals, …)
But if you want to go the second way, here is a quick hack that should get you started.
I need to find global caret position in Linux. The problem is similar to this one for Windows. Do you guys have any idea?
More information:
I am trying to make input method for a certain Indic language. I am using IBus libraries in Python. I need to create something like the lookup table found in IBus but my requirements are such that I decided its better if I make the whole thing again using tk (or something). The link in the question solves this problem for windows where a tooltip follows the text caret. So I need something just like that but for X-Windows.
There is no such thing as a caret position in X11. While the older UIM framework did a fairly good job of displaying the input method UI near the cursor position, this failed often enough that it was abandoned.
You might want to take a look at the SCIM framework. Note that it is usually preferred to hint the application at the completion state rather than provide a separate editor, as this gives a more seamless integration.
I figured it out! All I had to do was create a method in my IBus engine class (sub class of IBus.Engine) called do_set_cursor_location which handles the signals created when position of the caret changes. Here is more from IBus manual: The "set-cursor-location" signal.
That means the problem is solved for now, although I certainly don't know what is happening under the hood.
Thanks guys.