Can I force character encoding inside a python script? [duplicate]

Can I force character encoding inside a python script? [duplicate] - python

This question already has an answer here:
Task output encoding in VSCode
(1 answer)
Closed 2 years ago.
I'm having a problem with characters not rendering properly.
Background: I am taking online courses to learn Python. I use VSCode as my IDE along with various python extensions.
Problem: Some of the lessons I solve have characters that are beyond the 128 standard ASCII set.
Sample: For clarity, this is the full script I'm dealing with. The currently lesson has text containing a small e with acute (é) in a painting named Vétheuil in the Fog. Unfortunately that acute-e character is rendered as a placeholder (�) and ends up outputting: V�theuil in the Fog.
Efforts: I have done some searching and thought I found a solution: including an encoding flag at the beginning of the python script like this:
# coding=UTF-8
No joy.
Am I tilting at a windmill / misunderstanding purpose or application?
Money Question: Is there a way to get the character to properly render when I run the script?

If you want Unicode characters to be displayed correctly, you'll need to use a terminal, that supports Unicode or change the terminal settings to work with Unicode, in case the terminal provides that option.
Of course the same way you can make the terminal able to display only extended ascii characters like "é" (as long as you have a terminal that provides this functionality), in case you don't need more characters.

This has been sufficiently addressed.
VSCode Panel displays different things via tabs below the editor region. The OUTPUT tab has the displayed different and therefore becomes a question of my setup; not error as the TERMINAL tab displays characters as expected. So, there is no problem to be solved beyond my misunderstand of what was happening.
As #user2357112supportsMonica also provided "An encoding comment declares what encoding your script itself is written in," not how the output is rendered.
Thank you all for helping me through this.
Just to cap this topic. A PERFECT solution to this is provided here:
https://discuss.codecademy.com/t/general-nerdiness-questions/541682/7?u=coreblaster01537

Related

How do I fix Unicode Display Issue in Windows 7 with either FontTools Font-Merging or Manually Editing a System File?

Update: This issue is now solved. It actually appears that the issue is that the Unicode characters are somehow not being recognized by Uniscribe at all, and so no fallback font is used at all. The only feasible system-wide solution without trying to repair or change Uniscribe appears to be to change the name of Segoe UI Symbol to Arial (this can be done by converting the file to .ttx XML format and manually editing the file, then converting back to .ttf), and use this new "Arial" font as the base font which other Unicode fonts can be merged into with FontTools. I discuss what I've learned about merge / pyftmerge in more detail here.
The issue I'm trying to solve is the exact same issue/question asked here many years ago:
https://superuser.com/questions/784078/some-unicode-fonts-not-working-in-windows-7-firefox
If you read through the question/responses, you'll see that no actual conclusion was reached and no actual answer was found as to why some Windows 7 systems fail to display characters in uncommon Unicode blocks like Egyptian Hieroglyphics even though the relevant fonts are installed.
Furthermore, no general solution was found either (the OP concluded with a "solution" specifically for Firefox). The OP mentions that he has very extensively tried a large number of potential solutions to no avail.
My intuition is that this issue has nothing to do with graphics cards. I believe this is entirely a Windows issue and specifically a Uniscribe issue.
I suspect that the discrepancy between the OP testing the issue on his VM (where the unicode characters display properly) and on his local machine (where the unicode characters do not display), is possibly due to his local machine and his VM having different versions of usp10.dll. However, as the question is 6 years old and the OP does not appear to be active, there is no way to ask to confirm this hypothesis.
Upon inspecting usp10.dll with a hex editor, there are actually a decent amount of intelligible ASCII sections:
My actual question is what "should" I do to solve this issue? And furthermore, if someone has a decent idea of how to actually execute one of the solutions that I'll propose below, please explain the nitty-gritty. (I don't consider "upgrade to Windows 10" a valid answer. Although I understand that there are many good reasons to do so, please do not post a reply if this is your only suggestion.)
From what I've gathered, I can see 3 solutions to this issue:
I can go out of my way to get a VM and a fresh installation of Windows 7 to test my hypothesis relating to usp10.dll. If my hypothesis was true, I can possibly transfer the "good" version of usp10.dll to my local machine. The downside is that I do actually have to go out of my way to do this, as I don't own a VM and I no longer have my Windows 7 installation disc.
I can experiment with manually editing usp10.dll locally and see if this produces any helpful results. For instance, there may be some parts of the code where I can append additional font names such as Symbola and Aegyptus. If I'm very lucky, this may fix the issue by allowing Windows to fallback to these fonts. The obvious downside is that I don't understand Uniscribe or Windows DLLs near enough to be "qualified" to do this. Furthermore, I would have to do this for every additional "obscure" unicode font that I install in the future.
Lastly, perhaps the easiest solution I can think of, is to use the same approach EmojiOneColor and TwitterColorEmoji uses for their Windows installation procedure. Which is to merge additional fonts (in my case, Symbola and Aegyptus) into Segoe UI Symbol, and I believe this should almost guarantee that these additional Unicode characters are displayed properly, since Segoe UI Symbol is used by Windows by default.
The obvious advantage to the last approach is that it's relatively easy to do and almost guaranteed to work. And if defined to not replace any existing glyphs, would also give the same appearance as Windows using the correct fallback font. Furthermore, it can be automated with a script that you can run every time you want to merge a new font into Segoe UI Symbol. I currently don't have such a script, but maybe someone here can help me with that (TwitterColorEmoji uses Python FontTools). The downside to this approach is that I believe font files have a max number of glyphs, so there might be a limit to how many fonts I can stack up inside of Segoe UI Symbol.

Most pythonic way to clear console? [duplicate]

This question already has answers here:
Clear terminal in Python [duplicate]
(27 answers)
Closed 3 years ago.
So obviously you can clear the console with os.system('clear') but this seems like a very bodged solution to me.
Is there a more elegant way to clear the console?
I feel that this question is different from Clear terminal in Python because I am not asking simply how to clear the terminal, I am asking which is the most pythonic. Python - Clearing the terminal screen more elegantly is not what I am looking for either, as the marked answer there still does not feel very elegant or pythonic. Using escape characters or calling the command and checking the output seems even more like a bodge than an actual solution.

import curses
stdscr = curses.initscr()
stdscr.erase()
Rather than assuming specific character control codes, the curses package was developed for portable full screen character control in the heyday of many different video terminals, with many different and incompatible control sequences. It also turns the use of such codes from cryptic raw escape sequences to more readily understandable named functions, such as "erase." Look at the curses package documentation to see all of the capabilities. Here, we initialize curses to the (default) full terminal window, then erase the contents of that window.
As for this being Pythonic, it's hard to see something more Pythonic than using a standard Python package, initialize, and one line of code...

How to create scrollable console application that support ANSI escape code sequences

I am making some assumptions here on technology based on what I know, but other technology recommendations are welcome.
My goal: Write an ANSI Art viewer that as closely as possible resembles viewing on a DOS machine as possible, preferably without the overhead of running dosbox. This will run on a Raspberry Pi.
I have gotten my console to properly cat an ANSI with proper characters, colors, etc. The catch with the "viewer" is that I would like to be able to use the arrow keys to scroll up and down through the document, much like, say, the "less" command does.
From what I have been able to research, curses is a perfect candidate for this. The problem is that curses does not support ANSI escape code sequences. There is an ANSI editor written in C++ that uses curses, but it builds its own support for parsing the escape code sequences. Right now this is my last resort.
So my question is: Is there a better route to creating a scrollable console-mode application for viewing ANSI Art (Code Page 437 + ANSI escape code sequences) in python on linux?

There are really only two possibilities: Parse the ANSI sequences into something curses can accept, or use the ANSI sequences as-is.
At first, the latter may seem more attractive. Most of the limitations are either irrelevant to you, or easy to deal with:
It only works for static ANSI art, not animated files. Which is pretty reasonable, because it wouldn't make much sense to "scroll up" in an animated file. (You could of course render it on the fly to a canvas and then scroll a window up and down within that, but once you start thinking about what that rendering and windowing means… you're parsing ANSI art.) But it sounds like you only need static ANSI art.
It only works if your terminal is (close enough to) ANSI compatible… but it is (or can be made so) or your cat command wouldn't work. (You may still have a problem with the color settings, but I assume you know how to work around that too.)
It only works if your terminal is cp437, which may be more of a problem… but that's trivial to work around; just decode('cp437') then encode as appropriately in your code; the escape sequences are going to pass through unchanged.
You probably want raw keyboard input. But this is as easy as tty.setraw(sys.stdin.fileno()) and just reading stdin as an unbuffered file. (Well, you may want to stash the original tcgetattr so you can restore it later, but that's not much harder.)
You'll have to parse keyboard input escape sequences yourself. This is a lot of work to do generally… but just handling the up and down arrows for ANSI-art-compatible terminals is easy.
You'll have to know how to map the ANS file to actual lines.
That last one sounds like the easy part, but it's not. For example, I grabbed a random file, GR-BANT; it's only 33 lines long, but it's got 154 newlines in it. And that's going to be pretty common. In many cases, it's just going to be "overlay lines" that start with esc-[-A, that you have to treat as part of the previous line, which is not hard to do, but there will be plenty of cases that require something more than that.
So, you're going to have to do at least some ANSI parsing, no matter what.
And once you start on that, I think you'll find an easier time with your "last resort" of doing a full parse and drawing manually to a curses pad. (And of course this has the side effects of making it possible to handle animated files, working on non-ANSI terminals, handling "keypad" keys more easily and on all terminals, …)
But if you want to go the second way, here is a quick hack that should get you started.

How to get PyCharm to display unicode data in its console?

I have switched over to PyCharm and have had a blast using it. I code for projects that use languages other than English (i.e. Hebrew and Arabic) and need to debug encodings once in a while. For some reason, PyCharm will not display Unicode characters in its debug console.
I have set the IDE encoding to UTF-8 but it did not help.
Any ideas?

The accepted answer is no longer correct. Of the default fonts, none of them make a difference. I just spent awhile going through this same problem and the best solution is to modify your .bash_profile (or .zshrc) and include the line:
export PYTHONIOENCODING=UTF-8
In theory, you could also add this to your Environment Variables which you can set from within Preferences->Build,Execution,Deployment->Python Console
This approach, however, seems to be broken in the build I am using (4.0.4)

You need to change the console font to the one which contains the required Unicode glyphs:

SDL or PyGame international input

So basically, how is non-western input handled in SDL or OpenGL games or applications? Googling for it reveals http://sdl-im.csie.net/ but that doesn't seem to be maintained or available anymore. Just to view the page I had to use the Google cache.
To clarify, I'm not having any kind of issue in terms of the application displaying text in non-western languages to users. This is a solved problem. There are many unicode fonts available, and many different ways to process text into glyphs and then into display surfaces.
I run a-foul in the opposite direction. Even if my program could safely handle text data in any arbitrary encoding, there's no way for users to actually type their name if it happens to include a character that requires more than one keystroke to produce.

You are interested in SDL_EnableUNICODE(). When you enable unicode translation, you can use the unicode field of SDL_keysym structure to get the unicode character based on the key user typed.
Generally I think whenever you do text input (e.g. user focuses on a textbox) you should use the unicode field and not attempt to do translation yourself.
Here's something we did in YATC. Not really a shining example of how things should be done, but demonstrates the use of the unicode field.

Usually everybody just ends up using unicode for the text to internationalize their apps.
I don't remember SDL or neither OpenGL implemented anything that'd prevent you from implementing international input/output, except they are neither helping at that.
There's utilities over OpenGL you can use to render with .ttf fonts.

It appears there is now a Google summer of code project on this topic, for both X11 and for MacOS X

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.