Japanese characters won't appear when printed

Japanese characters won't appear when printed - python

I am printing Unicode characters in python. All of the symbols I have used so far work except for Japanese characters. When I print the characters, it only shows the "question mark in a box" symbol. How can I fix this?
When I first countered the problem I thought it might be python. I searched Google, but I found almost nothing.
Then I wondered if it was Command Prompt. (I use Command Prompt to test my code.) No relevant results.
For my code, I use a list made of the Unicode characters so I won't have to look up and type the specific code. This is what it looks like.
UD = [u"\u3053", u"\u3093", u"\u306B", u"\u3061", u"\u306F"]
UDTemp = UD[0] + UD[1] + UD[2] + UD[3] + UD[4]
print(UDTemp)
When printing, I expected "こんにちは", but instead I got the weird symbols.

The font has to support the characters. For example, I have east Asia IMEs installed on a US Windows 10 system, which make available fonts that support Japanese:
To obtain the fonts you want, it is easiest to add the language support for the desired language in Window 10. To add a language, search for "Language settings":
Once the language is installed, fonts supporting that language will appear in the Console properties, and IMEs will be installed so you can type in that language if you know how to use them.

Related

Unexpected question mark when trying to regex replace

I run this file test.py in my Sublime venv Python build system:
import re
text = "skull ☠️..."
print(text)
print(repr(text))
x = re.sub(r' *[\u2600-\u26FF]', r'', text)
print(x)
print(repr(x))
And see the output in Sublime window as expected:
skull ☠️...
'skull ☠️...'
skull️...
'skull️...'
But when I run the same file from command line in Windows 10 I get a strange question marks:
In Google Colab it also works as expected:
There is an invisible symbol with index 5:
What's happening here? How can I remove ☠️ without any question marks or zero width symbols on its place?

To identify the character that is left, you can paste it in some online Tool like this one.
The left character is U+FE0F : VARIATION SELECTOR-16 [VS16] {emoji variation selector}
and you can match or replace it by: \uFE0F
Together with your current pattern: [\u2600-\u26FF\uFE0F]

The Windows command prompt is a text user interface. So why do you want to output graphic symbols like emojis on a pure text interface at all? The font configured for drawing characters and symbols into a Windows console window must support the characters and symbols you want to see in the console window.
So simply you have to add custom fonts to your cmd so it can support the drawing of this emoji , here's a link to help you on how to add custom fonts to your command prompt https://www.maketecheasier.com/add-custom-fonts-command-prompt-windows10/
The Windows default console host (conhost.exe) does not support printing Unicode characters. However, the new Windows Terminal does. Run that code in the Windows Terminal (wt.exe), because it has fully Unicode support.
As per this answer:does all windows command prompt not support emoji?
This is a very lovely article about What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ will help you understand the encoding of every windows version.
I hope I could help you

Advanced input in python

I want to receive some information from a user in a next way:
My score is of 10 - is already printed
Between 'is' and 'of' there is an empty place for user's input so he doesn't enter his information at the end( if using simple input() ) but in the middle. While user is entering some information it appears between 'is' and 'of'
Is there any possible way to do it?

One way to get something close to what you want is if you terminal supports ANSI escape codes:
x = input("My score is \x1b[s of 10\x1b[u")
\x1b is the escape character. Neither escape character is dipsplayed on the screen; instead, they introduce byte sequences that the terminal interprets as an instruction of some kind. ESC[s tells the terminal to remember where the cursor is at the moment. ESC[u tells the terminal to move the cursor to the last-remembered position.
(The rectangle is the cursor in an unfocused window.)
Using a library that abstracts away the exact terminal you are using is preferable, but this gives you an idea of how such libraries interact with your terminal: it's all just bytes written to standard output.

If you use console then consider importing curses library. It works on both linux and windows. Download it for windows from http://www.lfd.uci.edu/~gohlke/pythonlibs/#curses
With this library you have a total control over console. Here is the answer to your question.
How to input a word in ncurses screen?

How to type Greek letters with Sikuli

I am using Sikuli to complete some forms and I have to type Greek letters on some of them.
I can define a string with greek letters, for example a='Γεια σου', and even print it using Python 3.5.2 Shell (on Windows). However when I use the type command on SikuliX the program crashes. The paste command does not give error but does not type the correct word either (it types other symbols).
Is there any way to type the correct Greek letters? (couldn't find anything in Google)
Added later: I noticed that typing ALT+(a number 896-919) gives Greek capital letters. I tried this with KeyDown(Key.ALT) on Sikuli but it doesn't work - it types nothing.

As stavros11 already noted, https://answers.launchpad.net/sikuli/+question/260734 holds the solution.
Here is a small sample script:
openApp("notepad.exe")
find("1470067652176.png") #set focus to notepad window
paste(unicode("Δ δ", "utf-8"))
paste(ucode("Δ δ"))

docstring (triple quotes) in iPython/Jupyter with autoclose brackets/quotes?

I'm trying to use docstrings w/ triple-quotes in my Jupyter notebooks using Python 2.7 .
I can disable the autoclose brackets/quotes thing but I'm quite keen on them; major increase in workflow.
Does anyone know how to do triple quotes without over-quoting while keeping the autoclose feature?
If I press the " key 3x I get """""";
If I press it 3x and delete once, I get """" pressing; and
If I press it 3x and delete twice, I get ""
Annoying, right? How can I have the best of both worlds (autoclose | docstrings) ?
This is a pretty low-level question, but I haven't seen an easy fix anywhere so the answer should be useful for the community. If you downvote, can you explain why this is a poor question please?

Nothing is wrong. When you type three " your cursor is at the middle of the resulting six. Thus, anything you type is within the string and has been auto-closed.
Type this exact string of characters: """This is working without clicking or otherwise moving the cursor. The result will be a correcly formatted string, because it will have auto-closed the string. Therefore you have both strings and autoclose.

PDFtotext - whitespace showing as aacute on commandline

I am extracting text using python from a textfile created from pdf using pdftotext. It is one of 2000 files and in this particular one, a line of keywords ends in EU. The remainder of the line is blank to the naked eye and so is the following line.
The program normally strips off any trailing blanks at the end of a line and ignores the subsequent blank line.
In this instance, it is saving the whitespace which is seen when it is printed out in at textfile between "EU. " and similarly in html (Simile Exhibit).
I also printed to the command line and here I see a string of aacute. [?]
I thought the obvious way to deal with this was to search and replace the accute. I've tried to do that with a compile statement and I've played with permutations of decoding the incoming text.
Oddly though, when I print "\255" I don't get an aacute, I get an o grave.
It seems likely with this odd combination of errors that I have misunderstood something fundamental. Any tips of how to begin unravelling this?
Many thanks.

The first tip is not to print wildly to all possible output mechanisms using various unstated encodings. Find out exactly what you have got. Do this:
print repr(the_line_with_the_problem) # Python 2.x
print(ascii(the_line_with_the_problem)) # Python 3.x
and edit your question and copy/paste the result.
Second tip: When asking for help, give information about your environment:
What version of Python? What version of what operating system?
Also show locale-related info; following example is from my computer running Python 2.7 in a Windows 7 Command Prompt window::
>>> import sys, locale
>>> sys.getdefaultencoding()
'ascii'
>>> sys.stdout.encoding
'cp850'
>>> locale.getdefaultlocale()
('en_AU', 'cp1252')
>>>
Third tip: Don't use your own jargon ... the concepts "Simile Exhibit", "printed to the command line", and "compile statement" need explanation.
What is the relevance of "\255"? Where did you get that from?
Wild guesses while waiting for some facts to emerge:
(1) The offending character is U+00A0 NO-BREAK SPACE aka NBSP which appears in your text as "\xA0" and when sent to stdout in a Western European locale on Windows using a Command Prompt window would be treated as being encoded in cp850 and thus appear as a-acute. How this could be transmogrified into o-grave is a mystery.
(2) "\255" == \xAD implies the offending character is U+00AD SOFT HYPHEN but why this would be seen as o-grave is a mystery, and it's not "whitespace"; it shouldn't be shown at all, and it it is shown it should be as a hyphen/minus-sign, not a space.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Japanese characters won't appear when printed - python

Related

Unexpected question mark when trying to regex replace

Advanced input in python

How to type Greek letters with Sikuli

docstring (triple quotes) in iPython/Jupyter with autoclose brackets/quotes?

PDFtotext - whitespace showing as aacute on commandline

Categories

Resources