Sphinx, gettext and html : how can I create multiline translations?

Sphinx, gettext and html : how can I create multiline translations? - python

The documentation of my project (using Python 3.2) is created with Sphinx (1.1.3) and is made of HTML files. I have to write this documentation in English and in French; that's the reason why I build my documentation with Sphinx and gettext, using the usual bunch of .po(t) and .mo files.
By example, my .po files are made of lines like this one :
msgid "original text"
msgstr "translation"
My problem lies in the fact that if some translations are made of several lines, the corresponding HTML file loses the "new line" characters : my different lines are packed into one big paragraph. I tried different things, like :
msgid "original text"
msgstr "translation : first line \n second line"
But of course the HTML doesn't care about the \n character; same problem with the \r character.
Then I tried this :
msgid "original text"
msgstr "translation : first line <br> second line"
But all I get is something like <br> instead of the expected "line break". Same thing with <br/>.
What can I do ? I would be nice to help me !

Well, thank's to ms4py, I'm able to answer my own question :
add the following line in each .rst file if you use "new line" characters in this file :
.. |br| raw:: html
Then add an empty line, then add on a new line the following characters : "less than" and br />. Then a new empty line.
use the following characters in each translations in order to have a break line : |br| with spaces before and after |br|.
Thank you !

Related

Sphinx - Split up long paragraphs docstrings for internazionalization

I'm trying to internazionalize the documents of a python library using sphinx and crowdin.
Through sphinx i firstly generate the .pot files but there's a problem with these files.
As mentioned in the sphinx docs
It is the maintainer’s task to split up paragraphs which are too large as there is no sane automated way to do that.
that's an example of what i have
...
#: ../../../disnake/client.py:docstring of disnake.client.Client:4
msgid "A number of options can be passed to the :class:`Client`."
msgstr ""
#: ../../../disnake/abc.py:docstring of disnake.abc.GuildChannel.clone:0
#: ../../../disnake/abc.py:docstring of disnake.abc.GuildChannel.create_invite:0
#: ../../../disnake/abc.py:docstring of disnake.abc.GuildChannel.delete:0
...
where i need all the docstrings of the methods with a msgid and the empty msgstr for translators.
Now, am i supposed to create a script to do this? If so, that script should extract paragraphs to use as msgid but i don't know where to start. I've also searched on internet but there isn't any example.
Thanks in advance.

How can I convert String (with linebreaks) to HTML?

When I print the string (in Python) coming from a website I scraped it from, it looks like this:
"His this
is
a sample
String"
It does not show the \n breaks. this is what I see in a Python interpreter.
And I want to convert it to HTML that will add in the line breaks. I was looking around and didn't see any libraries that do this out of the box.
I was thinking BeautifulSoup, but wasn't quite sure.

If you have a String that you have readed it from a file you can just replace \n to <br>, which is a line break in html, by doing:
my_string.replace('\n', '<br>')

You can use the python replace(...) method to replace all line breaks with the html version <br> and possibly surround the string in a paragraph tag <p>...</p>. Let's say the name of the variable with the text is text:
html = "<p>" + text.replace("\n", "<br>") + "</p>"

searching for this answer in found this, witch is likely better because it encodes all characters, at least for python 3
Python – Convert HTML Characters To Strings
# import html
import html
# Create Text
text = 'Γeeks for Γeeks'
# It Converts given text To String
print(html.unescape(text))
# It Converts given text to HTML Entities
print(html.escape(text))

I believe this will work
for line in text:
for char in line:
if char == "/n":
text.replace(char, "<br>")

Django French Translation - how to handle single quotes in translation strings?

I am using Python 3.5.2 and Django 1.10.
I have received the French translation .po file and can run the compilemessages command without receiving any errors.
However, when I run the site, many pages refuse to load.
I suspect that this is because the French translation .po file contains many single quotes (') in the translation strings.
For example,
#: .\core\constants\address_country_style_types.py:274
msgid "Ascension Island"
msgstr "Île de l'Ascension"
I remember reading somewhere (but cannot find that reference anywhere) that the single quotes must have either a forward or back slash before them. So I tried that, but when I ran the compilemessage command, I got an error message of:
C:\Users\me\desktop\myapp\myapp\locale\fr\LC_MESSAGES\django.po:423:18: invalid control sequence
So how do I escape the French single quote in strings issue?
here is the header of my French language .po file:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL#ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2017-05-04 12:55+1000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL#ADDRESS>\n"
"Language-Team: LANGUAGE <LL#li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"

I am unsure what is the cause of this issue (maybe that the translator somehow corrupted the file?).
However, a workaround is instead of using the standard single quotation mark ', I have used this single quotation mark (taken from symbols in MS Word):
′
I am yet to check this with the French translator, but it looks and works OK.
I hope this helps someone.

The correct way is to "Escape" the single quote, however, you need to know the end-point consuming the text. Like you found out with the backslash, as in:
L\'Ascension
Trust me, nobody that is French will like seeing the backquote. Back in the DOS days of the 90's, visually, there was almost no difference. Now with fonts, it gets ugly.
Since you're producing for the web, use a HTML replacement, like &apos;
See this article:
Why shouldn't `&apos;` be used to escape single quotes?

The solution is
#: .\core\constants\address_country_style_types.py:274
msgid "Ascension Island"
msgstr "Ile de l‘Ascension"
It works, even if it will be used in some JavaScript. Don't use the numeric code ', it will not work inside Form fields, it will not be rendered and you will see the ugly number. I already tested all this.
As I said in the comments, beginning a word with a uppercase accented letter is not recommended. If you put Île and you then sort the list of countries, the Î character will come after the Z and will not be sorted following a natural order, as you would expect.
This is another problem with Python sorting capabilities. It will only follow the extended ASCII code according of each letter encoding number. And Î has an ANSI code of 206, it comes after the Z, which is 90.
Maybe Python provides a solution to this, but I didn't find yet. If someone found it I would be glad to know.

I'm a French speaker, so are most of my users.
Very annoying bug.
the normal django escaping techniques (through \' or format_html(my_translated_string)) do not work for me as well.
I have used ′ instead of ' and it works OK - the compilemessage command works and the html node works ok.
it is however not very elegant or Robust as any future message needs to take this into account, and it is not very common to use the character ´
I found out another better and more robust solution:
escaping through template filters.
in html template:
<h5 class="modal-title">{{help_message_body|escape}}</h5>
and in javascript:
modal.find('.modal-message').html('<h5 class="modal-title">{{help_message_body|escapejs}}</h5>')

Line break in a word document with Win32Com

I'm using the module Win32Com to edit automatically Word Documents with Python. But I'm facing an annoying problem that you've probably seen before : I use the Find and Replace function of the module to insert paragraphs into a template that I have, but sometimes I'd like to insert several paragraphs at the same time, which are separated with a line-break. The python string of these paragraphs is something like that : text = "First paragraph.\nSecond paragraph."
But the problem is that when I use the Find and Replace function with that kind of strings, it doesn't make a line-break but something like First paragraph Second paragraph, which is obviously not what I want.
Does someone have an idea on how to deal with that ?
Thanks guys for help !

Removed new line character appears in write to file but not in print to screen

One very stupid thing happens when I modify a string that contained newline characters within it.
After modifying the string variable, I print it. It successfully shows that the new line character has been removed.
When I write the string variable to a file, it prints the new line character there.
I spent hours figuring this out!
import os
import csv
s = "I want this \n new line removed"
s = s.replace("\n", "")
print(s)
file = open('my_file.tsv', 'w')
file.write(s)
file.close()
The above is a sample code. If you run this code, it will run. The string in my real project is a text dynamically fetched from a mysql database -- which is being modified. That contains one or more \n characters within it.
If in the above code, I try replacing that text obtained from the database in a hardcoded manner and running it, it throws me an error saying "EOL while scanning string lateral"
Can you please help me clean this text into something consumable?

Removal of '\r\n' worked!! Thanks a lot #abarnert for the suggestion.
The text wasn't visible to me in code form. It was raw text fetched from db. The raw text just looked like a paragraph with multiple newlines. Hence, I wasn't able to provide real text

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.