Sphinx - Split up long paragraphs docstrings for internazionalization - python

I'm trying to internazionalize the documents of a python library using sphinx and crowdin.
Through sphinx i firstly generate the .pot files but there's a problem with these files.
As mentioned in the sphinx docs
It is the maintainer’s task to split up paragraphs which are too large as there is no sane automated way to do that.
that's an example of what i have
...
#: ../../../disnake/client.py:docstring of disnake.client.Client:4
msgid "A number of options can be passed to the :class:`Client`."
msgstr ""
#: ../../../disnake/abc.py:docstring of disnake.abc.GuildChannel.clone:0
#: ../../../disnake/abc.py:docstring of disnake.abc.GuildChannel.create_invite:0
#: ../../../disnake/abc.py:docstring of disnake.abc.GuildChannel.delete:0
...
where i need all the docstrings of the methods with a msgid and the empty msgstr for translators.
Now, am i supposed to create a script to do this? If so, that script should extract paragraphs to use as msgid but i don't know where to start. I've also searched on internet but there isn't any example.
Thanks in advance.

Related

How to disable fuzzy on django translations?

I don't want to use fuzzy tag. Is it possible?
For example;
When i added new sentence or word translations , generally fuzz automatically wrap it. But i don't like it.
#: frontend/src/components/language_consts.js:74
#, fuzzy
#| msgid "Patient Address"
msgid "Patient's address?"
msgstr "Adresse du doctor"
This is probably because of the software you use to translate your strings. fuzzy means that the translation needs reviewing. Mark the translations as reviewed and it should disappear.

Use python to modify words in a LaTex file, ignoring LaTeX markup

I want to run an automated "spell checker" over some LaTex files (in addition to spelling it detects certain custom words, etc). I need to read the LaTex file, find certain words in the document text (i.e. ignoring words if they are part of the LaTeX markup code), then wrap each word in additional LaTeX highlighting markup and write the file back out. E.g.
\title{My Document}
...
I won the title!
If I search for "title", then it should ignore "\title".
This is so that, when rendered, the modified LaTeX will display found words using the highlighting I add e.g.:
\title{My Document}
...
I won the \colorbox{red}{title}!
A library would be helpful since I may eventually require additional parsing/control features, but simple modification is all I need for now.
It seems the hard part is discerning LaTex commands, comments, etc. from actual body text.
Thanks.
You need a Python LaTeX parser to do this. This looks like a good candidate https://github.com/alvinwan/TexSoup, there there are several available.
Like BeautifulSoup, there are search functions which would allow you to find all text nodes, then you can use regular python split/search functions to find your misspelled words, then replace the text node with a new set of latex nodes (with the wrapping syntax around the selected words).
TexSoup's documentation is a little unclear as to how to write the document back out, but looking at their source code they appear to override the repr function, so:
with open('out.tex','w') as f:
f.write(repr(soup))
Should do it for you.
EDIT:
If you look at the descendants generator:
>>> [x for x in soup.descendants if isinstance(x, str)]
['\x08egin', '(n.) A sacred fruit. Also known as:', '\x08egin', 'Here is the prevalence of each synonym.', '\x08egin', 'red lemon & uncommon ', 'Hello \textit', '.', 'Watermelon', 'red lemon', 'life', 'itemize', '& common', 'tabular', 'document']
The "children" are a mix of strs and TexNodes. You can pick out the pure strings there for your check, and just walk the tree yourself. The children attribute bizzarely only includes the TextNode elements.
As I got what you need, the python shouldn't be the best fitting instrument. I think that the thing what you need is sed or vim editors and a group of editing scripts. It'd work faster and be easier to maintain than writing python script

Django French Translation - how to handle single quotes in translation strings?

I am using Python 3.5.2 and Django 1.10.
I have received the French translation .po file and can run the compilemessages command without receiving any errors.
However, when I run the site, many pages refuse to load.
I suspect that this is because the French translation .po file contains many single quotes (') in the translation strings.
For example,
#: .\core\constants\address_country_style_types.py:274
msgid "Ascension Island"
msgstr "Île de l'Ascension"
I remember reading somewhere (but cannot find that reference anywhere) that the single quotes must have either a forward or back slash before them. So I tried that, but when I ran the compilemessage command, I got an error message of:
C:\Users\me\desktop\myapp\myapp\locale\fr\LC_MESSAGES\django.po:423:18: invalid control sequence
So how do I escape the French single quote in strings issue?
here is the header of my French language .po file:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL#ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2017-05-04 12:55+1000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL#ADDRESS>\n"
"Language-Team: LANGUAGE <LL#li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
I am unsure what is the cause of this issue (maybe that the translator somehow corrupted the file?).
However, a workaround is instead of using the standard single quotation mark ', I have used this single quotation mark (taken from symbols in MS Word):
′
I am yet to check this with the French translator, but it looks and works OK.
I hope this helps someone.
The correct way is to "Escape" the single quote, however, you need to know the end-point consuming the text. Like you found out with the backslash, as in:
L\'Ascension
Trust me, nobody that is French will like seeing the backquote. Back in the DOS days of the 90's, visually, there was almost no difference. Now with fonts, it gets ugly.
Since you're producing for the web, use a HTML replacement, like &apos;
See this article:
Why shouldn't `&apos;` be used to escape single quotes?
The solution is
#: .\core\constants\address_country_style_types.py:274
msgid "Ascension Island"
msgstr "Ile de l‘Ascension"
It works, even if it will be used in some JavaScript. Don't use the numeric code ', it will not work inside Form fields, it will not be rendered and you will see the ugly number. I already tested all this.
As I said in the comments, beginning a word with a uppercase accented letter is not recommended. If you put Île and you then sort the list of countries, the Î character will come after the Z and will not be sorted following a natural order, as you would expect.
This is another problem with Python sorting capabilities. It will only follow the extended ASCII code according of each letter encoding number. And Î has an ANSI code of 206, it comes after the Z, which is 90.
Maybe Python provides a solution to this, but I didn't find yet. If someone found it I would be glad to know.
I'm a French speaker, so are most of my users.
Very annoying bug.
the normal django escaping techniques (through \' or format_html(my_translated_string)) do not work for me as well.
I have used ′ instead of ' and it works OK - the compilemessage command works and the html node works ok.
it is however not very elegant or Robust as any future message needs to take this into account, and it is not very common to use the character ´
I found out another better and more robust solution:
escaping through template filters.
in html template:
<h5 class="modal-title">{{help_message_body|escape}}</h5>
and in javascript:
modal.find('.modal-message').html('<h5 class="modal-title">{{help_message_body|escapejs}}</h5>')

Using .po keys in bottle i18n

In the basic example for bottle i18n, the msgID is used to get the string in the corresponding language.
return bottle.template("<b>{{_('hello')}} I18N<b/>?")
in the corresponding .po, the msgid is defined:
msgid "hello"
msgstr "Hello"
In other projects, the .po does not only contain msgid and msgstr, but also a key before, defined with a hash sign. This is especially useful for longer phrases to avoid clutter in source code:
#: wordpress_file_monitor.php:138 wordpress_file_monitor.php:147
msgid "Remove Alert"
msgstr "Benachrichtigung entfernen"
How can I access this # key using bottle-i18n?

Sphinx, gettext and html : how can I create multiline translations?

The documentation of my project (using Python 3.2) is created with Sphinx (1.1.3) and is made of HTML files. I have to write this documentation in English and in French; that's the reason why I build my documentation with Sphinx and gettext, using the usual bunch of .po(t) and .mo files.
By example, my .po files are made of lines like this one :
msgid "original text"
msgstr "translation"
My problem lies in the fact that if some translations are made of several lines, the corresponding HTML file loses the "new line" characters : my different lines are packed into one big paragraph. I tried different things, like :
msgid "original text"
msgstr "translation : first line \n second line"
But of course the HTML doesn't care about the \n character; same problem with the \r character.
Then I tried this :
msgid "original text"
msgstr "translation : first line <br> second line"
But all I get is something like <br> instead of the expected "line break". Same thing with <br/>.
What can I do ? I would be nice to help me !
Well, thank's to ms4py, I'm able to answer my own question :
add the following line in each .rst file if you use "new line" characters in this file :
.. |br| raw:: html
Then add an empty line, then add on a new line the following characters : "less than" and br />. Then a new empty line.
use the following characters in each translations in order to have a break line : |br| with spaces before and after |br|.
Thank you !

Categories