How to translate text in a html-text? - python

I have a text field in html. The inserted text is formatted with things like lists, underline, tables and such. Using pylupdate4, I want to translate this stuff. Can I wrap a self.tr("""all the html stuff""")around it and translate it? Looking into the .ts-file, the parts look very weird:
<message>
<location filename="Main.py" line="3526"/>
<source><byte value="xd"/>
<br><byte value="xd"/>
<p align="left", style="margin-left: 20px; margin-right:20px; margin-top:20px;"><byte value="xd"/>
test text ipsum blabla ... and so on tbc. <byte value="xd"/>
laurem ipsum blub rocknroll.<br><byte value="xd"/>
Moreover, Elvis is among us and jackson is his son PNEIS.<br><br><byte value="xd"/>;</source>
<translation type="unfinished"></translation>
</message>
What is a good approch to translate this? Inside of a big string """...""" I can't type self.tr...

Related

Python XPath exclude item

I have XML. I need you to take all XML except for the first IMG tag
<img width=\"1200\" height=\"673\" src=\"https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg\" class=\"webfeedsFeaturedVisual wp-post-image\" alt=\"Gilbert Arenas Dominique Wilkins Steve Kerr Magic\" style=\"display: block; margin-bottom: 5px; clear:both;max-width: 100%;\" link_thumbnail=\"\" srcset=\"https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg 1200w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-300x168.jpg 300w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-1024x574.jpg 1024w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-768x431.jpg 768w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" /><p>Even the most avid of NBA fans are unable to keep a mental record of every team all the greats have ever played in. That’s just impossible. This is especially the case when these stints are rather forgettable or possibly short-lived. Aside from Shaquille O’Neal, Dwight Howard, and Penny Hardaway, a few more stars played […]</p>\n<p>The post <a rel=\"nofollow\" href=\"https://clutchpoints.com/5-best-players-who-played-for-the-magic-that-you-forgot-about/\">5 best players who played for the Magic that you forgot about</a> appeared first on <a rel=\"nofollow\" href=\"https://clutchpoints.com\">ClutchPoints</a>.</p>\n
this does not work
//div/*[not(img)]
Assuming this is your HTML data :
<div>
<img width="1200" height="673" src="https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="Gilbert Arenas Dominique Wilkins Steve Kerr Magic" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" srcset="https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg 1200w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-300x168.jpg 300w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-1024x574.jpg 1024w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-768x431.jpg 768w" sizes="(max-width: 1200px) 100vw, 1200px"/>
<p>Even the most avid of NBA fans are unable to keep a mental record of every team all the greats have ever played in. That’s just impossible. This is especially the case when these stints are rather forgettable or possibly short-lived. Aside from Shaquille O’Neal, Dwight Howard, and Penny Hardaway, a few more stars played […]</p>
<p>The post <a rel="nofollow" href="https://clutchpoints.com/5-best-players-who-played-for-the-magic-that-you-forgot-about/">5 best players who played for the Magic that you forgot about</a> appeared first on <a rel="nofollow" href="https://clutchpoints.com">ClutchPoints</a>.</p>
</div>
You can use the following XPath expression to get all elements except the img, use self axis in combination with not :
//div/*[not(self::img)]
Output : 4 nodes (2 p, 2 a)

Parse Heavy XML into Ordered Dictionary

Am currently working on parsing XML in Python 3.x, for XML size till 300 MB not facing any issues with below code. However when file size increases to 500 MB or in GB, memory issues are being faced.
tree2=etree.parse(xmlfile2)
root2=tree2.getroot()
df_list2=[]
for i, child in enumerate(root2):
for subchildren in (child.findall('{raml20.xsd}header')):
for subchildren in (child.findall('{raml20.xsd}managedObject')):
xml_class_name2 = subchildren.get('class')
xml_dist_name2 = subchildren.get('distName')
for subchild in subchildren:
df_dict2=OrderedDict()
header2=subchild.attrib.get('name')
df_dict2['MOClass']=xml_class_name2
df_dict2['CellDN']=xml_dist_name2
df_dict2['Parameter']=header2
df_dict2['CurrentValue']=subchild.text
df_list2.append(df_dict2)
Came across various articles explaining use of 'iterparse', but am not getting a way through to use it for saving the XML data in ordered way.
Below is format of my XML:
<raml version="2.0" xmlns="raml20.xsd">
<cmData type="plan" scope="all" name="XML_Plan_update.xml">
<header>
<log dateTime="2018-12-31T16:13:28" action="created" appInfo="PlanExporter"/>
</header>
<managedObject class="WNCEL" version="LN2.0" distName="PLMN-PLMN/MRBTS-137/WNBTS-1/WNCEL-27046" operation="update">
<p name="defaultCarrier">10787</p>
<p name="lCelwDN">MRBTS-137/MNL-1/MNLENT-1/CELLMAPPING-1/LCELW-4</p>
<p name="maxCarrierPower">460</p>
</managedObject>
<managedObject class="WNCEL" version="LN2.0" distName="PLMN-PLMN/MRBTS-6770/WNBTS-1/WNCEL-26925" operation="update">
<p name="defaultCarrier">10787</p>
<p name="lCelwDN">MRBTS-6770/MNL-1/MNLENT-1/CELLMAPPING-1/LCELW-5</p>
<p name="maxCarrierPower">460</p>
</managedObject>
<managedObject class="WNCEL" version="LN2.0" distName="PLMN-PLMN/MRBTS-806/WNBTS-1/WNCEL-22661" operation="update">
<p name="defaultCarrier">10762</p>
<p name="lCelwDN">MRBTS-806/MNL-1/MNLENT-1/CELLMAPPING-1/LCELW-9</p>
<p name="maxCarrierPower">460</p>
</managedObject>
Am currently using cElementTree or lxml to parse the XML and save the for loop generated output in Ordered Dictionary. All entries of dict are appended in list at the end.
Looking for a way to use iterparse method for parsing above XML in ordered dict.

Python: convert json+html string to .doc

I'm writing a python script and i have to convert a rendered string(from a json with html inside) to a .docx file.
I searched a lot in web but I'm still confused.
I tried with python-docx but doesn't work well because wants docx input and he doesn't like this as a string:
<h1><span lessico='Questa' idx="0" testo="testo" show-modal="setModal()" tables="updateTables(input)">Questa</span> <span lessico='è' idx="1" testo="testo" show-modal="setModal()" tables="updateTables(input)">è</span> <span lessico='una' idx="2" testo="testo" show-modal="setModal()" tables="updateTables(input)">una</span> <span lessico='domanda' idx="3" testo="testo" show-modal="setModal()" tables="updateTables(input)">domanda</span>...</h1>
<ul>
<li>a scelta multipla</li>
<li>con risposta aperta</li>
<li>di tipo trova</li>
<li>di associazione</li>
How can i convert this into a formatted .doc or .docx? possibly without getting mad :)

Syntax highlighting in <pre> tags

Are there any libraries that will allow me to display code in <pre> tags and highlight the syntax according to the language? I'm imagining something like this:
<pre class="python">
class MyClass:
"""A simple example class"""
i = 12345
def f(self):
return 'hello world'
</pre>
...where the CSS for pre.python would highlight the Python code appropriately.
Does something like this exist?
There's SyntaxHighlighter:
<pre class="brush: python">
# python code here
</pre>
There's also highlight.js which has the option of automatically detecting the syntax and highlighting it appropriately; however, you would need to use both <pre><code> tags to wrap your code.
If you're looking for a server-side example, there's GeSHi or Pygments for Python.
I prefer highlight.js. It supports 112 languages.
Preview your page with this code injection from the browser console:
// Highlight 22 popular code types. TODO: Inline for speed and security.
function loadjscssfile(filename, filetype){ // http://www.javascriptkit.com/javatutors/loadjavascriptcss.shtml
if(filetype=="js"){
var fileref=document.createElement('script')
fileref.setAttribute("type","text/javascript")
fileref.setAttribute("src", filename)
}
else if(filetype=="css"){
var fileref=document.createElement("link")
fileref.setAttribute("rel", "stylesheet")
fileref.setAttribute("type", "text/css")
fileref.setAttribute("href", filename)
}
if(typeof fileref!="undefined") document.getElementsByTagName("head")[0].appendChild(fileref)
}
loadjscssfile("//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/vs.min.css", "css")
loadjscssfile("//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/highlight.min.js", "js")
setTimeout("var a = document.querySelectorAll('.code'); for(var i=0; i < a.length; ++i) hljs.highlightBlock(a[i])", 600)
Not sure if this is what you're after, but when I want syntax-highlighted code blocks in a document, I write the document in Pandoc-Markdown, then use Pandoc to process the doc into html.
You get the highlighted code block using pandoc-markdown syntax like this:
~~~{.python}
class MyClass:
"""A simple example class"""
i = 12345
def f(self):
return 'hello world'
~~~
Yes. You can use SyntaxHighlighter. It's easy to use, exactly the thing you need. Just add the code tag in your pre block.
It highlights about 23 languages including Python.
First Dowonload or Using CDNemphasized text
<link rel="stylesheet" href="{{asset("assets/css/prism.css")}}">
<link rel="stylesheet" href="{{asset("assets/css/prism-unescaped-markup.min.css")}}">
<script src="{{asset("assets/js/prism.js")}}"></script>
<script src="{{asset("assets/js/prism-unescaped-markup.min.js")}}"></script>
With HTML (Only HTML)
<script type="text/plain" class="language-markup">
<!DOCTYPE html>
<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec viverra nec nulla vitae mollis.</p>
</body>
</html>
</script>
With Any
<code class="language-css">
p { color: red }
</code>

Extract artist and music From text (regex)

I have written following regex But its not working. Can you please help me? thank you :-)
track_desc = '''<img src="http://images.raaga.com/catalog/cd/A/A0000102.jpg" align="right" border="0" width="100" height="100" vspace="4" hspace="4" />
<p>
</p>
<p> Artist(s) David: <br/>
Music: Ramana Gogula<br/>
</p>'''
rx = "<p><\/p><p>Artist\(s\): (.*?)<br\/>Music: (.*?)<br\/><\/p>"
m = re.search(rx, track_desc)
Output Should be:
Artist(s) David
Music: Ramana Gogula
You were ignoring the whitespace:
<p>[\s\n\r]*Artist\(s\)[\s\n\r]*(.*?)[\s\n\r]*:[\s\n\r]*<br/>[\s\n\r]*Music:[\s\n\r]*(.*?)<br/>[\s\n\r]*</p>
Output is:
[1] => "David"
[2] => "Ramana Gogula"
(note that your regex didn't match the Artists(s) and Music: prefixes either)
However for production code I would not rely on such rather clumsy regex (and equally clumsily formatted HTML source).
Seriously though, ditch the idea of using regex for this if you aren't the slightest familiar with regex (which it looks like). You're using the wrong tool and a badly formatted data source. Parsing HTML with Regex is wrong in 9 out of 10 cases (see #bgporter's comment link) and doomed to fail. Apart from that HTML is hardly ever an appropriate data source (unless there really really is no alternative source).
import lxml.html as lh
import re
track_desc = '''
<img src="http://images.raaga.com/catalog/cd/A/A0000102.jpg" align="right" border="0" width="100" height="100" vspace="4" hspace="4" />
<p>
</p>
<p> Artist(s) David: <br/>
Music: Ramana Gogula<br/>
</p>
'''
tree = lh.fromstring(track_desc)
print re.findall(r'Artist\(s\) (.+):\s*\nMusic: (.*\w)', tree.text_content())
I see a few errors:
regex is not multiline : should use flags=re.MULTILINE to allow to match on multilines
spaces are not taken into account
artist(s) is not followed by :
As the web page is rather strangely presented, this might be error prone to rely on a regex and I wouldn't advise to use it extensively.
Note, following seems to work:
rx='Artist(?:\(s\))?\s+(.*?)\<br\/>\s+Music:\s*(.*?)\<br'
print ("Art... : %s && Mus... : %s" % re.search(rx, track_desc,flags=re.MULTILINE).groups())

Categories