Python XPath exclude item - python

I have XML. I need you to take all XML except for the first IMG tag
<img width=\"1200\" height=\"673\" src=\"https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg\" class=\"webfeedsFeaturedVisual wp-post-image\" alt=\"Gilbert Arenas Dominique Wilkins Steve Kerr Magic\" style=\"display: block; margin-bottom: 5px; clear:both;max-width: 100%;\" link_thumbnail=\"\" srcset=\"https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg 1200w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-300x168.jpg 300w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-1024x574.jpg 1024w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-768x431.jpg 768w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" /><p>Even the most avid of NBA fans are unable to keep a mental record of every team all the greats have ever played in. That’s just impossible. This is especially the case when these stints are rather forgettable or possibly short-lived. Aside from Shaquille O’Neal, Dwight Howard, and Penny Hardaway, a few more stars played […]</p>\n<p>The post <a rel=\"nofollow\" href=\"https://clutchpoints.com/5-best-players-who-played-for-the-magic-that-you-forgot-about/\">5 best players who played for the Magic that you forgot about</a> appeared first on <a rel=\"nofollow\" href=\"https://clutchpoints.com\">ClutchPoints</a>.</p>\n
this does not work
//div/*[not(img)]

Assuming this is your HTML data :
<div>
<img width="1200" height="673" src="https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="Gilbert Arenas Dominique Wilkins Steve Kerr Magic" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" srcset="https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic.jpg 1200w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-300x168.jpg 300w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-1024x574.jpg 1024w, https://clutchpoints.com/wp-content/uploads/2020/07/Gilbert-Arenas-Dominique-Wilkins-Steve-Kerr-Magic-768x431.jpg 768w" sizes="(max-width: 1200px) 100vw, 1200px"/>
<p>Even the most avid of NBA fans are unable to keep a mental record of every team all the greats have ever played in. That’s just impossible. This is especially the case when these stints are rather forgettable or possibly short-lived. Aside from Shaquille O’Neal, Dwight Howard, and Penny Hardaway, a few more stars played […]</p>
<p>The post <a rel="nofollow" href="https://clutchpoints.com/5-best-players-who-played-for-the-magic-that-you-forgot-about/">5 best players who played for the Magic that you forgot about</a> appeared first on <a rel="nofollow" href="https://clutchpoints.com">ClutchPoints</a>.</p>
</div>
You can use the following XPath expression to get all elements except the img, use self axis in combination with not :
//div/*[not(self::img)]
Output : 4 nodes (2 p, 2 a)

Related

Juptyter Notebook HTML export showing </div> and not formatting properly

My wife recently did a boot camp program for Data analytics. She is trying to push her projects to GitHub for display. However, some of the Jupyter notebooks are not exporting to HTML properly or showing up on GitHub's display properly. Bellow is screen shots and code snipets.
I can't identify where I need to make changes to resolve this while still maintaining the comments and formatting. Most of her projects have this issue. I noticed the issue seems to start at but I have not been able to identify the specific issue. When I load up the HTML, I do notice that a lot of the conversion gets messed up, like so:
<p></div></p>
So it looks like some of the < /div > are getting messed up during the export/conversion to proper HTML.
When loaded up into the latest version of Jupyter Notebook...
jupyter core : 4.7.1
jupyter-notebook : 6.4.10
The display looks like this
However, when I export as HTML, or upload that to GitHub, it looks like this:
Here is the HTML code, as it looks its the Jupyter Notebook:
# <font color='#32cd32'><b><u> Reviewer Comment </u></b></font>
<div class="alert alert-success" >
**Hello! I am Larchenko Ksenia and I'm glad to review your project**!
You can find my comments in green, yellow and red boxes like these:
</div>
<div class="alert alert-success" >
**Success:** green color shows that everything is done perfectly;
</div>
<div class="alert alert-warning" >
**Remark & Recommendation:** yellow color highlights something to pay attention for;
</div>
<div class="alert alert-danger" >
**Needs fixing:** if the color is red, please rework.
</div>
<div class="alert alert-success" >
**Please, don't remove my comments:)**
</div>
And if I load up the raw ipynb and look at that section, it looks like this:
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hello, my name is Ivan Alexeev and I am going to review your project.\n",
"\n",
"\n",
"There may be some shortcomings in the work that I will ask you to eliminate, you fix them and I check your decisions. You can find my comments in <font color='green'>green</font>, <font color='orange'>orange</font> or <font color='red'>red</font> boxes like this:\n",
"\n",
"\n",
"<div class=\"alert alert-success\" style=\"box-shadow: 4px 4px 4px\">\n",
"Success: if everything is done successfully\n",
"</div>\n",
"\n",
"\n",
"<div class=\"alert alert-warning\" style=\"box-shadow: 4px 4px 4px\">\n",
"Remark: if I can give some recommendations or additional information\n",
"</div>\n",
"\n",
"\n",
"<div class=\"alert alert-danger\" style=\"box-shadow: 4px 4px 4px\">\n",
"Need fixing: if the block requires some corrections. Work can't be accepted with the red comments\n",
"</div>\n",
"\n",
"\n",
"Thank you for taking time to complete this project, I appreciate the amount of work you've done! There are some issues that you need to work on, but overall it is a great start! \n",
"\n",
"\n",
"Please, don't delete my comments) \n"
]
}

How to add watermarks in all pages of Odoo Reports?

Using below code It is just view on the first page. I want to show watermark on all pages.
<div class="watermark_report">
<img t-att-src="'data:image/png;base64,'+ doc.company_id.report_header_logo"/>
</div>
You already have the answer here:
Add this code for watermark in header of external layout. Its external id is report.external_layout_header:
<style>
.watermark {
position: absolute;
opacity: 0.25;
z-index: 1000;
transform: rotate(300deg);
-webkit-transform: rotate(300deg);
width: 150%;
}
</style>
<div class="watermark">
<p>WATERMARK</p>
<img t-att-src="'/module_name/static/src/img/image_name.png'" />
</div>
I have added a image stored as a file. If you are going to use a static image I think this is the most appropriate way
Note: Instead of using the css attribute opacity you can use a png image with opacity and transparent backgroud. That´s what I had to do
Note 2: I am afraid this does not work in Odoo v11
Update
This solution only is valid if you want to add the same image to all the reports.
There is a module developed by the OCA to add watermarks to the reports. A field appears in all reports where an images (with A4 size) can be added. The module name is report_qweb_pdf_watermark

Google Fusion Maps Info Window Dynamic Templating

I'm doing some web scraping with Python and the last step is to use Google Fusion maps, but as somebody who has never touched any CSS styling before, I have no idea how to do something probably incredibly simple: hide a column title in the info window if it's blank. Not all the data have entries in my Amenities column, so I would like that to be gone from the info window if it's blank.
I've read this (https://support.google.com/fusiontables/answer/3081246?hl=en&ref_topic=2575652), but it's complete gibberish to me at this stage.
This is the default HTML they provide for the info window (with my data):
<div class='googft-info-window'>
<b>Location:</b> {Location}<br>
<b>Movie Title:</b> {Movie Title}<br>
<b>Date:</b> {Date}<br>
<b>Amenities:</b> {Amenities}
</div>
This shot in the dark didn't work:
<div class='googft-info-window'>
<b>Location:</b> {Location}<br>
<b>Movie Title:</b> {Movie Title}<br>
<b>Date:</b> {Date}<br>
<b>{if Amenities.value}
Amenities:
{/if}Amenities:</b> {Amenities}<br>
</div>
This question isn't related to CSS, try this:
{template .contents}
<div class='googft-info-window'>
<b>Location:</b> {$data.value.Location}<br/>
<b>Movie Title:</b> {$data.value['Movie Title']}<br/>
<b>Date:</b> {$data.value.Date}<br/>
{if $data.value.Amenities}
<b>Amenities:</b>{$data.value.Amenities}<br/>
{/if}
</div>
{/template}

How to translate text in a html-text?

I have a text field in html. The inserted text is formatted with things like lists, underline, tables and such. Using pylupdate4, I want to translate this stuff. Can I wrap a self.tr("""all the html stuff""")around it and translate it? Looking into the .ts-file, the parts look very weird:
<message>
<location filename="Main.py" line="3526"/>
<source><byte value="xd"/>
<br><byte value="xd"/>
<p align="left", style="margin-left: 20px; margin-right:20px; margin-top:20px;"><byte value="xd"/>
test text ipsum blabla ... and so on tbc. <byte value="xd"/>
laurem ipsum blub rocknroll.<br><byte value="xd"/>
Moreover, Elvis is among us and jackson is his son PNEIS.<br><br><byte value="xd"/>;</source>
<translation type="unfinished"></translation>
</message>
What is a good approch to translate this? Inside of a big string """...""" I can't type self.tr...

Extract artist and music From text (regex)

I have written following regex But its not working. Can you please help me? thank you :-)
track_desc = '''<img src="http://images.raaga.com/catalog/cd/A/A0000102.jpg" align="right" border="0" width="100" height="100" vspace="4" hspace="4" />
<p>
</p>
<p> Artist(s) David: <br/>
Music: Ramana Gogula<br/>
</p>'''
rx = "<p><\/p><p>Artist\(s\): (.*?)<br\/>Music: (.*?)<br\/><\/p>"
m = re.search(rx, track_desc)
Output Should be:
Artist(s) David
Music: Ramana Gogula
You were ignoring the whitespace:
<p>[\s\n\r]*Artist\(s\)[\s\n\r]*(.*?)[\s\n\r]*:[\s\n\r]*<br/>[\s\n\r]*Music:[\s\n\r]*(.*?)<br/>[\s\n\r]*</p>
Output is:
[1] => "David"
[2] => "Ramana Gogula"
(note that your regex didn't match the Artists(s) and Music: prefixes either)
However for production code I would not rely on such rather clumsy regex (and equally clumsily formatted HTML source).
Seriously though, ditch the idea of using regex for this if you aren't the slightest familiar with regex (which it looks like). You're using the wrong tool and a badly formatted data source. Parsing HTML with Regex is wrong in 9 out of 10 cases (see #bgporter's comment link) and doomed to fail. Apart from that HTML is hardly ever an appropriate data source (unless there really really is no alternative source).
import lxml.html as lh
import re
track_desc = '''
<img src="http://images.raaga.com/catalog/cd/A/A0000102.jpg" align="right" border="0" width="100" height="100" vspace="4" hspace="4" />
<p>
</p>
<p> Artist(s) David: <br/>
Music: Ramana Gogula<br/>
</p>
'''
tree = lh.fromstring(track_desc)
print re.findall(r'Artist\(s\) (.+):\s*\nMusic: (.*\w)', tree.text_content())
I see a few errors:
regex is not multiline : should use flags=re.MULTILINE to allow to match on multilines
spaces are not taken into account
artist(s) is not followed by :
As the web page is rather strangely presented, this might be error prone to rely on a regex and I wouldn't advise to use it extensively.
Note, following seems to work:
rx='Artist(?:\(s\))?\s+(.*?)\<br\/>\s+Music:\s*(.*?)\<br'
print ("Art... : %s && Mus... : %s" % re.search(rx, track_desc,flags=re.MULTILINE).groups())

Categories