click Menu item with same ID using Selenium Python - python

Below is my HTML snippet and code I tried. I need to Click the Integrated Consoles menu item. I tried like below, But nothing happens and no error as well. Kindly help me to select the Specific Menu Item using the text inside the tag.
driver.find_element_by_xpath(".//td[contains text(),'Integrated Consoles']").click()
HTMl sample Snippet
<td nowrap="" id="MENU_TD110"> Integrated Consoles </td>
<td nowrap="" id="MENU_TD110"> System Information </td>
<td nowrap="" id="MENU_TD110"> More Tools </td>

Parentheses () are missing inside your contains method just enclose like below and try -
driver.find_element_by_xpath(".//td[contains(text(),'Integrated Consoles')]").click()

Related

How to define Attributes like background colour

How we can give background colour or any other tags on row and column in this ?
There are three ways to set background-color property using inline CSS, internal CSS and external CSS,
<div style="background-color: red">
also you can apply this to your table row and as well as on table column like below.
<tr style="background-color: red">Table row</tr>
<td style="background-color: green">Table Column</td>

Python xhtml2pdf table cell text display to vertical

Im using xhtml2pdf to generate report in django and i would like to let one of my cells text display vertically but i couldn't make it using css.
Here is some attempt:
.vertical-text {
writing-mode: tb-rl;
}
<table>
<tbody>
<tr>
<td class="vertical-text" >V text</td>
</tr>
</tbody>
</table>
UPDATE
writing-mode property is missing in the supported css properties. Is there any workaround?
There is another work around, where we can use css: JSFiddle
.verical-text {
width:1px;
font-family: monospace;
white-space: pre-wrap; /* this is for displaying whitespaces including Firefox */
}
But, there are couple of downfall's here:
Is there has to be spaces between letters to ensure it's displayed in vertical.
The letters are not rotated, but would be in similar orientation.
I would suggest use some other tool, where you would not be restricted with css properties like:
PDFKit
PhantomJs - Write a custom nodeJs server in the backend which would do it.

Parsing Web Page's Search Results With Python

I recently started working on a program in python which allows the user to conjugate any verb easily. To do this, I am using the urllib module to open the corresponding conjugations web page. For example, the verb "beber" would have the web page:
"http://www.spanishdict.com/conjugate/beber"
To open the page, I use the following python code:
source = urllib.urlopen("http://wwww.spanishdict.com/conjugate/beber").read()
This source does contain the information that I want to parse. But, when I make a BeautifulSoup object out of it like this:
soup = BeautifulSoup(source)
I appear to lose all the information I want to parse. The information lost when making the BeautifulSoup object usually looks something like this:
<tr>
<td class="verb-pronoun-row">
yo </td>
<td class="">
bebo </td>
<td class="">
bebí </td>
<td class="">
bebía </td>
<td class="">
bebería </td>
<td class="">
beberé </td>
</tr>
What am I doing wrong? I am no professional at Python or Web Parsing in general, so it may be a simple problem.
Here is my complete code (I used the "++++++" to differentiate the two):
import urllib
from bs4 import BeautifulSoup
source = urllib.urlopen("http://www.spanishdict.com/conjugate/beber").read()
soup = BeautifulSoup(source)
print source
print "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
print str(soup)
When I wrote parsers I've had problems with bs, in some cases, it didn't find that found lxml and vice versa, because of broken html.
Try to use lxml.html.
Your problem may be with encoding. I think that bs4 works with utf-8 and you have a different encoding set on your machine as default(an encoding that contains spanish letters). So urllib requests the page in your default encoding,thats okay so data is there in the source, it even prints out okay, but when you pass it to utf-8 based bs4 that characters are lost. Try looking for setting a different encoding in bs4 and if possible set it to your default. This is just a guess though, take it easy.
I recommend using regular expressions. I have used them for all my web crawlers. If this is usable for you depends on the dynamicity of the website. But that problem is there even when you use bs4. You just write all your re manually and let it do the magic. You would have to work with the bs4 similiar way when looking foor information you want.

Python pattern matching

I'm currently in the process of converting an old bash script of mine into a Python script with added functionality. I've been able to do most things, but I'm having a lot of trouble with Python pattern matching.
In my previous script, I downloaded a web page and used sed to get the elemented I wanted. The matching was done like so (for one of the values I wanted):
PM_NUMBER=`cat um.htm | LANG=sv_SE.iso88591 sed -n 's/.*ol.st.*pm.*count..\([0-9]*\).*/\1/p'`
It would match the number wrapped in <span class="count"></span> after the phrase "olästa pm". The markup I'm running this against is:
<td style="padding-left: 11px;">
<a href="/abuse_list.php">
<img src="/gfx/abuse_unread.png" width="15" height="12" alt="" title="9 anmälningar" />
</a>
</td>
<td align="center">
<a class="page_login_text" href="/pm.php" title="Du har 3 olästa pm.">
<span class="count">3</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/blogg_latest.php" title="Du har 1 ny bloggkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/user_guestbook.php" title="Min gästbok">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/forum.php?view=3" title="Du har 1 ny forumkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/user_images.php?user_id=162005&func=display_new_comments" title="Du har 1 ny albumkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/forum_favorites.php" title="Du har 2 uppdaterade trådar i "bevakade trådar"">
<span class="count">2</span>
</td>
I'm hesitant to post this, because it seems like I'm asking for a lot, but could someone please help me with a way to parse this in Python? I've been pulling my hair trying to do this, but regular expressions and I just don't match (pardon the pun). I've spent the last couple of hours experimenting and reading the Python manual on regular expressions, but I can't seem to figure it out.
Just to make it clear, what I need are 7 different expressions for matching the number within <span class="count"></span>. I need to, for example, be able to find the number of unread PMs ("olästa pm").
You will not parse html yourself. You will use a html parser built in python to parse the html.
Lightweight xml dom parser in python
Beautiful Soup
You can user lxml to pull out the values you are looking for pretty easily with xpaths
lxml
xpath
Example
from lxml import html
page = html.fromstring(open("um.htm", "r").read())
matches = page.xpath("//a[contains(#title, 'pm.') or contains(#title, 'ol')]/span")
print [elem.text for elem in matches]
use either:
BeautifulSoup
lxml
parsing HTML with regexes is a recipe for disaster.
It is impossible to reliably match HTML using regular expressions. It is usually possible to cobble something together that works for a specific page, but it is not advisable as even a subtle tweak to the source HTML can render all your work useless. HTML simply has a more complex structure than Regex is capable of describing.
The proper solution is to use a dedicated HTML parser. Note that even XML parsers won't do what you need, not reliably anyway. Valid XHTML is valid XML, but even valid HTML is not, even though it's quite similar. And valid HTML/XHTML is nearly impossible to find in the wild anyway.
There are a few different HTML parsers available:
BeautifulSoup is not in the standard library, but it is the most forgiving parser, it can handle almost all real-world HTML and it's designed to do exactly what you're trying to do.
HTMLParser is included in the Python standard library, but it is fairly strict about accepting only valid HTML.
htmllib is also in the standard library, but is deprecated.
As other people have suggested, BeautifulSoup is almost certainly your best choice.

How do I find all cells with a particular attribute in BeautifulSoup?

I am trying to develop a script to pull some data from a large number of html tables. One problem is that the number of rows that contain the information to create the column headings is indeterminate. I have discovered that the last row of the set of header rows has the attribute border-bottom for each cell with a value. Thus I decided to find those cells with the attribute border-bottom. As you can see I initialized a list. I intended to find the parent of each of the cells that end up in the borderCells list. However, when I run this code only one cell, that is the first cell in allCells with the attribute border-bottom is added to the list borderCells. For your information allCells has 193 cells, 9 of them have the attr border-bottom. Thus I was expecting nine members in the borderCells list. Any help is appreciated.
borderCells=[]
for each in allCells:
if each.find(attrs={"style": re.compile("border-bottom")}):
borderCells.append(each)
Is there any reason
borderCells = soup.findAll("td", style=re.compile("border-bottom")})
wouldn't work? It's kind of hard to figure out exactly what you're asking for, since your description of the original tables is pretty ambiguous, and it's not really clear what allCells is supposed to be either.
I would suggest giving a representative sample of the HTML you're working with, along with the "correct" results pulled from that table.
Well you know computers are always right. The answer is that the attrs are on different things in the html. What I was modeling on what some html that looked like this:
<TD nowrap align="left" valign="bottom">
<DIV style="border-bottom: 1px solid #000000; width: 1%; padding-bottom: 1px">
<B>Name</B>
</DIV>
</TD>
The other places in the file where style="border-bottom etc look like:
<TD colspan="2" nowrap align="center" valign="bottom" style="border-bottom: 1px solid 00000">
<B>Location</B>
</TD>
so now I have to modify the question to figure out how identify those cells where the attr is at the td level not the div level
Someone took away one of their answers though I tested it and it worked for me. Thanks for the help. Both answers worked and I learned a little bit more about how to post questions and after I stare at the code for a while I might learn more about Python and BeautifulSoup

Categories