Getting the text of a paragraph element using Selenium - python

`<div id="businessCategory12">`
`<p style="margin-top: 0px;line-height:80%;margin-left:5px;font-weight: bold;color:#00004C">Business Types</p>`
`<p style="margin-top: 0px;line-height:80%;margin-left:15px;font-weight: bold;"> Minority Owned Business</p>`
`<p style="margin-top: 0px;line-height:80%;margin-left:15px;"> Black American Owned</p>`
`</div>``
I am working on a webscraping tool for a client. I need to get the text from the third paragraph above using selenium (python) but I am having a lot of trouble. The text should be "Black American Owned". I have tried the following but it keeps giving me a null value. What am I doing wrong here?
Any help or other way to get the text would be greatly greatly appreciated!
`minority = driver.find_element_by_xpath("//*[#id='businessCategory12']/p[3]")`
`minority_owned = minority.text`

Possibly the node is hidden try with textContent instead of text
minority = driver.find_element_by_xpath("//*[#id='businessCategory12']/p[3]")
minority_owned = minority.get_attribute("textContent")

<div id="businessCategory12">
<p style="margin-top: 0px;line-height:80%;margin-left:5px;font-weight: bold;color:#00004C">Business Types</p>
<p style="margin-top: 0px;line-height:80%;margin-left:15px;font-weight: bold;">Minority Owned Business</p>
<p style="margin-top: 0px;line-height:80%;margin-left:15px;">Black American Owned</p>
</div>
Just try:
//p[3]/text()
Here is a good site to play around xpath:
https://scrapinghub.github.io/xpath-playground/

Related

Trying to use Python + XPath to click the highlighted text?

Fairly new to coding and Python, I'm trying to use find_element_by_xpath to click the text highlighted text "Snoring Chin Strap by TheFamilyMarket".
time.sleep(2)
#btn = br.find_element_by_name("#Anti Snoring Chin Strap Kit")
# btn = br.find_element_by_link_text('Snoring Chin Strap')
The HTML code:
<div class="tableD">
<div class="productDiv" id="productDiv69507">
<h2 class="productTitle" id="productTitle69507" onclick="goToProduct(7)">Snoring Chin Strap by TheFamilyMarket</h2>
<img class="productImage" src="https://images-na.ssl-images-amazon.com/images/I/516fC3JruqL.jpg" onclick="goToProduct(7)">
<hr>
<h4 class="normalPrice" id="normalPrice7" onclick="goToProduct(7)">Normally: <span class="currency">$ </span>19.99</h4>
<h4 class="promoPrice" style="margin:2.5px auto;" id="promoPrice69507" onclick="goToProduct(7)">Your Amazon Price: <span class="currency">$ </span>1.99</h4>
<h3>Your Total: <span class="currency">$ </span>1.99</h3>
<p class="clickToViewP" id="cToVP69507" onclick="goToProduct(7)">Click to view and purchase!</p>
</div>
</div>
br.find_element_by_xpath("//h2[text()='Snoring Chin Strap by TheFamilyMarket']");
XPath is sometimes fast to get because you can get it from the browser, and that's why so many people use it, but in my opinion for long term, learning JavaScript and CSS selectors can help you in many instances in the future.
The above can be done also by selecting all the h2 elements and looking for text using plain JavaScript and passing the result to python:
link_you_search = br.execute_script('''
links= document.querySelectorAll("h2");
for (link of links) if (link.textContent.includes("Chin Strap")) return link;
''')
link_you_search.click()
or alternatively you can select by class:
link_you_search = br.execute_script('''
links= document.querySelectorAll(".productDiv");
for (link of links) if (link.textContent.includes("Chin Strap")) return link;
''')
link_you_search.click()
given that your element has an id attribute usually selecting by id it is best practice since it is the fastest search and you should only have only one element with that id and usually ids don't change so often in case of translation etc, so in your case it would be:
link_you_search = br.find_element_by_id('productTitle69507')
link_you_search.click()

I am trying to webscrape to pycharm using selenium

This is the HTML text that gives the output -£0.03.
I`d like to find this value by xpath and print the output to pycharm console.
<div class="pnl runner-info-elem below-runner-info" ng-if="ctrl.vm.events.shouldShowPnl(runner)" ng-class="{'below-runner-info': ctrl.vm.data.displayRaceCardInfo}">
<!---->
<mv-runner-pnl ng-repeat="(type, pnl) in ctrl.vm.data.pnl[runner.key] track by type" type="actual" values="pnl.values" separator="comma" formatter="::pnl.formatter">
<!-- PnL -->
<div class="runner-elem-pnl actual-pnl">
<span class="prefix"></span>
<span class="pnl-value-container">
<span class="pnl-value negative">-£0.03</span>
</span>
<span class="pnl-value-container hidden"><span>
I have tried the following:
driver.find_element_by_xpath('//*[#id="main-wrapper"]/div/div[3]/div/div[2]/div/div[1]/div[3]/div/div[1]/div/bf-main-market/bf-main-marketview/div/div[2]/bf-marketview-runners-list[2]/div/div/div/table/tbody/tr[1]/td[1]/div/div[3]').get_attribute(pnl.value)
print(pnl.value)
My result is:
:NameError: name 'pnl' is not defined
Is this even possible? If not by xpath then by any other means?
As I said I am totally new to this and trying to learn from YouTube tutorials.
With the provided html, you could use:
el= driver.find_element_by_xpath("//div/mv-runner-pnl/div[#class='runner-elem-pnl actual-pnl']/span[#class='pnl-value-container']/span[#class='pnl-value negative']")
print(el.text)

Trouble finding element in Selenium with Python

I have been trying to collect a list of live channels/viewers on Youtube Gaming. I am using selenium with Python to force the website to scroll down the page so it loads more that 11 channels. For reference, this is the webpage I am working on.
I have found the location of the data I want, but I am struggling with getting selenium to go there. The part I am having trouble with looks like this:
<div class="style-scope ytg-gaming-video-renderer" id="video-metadata"><span class="title ellipsis-2 style-scope ytg-gaming-video-renderer"><ytg-nav-endpoint class="style-scope ytg-gaming-video-renderer x-scope ytg-nav-endpoint-2"><a href="/watch?v=FFKSD1HHrdA" tabindex="0" class="style-scope ytg-nav-endpoint" target="_blank">
Live met Bo3
</a></ytg-nav-endpoint></span>
<div class="channel-info small layout horizontal center style-scope ytg-gaming-video-renderer">
<ytg-owner-badges class="style-scope ytg-gaming-video-renderer x-scope ytg-owner-badges-0">
<template class="style-scope ytg-owner-badges" is="dom-repeat"></template>
</ytg-owner-badges>
<ytg-formatted-string class="style-scope ytg-gaming-video-renderer">
<ytg-nav-endpoint class="style-scope ytg-formatted-string x-scope ytg-nav-endpoint-2">Rico Eeman
</ytg-nav-endpoint>
</ytg-formatted-string>
</div><span class="ellipsis-1 small style-scope ytg-gaming-video-renderer" id="video-viewership-info" hidden=""></span>
<div id="metadata-badges" class="small style-scope ytg-gaming-video-renderer">
<ytg-live-badge-renderer class="style-scope ytg-gaming-video-renderer x-scope ytg-live-badge-renderer-1">
<template class="style-scope ytg-live-badge-renderer" is="dom-if"></template>
<span aria-label="" class="text layout horizontal center style-scope ytg-live-badge-renderer">4 watching</span>
<template class="style-scope ytg-live-badge-renderer" is="dom-if"></template>
</ytg-live-badge-renderer>
</div>
</div>
Currently, I am trying:
#This part works fine. I can use the unique ID
meta_data = driver.find_element_by_id('video-metadata')
#This part is also fine. Once again, it has an ID.
viewers = meta_data.find_element_by_id('metadata-badges')
print(viewers.text)
However, I am have been having trouble getting to the channel name (in this example 'Rico Eeman', and it is under the first nested div tag). Because its a compound class name, I cannot find the element by class name, and trying the following xpaths doesnt work:
name = meta_data.find_element_by_xpath('/div[#class="channel-info small layout horizontal center style-scope ytg-gaming-video-renderer"]/ytg-formatted-string'
name = meta_data.find_element_by_xpath('/div[1])
They both raise the element not found error. I am not really sure what to do here. Does anyone have a working solution?
The name id not in the <ytg-formatted-string> tag, its in one of it descendants. Try
meta_data.find_element_by_css_selector('.style-scope.ytg-formatted-string.x-scope.ytg-nav-endpoint-2 > a')
Or with xpath
meta_data.find_element_by_xpath('//ytg-nav-endpoint[#class="style-scope ytg-formatted-string x-scope ytg-nav-endpoint-2"]/a')
This will get all the names, even if your xpath worked using video-metadata would not get all the names, the id is repeated per div for each user so you would need find_elements and to iterate over the returned elements:
names = dr.find_elements_by_css_selector("a.style-scope.ytg-nav-endpoint[href^='/channel/']")
print([name.get_attribute("text") for name in names])
Which gives you:
['NinjaNation Gaming', 'DURX DANIEL', 'DEMON', 'Perfection', 'The one and only jd', 'Violator Games', 'KingLuii718', 'NinjaNation Gaming', 'DURX DANIEL', 'DEMON', 'Perfection']

Webdriver - Locate Input via Label (Python)

How do I locate an input field via its label using webdriver?
I like to test a certain web form which unfortunately uses dynamically generated
ids, so they're unsuitable as identifiers.
Yet, the labels associated with each web element strike me as suitable.
Unfortunately I was not able to do it with the few suggestions
offered on the web. There is one thread here at SO, but which did not
yield an accepted answer:
Selenium WebDriver Java - Clicking on element by label not working on certain labels
To solve this problem in Java, it is commonly suggested to locate the label as an anchor via its text content and then specifying the xpath to the input element:
//label[contains(text(), 'TEXT_TO_FIND')]
I am not sure how to do this in python though.
My web element:
<div class="InputText">
<label for="INPUT">
<span>
LABEL TEXT
</span>
<span id="idd" class="Required" title="required">
*
</span>
</label>
<span class="Text">
<input id="INPUT" class="Text ColouredFocus" type="text" onchange="var wcall=wicketAjaxPost(';jsessionid= ... ;" maxlength="30" name="z1013400259" value=""></input>
</span>
<div class="RequiredLabel"> … </div>
<span> … </span>
</div>
Unfortunately I was not able to use CSS or XPATH expressions
on the site. IDs and names always changed.
The only solution to my problem I found was a dirty one - parsing
the source code of the page and extract the ids by string operations.
Certainly this is not the way webdriver was intended to be used, but
it works robustly.
Code:
lines = []
for line in driver.page_source.splitlines():
lines.append(line)
if 'LABEL TEXT 1' in line:
id_l1 = lines[-2].split('"', 2)[1]
You should start with a div and check that there is a label with an appropriate span inside, then get the input element from the span tag with class Text:
//div[#class='InputText' and contains(label/span, 'TEXT_TO_FIND')]/span[#class='Text']/input

Obtain text from lxml Comment

I am trying to get the content of the _Comment. I've researched quite a bit on how do do it, but I don't know how to access the function from the td element in order to grab the text. I'm using xpaths with the python Scrapy module if that helps.
td = None [_Element]
<built-in function Comment> = None [_Comment]
a = None [_Element]
The HTML for the td element is:
<table class="crIFrameReviewList">
<tr>
<td>
<!-- BOUNDARY -->
<a name="R2L4AFEICL8GG6"></a><br />
<div style="margin-left:0.5em;">
<div style="margin-bottom:0.5em;">
304 of 309 people found the following review helpful
</div>
<div style="margin-bottom:0.5em;">
<span style='margin-left: -5px;'><img src="http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif" width="64" alt="5.0 out of 5 stars" title="5.0 out of 5 stars" height="12" border="0" /> </span>
<b>Great Travel Zoom</b>, <nobr>April 9, 2014</nobr>
</div>
<div style="margin-bottom:0.5em;">
<div class="tiny" style="margin-bottom:0.5em;">
<span class="crVerifiedStripe"><b class="h3color tiny" style="margin-right: 0.5em;">Verified Purchase</b><span class="tiny verifyWhatsThis">(What's this?)</span></span>
</div>
<div class="tiny" style="margin-bottom:0.5em;">
<b><span class="h3color tiny">This review is from: </span>Canon PowerShot SX700 HS Digital Camera (Black) (Electronics)</b>
</div>
For the recent few years Canon has made great efforts to improve their travel-zoom compact cameras, and the new SX700 is their next remarkable achievement on that way. It's a little bit bigger than its predecessor (SX280) but it is very well built and has an attractive look and feel (I like the black one). It also got a new front grip which makes one-hand shooting more convenient, even when shooting video, since the Video button was moved from the back to the top and you can now use your thumb solely for holding the camera.<br /><br />Here is a brief list of the new camera pros & cons:<br /><br />PROS:<br />* A very good design and build quality with the attractive finish.<br />* A new powerful 30x optical zoom lens in just a pocket-size body.<br />* Incredible range from 25mm wide to 750mm telephoto for stills and video.<br />* Zoom Framing Assist - very useful new feature to compose your pictures at long telephoto.<br />* Very effective optical Intelligent Image Stabilization for...
Read more
<div style="padding-top: 10px; clear: both; width: 100%;">
Find the div with class="reviewText" using .//div[#class="reviewText"] xpath expression and dump the element to string using tostring() with a text method:
import lxml.html
data = """
your html here
"""
td = lxml.html.fromstring(data)
review = td.find('.//div[#class="reviewText"]')
print lxml.html.tostring(review, method="text")
Prints:
54,000 RPM - It has a spinning disk drive that is way beyond our time...I bought 10 of these just for the hard drive, they blow SSD's out of the water.
Seriously though... how does a well known computer company mistype an important spec?

Categories