I am trying to webscrape to pycharm using selenium - python

This is the HTML text that gives the output -£0.03.
I`d like to find this value by xpath and print the output to pycharm console.
<div class="pnl runner-info-elem below-runner-info" ng-if="ctrl.vm.events.shouldShowPnl(runner)" ng-class="{'below-runner-info': ctrl.vm.data.displayRaceCardInfo}">
<!---->
<mv-runner-pnl ng-repeat="(type, pnl) in ctrl.vm.data.pnl[runner.key] track by type" type="actual" values="pnl.values" separator="comma" formatter="::pnl.formatter">
<!-- PnL -->
<div class="runner-elem-pnl actual-pnl">
<span class="prefix"></span>
<span class="pnl-value-container">
<span class="pnl-value negative">-£0.03</span>
</span>
<span class="pnl-value-container hidden"><span>
I have tried the following:
driver.find_element_by_xpath('//*[#id="main-wrapper"]/div/div[3]/div/div[2]/div/div[1]/div[3]/div/div[1]/div/bf-main-market/bf-main-marketview/div/div[2]/bf-marketview-runners-list[2]/div/div/div/table/tbody/tr[1]/td[1]/div/div[3]').get_attribute(pnl.value)
print(pnl.value)
My result is:
:NameError: name 'pnl' is not defined
Is this even possible? If not by xpath then by any other means?
As I said I am totally new to this and trying to learn from YouTube tutorials.

With the provided html, you could use:
el= driver.find_element_by_xpath("//div/mv-runner-pnl/div[#class='runner-elem-pnl actual-pnl']/span[#class='pnl-value-container']/span[#class='pnl-value negative']")
print(el.text)

Related

Getting the text of a paragraph element using Selenium

`<div id="businessCategory12">`
`<p style="margin-top: 0px;line-height:80%;margin-left:5px;font-weight: bold;color:#00004C">Business Types</p>`
`<p style="margin-top: 0px;line-height:80%;margin-left:15px;font-weight: bold;"> Minority Owned Business</p>`
`<p style="margin-top: 0px;line-height:80%;margin-left:15px;"> Black American Owned</p>`
`</div>``
I am working on a webscraping tool for a client. I need to get the text from the third paragraph above using selenium (python) but I am having a lot of trouble. The text should be "Black American Owned". I have tried the following but it keeps giving me a null value. What am I doing wrong here?
Any help or other way to get the text would be greatly greatly appreciated!
`minority = driver.find_element_by_xpath("//*[#id='businessCategory12']/p[3]")`
`minority_owned = minority.text`
Possibly the node is hidden try with textContent instead of text
minority = driver.find_element_by_xpath("//*[#id='businessCategory12']/p[3]")
minority_owned = minority.get_attribute("textContent")
<div id="businessCategory12">
<p style="margin-top: 0px;line-height:80%;margin-left:5px;font-weight: bold;color:#00004C">Business Types</p>
<p style="margin-top: 0px;line-height:80%;margin-left:15px;font-weight: bold;">Minority Owned Business</p>
<p style="margin-top: 0px;line-height:80%;margin-left:15px;">Black American Owned</p>
</div>
Just try:
//p[3]/text()
Here is a good site to play around xpath:
https://scrapinghub.github.io/xpath-playground/

Beautifulsoup Text from Table using same Span tag

When using beautifulsoup to extract text from a table, I am unable to extract text because of multiple text in same .
I used the following codes -
results = soup.find_all('span', class_="crux-body-copy crux-body-copy--small--bold")
results[0]
I get the following results
<span class="crux-body-copy crux-body-copy--small--bold">
LATCH connections
<span class="product-model-tooltip">
<span aria-hidden="true" class="crux-icons crux-icons-help-information"></span>
<span class="product-model-tooltip-window">
<span aria-hidden="true" class="crux-icons crux-icons-close"></span>
<span class="crux-body-copy crux-body-copy--small--bold">LATCH connections</span>
<span class="crux-body-copy crux-body-copy--small">Type of LATCH connection.
</span>
</span>
</span>
</span>
Then I tried to get the text
results[0].get_text()
gives me
'\nLATCH connections\n\n\n\n\nLATCH connections\nType of LATCH connection.\n\n\n\n'
Then I used
results[0].get_text().replace('\n', '')
and I get
'LATCH connectionsLATCH connectionsType of LATCH connection.'
All I need is 'LATCH connections' and 'Type of LATCH connection' as two columns.
Can you please help.
Actually, there are many ways to do that. One way is :
After you find by class, you need to find by same class name again like following
parent_span= soup.find('span', class_="crux-body-copy crux-body-copy--small--bold")
result = parent_span.find('span', class_="crux-body-copy crux-body-copy--small--bold")
print(result.text)
if you want to pick only one tag, you don't need to use find_all. Just use find.

Trying to use Python + XPath to click the highlighted text?

Fairly new to coding and Python, I'm trying to use find_element_by_xpath to click the text highlighted text "Snoring Chin Strap by TheFamilyMarket".
time.sleep(2)
#btn = br.find_element_by_name("#Anti Snoring Chin Strap Kit")
# btn = br.find_element_by_link_text('Snoring Chin Strap')
The HTML code:
<div class="tableD">
<div class="productDiv" id="productDiv69507">
<h2 class="productTitle" id="productTitle69507" onclick="goToProduct(7)">Snoring Chin Strap by TheFamilyMarket</h2>
<img class="productImage" src="https://images-na.ssl-images-amazon.com/images/I/516fC3JruqL.jpg" onclick="goToProduct(7)">
<hr>
<h4 class="normalPrice" id="normalPrice7" onclick="goToProduct(7)">Normally: <span class="currency">$ </span>19.99</h4>
<h4 class="promoPrice" style="margin:2.5px auto;" id="promoPrice69507" onclick="goToProduct(7)">Your Amazon Price: <span class="currency">$ </span>1.99</h4>
<h3>Your Total: <span class="currency">$ </span>1.99</h3>
<p class="clickToViewP" id="cToVP69507" onclick="goToProduct(7)">Click to view and purchase!</p>
</div>
</div>
br.find_element_by_xpath("//h2[text()='Snoring Chin Strap by TheFamilyMarket']");
XPath is sometimes fast to get because you can get it from the browser, and that's why so many people use it, but in my opinion for long term, learning JavaScript and CSS selectors can help you in many instances in the future.
The above can be done also by selecting all the h2 elements and looking for text using plain JavaScript and passing the result to python:
link_you_search = br.execute_script('''
links= document.querySelectorAll("h2");
for (link of links) if (link.textContent.includes("Chin Strap")) return link;
''')
link_you_search.click()
or alternatively you can select by class:
link_you_search = br.execute_script('''
links= document.querySelectorAll(".productDiv");
for (link of links) if (link.textContent.includes("Chin Strap")) return link;
''')
link_you_search.click()
given that your element has an id attribute usually selecting by id it is best practice since it is the fastest search and you should only have only one element with that id and usually ids don't change so often in case of translation etc, so in your case it would be:
link_you_search = br.find_element_by_id('productTitle69507')
link_you_search.click()

Robot Framework anchor tag value is not retrieved

<div class="m-page-nav m-bottom-2" data-hbui="mobile-nav" role="navigation">
<span aria-`enter code here`label="Collapse page navigation" class="a-icon m-page-nav-icon m-page-nav-icon-collapse">expand_more</span>
<span aria-label="Expand page navigation" class="a-icon m-page-nav-icon m-page-nav-icon-expand">expand_more</span>
<button data-scroll-header class="m-page-nav-button"></button>
<div data-gumshoe class="m-page-nav-list">
<a data-scroll href="#intro" tabindex="0">Intro</a>
<a data-scroll href="#courseHeadingDescription" tabindex="0">Enrolment Disclaimer</a>
<a data-scroll href="#admissionRequirements" tabindex="0">Admission Requirements</a>
<a data-scroll href="#programRequirements" tabindex="0">Program Requirements</a>
<a data-scroll href="#professionalOutcomes" tabindex="0">Professional Outcomes</a>
<a data-scroll href="#recognitionMasters" tabindex="0">Recognition of Achievement</a>
<a data-scroll href="#fees" tabindex="0">Program Fees</a>
</div>
</div>
I am kind of new to robot framework. I am trying to retrieve the value from the href anchor tag value that is Intro using the following xpath
//*[#id="pageNavContainer"]/div/div/a[2]/#href
But when I run the script to get the text with the above xpath I am getting the below error:
InvalidSelectorException: Message: invalid selector: The result of the xpath express
I tried using getattribute also but it didn't work.
Can anyone please help me out what is the correct xpath for this?
Thanks in advance
The error you received says it all - the selector you provided is not a valid one; the problem comes from the /#href construct at the end.
In XPath one can for sure address a particular node's attribute with this syntax. Yet in Selenium (which Robotframework wraps), a locator has to point to an element, not its attribute - thus when it evaluates your expression, it throws the exception.
If you change the locator to
//*[#id="pageNavContainer"]/div/div/a[2]
, and the use Get Element Attribute passing href as the target, you'll get the desired value.
To extract the text Intro you can use the following xpath :
//*[#id="pageNavContainer"]//div[#class='m-page-nav m-bottom-2']//div[#class='m-page-nav-list']//following-sibling::a[1]
Next you can use the Get Element Attribute to extract the href :
${HREF}= Get Element Attribute ${element_xpath}#href

Scrape the <p> tag using Python Selenium

I have a requirement to extract the tag from the HTML. However I am not able to get it done.
This is the HTML:
<div class="small-12 columns" id="gennedword-container">
<p class="text-center" id="gennedword">
Press "New Word"
</p>
</div>
I tried using the following, but I dont get the text.
word = driver.find_element_by_xpath('//*[#id="gennedword"]')
print(word.get_attribute('text'))
print(word.get_attribute('innerHTML'))
All it returns was None and for innerHtml I get -
<img src="/wordgenerator/loading.gif" alt="Loading..." style="vertical-align:middle;">
As per your Question you are trying to scrape the <p> tag and extract the text Press "New Word". To achieve that you can use the following line of code :
print(driver.find_element_by_xpath("//div[#id='gennedword-container' and #class='small-12 columns']/p[#id='gennedword' and #class='text-center']").get_attribute('innerHTML'))
I was able to solve the issue.
Added a sleep for 1 sec and the text appeared.
Thanks all for the help

Categories