Trying to use Python + XPath to click the highlighted text? - python

Fairly new to coding and Python, I'm trying to use find_element_by_xpath to click the text highlighted text "Snoring Chin Strap by TheFamilyMarket".
time.sleep(2)
#btn = br.find_element_by_name("#Anti Snoring Chin Strap Kit")
# btn = br.find_element_by_link_text('Snoring Chin Strap')
The HTML code:
<div class="tableD">
<div class="productDiv" id="productDiv69507">
<h2 class="productTitle" id="productTitle69507" onclick="goToProduct(7)">Snoring Chin Strap by TheFamilyMarket</h2>
<img class="productImage" src="https://images-na.ssl-images-amazon.com/images/I/516fC3JruqL.jpg" onclick="goToProduct(7)">
<hr>
<h4 class="normalPrice" id="normalPrice7" onclick="goToProduct(7)">Normally: <span class="currency">$ </span>19.99</h4>
<h4 class="promoPrice" style="margin:2.5px auto;" id="promoPrice69507" onclick="goToProduct(7)">Your Amazon Price: <span class="currency">$ </span>1.99</h4>
<h3>Your Total: <span class="currency">$ </span>1.99</h3>
<p class="clickToViewP" id="cToVP69507" onclick="goToProduct(7)">Click to view and purchase!</p>
</div>
</div>

br.find_element_by_xpath("//h2[text()='Snoring Chin Strap by TheFamilyMarket']");

XPath is sometimes fast to get because you can get it from the browser, and that's why so many people use it, but in my opinion for long term, learning JavaScript and CSS selectors can help you in many instances in the future.
The above can be done also by selecting all the h2 elements and looking for text using plain JavaScript and passing the result to python:
link_you_search = br.execute_script('''
links= document.querySelectorAll("h2");
for (link of links) if (link.textContent.includes("Chin Strap")) return link;
''')
link_you_search.click()
or alternatively you can select by class:
link_you_search = br.execute_script('''
links= document.querySelectorAll(".productDiv");
for (link of links) if (link.textContent.includes("Chin Strap")) return link;
''')
link_you_search.click()
given that your element has an id attribute usually selecting by id it is best practice since it is the fastest search and you should only have only one element with that id and usually ids don't change so often in case of translation etc, so in your case it would be:
link_you_search = br.find_element_by_id('productTitle69507')
link_you_search.click()

Related

Getting the text of a paragraph element using Selenium

`<div id="businessCategory12">`
`<p style="margin-top: 0px;line-height:80%;margin-left:5px;font-weight: bold;color:#00004C">Business Types</p>`
`<p style="margin-top: 0px;line-height:80%;margin-left:15px;font-weight: bold;"> Minority Owned Business</p>`
`<p style="margin-top: 0px;line-height:80%;margin-left:15px;"> Black American Owned</p>`
`</div>``
I am working on a webscraping tool for a client. I need to get the text from the third paragraph above using selenium (python) but I am having a lot of trouble. The text should be "Black American Owned". I have tried the following but it keeps giving me a null value. What am I doing wrong here?
Any help or other way to get the text would be greatly greatly appreciated!
`minority = driver.find_element_by_xpath("//*[#id='businessCategory12']/p[3]")`
`minority_owned = minority.text`
Possibly the node is hidden try with textContent instead of text
minority = driver.find_element_by_xpath("//*[#id='businessCategory12']/p[3]")
minority_owned = minority.get_attribute("textContent")
<div id="businessCategory12">
<p style="margin-top: 0px;line-height:80%;margin-left:5px;font-weight: bold;color:#00004C">Business Types</p>
<p style="margin-top: 0px;line-height:80%;margin-left:15px;font-weight: bold;">Minority Owned Business</p>
<p style="margin-top: 0px;line-height:80%;margin-left:15px;">Black American Owned</p>
</div>
Just try:
//p[3]/text()
Here is a good site to play around xpath:
https://scrapinghub.github.io/xpath-playground/

Creating a css selector to locate multiple ids in a single-shot

I've defined css selectors within the script to get the text within span elements and I'm getting them accordingly. However, the way I tried is definitely messy. I just seperated different css selectors using comma to let the script understand I'm after this or that.
If I opt for xpath I could have used 'div//span[.="Featured" or .="Sponsored"]' but in case of css selector I could not find anything similar to serve the same purpose. I know using 'span:contains("Featured"),span:contains("Sponsored")' I can get the text but there is the comma in between as usual.
What is the ideal way to locate the elements (within different ids) using css selectors except for comma?
My try so far with:
from lxml.html import fromstring
html = """
<div class="rest-list-information">
<a class="restaurant-header" href="/madison-wi/restaurants/pizza-hut">
Pizza Hut
</a>
<div id="featured other-dynamic-ids">
<span>Sponsored</span>
</div>
</div>
<div class="rest-list-information">
<a class="restaurant-header" href="/madison-wi/restaurants/salads-up">
Salads UP
</a>
<div id="other-dynamic-ids border">
<span>Featured</span>
</div>
</div>
"""
root = fromstring(html)
for item in root.cssselect("[id~='featured'] span,[id~='border'] span"):
print(item.text)
You can do:
.rest-list-information div span
But I think it's a bad idea to consider the comma messy. You won't find many stylesheets that don't have commas.
If you are just looking to get all 'span' text from the HTML then the following should suffice:
root_spans = root.xpath('//span')
for i, root_spans in enumerate(root_spans):
span_text = root_spans.xpath('.//text()')[0]
print(span_text)

Trouble finding element in Selenium with Python

I have been trying to collect a list of live channels/viewers on Youtube Gaming. I am using selenium with Python to force the website to scroll down the page so it loads more that 11 channels. For reference, this is the webpage I am working on.
I have found the location of the data I want, but I am struggling with getting selenium to go there. The part I am having trouble with looks like this:
<div class="style-scope ytg-gaming-video-renderer" id="video-metadata"><span class="title ellipsis-2 style-scope ytg-gaming-video-renderer"><ytg-nav-endpoint class="style-scope ytg-gaming-video-renderer x-scope ytg-nav-endpoint-2"><a href="/watch?v=FFKSD1HHrdA" tabindex="0" class="style-scope ytg-nav-endpoint" target="_blank">
Live met Bo3
</a></ytg-nav-endpoint></span>
<div class="channel-info small layout horizontal center style-scope ytg-gaming-video-renderer">
<ytg-owner-badges class="style-scope ytg-gaming-video-renderer x-scope ytg-owner-badges-0">
<template class="style-scope ytg-owner-badges" is="dom-repeat"></template>
</ytg-owner-badges>
<ytg-formatted-string class="style-scope ytg-gaming-video-renderer">
<ytg-nav-endpoint class="style-scope ytg-formatted-string x-scope ytg-nav-endpoint-2">Rico Eeman
</ytg-nav-endpoint>
</ytg-formatted-string>
</div><span class="ellipsis-1 small style-scope ytg-gaming-video-renderer" id="video-viewership-info" hidden=""></span>
<div id="metadata-badges" class="small style-scope ytg-gaming-video-renderer">
<ytg-live-badge-renderer class="style-scope ytg-gaming-video-renderer x-scope ytg-live-badge-renderer-1">
<template class="style-scope ytg-live-badge-renderer" is="dom-if"></template>
<span aria-label="" class="text layout horizontal center style-scope ytg-live-badge-renderer">4 watching</span>
<template class="style-scope ytg-live-badge-renderer" is="dom-if"></template>
</ytg-live-badge-renderer>
</div>
</div>
Currently, I am trying:
#This part works fine. I can use the unique ID
meta_data = driver.find_element_by_id('video-metadata')
#This part is also fine. Once again, it has an ID.
viewers = meta_data.find_element_by_id('metadata-badges')
print(viewers.text)
However, I am have been having trouble getting to the channel name (in this example 'Rico Eeman', and it is under the first nested div tag). Because its a compound class name, I cannot find the element by class name, and trying the following xpaths doesnt work:
name = meta_data.find_element_by_xpath('/div[#class="channel-info small layout horizontal center style-scope ytg-gaming-video-renderer"]/ytg-formatted-string'
name = meta_data.find_element_by_xpath('/div[1])
They both raise the element not found error. I am not really sure what to do here. Does anyone have a working solution?
The name id not in the <ytg-formatted-string> tag, its in one of it descendants. Try
meta_data.find_element_by_css_selector('.style-scope.ytg-formatted-string.x-scope.ytg-nav-endpoint-2 > a')
Or with xpath
meta_data.find_element_by_xpath('//ytg-nav-endpoint[#class="style-scope ytg-formatted-string x-scope ytg-nav-endpoint-2"]/a')
This will get all the names, even if your xpath worked using video-metadata would not get all the names, the id is repeated per div for each user so you would need find_elements and to iterate over the returned elements:
names = dr.find_elements_by_css_selector("a.style-scope.ytg-nav-endpoint[href^='/channel/']")
print([name.get_attribute("text") for name in names])
Which gives you:
['NinjaNation Gaming', 'DURX DANIEL', 'DEMON', 'Perfection', 'The one and only jd', 'Violator Games', 'KingLuii718', 'NinjaNation Gaming', 'DURX DANIEL', 'DEMON', 'Perfection']

Python 3.5 + Selenium Scrape. Is there anyway to select <a><a/> tags?

So I'm very new to python and selenium. I'm writting an scraper to take some balances and download a txt file. So far I've managed to grab the account balances but downloading the txt files have proven to be a difficult task.
This is a sample of the html
<td>
<div id="expoDato_msdd" class="dd noImprimible" style="width: 135px">
<div id="expoDato_title123" class="ddTitle">
<span id="expoDato_arrow" class="arrow" style="background-position: 0pt 0pt"></span>
<span id="expoDato_titletext" class="textTitle">Exportar Datos</span>
</div>
<div id="expoDato_child" class="ddChild" style="width: 133px; z-index: 50">
<a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=txt">txt</a>
<a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=pdf">PDF</a>
<a class="enabled" href="/CCOLEmpresasCartolaHistoricaWEB/exportarDatos.do;jsessionid=9817239879882871987129837882222R?tipoExportacion=excel">Excel</a>
<a class="modal" href="#info_formatos">Información Formatos</a>
</div>
</div>
I need to click on the fisrt "a" class=enabled. But i just can't manage to get there by xpath, class or whatever really. Here is the last thing i tried.
#Descarga de Archivos
ddmenu2 = driver.find_element_by_id("expoDato_child")
ddmenu2.find_element_by_css_selector("txt").click()
This is more of the stuff i've already tryed
#TXT = driver.select
#TXT.send_keys(Keys.RETURN)
#ddmenu2 = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]")
#Descarga = ddmenu2.find_element_by_visible_text("txt")
#Descarga.send_keys(Keys.RETURN)
Please i would apreciate your help.
Ps:English is not my native language, so i'm sorry for any confusion.
EDIT:
This was the approach that worked, I'll try your other suggetions to make a more neat code. Also it will only work if the mouse pointer is over the browser windows, it doesn't matter where.
ddmenu2a = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[1]").click()
ddmenu2b = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]")
ddmenu2c = driver.find_element_by_xpath("/html/body/div[1]/div[1]/div/div/form/table/tbody/tr[2]/td/div[2]/table/tbody/tr/td[4]/div/div[2]/a[1]").click()
Pretty much brute force, but im getting to like python scripting.
Or simply use CSS to match on the href:
driver.find_element_by_css_selector("div#expoDato_child a.enabled[href*='txt']")
You can get all anchor elements like this:
a_list = driver.find_elements_by_tag_name('a')
this will return a list of elements. you can click on each element:
for a in a_list:
a.click()
driver.back()
or try xpath for each anchor element:
a1 = driver.find_element_by_xpath('//a[#class="enabled"][1]')
a2 = driver.find_element_by_xpath('//a[#class="enabled"][2]')
a3 = driver.find_element_by_xpath('//a[#class="enabled"][3]')
Please let me know if this was helpful
you can directly reach the elements by xpath via text:
driver.find_element_by_xpath("//*[#id='expoDato_child' and contains(., 'txt')]").click()
driver.find_element_by_xpath("//*[#id='expoDato_child' and contains(., 'PDF')]").click()
...
If there is a public link for the page in question that would be helpful.
However, generally, I can think of two methods for this:
If you can discover the direct link you can extract the link text and use pythons' urllib and download the file directly.
or
Use use Seleniums' click function and have it click on the link in the page.
A quick search resulted thusly:
downloading-file-using-selenium

Webdriver - Locate Input via Label (Python)

How do I locate an input field via its label using webdriver?
I like to test a certain web form which unfortunately uses dynamically generated
ids, so they're unsuitable as identifiers.
Yet, the labels associated with each web element strike me as suitable.
Unfortunately I was not able to do it with the few suggestions
offered on the web. There is one thread here at SO, but which did not
yield an accepted answer:
Selenium WebDriver Java - Clicking on element by label not working on certain labels
To solve this problem in Java, it is commonly suggested to locate the label as an anchor via its text content and then specifying the xpath to the input element:
//label[contains(text(), 'TEXT_TO_FIND')]
I am not sure how to do this in python though.
My web element:
<div class="InputText">
<label for="INPUT">
<span>
LABEL TEXT
</span>
<span id="idd" class="Required" title="required">
*
</span>
</label>
<span class="Text">
<input id="INPUT" class="Text ColouredFocus" type="text" onchange="var wcall=wicketAjaxPost(';jsessionid= ... ;" maxlength="30" name="z1013400259" value=""></input>
</span>
<div class="RequiredLabel"> … </div>
<span> … </span>
</div>
Unfortunately I was not able to use CSS or XPATH expressions
on the site. IDs and names always changed.
The only solution to my problem I found was a dirty one - parsing
the source code of the page and extract the ids by string operations.
Certainly this is not the way webdriver was intended to be used, but
it works robustly.
Code:
lines = []
for line in driver.page_source.splitlines():
lines.append(line)
if 'LABEL TEXT 1' in line:
id_l1 = lines[-2].split('"', 2)[1]
You should start with a div and check that there is a label with an appropriate span inside, then get the input element from the span tag with class Text:
//div[#class='InputText' and contains(label/span, 'TEXT_TO_FIND')]/span[#class='Text']/input

Categories