python selenium data-style-name - python

So there's a bit of html that looks like this
<a class="" data-style-name="Black" data-style-id="16360" "true" data-description="null"<img width="32" height="32"
and I was wondering if I could get the text "Black" out of it and than click it, but there's no class name too loop through and the xpath doesn't return anything

data-style-name is called an attribute of your a element and "Black" is its value.
Here is a way to access attribute's value with selenium & python:
elements = driver.find_elements_by_xpath("//a[#data-style-name]")
for element in elements:
print element.get_attribute("data-style-name")
If you want to select only elements with attribute data-style-name with value "Black":
driver.find_elements_by_xpath("//a[#data-style-name=Black]")
More about xpath: https://www.w3.org/TR/xpath/#section-Introduction

Have you try on find_element_by_xpath()?
a_check = browser.find_element_by_xpath("/html/body/a[#data-style-name='Black']")
Which returns:
<selenium.webdriver.remote.webelement.WebElement (session="6c94ac24e0ec3a3320ec21b24055f4fa", element="0.1043557711542944-1")>

Related

Xpath to locate element using text() and #class as conditions

I am trying to automate adding items to cart in online shop, however, I got stuck on a loop that should differentiate whether item is available or not.
Here's the loop:
while True:
#if ???
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='" + size.get() + "']"))).click()
sleep(1)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[text()='Add to cart']"))).click()
sleep(1)
print("Success!")
break
else:
driver.refresh()
sleep(3)
If the size is available, button is active:
<div class="styles__ArticleSizeItemWrapper-sc-dt4c4z-4 eQqdpu">
<button aria-checked="false" role="radio" class="styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs">
<span class="styles__StyledText-sc-cia9rt-0 styles__StyledText-sc-1n1fwgw-2 styles__ArticleSizeItemTitle-sc-1n1fwgw-3 gnSCRf cLhSqA bipwfD">XL</span>
<span class="styles__StyledText-sc-cia9rt-0 ffGzxX">
</span>
</button>
</div>
If not, button is inactive:
<div class="styles__ArticleSizeItemWrapper-sc-dt4c4z-4 eQqdpu">
<button disabled="" aria-checked="false" role="radio" class="styles__ArticleSizeButton-sc-1n1fwgw-0 fBeTLI">
<span class="styles__StyledText-sc-cia9rt-0 styles__StyledText-sc-1n1fwgw-2 styles__ArticleSizeItemTitle-sc-1n1fwgw-3 gnSCRf cLhSqA bipwfD">XXL</span>
<span class="styles__StyledText-sc-cia9rt-0 styles__StyledText-sc-1n1fwgw-2 kQJTJc cLhSqA">
</span>
</button>
</div>
The question is: what should be the condition for this loop?
I have tried something like this:
if (driver.find_elements(By.XPATH, "//*[contains(#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and text()='" + e2.get() + "']")):
EDIT: Replaced "=" with "," in the above code as follows:
if (driver.find_elements(By.XPATH, "//*[contains(#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and text()='" + e2.get() + "']")):
but I keep getting invalid xpath expression error.
EDIT: The error is gone, but the browser keeps refreshing with the else statement (element not found).
I believe your error is in the use of the contains function, which expects two parameters: a string and a substring, although you're passing it a boolean expression (#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs').
I expect this is just a typo and you actually meant to type contains(#class, 'styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') (NB comma instead of an equals sign after #class).
Also, you are looking for a button element which has a child text node (text() refers to a text node) which is equal to the size you're looking for, but that text node is actually a child of a span which is a child of the button. You can compare your size to the text value of that span.
Try something like this:
"//*[contains(#class='styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and span='"
+ e2.get()
+ "']"
e3="Some value"
x=f"//button[contains(#class,'styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs') and not(contains(#disabled='')) and ./span[contains(text(),'{e3}')]])]"
print(x)
Try looking for the button which contains that class and with that span and maybe check if button disabled?
I managed to get it working using this condition:
if (driver.find_elements(By.XPATH,
"//*[contains(#class, 'styles__ArticleSizeButton-sc-1n1fwgw-0 jIVZOs')
and .//*[text()='" + e2.get() + "']]")):
It is quite similar to the original approach, however, adding .//* before text() did the trick.
Without .//* find_elements was looking in the same node which resulted in not finding the element. .//* instructs find_elements to look in the child node where element exists.
Important: text condition was wrapped in additional [] brackets.

Fetch href from class - selenium python

Tried extracting the href from:
<a lang="en" class="new class" href="/abc/stack.com"
tabindex="-1" data-type="itemTitles"><span><mark>Scott</mark>, CC042<br></span></a>
using elems = driver.find_elements_by_css_selector(".new class [href]") , but doesn't seem to work.
Also tried Python Selenium - get href value, but it returned an empty list.
So I want to extract all the href elements of class = "new class" as mentioned above and append them in a list
Thanks!!
Use .get_attribute('href').
by_css_selector:
elems = driver.find_elements_by_css_selector('.new.class')
for elem in elems:
print(elem.get_attribute('href'))
Or by_xpath:
elems = driver.find_elements_by_xpath('//a[#class="new class"]')
Just change it to
elems = driver.find_elements_by_css_selector(".new.class[href]")
OR
elems = driver.find_elements_by_css_selector("[class='new class'][href]")

(Beautiful Soup) Get data inside a button tag

I try to scrape out an ImageId inside a button tag, want to have the result:
"25511e1fd64e99acd991a22d6c2d6b6c".
When I try:
drawing_url = drawing_url.find_all('button', class_='inspectBut')['onclick']
it doesn't work. Giving an error-
TypeError: list indices must be integers or slices, not str
Input =
for article in soup.find_all('div', class_='dojoxGridRow'):
drawing_url = article.find('td', class_='dojoxGridCell', idx='3')
drawing_url = drawing_url.find_all('button', class_='inspectBut')
if drawing_url:
for e in drawing_url:
print(e)
Output =
<button class="inspectBut" href="#"
onclick="window.open('getImg?imageId=25511e1fd64e99acd991a22d6c2d6b6c&
timestamp=1552011572288','_blank', 'toolbar=0,
menubar=0, modal=yes, scrollbars=1, resizable=1,
height='+$(window).height()+', width='+$(window).width())"
title="Open Image" type="button">
</button>
...
...
Try this one.
import re
#for all the buttons
btn_onlclick_list = [a.get('onclick') for a in soup.find_all('button')]
for click in btn_onlclick_list:
a = re.findall("imageId=(\w+)", click)[0]
print(a)
You first need to check whether the attribute is present or not.
tag.attrs returns a list of attributes present in the current tag
Consider the following Code.
Code:
from bs4 import BeautifulSoup
a="""
<td>
<button class='hi' onclick="This Data">
<button class='hi' onclick="This Second">
</td>"""
soup = BeautifulSoup(a,'lxml')
print([btn['onclick'] for btn in soup.find_all('button',class_='hi') if 'onclick' in btn.attrs])
Output:
['This Data','This Second']
or you can simply do this
[btn['onclick'] for btn in soup.find_all('button', attrs={'class' : 'hi', 'onclick' : True})]
You should be searching for
button_list = soup.find_all('button', {'class': 'inspectBut'})
That will give you the button array and you can later get url field by
[button['getimg?imageid'] for button in button_list]
You will still need to do some parsing, but I hope this can get you on the right track.
Your mistake here was that you need to search correct property class and look for correct html tag, which is, ironically, getimg?imageid.

Selenium: Use Multiple Strings in find_element_by_partial_link_text()

I want to click a link that contains either the partial strings foo OR bar in the link text.
Something like:
elem = driver.find_element_by_partial_link_text(['foo','bar']).click()
or if it was using a str.contains("foo|bar") style:
elem = driver.find_element_by_partial_link_text('foo|bar']).click()
Whats the right way to do this?
EDIT
Example HTML:
<a class="noprint" href="/Docs/Doc?request=62391270&eCode=0XrIMF9p%2BMKSvdpdpqC5Nd3VFn4fB1eLXC3X0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">foo</a>
Or one with bar
<a class="noprint" href="/DocView/Doc?request=62391270&eCode=CWJ1stkSu3qFZ1coGTEsM8ka4xqU0XrIMF9p%2BfB1eLXC3wh4xPFQnYwOqX0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">bar</a>
EDIT2 The final working code was:
elem = driver.find_element_by_xpath("//a[contains(text(), 'foo') or contains(text(), 'bar')]").click()
If you don't want to create N conditional lines to do it, you can just use or operator or using xpath.
Example: //a[contains(text(), 'aaaaa') or contains(text(), 'bbbbb')]
I can get around it in the short term by doing this but it seems like a crappy solution:
try:
elem = driver.find_element_by_partial_link_text('foo').click()
except:
elem = driver.find_element_by_partial_link_text('bar').click()
EDIT
Example HTML:
<a class="noprint" href="/Docs/Doc?request=62391270&eCode=0XrIMF9p%2BMKSvdpdpqC5Nd3VFn4fB1eLXC3X0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">foo</a>
Or one with bar
<a class="noprint" href="/DocView/Doc?request=62391270&eCode=CWJ1stkSu3qFZ1coGTEsM8ka4xqU0XrIMF9p%2BfB1eLXC3wh4xPFQnYwOqX0yHiYptOxprT0N%2BtjAu0%3D" target="_blank" type="submit">bar</a>

Python print scraped data with beautifulsoup without tags

<div class="number" title="Player number">1211</div>
<div class="shirt" title="sName">Ronaldo 1211</div>
I'm scraping a website. I've managed to print out the . Here is my code:
web = urllib2.urlopen("WEBSITE")
soupit = BeautifulSoup(web, 'html.parser')
scrapeme = soupit.findAll("div", { "class" : "number" })
print scrapeme
prints out :
<div class="id" title="Player number">1211</div>
I want it to print just the 1211. How can I do it?
The get_ text() method of any beautifulsoup object does exactly that.
print(scrapeme.get_text())
Once you have your list of elements, scrapeme, you can loop through each element in the list and print it's text attribute using:
for element in scrapeme:
print(element.text)
Since in your example the scrape only generates a list scrapeme containing one element, the output in this case will just be:
1211

Categories