Stale elements not getting resolved after waits, python - python

I am trying to go into every link on this page with the class of "course"
<a name="hrvatski-jezik" href="/pregled/predmet/29812177240/1971997880"><div class="course">
Hrvatski jezik <br>
<span class="course-info">Tamara Čer</span>
</div>
</a>
<a name="likovna-kultura" href="/pregled/predmet/29812176230/1971998890">
<div class="course">Likovna kultura <br>
<span class="course-info">Mia Marušić</span>
</div>
</a>
<a name="glazbena-kultura" href="/pregled/predmet/29812175220/1971999900">
<div class="course">Glazbena kultura <br>
<span class="course-info">Danijel Služek</span>
</div>
</a>
<a name="engleski-jezik" href="/pregled/predmet/29820696590/1972511970">
<div class="course">Engleski jezik <br>
<span class="course-info">Nevena Genčić</span>
</div>
</a>
<a name="matematika" href="/pregled/predmet/29812174210/1972000910">
<div class="course">Matematika <br>
<span class="course-info">Ivan Tomljanović</span>
</div></a>
<a name="biologija" href="/pregled/predmet/29812173200/1972001920">
<div class="course">Biologija <br>
<span class="course-info">Antonija Milić</span>
</div>
</a>
<a name="kemija" href="/pregled/predmet/29812172190/1972002930">
<div class="course">Kemija <br>
<span class="course-info">Antonija Milić</span>
</div>
</a>
<a name="fizika" href="/pregled/predmet/29812171180/1972003940">
<div class="course">Fizika <br>
<span class="course-info">Ivan Kunac</span>
</div>
</a>
<a name="povijest" href="/pregled/predmet/29812170170/1972004950">
<div class="course">Povijest <br>
<span class="course-info">Lovorka Krajnović Tot</span>
</div>
</a>
<a name="geografija" href="/pregled/predmet/29812169160/1972005960">
<div class="course">Geografija <br>
<span class="course-info">Sunčica Podolski <strong> (na zamjeni)</strong>, Oliver Timarac</span>
</div>
</a>
<a name="tehnicka-kultura" href="/pregled/predmet/29812168150/1972006970">
<div class="course">Tehnička kultura <br>
<span class="course-info">Ivan Dorotek</span>
</div>
</a>
<a name="tjelesna-i-zdravstvena-kultura" href="/pregled/predmet/29812167140/1972007980">
<div class="course">Tjelesna i zdravstvena kultura <br>
<span class="course-info">Davor Marković, Tomislav Ruskaj</span>
</div>
</a>
<a name="informatika" href="/pregled/predmet/29821462170/1972568530">
<div class="course">Informatika (izborni) <br>
<span class="course-info">Blaženka Knežević</span>
</div>
</a>
<a name="njemacki-jezik" href="/pregled/predmet/32658461270/1972646300"><div class="course">Njemački jezik (izborni) <br>
<span class="course-info">Zdravka Marković Boto</span>
</div>
</a>
<a name="rusinski-jezik-i-kultura" href="/pregled/predmet/32658491570/1972675590">
<div class="course">Rusinski jezik i kultura (izborni) <br>
<span class="course-info">Natalija Hnatko, Ilona Hrecešin</span>
</div>
</a>
<a name="sat-razrednika" href="/pregled/predmet/32322897860/2140793120">
<div class="course">Sat razrednika <br>
<span class="course-info">Blaženka Knežević</span>
</div>
</a>
<a name="izvannastavne-aktivnosti" href="/pregled/predmet/34285616720/2324344460">
<div class="course">Izvannastavne aktivnosti (izvannastavna aktivnost) <br>
<span class="course-info">Nevena Genčić, Ivan Kunac, Davor Marković, Josip Matezović, Antonija Milić, Tomislav Ruskaj, Danijel Služek</span>
</div>
</a>
`
I expect the code to go into every link, then go back and repeat.
It goes once into the try block and then 16 times into the except block.
For every except it gives StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
My code:
def get_subject():
subjects = driver.find_elements_by_xpath("//div[#class='course']")
for subject in subjects:
actions = ActionChains(driver)
actions.move_to_element(subject)
try:
actions.click()
actions.perform()
driver.back()
print("try")
time.sleep(3)
except Exception as e:
subjects = driver.find_elements_by_xpath("//div[#class='course']")
print("except")
print(e)
I know this is a very common problem. I tried implicit and explicit waits, I still got the same error.
I tried visibility_of_element_located, presence_of_element_located, staleness_of, I tried defining "subjects" again.
Help would be really appreciated, I've been searching for a solution for some time now.

I would suggest capture all links first and then iterate.
alllinks=[link.get_attribute('href') for link in driver.find_elements_by_css_selector("a[href^='/pregled/predmet']")]
for link in alllinks:
driver.get(link)
#Perform your operation
If you want to continue with your code just re-assigned the elements again.since you are using driver.back() your page is getting refreshed.
def get_subject():
subjects = driver.find_elements_by_xpath("//div[#class='course']")
for subject in range(len(subjects)):
subjects = driver.find_elements_by_xpath("//div[#class='course']")
actions = ActionChains(driver)
actions.move_to_element(subjects[subject])
try:
actions.click()
actions.perform()
driver.back()
print("try")
time.sleep(3)
except Exception as e:
subjects = driver.find_elements_by_xpath("//div[#class='course']")
print("except")
print(e)

actions.move_to_element(subject) is the culprit here.
all the element references in the subjects will be refreshed when you click on the subject in your try block, that's why you are getting the StateleElementException.
Try changing the code as shown below.
def get_subject():
subjects = len(driver.find_elements_by_xpath("//div[#class='course']"))
for counter in range(subjects):
subject = driver.find_elements_by_xpath("(//div[#class='course'])[" + str(counter) + "]")
actions = ActionChains(driver)
actions.move_to_element(subject)
try:
actions.click()
actions.perform()
driver.back()
print("try")
time.sleep(3)
except Exception as e:
print("except")
print(e)

driver.back() function doesn't guarantee to work. Instead, try using this driver.execute_script("window.history.go(-1)") and then re-assign the elements again inside your loop.

Related

Confirm preceding and following sibling in XPATH

I've got the below statement to check that 2 conditions exist in a
an element:
if len(driver.find_elements(By.XPATH, "//span[text()='$400.00']/../following-sibling::div/a[text()='Buy']")) > 0:
elem = driver.find_element(By.XPATH, "//span[text()='$400.00']/../following-sibling::div/a[text()='Buy']")
I've tried a few variations, including "preceding sibling::span[text()='x'", but can't seem to get the syntax correct or if I'm going about it the right way.
HTML is below. the current find_elements(By.XPATH...) correctly finds the "Total" and "Buy" class, I would like to add $20.00 in the "price" class as a condition also.
<ul>
<li class="List">
<div class="List-Content row">
<div class="Price">"$20.00"</div>
<div class="Quantity">10</div>
<div class="Change">0%</div>
<div class="Total">
<span>$400.00</span>
</div>
<div class="Buy">
<a class="Button">Buy</a>
</div>
</div>
</li>
</ul>
Using built in ElementTree
import xml.etree.ElementTree as ET
html = '''<li class="List">
<div class="List-Content row">
<div class="Price">"$20.00"</div>
<div class="Quantity">10</div>
<div class="Change">0%</div>
<div class="Total"><span>$400.00</span></div>
<div class="Buy"><a class="Button">Buy</a></div>
</div>
<div class="List-Content row">
<div class="Price">"$27.00"</div>
<div class="Quantity">10</div>
<div class="Change">0%</div>
<div class="Total"><span>$400.00</span></div>
<div class="Buy"><a class="Button">Buy</a></div>
</div>
</li>'''
items = {'Total':'$400.00','Buy':'Buy','Price':'"$20.00"'}
root = ET.fromstring(html)
first_level_divs = root.findall('div')
for first_level_div in first_level_divs:
results = {}
for k,v in items.items():
div = first_level_div.find(f'.div[#class="{k}"]')
one_level_down = len(list(div)) > 0
results[k] = list(div)[0].text if one_level_down else div.text
if results == items:
print('found')
else:
print('not found')
results = {}
output
found
not found
Given this HTML snippet
<ul>
<li class="List">
<div class="List-Content row">
<div class="Price">"$20.00"</div>
<div class="Quantity">10</div>
<div class="Change">0%</div>
<div class="Total"><span>$400.00</span></div>
<div class="Buy"><a class="Button">Buy</a></div>
</div>
</li>
</ul>
I would use this XPath:
buy_buttons = driver.find_elements(By.XPATH, """//div[
contains(#class, 'List-Content')
and div[#class = 'Price'] = '$20.00'
and div[#class = 'Total'] = '$400.00'
]//a[. = 'Buy']""")
for buy_button in buy_buttons:
print(buy_button)
The for loop replaces your if len(buy_buttons) > 0 check. It won't run when there are no results, so the if is superfluous.

detecting the presence of text with BeautifulSoup

i trying check the presence of text on certain page(if you send precendently the text will appear in this zone otherwize it's blank).
html= urlopen(single_link)
parsed= BeautifulSoup.BeautifulSoup(html,'html.parser')
lastmessages = parsed.find('div',attrs={'id':'message-box'})
if lastmessages :
print('Already Reached')
else:
print('you can write your message')
<div class="lastMessage">
<div class="mine messages">
<div class="message last" id="msg-snd-1601248710299" dir="auto">
Hello, and how are you ?
</div>
<div style="clear : both ;"></div>
<div class="msg-status" id="msg-status-1601248710299" dir="rtl">
<div class="send-state">
Last message :
<span class="r2">before 7:35 </span>
</div>
<div class="read-state">
<span style="color : gray ;"> – </span>
Reading :
<span class="r2">Not yet</span>
</div>
</div>
<div style="clear : both ;"></div>
</div>
</div>
my problem is i can't know how to find if the text "Hello, and how are you ?" exist or not ???
Simple solution
import bs4
parsed= bs4.BeautifulSoup(html,'html.parser')
lastmessages = parsed.find('div', class_='message last')
if lastmessages :
print(f'{lastmessages.text.strip()}')
else:
print('No message')

Web scraping with Selenium and Xpath

I’m new in Xpath. I’m trying to scrape a stock website to get name and value of each element.
In my python selenium script I’ve locally extracted the main part of the web page in html_content, as follows.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
dirinstall="C:\\Program Files (x86)\\www\mm\\"
chrome_driver = dirinstall+"\\Webdriver\\chromedriver.exe"
options = Options()
driver = webdriver.Chrome(chrome_driver, options=options)
html_content = """
<html class="ng-scope">
<head data-meta-tags="">
<title> Stock NYSE </title>
<ui-layout class="ng-isolate-scope">
<div data-ng-include="" src="layoutCtrl.template" class="ng-scope">
<app-root class="ng-scope" _nghost-rqp-c0="" ng-version="8.2.14"></app-root>
<div ng-class="{'demo-mode': $root.session.user.portfolio.account.type === 'Demo' }" class="ng-scope">
<div ng-view="" ng-class="layoutCtrl.isBannerShown ? 'banner-shown' : ''" class="main-app-view ng-scope" role="main">
<et-discovery-markets-results class="ng-scope" _nghost-rqp-c42="" ng-version="8.2.14">
<div _ngcontent-rqp-c42="" class="discover main-content no-footer" ui-fun-scroll="{'class': 'minimize', 'classEl': '.user-head-wrapper, .table-discover', 'scrollContainer': '.table-discover', 'setClassAtScroll': 200 }">
<div _ngcontent-rqp-c42="" automation-id="discover-market-results-wrapp" class="table-discover markets-table">
<et-discovery-markets-results-list _ngcontent-rqp-c42="" automation-id="discover-market-results-sub-view-list" _nghost-rqp-c44="" class="ng-star-inserted">
<div _ngcontent-rqp-c44="" class="market-list list-view" data-etoro-locale-ns="discoverMarketResultsList">
<et-instrument-mobile-row _ngcontent-rqp-c44="" automation-id="discover-market-results-row" _nghost-rqp-c18="" class="ng-star-inserted">
<et-instrument-trading-mobile-row _ngcontent-rqp-c18="" automation-id="watchlist-grid-instruments-list" _nghost-rqp-c47="" class="ng-star-inserted">
<div _ngcontent-rqp-c47="" class="row-wrap">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-list-wrapp-instrument" class="instrument-cell name-cell">
<div _ngcontent-rqp-c47="" class="avatar-img-wrap"> </div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-wrapp-instrument-info" class="avatar-info">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-name" class="symbol">A</div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-full-name" class="name positive"> 0.68 (0.90%) </div>
</div>
</div>
<et-buy-sell-buttons _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-buy-sell-container" class="instrument-cell buy-sell-buttons" _nghost-rqp-c24="">
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label positive-change" automation-id="buy-sell-button-container-sell">
<div _ngcontent-rqp-c27="" class="trade-button-title">S</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">75.<span class="after-decimal">85</span></div>
</div>
</et-buy-sell-button>
<div _ngcontent-rqp-c24="" class="space-gap"></div>
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label negative-change" automation-id="buy-sell-button-container-buy">
<div _ngcontent-rqp-c27="" class="trade-button-title">B</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">76.<span class="after-decimal">03</span></div>
</div>
</et-buy-sell-button>
</et-buy-sell-buttons>
</div>
<et-trade-item-card-action _ngcontent-rqp-c18="" _nghost-rqp-c15="">
</et-trade-item-card-action>
</et-instrument-trading-mobile-row>
</et-instrument-mobile-row>
<et-instrument-mobile-row _ngcontent-rqp-c44="" automation-id="discover-market-results-row" _nghost-rqp-c18="" class="ng-star-inserted">
<et-instrument-trading-mobile-row _ngcontent-rqp-c18="" automation-id="watchlist-grid-instruments-list" _nghost-rqp-c47="" class="ng-star-inserted">
<div _ngcontent-rqp-c47="" class="row-wrap">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-list-wrapp-instrument" class="instrument-cell name-cell">
<div _ngcontent-rqp-c47="" class="avatar-img-wrap"> </div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-wrapp-instrument-info" class="avatar-info">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-name" class="symbol">AA</div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-full-name" class="name negative"> -0.11 (-1.46%) </div>
</div>
</div>
<et-buy-sell-buttons _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-buy-sell-container" class="instrument-cell buy-sell-buttons" _nghost-rqp-c24="">
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label negative-change" automation-id="buy-sell-button-container-sell">
<div _ngcontent-rqp-c27="" class="trade-button-title">S</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">7.<span class="after-decimal">44</span></div>
</div>
</et-buy-sell-button>
<div _ngcontent-rqp-c24="" class="space-gap"></div>
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label negative-change" automation-id="buy-sell-button-container-buy">
<div _ngcontent-rqp-c27="" class="trade-button-title">B</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">7.<span class="after-decimal">47</span></div>
</div>
</et-buy-sell-button>
</et-buy-sell-buttons>
</div>
<et-trade-item-card-action _ngcontent-rqp-c18="" _nghost-rqp-c15="">
</et-trade-item-card-action>
</et-instrument-trading-mobile-row>
</et-instrument-mobile-row>
</div>
</et-discovery-markets-results-list>
</div>
</div>
</et-discovery-markets-results>
</div>
</div>
</div>
</ui-layout>
</body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
#results = driver.find_elements_by_xpath("//*[#class='ng-star-inserted']")
results = driver.find_elements_by_xpath("//*[et-instrument-mobile-row and #class='ng-star-inserted']")
print('Number of results', len(results))
I don’t know why if I search ‘et-instrument-mobile-row’ I get only 1 element instead of 2, and if I search both ‘et-instrument-mobile-row’ and 'ng-star-inserted' I get 0 elements.
Looking at the example my goal is to get the symbol and current value of buy/sell (price and after-decimal).
Something like:
[A, 75.85, 76.03]
[AA, 7.44, 7.47]
Could anyone help me? Thanks!
It looks like you may have some malformed HTML and Selenium is unsure how to parse it. I noticed this line:
<div _ngcontent-rqp-c47="" class="avatar-img-wrap"><img _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-avatar" class="avatar-img" src="https://etoro-cdn.etorostatic.com/market-avatars/a/150x150.png" alt="Agilent Technologies Inc">
This <img> tag is unclosed. You can see that the syntax highlighting also gets confused here too.
Otherwise, the XPath you are searching by looks generally well formed.
Edit: Looked at it closer. Your attribute name should be where the * is.
Here is your XPath:
"//et-instrument-mobile-row[#class='ng-star-inserted']"
Edit 2: Asker had additional question about how to search within what they found with the XPath above.
To find more elements within these elements here, looking at the documentation, each Selenium WebElement provides its own find_element method. You can then use those to further search within those elements we just found (be sure to use .// here in your XPaths, as you only want to traverse that specific element's content - other find_elements don't have this caveat).
Once you have identified the elements containing the symbols and prices, you can use simply reference the text attribute on those elements. Let's look at a simpler example:
<div class="a">
<div class="b" id="1">B</div>
<div class="c" id="2">2</div>
<div class="d" id="3">22</div>
</div>
Suppose we have already found the root div here and stored it in a variable named element. Then:
symbol = element.find_element_by_xpath(".//*[#class='b']").text
integral = element.find_element_by_xpath(".//*[#class='c']").text
fractional = element.find_element_by_xpath(".//*[#class='d']").text
Generally, if you can search by something other than XPath, though, it's easier for everyone involved. Here is a more typical way you could accomplish this with class names:
symbol = element.find_element_by_class_name("b").text
integral = element.find_element_by_class_name("c").text
fractional = element.find_element_by_class_name("d").text
Edit 3: Note from author
After the precious help of #firstbass I went in deep to get symbol and different prices for sell/buy as follows:
for element in results:
symbol = element.find_element_by_xpath(".//*[#class='symbol']").text
print(str(symbol))
sell = element.find_element_by_xpath(".//et-buy-sell-buttons//et-buy-sell-button//div[#automation-id='buy-sell-button-container-sell']")
sell_integral = sell.find_element_by_xpath(".//*[#class='price']").text
sell_fractional = sell.find_element_by_xpath(".//*[#class='after-decimal']").text
print(str(sell_integral)+':'+str(sell_fractional))
buy = element.find_element_by_xpath(".//et-buy-sell-buttons//et-buy-sell-button//div[#automation-id='buy-sell-button-container-buy']")
buy_integral = buy.find_element_by_xpath(".//*[#class='price']").text
buy_fractional = buy.find_element_by_xpath(".//*[#class='after-decimal']").text
print(str(buy_integral)+':'+str(buy_fractional))

How to extract image/href url from div class using scrapy

I having hard time to extract href url from given website code
<div class="expando expando-uninitialized" style="display: none" data-cachedhtml=" <div class="media-preview" id="media-preview-66hch1" style="max-width: 534px"> <div class="media-preview-content"> <a href="https://i.redd.it/nctvpvsnbpsy.jpg" class="may-blank"> <img class="preview" src="https://i.redditmedia.com/UELqh-mbh5mwnXr67PoBbi23nwZuNl2v3flNbkmewQE.jpg?w=534&amp;s=1426be7f811e5d5043760f8882674070" width="534" height="768"> </a> </div> </div> " data-pin-condition="function() {return this.style.display != 'none';}"><span class="error">loading...</span></div>
Probably, you can use regular expressions for this. Here is example:
s = """<div class="expando expando-uninitialized" style="display: none" data-cachedhtml=" <div class="media-preview" id="media-preview-66hch1" style="max-width: 534px"> <div class="media-preview-content"> <a href="https://i.redd.it/nctvpvsnbpsy.jpg" class="may-blank"> <img class="preview" src="https://i.redditmedia.com/UELqh-mbh5mwnXr67PoBbi23nwZuNl2v3flNbkmewQE.jpg?w=534&amp;s=1426be7f811e5d5043760f8882674070" width="534" height="768"> </a> </div> </div> " data-pin-condition="function() {return this.style.display != 'none';}"><span class="error">loading...</span></div>"""
re.search('href="(.*jpg)&quot', s).groups()[0]
# 'https://i.redd.it/nctvpvsnbpsy.jpg'

Extracting the right elements by text and span / Beautiful Soup / Python

Im trying to scrape following data:
Cuisine: 4.5
Service: 4.0
Quality: 4.5
But im having issues to scrape the right data. I tried following two Codes:
for bewertungen in soup.find_all('div', {'class' : 'histogramCommon bubbleHistogram wrap'}):
if bewertungen.find(text='Cuisine'):
cuisine = bewertungen.find(text='Cuisine')
cuisine = cuisine.next_element
print("test " + str(cuisine))
if bewertungen.find_all(text='Service'):
for s_bewertung in bewertungen.find_all('span', {'class':'ui_bubble_rating'}):
s_speicher = s_bewertung['alt']
In the first if i get no result. In the second If i get the right elements but i get all 3 results but i can not define which ones belongs to which text (Cuisine, Service, Quality)
Can someone give me an advice how to get the right data?
I put at the bottom the html code.
<div class="histogramCommon bubbleHistogram wrap">
<div class="colTitle">\nGesamtwertung\n</div>
<ul class="barChart">
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Cuisine</span>
</div>
<div class="wrap row part ">
<span alt="4.5 of five" class="ui_bubble_rating bubble_45"></span>
</div>
</div>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Service</span>
</div>
<div class="wrap row part ">
<span alt="4.0 of five" class="ui_bubble_rating bubble_40"></span>
</div>
</div>
</li>
<li>
<div class="ratingRow wrap">
<div class="label part ">
<span class="text">Quality</span>
</div>
<div class="wrap row part "><span alt="4.5 of five" class="ui_bubble_rating bubble_45"></span></div>
</div>
</li>
</ul>
</div>
Try this. According to the snippet you have pasted above, the following code should work:
from bs4 import BeautifulSoup
soup = BeautifulSoup(content,"lxml")
for item in soup.select(".ratingRow"):
category = item.select_one(".text").text
rating = item.select_one(".row span")['alt'].split(" ")[0]
print("{} : {}".format(category,rating))
Another way would be:
for item in soup.select(".ratingRow"):
category = item.select_one(".text").text
rating = item.select_one(".text").find_parent().find_next_sibling().select_one("span")['alt'].split(" ")[0]
print("{} : {}".format(category,rating))
Output:
Cuisine : 4.5
Service : 4.0
Quality : 4.5

Categories