I have tried probably every kind of selector and am unable to output this selector as text.
Id, css selector, xpath, all return no result, but when using the same reference in Scrapy shell the desired output is returned.
Any Idea why the Selenium selector does not work?
I am trying to return the text in masterBody_trSalesDate
発売予定日 : 7月(2021/4/21予約開始)
https://www.example.co.jp/10777687
try:
hatsubai = driver.find_element_by_id('#masterBody_trSalesDate').text
I have honestly tried every possible combination elements and selectors I can think of with no luck, but as mentioned Scrapy shell DOES return the correct data so I am not sure what is going wrong.
Is there any way to test Selenium selectors like scrapy shell without running the script?
Thank you if you have any advice.
image shows working in scrapy shell
When you use by_id or by_xpath then you don't need char #
hatsubai = driver.find_element_by_id('masterBody_trSalesDate').text
That's all.
Minimal working code which works for me
from selenium import webdriver
url = 'https://www.1999.co.jp/10777687'
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get(url)
hatsubai = driver.find_element_by_id('masterBody_trSalesDate').text
print(hatsubai)
hatsubai = driver.find_element_by_xpath('//*[#id="masterBody_trSalesDate"]').text
print(hatsubai)
hatsubai = driver.find_element_by_css_selector('#masterBody_trSalesDate').text
print(hatsubai)
BTW:
The same is with by_class_name - it needs only name without dot .
You need to use css selector for this one:
hatsubai = driver.find_element_by_css_selector('#masterBody_trSalesDate').text
print(hatsubai)
Output:
発売予定日 : 7月(2021/4/21予約開始)
I am looking at a tag :
.
When I write a code,
message = soup.find("div", {"class": "text-msg-container"})
it gave me none. What are _ngcontent-vex-c62 and data-e2e-text-message-content tags? Do I need to include them too? How should I write them to get the div tag?
You can't because the div isn't there when you send a GET request to get the page code.
That page is built using Angular framework which produce SPA(Single Page Application) which means you can't scrape data from it when you send a GET request because the data isn't there.
The data is being generated by Javascript code which needs to run first to add the required data to the webpage.
You need to use another way that allows Javascript code to run first then you try to get the data you want.
If you want to find class text-msg-container, try Selenium. It will find any locator easily.
import unittest
from selenium import webdriver
class PythonSearch(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
def test_search(self):
driver = self.driver
driver.get("http://www.yoursite.com")
elem = driver.find_element_by_css_selector(".text-msg-container")
def tearDown(self):
self.driver.close()
if __name__ == "__main__":
unittest.main()
Use driver = webdriver.Chrome('/path/to/chromedriver') if you are testing Chrome. Look here for more info https://chromedriver.chromium.org/getting-started .
Getting started for Selenium https://selenium-python.readthedocs.io/getting-started.html#simple-usage
try this please
message = soup.find("div", _class="text-msg-container")
i hope that works
from selenium import webdriver
path = "C:/chromedriver.exe" ### path to downloaded chromedriver on your
#pc change this directory or put the same location C:
driver = webdriver.Chrome(path) ## your browser change it if you are not using chrome
driver.get("website link")
out = driver.find_element_by_class_name("text-msg-container")
print(out.text)
When I use below code, the website doesn't go on to the next page.
import unittest
from selenium import webdriver
import time
class ProductPurchase(unittest.TestCase):
"""
Purchase the product on the website http://automationpractice.com/index.php
"""
# Preconditions
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.get("http://automationpractice.com/index.php")
self.driver.maximize_window()
def teardown(self):
self.driver.quit()
# Buying a product on the website
def test_wrong_agreement(self):
driver = self.webdriver
time.sleep(2)
#Click on "Quick view"
quickview_btn = driver.find_element_by_xpath("/html/body/div/div[2]/div/div[2]/div/div[1]/ul[1]/li[1]/div/div[1]/div/a[2]").click()
if __name__ == '__main__':
unittest.main(verbosity=2)
It should go on to the next page but xPath doesn't work.
Hello and good luck on learning test automation.
The first thing I do when an xpath does not work is to check it, usually using an extension like this one to make sure it is correct. Additionally, it is better to use a shorter xpath so there is less room for mistake i.e. "//img[#title='Printed Dress']"
Try the following xpath and use javaScript executor to click on Quick View
This code will click 1st element on the page.If you wish to click all Quick Viw button sequentially you need to write logic further.
driver.get("http://automationpractice.com/index.php")
driver.maximize_window()
ele_Quikview=driver.find_element_by_xpath('(//a[#class="quick-view"])[1]')
driver.execute_script("arguments[0].click();",ele_Quikview)
I try to understand where is the problem in code:
class WebTest(unittest.TestCase):
#classmethod
def setUpClass(cls):
binary = FirefoxBinary('/home/andrew/Downloads/firefox 45/firefox')
cls.browser = webdriver.Firefox(firefox_binary=binary)
cls.wait = WebDriverWait(cls.browser, 10)
cls.browser.maximize_window()
cls.browser.get('http://www.test.com/')
def test_login_menu_elements(self):
self.wait.until(EC.element_to_be_clickable((By.XPATH, "//a[#id='menu_min']"))).click()
check_icons(self)
self.wait.until(EC.element_to_be_clickable((By.XPATH, "//a[#id='menu_min']"))).click()
check_fields(self)
def test_add_news(self):
self.wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(.,'News')]"))).click()
self.wait.until(EC.element_to_be_clickable((By.XPATH, "//a[#href='/manager/news']"))).click()
#classmethod
def tearDownClass(cls):
cls.browser.quit()
if __name__=='__main__':
unittest.main()
Every time I receive TimeoutException, and I really don't understand why, and where is the problem in the code
A TimeoutException can be received without having any logical or syntantic errors with your code.
TimeoutExceptions will be raised when the wait.until expected conditions aren't found.
Some things I have found to help:
Isolate the xpath by using chrome/firefox dev tools and right clicking on the element, and show xpath
Using the xpath from the step above, make sure that the condition chose is correct
ime having front end experience, using css selectors is usually more intuative and more understandable than relative xpaths.
check the selector you are using by opening up dev tools console and using $x({{ XPATH_HERE }}) to make sure it is valid
for dynamic HTML use python debugger and make sure that html is in the expected state between each expected condition
This question has been asked over and over again - and in-spite of trying all the hacks I still can't seem to figure out what's wrong.
I tried increasing the implicitly_wait to 30 (and even increased it upto 100) - yet it did not work.
Use case -: I am trying to create a list that wil populate all the items in the page here, as a base case - and I intend to bind this to a mini-module that I already have with scrapy which has all (pages with similar web elements) crawled links - so essentially will be building the whole pipeline, post I am done with this.
###My source code - generated via Selenium IDE, exported to a Python webdriver and manipulated a little later ###
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.wait import WebDriverWait
import unittest, time, re
class Einstein(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.implicitly_wait(30)
self.base_url = "http://shopap.lenovo.com/in/en/laptops/"
self.verificationErrors = []
self.accept_next_alert = True
def test_einstein(self):
driver = self.driver
driver.get(self.base_url)
print driver.title
driver.find_element_by_link_text("T430").click()
print driver.title
# driver.find_element_by_xpath("id('facetedBrowseWrapper')/div/div/div[1]/div[2]/ul[1]/li[1]/a").click()
driver.find_element_by_xpath("//div[#id='subseries']/div[2]/div/p[3]/a").click()
print driver.title
# driver.find_element_by_xpath("//div[#id='subseries']/div[2]/div/p[3]/a").click()
try: self.assertEqual("Thinkpad Edge E530 (Black)", driver.find_element_by_link_text("Thinkpad Edge E530 (Black)").text)
except AssertionError as e: self.verificationErrors.append(str(e))
# Everything ok till here
#**THE CODE FAILS HERE**#
laptop1 = driver.find_element_by_link_text("Thinkpad Edge E530 (Black)").text
print laptop1
price1 = driver.find_element_by_css_selector("span.price").text
print price1
detail1 = self.is_element_present(By.CSS_SELECTOR, "div.desc.std")
print detail1
def is_element_present(self, how, what):
try: self.driver.find_element(by=how, value=what)
except NoSuchElementException, e: return False
return True
def is_alert_present(self):
try: self.driver.switch_to_alert()
except NoAlertPresentException, e: return False
return True
def close_alert_and_get_its_text(self):
try:
alert = self.driver.switch_to_alert()
alert_text = alert.text
if self.accept_next_alert:
alert.accept()
else:
alert.dismiss()
return alert_text
finally: self.accept_next_alert = True
def tearDown(self):
self.driver.quit()
self.assertEqual([], self.verificationErrors)
if __name__ == "__main__":
unittest.main()
Errors & output :
ekta#ekta-VirtualBox:~$ python einstein.py
Laptops & Ultrabooks | Lenovo (IN)
ThinkPad T430 Laptop PC for Business Computing | Lenovo (IN)
Buy Lenovo Thinkpad Laptops | Lenovo Thinkpad Laptops Price India
E
======================================================================
ERROR: test_einstein (__main__.Einstein)
----------------------------------------------------------------------
Traceback (most recent call last):
File "einstein.py", line 27, in test_einstein
try: self.assertEqual("Thinkpad Edge E530 (Black)", driver.find_element_by_link_text("Thinkpad Edge E530 (Black)").text)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 246, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 680, in find_element
{'using': by, 'value': value})['value']
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 165, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 158, in check_response
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: u'Unable to locate element: {"method":"link text","selector":"Thinkpad Edge E530 (Black)"}' ; Stacktrace:
at FirefoxDriver.prototype.findElementInternal_ (file:///tmp/tmphli5Jg/extensions/fxdriver#googlecode.com/components/driver_component.js:8444)
at fxdriver.Timer.prototype.setTimeout/<.notify (file:///tmp/tmphli5Jg/extensions/fxdriver#googlecode.com/components/driver_component.js:386)
----------------------------------------------------------------------
Ran 1 test in 79.348s
FAILED (errors=1)
Questions & comments:
If you are answering this question - please mention why this specific "find_element_by_link_text" does not work.
(Very Basic) In the GUI of my selenium IDE -> Show all available commands - why dont I see the css (find_element_by_css_selector) for all the web elements - is there a way to force feed an element to be read as a CSS selector ?
In case you suggest using some other locator - please mention if that will be consistent way to fetch elements, given (1)
Does assert work to capture the exceptions and "move on" - since even after trying "verify" , "assert" loops, I still cant fetch this "find_element_by_link_text"
I tried using Xpath to build this "element" , but in the view Xpath (in firefox) - I see nothing, to clue why that happens (Of course I removed the namespace ":x" )
Other things I tried apart from implicity_wait(30):
find_element_by_partial_link(“Thinkpad”) and appending Unicode to this (wasn’t sure if it was reading the brackets ( , driver.find_element_by_link_text(u"Thinkpad Edge E530 (Black)").text, still did not work.
Related questions:
How to use find_element_by_link_text() properly to not raise NoSuchElementException?
NoSuchElement Exception using find_element_by_link_text when implicitly_wait doesn't work?
It happened to me before that the find_element_by_link_text method sometimes works and sometimes doesn't work; even in a single case. I think it's not a reliable way to access elements; the best way is to use find_element_by_id.
But in your case, as I visit the page, there is no id to help you. Still you can try find_elements_by_xpath in 3 ways:
1- Accessing title: find_element_by_xpath["//a[contains(#title = 'T430')]"]
2- Accessing text: find_element_by_xpath["//a[contains(text(), 'T430')]"]
3- Accessing href: find_element_by_xpath["//a[contains(#href = 'http://www.thedostore.com/laptops/thinkpad-laptops/thinkpad-t430-u-black-627326q.html')]"].
Hope it helps.
NoSuchElementException is thrown when the element could not be found.
If you encounter this exception, please check the followings:
Check your selector used in your find_by...
Element may not yet be on the screen at the time of the find operation.
If webpage is still loading, check for selenium.webdriver.support.wait.WebDriverWait() and write a wait wrapper to wait for an element to appear.
Troubleshooting and code samples
You can add breakpoint just before your failing line pdb.set_trace() (don't forget to import pdb), then run your test and once your debugger stops, then do the following tests.
You could try:
driver.find_element_by_xpath(u'//a[text()="Foo text"]')
instead. This is more reliable test, so if this would work, use it instead.
If above won't help, please check if your page has been loaded properly via:
(Pdb) driver.execute_script("return document.readyState")
'complete'
Sometimes when the page is not loaded, you're actually fetching the elements from the old page. But even though, readyState could still indicate the state of the old page (especially when using click()). Here is how this is explained in this blog:
Since Selenium webdriver has become more advanced, clicks are much more like "real" clicks, which has the benefit of making our tests more realistic, but it also means it's hard for Selenium to be able to track the impact that a click has on the browsers' internals -- it might try to poll the browser for its page-loaded status immediately after clicking, but that's open to a race condition where the browser was multitasking, hasn't quite got round to dealing with the click yet, and it gives you the .readyState of the old page.
If you think this is happening because the page wasn't loaded properly, the "recommended" (however still ugly) solution is an explicit wait:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
old_value = browser.find_element_by_id('thing-on-old-page').text
browser.find_element_by_link_text('my link').click()
WebDriverWait(browser, 3).until(
expected_conditions.text_to_be_present_in_element(
(By.ID, 'thing-on-new-page'),
'expected new text'
)
)
The naive attempt would be something like this:
def wait_for(condition_function):
start_time = time.time()
while time.time() < start_time + 3:
if condition_function():
return True
else:
time.sleep(0.1)
raise Exception(
'Timeout waiting for {}'.format(condition_function.__name__)
)
def click_through_to_new_page(link_text):
browser.find_element_by_link_text('my link').click()
def page_has_loaded():
page_state = browser.execute_script(
'return document.readyState;'
)
return page_state == 'complete'
wait_for(page_has_loaded)
Another, better one would be (credits to #ThomasMarks):
def click_through_to_new_page(link_text):
link = browser.find_element_by_link_text('my link')
link.click()
def link_has_gone_stale():
try:
# poll the link with an arbitrary call
link.find_elements_by_id('doesnt-matter')
return False
except StaleElementReferenceException:
return True
wait_for(link_has_gone_stale)
And the final example includes comparing page ids as below (which could be bulletproof):
class wait_for_page_load(object):
def __init__(self, browser):
self.browser = browser
def __enter__(self):
self.old_page = self.browser.find_element_by_tag_name('html')
def page_has_loaded(self):
new_page = self.browser.find_element_by_tag_name('html')
return new_page.id != self.old_page.id
def __exit__(self, *_):
wait_for(self.page_has_loaded)
And now we can do:
with wait_for_page_load(browser):
browser.find_element_by_link_text('my link').click()
Above code samples are from Harry's blog.
Here is the version proposed by Tommy Beadle (by using staleness approach):
import contextlib
from selenium.webdriver import Remote
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import staleness_of
class MyRemote(Remote):
#contextlib.contextmanager
def wait_for_page_load(self, timeout=30):
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
If you think it isn't about page load, double check if your element isn't in iframe or different window. If so, you've to switch to it first. To check list of available windows, run: driver.window_handles.
From viewing the source of the page that you provided a link to, it seems you are using an incorrect selector.
You should use instead find_elements_by_link_text(u'text here')[0] to select the first occurrence instead as there seems to be the potential for multiple links with the same link text.
So instead of:
self.assertEqual("Thinkpad Edge E530 (Black)", driver.find_element_by_link_text("Thinkpad Edge E530 (Black)").text)
You should use:
self.assertEqual("Thinkpad Edge E530 (Black)", driver.find_elements_by_link_text("Thinkpad Edge E530 (Black)")[0].text)
Solution posted by OP:
Hack 1: Instead of identifying the element as a text-link, I identified the "bigger frame" in which this element was present.
itemlist_1 = driver.find_element_by_css_selector("li.item.first").text
This will give the whole item along with the name, price and detail (and the unwanted add to cart and compare"
See the attached image for more .
Hack 2: I found that the "Buy Now" which was an image element with xPath (driver.find_element_by_xpath("//div[#id='subseries']/div[2]/div/p[3]/a").click()
, in the code above) , could be made to click/identified faster if I added the following line, before finding this by xpath. I think this sort of narrows down where the Webdriver is looking for an element. This is what I added " driver.find_element_by_css_selector("#subseries").text"
This must have decreased my wait by at least 20 seconds, on that page .Hope that helps.