Selenium does not get elements loaded later

Selenium does not get elements loaded later - python

I have been trying to make a python script that will log into my router's page, log all the connected devices, and then display them by their connected names.
I have made a script that looks like this:
from requests import session
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
url = "http://192.168.1.1/login/login.html"
browser = webdriver.Chrome()
wait = WebDriverWait(browser, 100)
browser.get(url)
kad = browser.find_element_by_id("AuthName")
password = browser.find_element_by_id("AuthPassword")
kad.send_keys("MyRouterLoginName")
password.send_keys("MyRouterLoginPassword")
buton = browser.find_element_by_xpath("/html/body/div[2]/div[2]/div[2]/div/ul/li/div[3]/form/fieldset/ul/li[6]/input") #this is my login button
buton.click()
homepage = "http://192.168.1.1/index.html"
browser.get(homepage) #Router asks for changing default password, i skip it like that
sleep(5)
verify = browser.find_element_by_css_selector('body')
print(verify.text) #see my later explanation
xpathmethod = browser.find_element_by_xpath("/html/body/div[3]/div/div/div/div[3]/div/table/tbody/tr/td[3]/div/ul[1]/li[1]/div[2]/a")
print(xpathmethod.text)
print("Finding by css")
content = browser.find_element_by_css_selector('.addmenu')
print(content.text)
The verify line was to make sure the webpage was fully loaded but here is the problem, while webpage loads, it first loads a default menu items (Such as connection status, Networking settings, troubleshooting etc) then loads the devices that are currently connected. Webdriver somehow does not recognize the connected devices section and gives an "unable to locate element" error.
I have tried xpath and css selector methods but both gives me the same result.
Sorry, I can't paste the html fully but here is the path that chrome gives me when I inspect the element
html body div.index div div #mainframe html body div div #contentPanel #mapView div div table tbody tr td div #wlInfo li div a

You need something like this:
try:
# Load page
browser.get("http://192.168.1.1/index.html")
# Wait 10 seconds before element can be located on the page
# For example, <div class="example"> -- first div with class "example"
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, ".//div[#class='example'][1]")))
except Exception as e:
# Catch an exception in case of element unavailability
print ("Page load error: {}".format(e.message))

Found another way to get connected devices list.
Modem has a ConnectionStatus page where it gives me a full list including mac addresses and other details in a single string.
Now I need to parse them. Will create another question about that.

Related

selenium python product load button not working

This page has a total of 790 products and I write selenium code to automatically click on the product load button until it will finish loading all 790 products. Unfortunately, my code is not working and getting an error. here is my full code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
driver = webdriver.Chrome()
driver.maximize_window()
url ='https://www.billigvvs.dk/maerker/grohe/produkter?min_price=1'
driver.get(url)
time.sleep(5)
#accept cookies
try:
driver.find_element_by_xpath("//button[#class='coi-banner__accept']").click()
except:
pass
print('cookies not accepted')
# Wait 20 seconds for page to load.
timeout = 20
try:
WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='productbox__info__name']")))
except TimeoutException:
print("Timed out waiting for page to load")
browser.quit()
#my page load button not working. I want to load all 790 product in this page
products_load_button = driver.find_element_by_xpath("//div[#class='filterlist__button']").click()
The error that I am getting:
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='filterlist__button']"}
(Session info: chrome=87.0.4280.88)
The error message saying Unable to locate element but see the picture which saying I am selecting the right element.

You are missing an extra space at the end, try with this:
products_load_button = driver.find_element_by_xpath("//div[#class='filterlist__button ']").click()
when you work with selectors is always a good practice to copy and paste directly from the page, that will save a lot of headaches in the future.
Edit:
The while loop to check if all the elements are loaded looks similar to this:
progress_bar_text = driver.find_element_by_css("div.filterlist__pagination__text").text
# From here you could extract the total items and the loaded items
# Note: I am doing this because I don't have access to the page, probably
# there is a better way to found out if the items are loaded taking
# taking a look into the attributes of the progressBar
total_items = int(progress_bar_text.split()[4])
loaded_items = int(progress_bar_text.split()[1])
while loaded_items < total_items:
# Click the product load button until the products are loaded
product_load_button.click()
# Get the progress bar text and updates the loaded_items count
progress_bar_text = driver.find_element_by_css("div.filterlist__pagination__text").text
loaded_items = int(progress_bar_text.split()[1])
This is a very simple example and does not consider a lot of scenarios that you will need to handle to make it stable, some of them are:
The elements might disappear or reload after you click the products_load_button. For this I'll recommend that you take a look to explicit waits in selenium docs.
Is possible that the progress bar could disappear/hide after the load is complete.

Access all href-links in a deep-class hierarchy

I am trying to access all href-links from a website, the search-results to be precise. My first intention is to get all the links, and then to look further on it. The problem is --> I get some links from the website, but not the links of the search-results. Here is one version of my code.
from selenium import webdriver
from htmldom import htmldom
dom = htmldom.HtmlDom("myWebsite")
dom = dom.createDom()
p_links = dom.find("a")
for link in p_links:
print("URL: " +link.attr("href"))
Here is screen of the HTML of that particular website. In the screen, I marked the href-link I try to access in the future. I am open for any help given, be it in Selenium, htmldom, b4soup, etc.

The data you are after, is loaded with AJAX requests. So, you can't scrape them directly after getting the page source. But, the AJAX request is sent to this URL:
https://open.nrw/solr/collection1/select?q=*%3A*&fl=validated_data_dict%20title%20groups%20notes%20maintainer%20metadata_modified%20res_format%20author_email%20name%20extras_opennrw_spatial%20author%20extras_opennrw_groups%20extras_opennrw_format%20license_id&wt=json&fq=-type:harvest+&sort=title_string%20asc&indent=true&rows=20
which returns the data in JSON format. You can use requests module to scrape this data.
import requests
BASE_URL = 'https://open.nrw/dataset/'
r = requests.get('https://open.nrw/solr/collection1/select?q=*%3A*&fl=validated_data_dict%20title%20groups%20notes%20maintainer%20metadata_modified%20res_format%20author_email%20name%20extras_opennrw_spatial%20author%20extras_opennrw_groups%20extras_opennrw_format%20license_id&wt=json&fq=-type:harvest+&sort=title_string%20asc&indent=true&rows=20')
data = r.json()
for item in data['response']['docs']:
print(BASE_URL + item['name'])
Output:
https://open.nrw/dataset/mags-90-10-dezilsverhaeltnis-der-aequivalenzeinkommen-1512029759099
https://open.nrw/dataset/alkis-nutzungsarten-pro-baublock-wuppertal-w
https://open.nrw/dataset/allgemein-bildende-schulen-am-1510-nach-schulformen-schulen-schueler-und-lehrerbestand-w
https://open.nrw/dataset/altersgruppen-in-meerbusch-gesamt-meerb
https://open.nrw/dataset/amtliche-stadtkarte-wuppertal-raster-w
https://open.nrw/dataset/mais-anteil-abhaengig-erwerbstaetiger-mit-geringfuegiger-beschaeftigung-1477312040433
https://open.nrw/dataset/mags-anteil-der-stillen-reserve-nach-geschlecht-und-altersgruppen-1512033735012
https://open.nrw/dataset/mags-anteil-der-vermoegenslosen-in-nrw-nach-beruflicher-stellung-1512032087083
https://open.nrw/dataset/anzahl-kinderspielplatze-meerb
https://open.nrw/dataset/anzahl-der-sitzungen-von-rat-und-ausschussen-meerb
https://open.nrw/dataset/anzahl-medizinischer-anwendungen-den-oeffentlichen-baedern-duesseldorfs-seit-2006-d
https://open.nrw/dataset/arbeitslose-den-wohnquartieren-duesseldorf-d
https://open.nrw/dataset/arbeitsmarktstatistik-arbeitslose-gelsenkirchen-ge
https://open.nrw/dataset/arbeitsmarktstatistik-arbeitslose-nach-rechtskreisen-des-sgb-ge
https://open.nrw/dataset/arbeitsmarktstatistik-arbeitslose-nach-stadtteilen-gelsenkirchen-ge
https://open.nrw/dataset/arbeitsmarktstatistik-sgb-ii-rechtskreis-auf-stadtteilebene-gelsenkirchen-ge
https://open.nrw/dataset/arbeitsmarktstatistik-sozialversicherungspflichtige-auf-stadtteilebene-gelsenkirchen-ge
https://open.nrw/dataset/verkehrszentrale-arbeitsstellen-in-nordrhein-westfalen-1476688294843
https://open.nrw/dataset/mags-arbeitsvolumen-nach-wirtschaftssektoren-1512025235377
https://open.nrw/dataset/mais-armutsrisikoquoten-nach-geschlecht-und-migrationsstatus-der-personen-1477313317038
As you can see, this returned the first 20 URLs. When you first load the page only 20 items are present. But, if you scroll down, more are loaded. To get more items, you can change the Query String Parameter in the URL. The URL ends with rows=20. You can change this number to get the desired number of results.

Results appear after the initial page load due to the AJAX request.
I managed to get the links with Selenium, however I had to wait for .ckantitle a elements to be loaded (these are the links you want to get).
I should mention that the webdriver will wait for a page to load by
default. It does not wait for loading inside frames or for ajax
requests. It means when you use .get('url'), your browser will wait
until the page is completely loaded and then go to the next command in
the code. But when you are posting an ajax request, webdriver does not
wait and it's your responsibility to wait an appropriate amount of
time for the page or a part of page to load; so there is a module
named expected_conditions.
Code:
from urllib.parse import urljoin
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
url = 'https://open.nrw/suche'
html = None
browser = webdriver.Chrome()
browser.get(url)
delay = 3 # seconds
try:
WebDriverWait(browser, delay).until(
EC.presence_of_element_located((By.CSS_SELECTOR, '.ckantitle a'))
)
html = browser.page_source
except TimeoutException:
print('Loading took too much time!')
finally:
browser.quit()
if html:
soup = BeautifulSoup(html, 'lxml')
links = soup.select('.ckantitle a')
for link in links:
print(urljoin(url, link['href']))
You need to install selenium:
pip install selenium
and get a driver here.

Selenium for Python - text_to_be_present for partial text matching

I have a div which contains the results for a certain search query. The text contained in this div changes as a button to go to the next page is clicked.
In the text contained in this div, there is also the corresponding number of the page. After clicking in the button to go to the next page, the results still take a bit to load, so I want to make the driver wait the content to load.
As such, I want to wait until the string "Page x" appears inside the div, where x corresponds to the number of the next page to be loaded.
For this, I tried:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
# Search page
driver.get('http://searchpage.com')
timeout = 120
current_page = 1
while True:
try:
# Wait until the results have been loaded
WebDriverWait(driver, timeout).until(
EC.text_to_be_present_in_element(
locator=(By.ID, "SEARCHBASE"),
text_="Page {:}".format(current_page)))
# Click to go to the next page. This calls some javascript.
driver.find_element_by_xpath('//a[#href="#"]').click
current_page += 1
except:
driver.quit()
Although, this always fails to match the text. What am I doing wrong here?
To just detect if anything whatsoever had changed in the page would also do the job, but I haven't found any way to do that.

Try to apply below solution to wait for partial text match:
WebDriverWait(driver, timeout).until(lambda driver: "Page {:}".format(current_page) in driver.find_element(By.ID, "SEARCHBASE").text)

How to wait for page and get source when it redirects

How to wait when it redirects the page by ajax
So I tried making code. but this code is not perfect.
Code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
URL = "http://www.someajaxcode.com";
driver = webdriver.Firefox()
driver.get(URL)
wait = WebDriverWait(driver, 10)
driver.find_element(By.ID,'SomeAjaxButton').click()
wait.until(lambda driver: driver.current_url != URL)
driver.get(driver.current_url)
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")
print(source_code)
Temporarily. I prevent redirect by driver.get(driver.current_url)
because It doesn't keep getting the redirect page until download website completely.
so If I input the code driver.get(driver.current_url) at 11 line. It keep waiting the page until download website completely
but I think It's temporary code made by driver.get(driver.current_url).
and then I know
wait.until(expected_conditions.presence_of_element_located(By.ID,'anyitem at redict'))
but this code is getting one item element. So I don't want to this code
what if. have you know code better than this code? like something check changing all elements

cannot locate element within a web page using selenium python

I just want to write a simple log in code for one website. However, I think the log in page was written in JS. It's really hard to locate the elements with selenium.
The web page I am going to play with is:
"https://www.nike.com/snkrs/login?returnUrl=%2F"
This is how the page looks like and how the inspect element page look like:
I was trying to locate the element by following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox()
driver.get("https://www.nike.com/snkrs/login?returnUrl=%2Fthread%2Fe9680e08e7e3cd76b8832684037a58a369cad5ed")
time.sleep(5)
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
elem =driver.find_element_by_xpath("//*[#id='ce3feab5-6156-441a-970e-23544473a623']")
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
driver.close()
This code return the error that the element could not find by [#id='ce3feab5-6156-441a-970e-23544473a623'.
I tried playing with frames, it seems does not work. If I went to "view web source" page, it is full of JS code.
Is there a good way to play with such a web page with selenium?

Try changing the code :
elem =driver.find_element_by_xpath("//*[#id='ce3feab5-6156-441a-970e-23544473a623']")
to
elem =driver.find_element_by_xpath("//*[#type='email']")

My guess (and observation) is that the id changes each time you visit the page. The id looks auto-generated, and when I go to the page multiple times, the id is different each time.
You'll need to search for something that doesn't change. For example, you can search for the name attribute, which has the seemingly static value "emailAddress"
element = driver.find_element_by_name("emailAddress")
You could also use an xpath expression to search for other attributes, such as data-componentname:
element = driver.find_element_by_xpath("//input[#data-componentname='emailAddress']")
Also, instead of a hard-coded sleep, you can simply wait for the element to be visible:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.nike.com/snkrs/login")
element = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.NAME, "emailAddress"))
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selenium does not get elements loaded later - python

Found another way to get connected devices list. Modem has a ConnectionStatus page where it gives me a full list including mac addresses and other details in a single string. Now I need to parse them. Will create another question about that.

Related

selenium python product load button not working

Access all href-links in a deep-class hierarchy

Selenium for Python - text_to_be_present for partial text matching

How to wait for page and get source when it redirects

cannot locate element within a web page using selenium python

Categories

Resources