selenium scraper get empty list with Xpath

selenium scraper get empty list with Xpath - python

I am scraping price data on the website: https://fbx.freightos.com/ .
This is the code below:
from selenium import webdriver from selenium.webdriver.edge.service
import Service from selenium.webdriver.common.by import By
s = Service(e_driver_path)
driver = webdriver.Edge(service=s)
elements = driver.find_elements(By.XPATH,
/html/body/div[2]/div[1]/div[3]/div/div[1]/section/div[2]/div/div[2]/div/div[1]/div/div/div[1]/span)
content = "".join([element.text for element in elements])
print(content)
the problem is the result. It is an empty list.
As planned, it should be the "Current FBX" and the result looks like "$3,540".
Please help.
Thanks ahead.

Try to wait for required data
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
fbx = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[.='Current FBX']/following-sibling::span[normalize-space()]"))).text

Use webdriverwait() and wait for visibility of element located and following css selector
driver.get("https://fbx.freightos.com/")
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".row.bottom-xs.between-xs div:nth-of-type(1)>div"))).text)
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".row.bottom-xs.between-xs div:nth-of-type(1)>span"))).text)
You need to import below libraries.
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

This xpath will directly target the price web element.
Try fetching this after some wait as mentioned by #KunduK
//div[contains(text(),'Current FBX')]/parent::div/span

//span[#class='styled__Value-sc-1puuh0x-8 hIASbB']
You can also use xpath like this. simplicity is best

Related

Nontype Error when extracting text after using select_by_index() in Selenium Python

Code trials:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
browser = webdriver.Chrome()
url = "https://ssangmun.sen.es.kr/66769/subMenu.do"
browser.get(url)
browser.maximize_window()
select = Select(browser.find_element_by_id("srhMlsvYear"))
select.options[0].text #output "2017"
select.select_by_index(0).text #output Nonetype Error
When I use:
select.options[0].text
It works! But when I use:
select.select_by_dex(0)
it occur Nonetype Error
Why this error occur?

with id
#srhMlsvYear
there are two matches in HTMLDOM.
I would recommend you to use the below CSS with explicit wait:
select#srhMlsvYear
and use it like this:
wait = WebDriverWait(browser, 30)
ele = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select#srhMlsvYear")))
select = Select(ele)
select.select_by_index(0)
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Presumably, the following option texts:
2017년
2018년
2019년 etc
are present within the DOM Tree as soon as the url is invoked. That's why when you:
select.options[0].text
You see the output:
2017
However, select_by_index(index) doesn't returns anything. So while you try to extract the text after selecting the desired option as:
select.select_by_index(0).text
the output is Nonetype Error
Solution
After selecting the desired option to get the option text you can use the attribute first_selected_option as follows:
Code Block:
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://ssangmun.sen.es.kr/66769/subMenu.do")
select = Select(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select#srhMlsvYear"))))
select.select_by_index(0)
print(select.first_selected_option.text)
Console Output:
2017년

I have a problem with pressing a button by xpath using selenium

I want to press a button with the XPath of
//*[#id="rass-action-proceed"]
i don't know how to use selenium. can someone help me please?

This will allow you to do so using the respective XPath:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
elem = driver.find_element_by_xpath('//a[#id="rass-action-proceed"]')
elem.click() # Using WebElements 'click()' method for sheer simplicity

You can try this :
driver.find_element_by_id("rass-action-proceed").click()
or with Explicit wait :
WebDriverWait(driver , 10).until(EC.element_to_be_clickable((By.ID, "rass-action-proceed"))).click()
make sure to import :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Not Able to See Entire Page Source of Website using Selenium,Python

I am trying to scrape this website
https://script.google.com/a/macros/cprindia.org/s/AKfycbzgfCVNciFRpcpo8P7joP1wTeymj9haAQnNEkNJJ2fQ4FBXEco/exec
I am using selenium and python.I am not able to view entire page source,Basically i have to scrape the table inside it and click on next button,but the code of next and table not visible on page source.Here is my code
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get(link)
pass1 = browser.find_element_by_xpath("/html/body/div[2]/table[2]/tbody/tr[1]/td/div/div/div[2]/div[2]")
pass1.click()
time.sleep(30)
I am getting this error,NoSuchElementException.

There are two iframes present on the page, so you need to first switch on those iframe and then you need to click on the element.
And you can apply explicit wait on the element so that the script waits until the element is visible on the page.
You can do it like:
browser = webdriver.PhantomJS()
browser.get(link)
browser.switch_to.frame(driver.find_element_by_id('sandboxFrame'))
browser.switch_to.frame(driver.find_element_by_id('userHtmlFrame'))
WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, "//div[contains(#class,'charts-custom-button-collapse-left')]//div"))).click()
Note: You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

WebDriverWait on finding element by CSS Selector

I want to retrieve the price of the flight of this webpage using Python 3: https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o
At first I got an error which after many hours I realized was due to the fact that I wasn't giving the webdriver enough time to load all elements. So to ensure that it had enough time I added a time.sleep like so:
time.sleep(1)
This made it work! However, I've read and was advised to not use this solution and to use WebDriverWait instead. So after many hours and several tutorials im stuck trying to pinpoint the exact CSS class the WebDriverWait should wait for.
The closest I think I've got is:
WebDriverWait(d, 1).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price")))
Any ideas on what I'm missing on?

You could use a css attribute = value selector to target, or if that value is dynamic you can use a css selector combination to positional match.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")
#element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '[jstcache="9322"]')))
element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')))
print(element.text)
#driver.quit()
No results case:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
url ="https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o" #"https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-11-28;c:EUR;e:1;a:FR;sd:1;t:f;tt:o"
driver.get(url)
try:
status = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , 'p[role=status')))
print(status.text)
except TimeoutException as e:
element = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR , '.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')))
print(element.text)
#driver.quit()

I may be wrong but I think you are trying to get the price of the flight trip.
If my assumption is correct, take a look at my approach. I find the Search Results list, then all the Itinerary inside the Search Results list, loop over and get all the price information. This is the best approach I can come up with and avoiding all the dynamic attributes
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = 20
driver = Chrome()
driver.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/04jpl.2018-12-17;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")
# Get the Search Result List
search_results= WebDriverWait(driver, wait).until(EC.presence_of_element_located((By.CSS_SELECTOR , 'ol[class="gws-flights-results__result-list"]')))
# loop through all the Itinerary
for result in search_results.find_elements_by_css_selector('div[class*="gws-flights-results__collapsed-itinerary"]'):
price = result.find_element_by_css_selector('div[class="gws-flights-results__itinerary-price"]')
print(price.text)
Output
€18

Selenium can't find element, but element is on the https://login.aliexpress.com/ webpage

On the website the selenium script cannot find the login and password fields. I tried to search by xpath, css selector, name and class name. But nothing worked.
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get("https://login.aliexpress.com/")
driver.find_element_by_id("fm-login-id").send_keys("test_id")
driver.find_element_by_id("fm-login-password").clear()
driver.find_element_by_id("fm-login-password").send_keys("test_pass")
driver.find_element_by_id("fm-login-submit").click()`
I tried to do this with the help of Selenium IDE, and everything worked in the GUI. But after I exported the code to python and ran it, the program gave an error that it could not find the element.

The login form is inside of a frame, you need to switch to it first.
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get("https://login.aliexpress.com/")
frame = driver.find_element_by_id("alibaba-login-box")
driver.switch_to.frame(frame)
driver.find_element_by_id("fm-login-id").send_keys("test_id")
driver.find_element_by_id("fm-login-password").clear()
driver.find_element_by_id("fm-login-password").send_keys("test_pass")
driver.find_element_by_id("fm-login-submit").click()

However as the the desired elements are within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired elements to be clickable.
You can use the following solution:
Using CSS_SELECTOR:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://login.aliexpress.com/")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#alibaba-login-box[src^='https://passport.aliexpress.com/mini_login.htm?']")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.fm-text#fm-login-id"))).send_keys("test_id")
driver.find_element_by_css_selector("input.fm-text#fm-login-password").send_keys("test_pass")
driver.find_element_by_css_selector("input.fm-button#fm-login-submit").click()
Interim Broswer Snapshot:
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Reference
You can find a relevant discussion in
Ways to deal with #document under iframe

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

selenium scraper get empty list with Xpath - python

This xpath will directly target the price web element. Try fetching this after some wait as mentioned by #KunduK //div[contains(text(),'Current FBX')]/parent::div/span

//span[#class='styled__Value-sc-1puuh0x-8 hIASbB'] You can also use xpath like this. simplicity is best

Related

Nontype Error when extracting text after using select_by_index() in Selenium Python

I have a problem with pressing a button by xpath using selenium

Not Able to See Entire Page Source of Website using Selenium,Python

WebDriverWait on finding element by CSS Selector

Selenium can't find element, but element is on the https://login.aliexpress.com/ webpage

Categories

Resources