Problems Scraping with Selenium (xpath) in Tripadvisor - python

I am new in python and scraping.
I am trying to extract information about Tripadvisor. First of all, I need Selenium for crawling but when I run the program in diferents times the paths change.
I show you a example:
import urllib.request
import urllib.parse
from selenium import webdriver
import csv
from selenium.webdriver.common.action_chains import ActionChains
import time
from _datetime import datetime
from selenium.webdriver.common.keys import Keys
options=webdriver.ChromeOptions()
options.headless=False
prefs={"profile.default_content_setting_values.notofications" :2}
options.add_experimental_option("prefs",prefs)
chromedriver = "C:/Users/rober/OneDrive/Escritorio/tfm/chromedriver.exe"
driver=webdriver.Chrome(chromedriver)
driver.maximize_window()
time.sleep(5)
driver.get("https://www.tripadvisor.es/")
//*[#id="component_5"]/div/div/div/span[3]/div/div/div/a/span[2]
#Click Restaurants
driver.find_element_by_xpath('//*[#id="component_5"]/div/div/div/span[3]/div/div/div/a').click()
#Introduce localization
driver.find_element_by_xpath('//*[#id="BODY_BLOCK_JQUERY_REFLOW"]/div[14]/div/div/div[1]/div[1]/div/input').send_keys("madrid")
In the last part of code, sometimes div[14] is div[13] or div[15]. is it possible absolute xpath or use other form?
Thank you

You should not use Xpath with a longer path. That makes the test brittle
Please use shorter xpaths. An Xpath like this "//input[#class="Smftgery"]" should help you click on the same input field.
Also to click on Restaurantes, you can use //*[text()='Restaurantes']

Your Xpath is too specific, find some uniqueness in the deeper levels of the DOM. This uniqueness can be also a combination of multiple levels.
e.g. if there is only one input field inside BODY_BLOCK_JQUERY_REFLOW you can ignore all the levels in between:
'//*[#id="BODY_BLOCK_JQUERY_REFLOW"]//input'
or use some other attribute of input e.g. if it has a data attribute://input[#data="the-data-of-the-input-field"]

Related

Python - Selenium (XPath) "Message: no such element: Unable to locate element"

im trying to interact with a website. I want to apply some filters but i have an error, my code does not recognize the xpath.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
options=Options()
options.add_argument('--windoes-size=1920,1080')
driver=webdriver.Chrome(options=options)
driver.get("https://dexscreener.com/polygon/uniswap")
folder=driver.find_element(By.XPATH,'//button[#class="chakra-button chakra-menu__menu-button custom-tpjv8u"]')
folder.click()
folder=driver.find_element(By.XPATH,'//button[#id="menu-list-36-menuitem-33"]')
folder.click()
You should use another XPATH for option choosing.
Seems like ids for options may be generated dynamically.
So you can try following XPATHs for different filters:
//button[#value="m5"] # Last 5 minutes button
//button[#value="h1"] # Last hour
//button[#value="h6"] # Last 6 hours
//button[#value="h24"] # Last 24 hours
This way it works fine for me.
Have you tried using CSS_SELECTOR? I was working with Selenium recently, and sometimes when XPATH was not working, CSS_SELECTOR was.
folder=driver.find_element(By.CSS_SELECTOR, "selector here")

How to select a value from drop down menu in python selenium

I am writing a python script which will call a webpage and will select an option from the drop down to download that file. To do this task, I am using chropath. It is a browser extension which can give you the relative xpath or id for any button or field on the webpage and using that we can call it from python selenium script.
Above image shows the drop down menu in which I have to select 2019 as year and the download the file. In the lower part of the image, you can see that I have used chropath to get the relative xpath of the drop down menu which is //select[#id='rain']
Below is the code I am using:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("<URL>")
driver.maximize_window()
grbf = driver.find_element_by_xpath("//select[#id='rain']")
grbf.send_keys('2019')
grbf_btn = (By.XPATH, "//form[1]//input[1]")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable(grbf_btn)).click()
from the above code, you can see that I am using xpath to select the drop down grbf = driver.find_element_by_xpath("//select[#id='rain']") and then sending keys as 2019 i.e. grbf.send_keys('2019') and after that I am calling download button to download it. But for some reason, its always selecting year 1999 from the drop down. I am not able to understand what is wrong in this. Is this correct approach to solve this. Please help. Thanks
I had the same problem time ago. Try this:
from selenium.webdriver.support.ui import Select
grbf = Select(driver.find_element_by_xpath("//select[#id='rain']"))
grbf.select_by_value('2019')
In the select_by_value() you have to use the value of the element in the dropdown.
By the way, if an element has id, use it.
grbf = Select(driver.find_element_by_id('rain'))
Try below code:
select = Select(driver.find_element_by_xpath("//select[#id='rain']"))
select.select_by_visible_text('2019')
Another approches to deal with dropdown:
Using Index of Dropdown :
select.select_by_index(Paass index)
Using valueof Dropdown :
select.select_by_value('value of element')
Using visible text of Dropdown :
select.select_by_visible_text('element_text')
In my opinion, I don't think this is the correct approach. You try to select the option which is dropdown (not a text box like ), so send key command does not work.
What you need to do is try to inspect HTML changing when clicking the dropdown and try to XPath for an option that you want to select.
If you still stuck at this problem, I recommend using katalon recorder which is a chrome extension to allow you to record and do UI testing

using Selenium in Python to click on the right checkbox

I am very new to using selenium but I cannot find way around a very simple task.
I need to be able to click on the element that specifies bedrooms: 2.
I have used I don't know how many references by xpath, by id, by name, by class but selenium just won't find the element. I also have tried to browse the internet but could not find solutions that help me.
Here's the sanpshot
For instance: using
driver.find_element_by_id('agatha_bedrooms1588844814480_advancedSearch1').click()
This won't work. Selenium cannot find the element. It seems that this element is within another element but I don't understand how to access it.
Could you help me please?
Thanks a lot to you.
G
The ids seem to be dynamically generated, in which case you cannot rely on them. Try with this xpath:
driver.find_element_by_xpath("//*[#name='bedrooms' and #value='2']/following::label").click()
Although it is generally good practice to work with waits. So something like:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//*[#name='bedrooms' and #value='2']/following::label"))).click()
Ensure to have these imports for the wait to work
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
Thanks a lot, after multiple and multiple trials I could get around that way:
elemt = driver.find_element_by_xpath("//*#name='bedrooms']").find_element_by_xpath("//[#value='2']")
idvar = elemt.get_attribute("id")
elemt2 = driver.find_element_by_xpath("//label[#for='" + idvar + "']")
elemt2.click()
It seems that the checkbox was hidden under the label (?!) so that Selenium did not want to click on it.
If the checkbox is inside of an iframe, do this:
# basically just select the iframe any way you want
frame = driver.find_element_by_css_selector("iframe")
driver.switch_to.frame(frame)
driver.find_element_by_id('agatha_bedrooms1588844814480_advancedSearch1').click()
edit:
I've found solution. Kinda ugly but works lol
element = driver.find_elements_by_css_selector("input[name=bedrooms][value='2']")[0]
element.find_element_by_xpath("..").click()
You can try this xpath. hope its helps:
//*[#name='bedrooms']/following::*/*[text()='2']

How do I click an element using selenium and beautifulsoup?

How do I click an element using selenium and beautifulsoup in python? I got these lines of code and I find it difficult to achieve. I want to click every element in every iteration. There are no pagination or next page. There are only like about 10 elements and after clicking the last element, it should stop. Does anyone know what should I do. Here are my code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
import urllib
import urllib.request
from bs4 import BeautifulSoup
chrome_path = r"C:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
url = 'https://www.99.co/singapore/condos-apartments/a-treasure-trove'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html,'lxml')
details = soup.select('.FloorPlans__container__rwH_w') //Whole container of the result
for d in details:
picture = d.find('span',{'class':'Tappable-inactive'}).click() //the single element.
print(d)
driver.close()
Here is the site https://www.99.co/singapore/condos-apartments/a-treasure-trove . I want to scrape the details and the image in every floor plans section but it is difficult because the image only appears after you click the specific element. I can only get the details except for the image itself. Try it yourself so that you know what I mean.
EDIT:
I tried this method
for d in driver.find_elements_by_xpath('//*[#id="floorPlans"]/div/div/div/div/span'):
d.click()
The problem is it clicks too fast that the image couldn't load. And also im using selenium here. Is there any method like selecting a beautifulsoup like this format picture = d.find('span',{'class':'Tappable-inactive'}).click() ?
You cannot interact with website widgets by using beautifulSoup you need to work with selenium. There are 2 ways to handle this problem.
First is to get the main wrapper (class) of the 10 elements and then iterate to each child element of the main class.
You can get the element by xpath and increment the last number in xpath by one in each iteration to move to the next element.
I print some result to check your code.
"details" only has one item.
And "picture" is not element. (So it's not clickable.)
details = soup.select('.FloorPlans__container__rwH_w')
print(details)
print(len(details))
for d in details:
print(d)
picture = d.find('span',{'class':'Tappable-inactive'})
print(picture)
Output:
For your edited version, you can check images have been visible before you do click().
Use visibility_of_element_located to do.
Reference: https://selenium-python.readthedocs.io/waits.html

Selenium: How to parse through a code after using selenium to click a dropdown

I'm trying to webscrape trough this webpage https://www.sigmaaldrich.com/. Up to now I have achieved for the code to use the requests method to use the search bar. After that, I want to look for the different prices of the compounds. The html code that includes the prices is not visible until the Price dropdown has been clicked. I have achieved that by using selenium to click all the dropdowns with the desired class. But after that, I do not know how to get the html code of the webpage that is generated after clicking the dropdowns and where the price is placed.
Here's my code so far:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
#get the desired search terms by imput
name=input("Reagent: ")
CAS=input("CAS: ")
#search using the name of the compound
data_name= {'term':name, 'interface':'Product%20Name', 'N':'0+',
'mode':'mode%20matchpartialmax', 'lang':'es','region':'ES',
'focus':'product', 'N':'0%20220003048%20219853286%20219853112'}
#search using the CAS of the compound
data_CAS={'term':CAS, 'interface':'CAS%20No.', 'N':'0','mode':'partialmax',
'lang':'es', 'region':'ES', 'focus':'product'}
#get the link of the name search
r=requests.post("https://www.sigmaaldrich.com/catalog/search/", params=data_name.items())
#get the link of the CAS search
n=requests.post("https://www.sigmaaldrich.com/catalog/search/", params=data_CAS.items())
#use selenium to click in the dropdown(only for the name search)
driver=webdriver.Chrome(executable_path=r"C:\webdrivers\chromedriver.exe")
driver.get(r.url)
dropdown=driver.find_elements_by_class_name("expandArrow")
for arrow in dropdown:
arrow.click()
As I said, after this I need to find a way to get the html code after opening the dropdowns so that I can look for the price class. I have tried different things but I don't seem to get any working solution.
Thanks for your help.
You can try using the Selenium WebDriverWait. WebDriverWait
WebDriverWait wait = new WebDriverWait(driver, 30);
WebElement element = wait.until(ExpectedConditions.presenceOfElementLocated(css));
First, You should use WebDriverWait as Austen had pointed out.
For your question try this:
from selenium import webdriver
driver=webdriver.Chrome(executable_path=r"C:\webdrivers\chromedriver.exe")
driver.get(r.url)
dropdown=driver.find_elements_by_class_name("expandArrow")
for arrow in dropdown:
arrow.click()
html_source = driver.page_source
print(html_source)
Hope this helps you!

Categories