selenium and python3: selecting search box on www.rottentomatoes.com - python

I have a list of movies for which I want to get the reviews from rotten www.rottentomatoes.com, but I have run into a snag.
What I want is to be able to pass the title of each movie to website search box and then process the result to get the review I want.
At present, I cannot get beyond the search stage, because I have not been able to successfully locate the search box.
My code is as shown below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
browser = webdriver.Chrome('/home/zona/chromedriver')
url = 'https://www.rottentomatoes.com/'
browser.get(url)
time.sleep(10)
try:
element = WebdriverWait(browser, 10).until(
EC.presence_of_element_located((By.XPATH,'//body//input[#name="search"]')))
element = browser.find_element_by_xpath('//body//input[#name="search"]')
element.clear()
element.send_keys("avatar")
except:
print("cound not find search box")
time.sleep(5)
browser.quit()
I get the output:
cound not find search box
Can someone please help me locate what I am doing wrong?
Apologies if this is too basic please, I am new to programming and to python.

It is just case-sensitivity issue.
You used WebdriverWait (lower case d) instead of WebDriverWait.
Note: Used trackback module to print the stack trace to know the exception details.
Try the following code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import traceback
browser = webdriver.Chrome(`/home/zona/chromedriver`)
url = 'https://www.rottentomatoes.com/'
browser.get(url)
time.sleep(5)
try:
element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH,'//body//input[#name="search"]')))
# element = browser.find_element_by_xpath('//body//input[#name="search"]')
element.clear()
element.send_keys("avatar")
except:
traceback.print_exc()
print("cound not find search box")
time.sleep(5)
browser.quit()

Related

Selenium using WebDriverWait WebElement still doesn't print any text, recognizes string PYTHON

As title. Couldn't find a solution online. You'll see by the print statements that the program will get the text property from the WebElement object, but it is always an empty string even though I am using WebDriverWait.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
PATH = "C:\\Users\\Anthony\\Desktop\\chromedriver.exe"
driver = webdriver.Chrome(PATH)
website = "https://www.magicspoiler.com"
driver.get(website)
el_name = '//div[#class="set-card-2 pad5"]'
try:
main = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, el_name)))
articles = main.find_elements(By.XPATH, el_name)
for article in articles:
print(type(article.text))
print(article.text)
finally:
driver.quit()
I checked the HTML DOM and I don't see any text in the element you are looking for. They are simply images with anchor links and therefore you are getting blank responses.
I have given an alternate try, to extract the links and they were successful.
So, after refactoring your code, it looks like this:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
PATH = "C:\\Users\\Anthony\\Desktop\\chromedriver.exe"
driver = webdriver.Chrome(PATH)
website = "https://www.magicspoiler.com"
driver.get(website)
el_name = '//div[#class="set-card-2 pad5"]/a'
try:
main = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, el_name)))
articles = main.find_elements(By.XPATH, el_name)
for article in articles:
# print(type(article.text))
print(article.get_attribute("href"))
finally:
pass
driver.quit()
Here is the response I got:
So, please check if you want to get the links or really the text. If you want the text, then there is none that I see in DOM as I said. You may have to traverse through each of the link by clicking on it, and find if any you need fro the navigated page.

Selenium not going to next page in scraper

I'm writing my first real scraper and although in general it's been going well, I've hit a wall using Selenium. I can't get it to go to the next page.
Below is the head of my code. The output below this is just printing out data in terminal for now and that's all working fine. It just stops scraping at the end of page 1 and shows me my terminal prompt. It never starts on page 2. I would be so grateful if anyone could make a suggestion. I've tried selecting the button at the bottom of the page I'm trying to scrape using both the relative and full Xpath (you're seeing the full one here) but neither work. I'm trying to click the right-arrow button.
I built in my own error message to indicate whether the driver successfully found the element by Xpath or not. The error message fires when I execute my code, so I guess it's not finding the element. I just can't understand why not.
# Importing libraries
import requests
import csv
import re
from urllib.request import urlopen
from bs4 import BeautifulSoup
# Import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
import time
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Chrome("/path/to/driver", options=options)
# Yes, I do have the actual path to my driver in the original code
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
time.sleep(5)
while True:
try:
driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div/form/div[3]/div/div/ul[1]/li[4]/a').click()
except (TimeoutException, WebDriverException) as e:
print("A timeout or webdriver exception occurred.")
break
driver.quit()
What you can do is to set up Selenium expected conditions (visibility_of_element_located, element_to_be_clickable) and use a relative XPath to select the next page element. All of this in a loop (its range is the number of pages you have to deal with).
XPath for the next page link :
//div[#class='pagination ctm-pagination']/ul[1]/li[last()-1]/a
Code could look like :
## imports
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
## count the number of pages you have
els = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='pagination ctm-pagination']/ul[1]/li[last()]/a"))).get_attribute("data-current-page")
## loop. at the end of the loop, click on the following page
for i in range(int(els)):
***scrape what you want***
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='pagination ctm-pagination']/ul[1]/li[last()-1]/a"))).click()
You were pretty close with while True and try-catch{} logic. To go to the next page using Selenium and python you have to induce WebDriverWait for element_to_be_clickable() and you can use either of the following Locator Strategies:
Code Block:
driver.get("https://uk.eu-supply.com/ctm/supplier/publictenders?B=UK")
while True:
try:
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(#class, 'state-active')]//following::li[1]/a[#href]"))).click()
print("Clicked for next page")
WebDriverWait(driver, 10).until(EC.staleness_of(driver.find_element_by_xpath("//a[contains(#class, 'state-active')]//following::li[1]/a[#href]")))
except (TimeoutException):
print("No more pages")
break
driver.quit()
Console Output:
Clicked for next page
No more pages

Python 2.7 Selenium No Such Element on Website

I'm trying to do some webscraping from a betting website:
As part of the process, I have to click on the different buttons under the "Favourites" section on the left side to select different competitions.
Let's take the ENG Premier League button as example. I identified the button as:
(source: 666kb.com)
The XPath is: //*[#id="SportMenuF"]/div[3] and the ID is 91.
My code for clicking on the button is as follows:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_path = "C:\Python27\Scripts\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("URL Removed")
content = driver.find_element_by_xpath('//*[#id="SportMenuF"]/div[3]')
content.click()
Unfortunately, I always get this error message when I run the script:
"no such element: Unable to locate element:
{"method":"xpath","selector":"//*[#id="SportMenuF"]/div[3]"}"
I have tried different identifiers such as CCS Selector, ID and, as shown in the example above, the Xpath. I tried using waits and explicit conditions, too. None of this has worked.
I also attempted scraping some values from the website without any success:
from selenium import webdriver
from selenium.webdriver.common.by import By
chrome_path = "C:\Python27\Scripts\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("URL removed")
content = driver.find_elements_by_class_name('price-val')
for entry in content:
print entry.text
Same problem, nothing shows up.
The website embeddes an iframe from a different website. Could this be the cause of my problems? I tried scraping directly from the iframe URL, too, which didn't work, either.
I would appreciate any suggestions.
Sometimes elements are either hiding behind an iframe, or they haven't loaded yet
For the iframe check, try:
driver.switch_to.frame(0)
For the wait check, try:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '-put the x-path here-')))

why is this code showing NoSuchElementException error? I checked Chrome DOM my XPATH able to find the destinated tag

why is this code showing NoSuchElementException error? I checked Chrome DOM my XPATH able to find the destinated tag.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
class Firefox():
def test(self):
base_url='https://oakliquorcabinet.com/'
driver = webdriver.Chrome(executable_path=r'C:\Users\Vicky\Downloads\chromedriver')
driver.get(base_url)
search=driver.find_element(By.XPATH,'//div[#class="box-footer"]/button[2]')
search.click()
ff=Firefox()
ff.test()
Selenium by default waits for the DOM to load and tries to find the element. But, the confirmation pop up becomes visible after some time the main page is loaded.
Use explicit wait to fix this issue.
add these imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
change line in script:
search = WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.XPATH, '//div[#class="box-footer"]/button[2]')))

My scraper fails to get all the items from a webpage

I've written some code in python in combination with selenium to parse different product names from a webpage. There are few load more buttons visible if the browser is made to scroll downward. The webpage displays it's full content if the page is made to scroll downmost until there is no load more button to click. My scraper seems to be doing good but I'm not getting all the results. There are around 200 products in that page but I'm getting 90 out of them. What change should I bring about in my scraper to get them all? Thanks in advance.
The webpage I'm dealing with: Page_Link
This is the script I'm trying with:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("put_above_url_here")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".listing_item")))
for scroll in range(17):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
try:
load = driver.find_element_by_css_selector(".lm-btm")
load.click()
except Exception:
pass
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^=item_]"))):
name = item.find_element_by_css_selector(".pro-name.el2").text
print(name)
driver.quit()
Try below code to get required data:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.purplle.com/search?q=hair%20fall%20shamboo")
wait = WebDriverWait(driver, 10)
header = driver.find_element_by_tag_name("header")
driver.execute_script("arguments[0].style.display='none';", header)
while True:
try:
page = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".listing_item")))
driver.execute_script("arguments[0].scrollIntoView();", page)
page.send_keys(Keys.END)
load = wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "LOAD MORE")))
driver.execute_script("arguments[0].scrollIntoView();", load)
load.click()
wait.until(EC.staleness_of(load))
except:
break
for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^=item_]"))):
name = item.find_element_by_css_selector(".pro-name.el2").text
print(name)
driver.quit()
You should only Use Selenium as a last resort.
A simple look around in the webpage showed the API it called to get your data.
It returns a JSON output with all the details:
Link
You can now just loop over and store in a dataframe easily.
Very fast, fewer errors than selenium.

Categories