I want to build a simple app to learn Selenium in Python using a custom google search url to search Reddit and scrape the results:
https://cse.google.com/cse/publicurl?cx=011171116424399119392:skuhhpapys8
I am trying to click on the first link in the overlay that comes up after a search on the site as pictured here
image
I have so far:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
url = "https://cse.google.com/cse/publicurl?cx=011171116424399119392:skuhhpapys8"
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
python_button = driver.find_element_by_class_name('gsc-search-button')
inputElement = driver.find_element_by_id("gsc-i-id1")
soup_level1=BeautifulSoup(driver.page_source, 'lxml')
search = input("Ask Reddit")
inputElement.send_keys(search)
inputElement.send_keys(Keys.ENTER)
overlay = driver.find_element_by_class_name('gsc-results-wrapper-overlay')
first_link = driver.find_element_by_css_selector('a.gsc-title').click()
and I get the error
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: a.gsc-title
Any idea on what I am missing? Is there something special I have to do to deal with an overlay like that? There are multiple elements by that css selector am I not targeting the first?
The class for a link, as far as I can see is not gsc-title, it's gs-title. Hence your main problem. But also a couple of other things:
There's nothing special about overlay, no need to account for it. So overlay = driver.<...> is not required, unless you use it to make sure results showed up (see next point)
Changing implicit wait (driver.implicitly_wait(30)) is not a good idea; use explicit wait instead:
from selenium.webdriver.support import expected_conditions as EC
...
WebDriverWait(driver, 30).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "gsc-results-wrapper-overlay")))
first_link = driver.find_element_by_css_selector('a.gs-title').click()
Related
I am trying to scrape from Google search results the blue highlighted portion as shown below:
When I use inspect element, it shows: span class="YhemCb". I have tried using various soup.find and soup.find_all commands, but everything I have tried has no
output so far. What command should I use to scrape this part?
Google uses javascript to display most of its web elements, so using something like requests and BeautifulSoup is unfortunately not enough.
Instead, use selenium! It essentially allows you to control a browser using code.
First, you will need to navigate to the google page you wish to scrape
google_search = 'https://www.google.com/search?q=courtyard+by+marriott+fayetteville+fort+bragg'
driver.get(google_search)
Then, you have to wait until the review page loads in the browser.
This is done using WebDriverWait: you have to specify an element that needs to appear on the page. The [data-attrid="kc:/local:one line summary"] span css selector allows me to select the review info about the hotel.
timeout = 10
expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '[data-attrid="kc:/local:one line summary"] span'))
review_element = WebDriverWait(driver, timeout).until(expectation)
And finally, print the rating
print(review_element.get_attribute('innerHTML'))
Here's the full code in case you want to play around with it
import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
# setup selenium (I am using chrome here, so chrome has to be installed on your system)
chromedriver_autoinstaller.install()
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
# navigate to google
google_search = 'https://www.google.com/search?q=courtyard+by+marriott+fayetteville+fort+bragg'
driver.get(google_search)
# wait until the page loads
timeout = 10
expectation = EC.presence_of_element_located((By.CSS_SELECTOR, '[data-attrid="kc:/local:one line summary"] span'))
review_element = WebDriverWait(driver, timeout).until(expectation)
# print the rating
print(review_element.get_attribute('innerHTML'))
Note Google is notoriously defensive against anyone who is trying to scrape them. On first few attempts you might be successful, but eventually you will have to deal with Google Captcha.
To work around that, I would suggest using the search engine scraper, something like the quickstart guide to get you started!
Disclaimer: I work at Oxylabs.io
I’ve run into a problem trying to use Selenium ChromeDriver to scroll down the sidebar of a google maps results page. I am trying to get to the 6th result down but the result does not fully load until you scroll down. Using the find_element_by_xpath method, I am successfully able to access results 1-5 and click into them individually, but when trying to use the actions.move_to_element(link).perform() method to scroll to the 6th element, it does not work and throws an error message.
The error that I get is:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:
However, I know this element exists because when I manually scroll and more results are loaded, the Xpath works correctly. What am I doing wrong? I’ve spent many hours trying to solve this and I haven’t been able to solve with the available content out there. I appreciate any help or insights you can offer, thank you!
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as soup
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.google.com/maps")
time.sleep(7)
page = soup(driver.page_source, 'html.parser')
#find the searchbar, enter search, and hit return
search = driver.find_element_by_id('searchboxinput')
search.send_keys("dentists in Austin Texas")
search.send_keys(Keys.RETURN)
driver.maximize_window()
time.sleep(7)
#I want to get the 6th result down but it requires a sidebar scroll to load
link = driver.find_element_by_xpath("//*[#id='pane']/div/div[1]/div/div/div[4]/div[1]/div[13]/div/a")
actions.move_to_element(link).perform()
link.click()
time.sleep(5)
driver.back()```
I found a solution that works, it is to target the element in XPATH from the javascript interface of selenium. You must then execute two commands on an instruction (targeting and scroll)
driver.executeScript("var el = document.evaluate('/html/body/jsl/div[3]/div[10]/div[8]/div/div[1]/div/div/div[4]/div[1]', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue; el.scroll(0, 5000);");
this is the only solution that worked for me
The search results in the google map are located with //div[contains(#aria-label,'dentists in Austin Texas')]//div[contains(#jsaction,'mouseover')] XPath.
So, to select 6-th element there you can do the following
from selenium.webdriver.common.action_chains import ActionChains
results = driver.find_elements_by_xpath('//div[contains(#aria-label,"dentists in Austin Texas")]//div[contains(#jsaction,"mouseover")]')
ActionChains(driver).move_to_element(results[6]).click(button).perform()
I was just implementing scrolling on google map sidebar, it's working on my side. check this code please
# selecting scroll body
driver.find_element_by_xpath('/html/body/div[3]/div[9]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]').click()
#start scrolling your sidebar
html = driver.find_element_by_xpath('/html/body/div[3]/div[9]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]')
html.send_keys(Keys.END)
also add the "KEYS" library
from selenium.webdriver.common.keys import Keys
I hope it would help you.
by the way I have implemented scrapping of google map with its available data and used above code to scroll. check if you have any problem, let me know then
There is site, that streams youtube videos. I want to get playlist with them. So I use selenium webdriver to get the needed element div with class-name ytp-title-text where youtube link is located.
It is located here for example, when I use browser console to find element:
<div class="ytp-title-text"><a class="ytp-title-link yt-uix-sessionlink" target="_blank" data-sessionlink="feature=player-title" href="https://www.youtube.com/watch?v=VyCY62ElJ3g">Fears - Jono McCleery</a><div class="ytp-title-subtext"><a class="ytp-title-channel-name" target="_blank" href=""></a></div></div>
I wrote simple script for testing:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
driver = webdriver.Firefox()
driver.get('http://awsmtv.com')
try:
element = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.CLASS_NAME, "ytp-title-text"))
)
finally:
driver.quit()
But no element is found and timeout exception is thrown. I cannot understand, what actions selenium needs to perform to get the full page source.
Required link is hidden and also located inside an iframe. Try below to locate it:
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it("tvPlayer_1"))
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "ytp-title-link")))
print(element.get_attribute('href'))
finally:
driver.quit()
Just saw this element is inside iframe... You need to switch to the iframe first -> find it by ClassName -> ifame = ...(By.CLASS_NAME, "player") then switch to it driver.switch_to_frame(iframe) and you should be able now to get the wanted element :)
The XPath locator like this one will work (or your locator) -> "//a[#class='ytp-title-link yt-uix-sessionlink']".
You then need via the element to get the property href for the youtube video url or the text of the element for the song title.
If still not working I can suggest to get the page source - html = driver.page_source which will give you the source of the page and via some regex to get the info you want eventually.
I've written a script in python in combination with selenium to get some titles out of some images from a webpage. The thing is the content I would like to parse are located near the bottom of that page. So, If i try like the conventional way to grab that, the browse fails.
So, I used a javascript code within my scraper to let the browser scroll to the bottom and it worked.
However, I don't think it's a good solution to keep up so tried with .scrollIntoView() but that didn't work either. What can be the ideal way to serve the purpose?
This is my script:
from selenium import webdriver
import time
URL = "https://www.99acres.com/supertech-cape-town-sector-74-noida-npxid-r922?sid=UiB8IFFTIHwgUyB8IzMxIyAgfCAxIHwgNyM0MyMgfCA4MjEyIHwjNSMgIHwg"
driver = webdriver.Chrome()
driver.get(URL)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") #I don't wish to keep this line
time.sleep(3)
for item in driver.find_elements_by_css_selector("#carousel img"):
print(item.get_attribute("title"))
driver.quit()
Try to use below code that should allow you to scroll to required node and scrape images:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
banks = driver.find_element_by_id("xidBankSection")
driver.execute_script("arguments[0].scrollIntoView();", banks)
images = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#carousel img")))
for image in images:
print(image.get_attribute("title"))
Some explanation: initially those images are absent in source code and generated inside BankSection once you scrolled to it, so you need to scroll down to BankSection and wait until images generated
You can try below line of code
recentList = driver.find_elements_by_css_selector("#carousel img"):
for list in recentList :
driver.execute_script("arguments[0].scrollIntoView();", list )
print(list.get_attribute("title"))
import webbrowser
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://www.suntrust.com/')
browser.implicitly_wait(10)
elem = browser.find_element_by_xpath('//*[#id="sign-on-3A69E29D-79E0-403E-
9352-5261239ADD89-user"]')
elem.send_keys('your-username')
I'm having two problems:
1) The window doesn't open up in full screen, meaning the username field isn't physically visible. How do I open the url in a new tab instead of a new window.
2) Other posts suggest that the element is faked by JavaScript so that webdriver can't see it.
I've tried find_element_by in all the other locators.
Your question should be answered by a simple line of code that you need to include
browser.maximize_window()
would maximise your window. Another option is to set a specific window size like
driver.set_window_size(1280, 1024)
You can use both to achieve the browser being open to a maximum size.
Another point that I would make is that, if you're a beginner, try using more of CSS Selectors instead of the Xpath. They are much faster than Xpath's. Please see a detailed post on SQA about what makes a good locator.
For your case, the CSS Selector for the sign in field would be
driver.find_element_by_css_selector('input#sign-on-3A69E29D-79E0-403E-9352-5261239ADD89-user')
For password, it would be
driver.find_element_by_css_selector('input#sign-on-3A69E29D-79E0-403E-9352-5261239ADD89-password')
For Sign On button it would be
driver.find_element_by_css_selector('button.suntrust-login-button')
Please read more about CSS Selectors and try using them more often in your code.
Here is the Answer to your Question:
The xpath you have constructed is not unique. The xpath matches exactly to 2 elements on the HTML DOM. So Selenium was trying to send_keys on the first matching element which was invisible. Hence the error element not visible. The xpath used in the following code block identifies the User ID field uniquely and sends the text:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
browser = webdriver.Chrome(chrome_options=options, executable_path="C:\\Utility\\BrowserDrivers\\chromedriver.exe")
browser.get('https://www.suntrust.com/')
browser.implicitly_wait(15)
elem = browser.find_element_by_xpath('//section[#role="main"]//input[#id="sign-on-3A69E29D-79E0-403E-9352-5261239ADD89-user"]')
elem.send_keys('your-username')
Let me know if this Answers your Question.
If you use absolute xpath then you can send the text in textbox.
Below code will do that
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
browser.maximize_window() # to open full size window
browser.get('https://www.suntrust.com/')
# browser.implicitly_wait(10)
WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="sign-on-3A69E29D-79E0-403E-9352-5261239ADD89-user"]')))
elem = browser.find_element_by_xpath("//div[#id='suntrust-login-form-herosignon']/div[2]/form/div[1]/input[1]")
elem.send_keys('your-username')
elem1 = browser.find_element_by_xpath("//div[#id='suntrust-login-form-herosignon']/div[2]/form/div[2]/input[1]")
elem1.send_keys('your-password')