Unable to collect titles from a webpage in the right way

Unable to collect titles from a webpage in the right way - python

I've written a script in python in combination with selenium to get some titles out of some images from a webpage. The thing is the content I would like to parse are located near the bottom of that page. So, If i try like the conventional way to grab that, the browse fails.
So, I used a javascript code within my scraper to let the browser scroll to the bottom and it worked.
However, I don't think it's a good solution to keep up so tried with .scrollIntoView() but that didn't work either. What can be the ideal way to serve the purpose?
This is my script:
from selenium import webdriver
import time
URL = "https://www.99acres.com/supertech-cape-town-sector-74-noida-npxid-r922?sid=UiB8IFFTIHwgUyB8IzMxIyAgfCAxIHwgNyM0MyMgfCA4MjEyIHwjNSMgIHwg"
driver = webdriver.Chrome()
driver.get(URL)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") #I don't wish to keep this line
time.sleep(3)
for item in driver.find_elements_by_css_selector("#carousel img"):
print(item.get_attribute("title"))
driver.quit()

Try to use below code that should allow you to scroll to required node and scrape images:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
banks = driver.find_element_by_id("xidBankSection")
driver.execute_script("arguments[0].scrollIntoView();", banks)
images = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#carousel img")))
for image in images:
print(image.get_attribute("title"))
Some explanation: initially those images are absent in source code and generated inside BankSection once you scrolled to it, so you need to scroll down to BankSection and wait until images generated

You can try below line of code
recentList = driver.find_elements_by_css_selector("#carousel img"):
for list in recentList :
driver.execute_script("arguments[0].scrollIntoView();", list )
print(list.get_attribute("title"))

Related

How do I use selenium ChromeDriver to scroll the sidebar on Google maps to load more results?

I’ve run into a problem trying to use Selenium ChromeDriver to scroll down the sidebar of a google maps results page. I am trying to get to the 6th result down but the result does not fully load until you scroll down. Using the find_element_by_xpath method, I am successfully able to access results 1-5 and click into them individually, but when trying to use the actions.move_to_element(link).perform() method to scroll to the 6th element, it does not work and throws an error message.
The error that I get is:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:
However, I know this element exists because when I manually scroll and more results are loaded, the Xpath works correctly. What am I doing wrong? I’ve spent many hours trying to solve this and I haven’t been able to solve with the available content out there. I appreciate any help or insights you can offer, thank you!
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as soup
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.google.com/maps")
time.sleep(7)
page = soup(driver.page_source, 'html.parser')
#find the searchbar, enter search, and hit return
search = driver.find_element_by_id('searchboxinput')
search.send_keys("dentists in Austin Texas")
search.send_keys(Keys.RETURN)
driver.maximize_window()
time.sleep(7)
#I want to get the 6th result down but it requires a sidebar scroll to load
link = driver.find_element_by_xpath("//*[#id='pane']/div/div[1]/div/div/div[4]/div[1]/div[13]/div/a")
actions.move_to_element(link).perform()
link.click()
time.sleep(5)
driver.back()```

I found a solution that works, it is to target the element in XPATH from the javascript interface of selenium. You must then execute two commands on an instruction (targeting and scroll)
driver.executeScript("var el = document.evaluate('/html/body/jsl/div[3]/div[10]/div[8]/div/div[1]/div/div/div[4]/div[1]', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue; el.scroll(0, 5000);");
this is the only solution that worked for me

The search results in the google map are located with //div[contains(#aria-label,'dentists in Austin Texas')]//div[contains(#jsaction,'mouseover')] XPath.
So, to select 6-th element there you can do the following
from selenium.webdriver.common.action_chains import ActionChains
results = driver.find_elements_by_xpath('//div[contains(#aria-label,"dentists in Austin Texas")]//div[contains(#jsaction,"mouseover")]')
ActionChains(driver).move_to_element(results[6]).click(button).perform()

I was just implementing scrolling on google map sidebar, it's working on my side. check this code please
# selecting scroll body
driver.find_element_by_xpath('/html/body/div[3]/div[9]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]').click()
#start scrolling your sidebar
html = driver.find_element_by_xpath('/html/body/div[3]/div[9]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]')
html.send_keys(Keys.END)
also add the "KEYS" library
from selenium.webdriver.common.keys import Keys
I hope it would help you.
by the way I have implemented scrapping of google map with its available data and used above code to scroll. check if you have any problem, let me know then

selenium - trouble clicking a button to export

I've been learning how to use selenium to parse data and I've been doing alright with that process. So I'm trying something different, in that I found data ta parse, but there is a provided export button which to me, sounds like a quicker solution, so I thought I'd have a stab at it. But I'm not quite understanding how it's not working:
browser = webdriver.Chrome()
url = 'https://www.rotowire.com/football/injury-report.php'
browser.get(url)
button = browser.find_elements_by_xpath('//*[#id="injury-report"]/div[2]/div[2]/button[2]')
button.click()
browser.close()
I just want to click on the export csv button on the page.
Also, I haven't looked yet, but my next step would be to specify where to save the csv file it exports. Right now it, defaults to the downloads folder. Is there a way to specify a location without changing the default? Also is there a way to specify a file name?

Try below code to click required button:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome()
url = 'https://www.rotowire.com/football/injury-report.php'
browser.get(url)
button = wait(browser, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "is-csv")))
button.click()
browser.close()
Also check how to save file to specific folder

How to click first link in an overlay using selenium driver python

I want to build a simple app to learn Selenium in Python using a custom google search url to search Reddit and scrape the results:
https://cse.google.com/cse/publicurl?cx=011171116424399119392:skuhhpapys8
I am trying to click on the first link in the overlay that comes up after a search on the site as pictured here
image
I have so far:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
url = "https://cse.google.com/cse/publicurl?cx=011171116424399119392:skuhhpapys8"
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
python_button = driver.find_element_by_class_name('gsc-search-button')
inputElement = driver.find_element_by_id("gsc-i-id1")
soup_level1=BeautifulSoup(driver.page_source, 'lxml')
search = input("Ask Reddit")
inputElement.send_keys(search)
inputElement.send_keys(Keys.ENTER)
overlay = driver.find_element_by_class_name('gsc-results-wrapper-overlay')
first_link = driver.find_element_by_css_selector('a.gsc-title').click()
and I get the error
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: a.gsc-title
Any idea on what I am missing? Is there something special I have to do to deal with an overlay like that? There are multiple elements by that css selector am I not targeting the first?

The class for a link, as far as I can see is not gsc-title, it's gs-title. Hence your main problem. But also a couple of other things:
There's nothing special about overlay, no need to account for it. So overlay = driver.<...> is not required, unless you use it to make sure results showed up (see next point)
Changing implicit wait (driver.implicitly_wait(30)) is not a good idea; use explicit wait instead:
from selenium.webdriver.support import expected_conditions as EC
...
WebDriverWait(driver, 30).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "gsc-results-wrapper-overlay")))
first_link = driver.find_element_by_css_selector('a.gs-title').click()

Access nested elements in HTML using Python Selenium

I am trying to automate logging into a website (http://www.phptravels.net/) using Selenium - Python on Chrome. This is an open website used for automation tutorials.
I am trying to click an element to open a drop-down (My Account button at the top navbar) which will then give me the option to login and redirect me to the login page.
The HTML is nested with many div and ul/li tags.
I have tried various lines of code but haven't been able to make much progress.
driver.find_element_by_id('li_myaccount').click()
driver.find_element_by_link_text(' Login').click()
driver.find_element_by_xpath("//*[#id='li_myaccount']/ul/li[1]/a").click()
These are some of the examples that I tried out. All of them failed with the error "element not visible".
How do I find those elements? Even the xpath function is throwing this error.
I have not added any time wait in my code.
Any ideas how to proceed further?

Hope this code will help:
from selenium import webdriver
from selenium.webdriver.common.by import By
url="http://www.phptravels.net/"
d=webdriver.Chrome()
d.maximize_window()
d.get(url)
d.find_element(By.LINK_TEXT,'MY ACCOUNT').click()
d.find_element(By.LINK_TEXT,'Login').click()
d.find_element(By.NAME,"username").send_keys("Test")
d.find_element(By.NAME,"password").send_keys("Test")
d.find_element(By.XPATH,"//button[text()='Login']").click()
Use the best available locator on your html page, so that you need not to create xpath of css for simple operations

You may be having issues with the page not being loaded when you try and find the element of interest. You should use the WebDriverWait class to wait until a given element is present in the page.
Adapted from the docs:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Set up your driver here....
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, 'li_myaccount'))
)
element.click()
except:
#Handle any exceptions here
finally:
driver.quit()

cannot locate element within a web page using selenium python

I just want to write a simple log in code for one website. However, I think the log in page was written in JS. It's really hard to locate the elements with selenium.
The web page I am going to play with is:
"https://www.nike.com/snkrs/login?returnUrl=%2F"
This is how the page looks like and how the inspect element page look like:
I was trying to locate the element by following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox()
driver.get("https://www.nike.com/snkrs/login?returnUrl=%2Fthread%2Fe9680e08e7e3cd76b8832684037a58a369cad5ed")
time.sleep(5)
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
elem =driver.find_element_by_xpath("//*[#id='ce3feab5-6156-441a-970e-23544473a623']")
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
driver.close()
This code return the error that the element could not find by [#id='ce3feab5-6156-441a-970e-23544473a623'.
I tried playing with frames, it seems does not work. If I went to "view web source" page, it is full of JS code.
Is there a good way to play with such a web page with selenium?

Try changing the code :
elem =driver.find_element_by_xpath("//*[#id='ce3feab5-6156-441a-970e-23544473a623']")
to
elem =driver.find_element_by_xpath("//*[#type='email']")

My guess (and observation) is that the id changes each time you visit the page. The id looks auto-generated, and when I go to the page multiple times, the id is different each time.
You'll need to search for something that doesn't change. For example, you can search for the name attribute, which has the seemingly static value "emailAddress"
element = driver.find_element_by_name("emailAddress")
You could also use an xpath expression to search for other attributes, such as data-componentname:
element = driver.find_element_by_xpath("//input[#data-componentname='emailAddress']")
Also, instead of a hard-coded sleep, you can simply wait for the element to be visible:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.nike.com/snkrs/login")
element = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.NAME, "emailAddress"))
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to collect titles from a webpage in the right way - python

You can try below line of code recentList = driver.find_elements_by_css_selector("#carousel img"): for list in recentList : driver.execute_script("arguments[0].scrollIntoView();", list ) print(list.get_attribute("title"))

Related

How do I use selenium ChromeDriver to scroll the sidebar on Google maps to load more results?

selenium - trouble clicking a button to export

How to click first link in an overlay using selenium driver python

Access nested elements in HTML using Python Selenium

cannot locate element within a web page using selenium python

Categories

Resources