Using selenium and python. I am trying to get a URL and save it by doing this:
driver = webdriver.Firefox()
driver.get("https://google.com")
elem = driver.find_element(By.XPATH, "/html/body/div/div[3]/div[1]/div/div/div/div[1]/div[1]/a")
elem.click()
url = driver.current_url
print url
url that prints is google.com and not the new clicked link which gmail.
My question is, how can I get the second url and save it.
You are getting the current url before the new page is loaded. Add an Explicit Wait to, for instance, wait for the page title to contain "Gmail":
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://google.com")
# click "Gmail" link
elem = driver.find_element_by_link_text("Gmail")
elem.click()
# wait for the page to load
wait = WebDriverWait(driver, 10)
wait.until(EC.title_contains("Gmail"))
url = driver.current_url
print(url)
Also note how I've improved the way to locate the Gmail link.
Related
In selenium, I am grabbing some search result URL by XPATH. Now I want to click then one by one which will open then in the same browser one by one where the base URL is opened so that I can switch between then. How can I do that? I am giving my code below.
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
serv_obj = Service("F:\Softwares\Selenium WebDrivers\chromedriver.exe")
driver = webdriver.Chrome(service=serv_obj)
driver.maximize_window()
driver.implicitly_wait(5)
url = "https://testautomationpractice.blogspot.com/"
driver.get(url)
driver.find_element(By.XPATH, "//input[#id='Wikipedia1_wikipedia-search-input']").send_keys("selenium")
driver.find_element(By.XPATH, "//input[#type='submit']").click()
search_result = driver.find_elements(By.XPATH, "//div[#id='wikipedia-search-result-link']/a")
links = []
for item in search_result:
url_data = item.get_attribute("href")
links.append(url_data)
print(url_data)
print(len(links))
print(links)
I have grabbed all the links from the search result by using customized XPATH. I am being able yo print them also. But I want to open/click on the every resulted link one by one in the same browser.
You can do that as following:
Get the list of the links.
In a loop click on grabbed links.
When link is opened in a new tab switch the driver to the new opened tab.
Do there what you want to do (I simulated this by a simple delay of 1 second).
Close the new tab.
Switch back to the first tab.
Collect the list of links again since the previously collected links become Stale reference.
The following code works:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 20)
url = "https://testautomationpractice.blogspot.com/"
driver.get(url)
wait.until(EC.element_to_be_clickable((By.XPATH, "//input[#id='Wikipedia1_wikipedia-search-input']"))).send_keys("selenium")
wait.until(EC.element_to_be_clickable((By.XPATH, "//input[#type='submit']"))).click()
links = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[#id='wikipedia-search-result-link']/a")))
for index, link in enumerate(links):
links[index].click()
driver.switch_to.window(driver.window_handles[1])
time.sleep(1)
driver.close()
driver.switch_to.window(driver.window_handles[0])
links = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[#id='wikipedia-search-result-link']/a")))
I am trying to scrape some tennis statistics starting from 01-01-2019.
For this I try to scrape the following webpage with selenium: https://www.sofascore.com/de/tennis/2019-01-01
When I click on the first match manually the container on the right side changes and shows the statistics.
This is what I want to access automatically.
When I try to click on the element with selenium it redirects me to another page.
Can anyone tell me why it is not just showing the same content as by manually clicking and how I can solve this issue?
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
import time
options = Options()
options.binary_location = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"
browser = webdriver.Chrome(chrome_options = options)
url = 'https://www.sofascore.com/de/tennis/2019-01-01'
browser.get(url)
browser.maximize_window()
xpath = '/html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div'
browser.find_element_by_xpath(xpath).click()
time.sleep(2)
browser.close()`
You can use the below xpath :
//div[contains(#class, 'Col-pm5mcz-')]//descendant::div[contains(#class, 'styles__StyledWidget-')]
and get the innerHTML of that using get_attribute method
Code :
url = "https://www.sofascore.com/de/tennis/2019-01-01"
driver.get(url)
xpath = '/html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div'
driver.find_element_by_xpath(xpath).click()
sleep(2)
details = driver.find_element_by_xpath("//div[contains(#class, 'Col-pm5mcz-')]//descendant::div[contains(#class, 'styles__StyledWidget-')]").get_attribute('innerHTML')
print(details)
The xpath that you are using is absolute xpath /html/body/div[1]/main/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/a/div
try to replace that with Relative xpath.
See if this works
tableRows = driver.find_elements_by_xpath(".//div[#class='ReactVirtualized__Grid ReactVirtualized__List']//following::div/a[contains(#class,'EventCellstyles__Link')]")
for e in tableRows:
e.click()
//You can add implicit wait here for the statics section to load
driver.find_element_by_xpath(".//a[text()='Statistiken']").click()
I'm practicing trying to scrape my university's course catalog. I have a few lines in Python that open the url in Chrome and clicks the search button to bring up the course catalog. When I go to extract the texting using find_elements_by_xpath(), it returns blank. When I use the dev tools on Chrome, there definitely is text there.
from selenium import webdriver
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]')
print(course)
I'm trying to extract the text from the element 'OSU_CAT_SRCH_OSR_CRSE_HEADER'. I don't understand why it's not returning the text values especially when I can see that it contains text with dev tools.
You are not using text that is the reason you are not getting the text.
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]').text
Try above changes in last second line
Below is the full code after the changes
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
# wait 10 seconds
course = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]'))
).text
print(course)
I'm trying to get the URL of a video, but every time it doesn't show in my output. I try request, urllib and even selenium, but it just doesn't show part of the code in my result, it's like it is blocked.
The url is https://unitplay.net/tt0089222, and here is my code:
from selenium import webdriver
browser=webdriver.Chrome('path/chromedriver.exe')
type(browser)
browser.get('https://unitplay.net/tt0089222')
elem = browser.page_source
print(elem)
browser.quit()
Here is the part it doesn't show and I want to get the src from it:
<div class="jw-media jw-reset"><video class="jw-video jw-reset" x-webkit-airplay="allow" webkit-playsinline="" playsinline="" preload="auto" jw-loaded="data" src="https://unitplay.net//file/others/DA6BB292BA130B6A825B62B96BD929F811EBF7BFEC748F8E2609004F5D96D0F5DD7025F4450289E31279E9F621883D048C869F15520DBE571D8FA35EBCCACD75" __idm_id__="64900097" jw-played=""></video></div>
You can wait for the element to appear using selenium.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome('path/chromedriver.exe')
browser.get('https://unitplay.net/tt0089222')
elem = browser.page_source
try:
element = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "video"))
)
print(element.get_attribute("src"))
finally:
browser.quit()
This should tell selenium to wait up to 10 seconds for a video element to appear and then print out it's source.
I've searched several solutions but it didn't work.
That's my code
driver = webdriver.PhantomJS()
driver.get('https://baijia.baidu.com')
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.ID, 'getMoreArticles'))).click()
content = driver.page_source
page = open('test.html','wb')
page.write(content)
I've tried to debug the code, It successfully returns the clicked page.
when I run the code, it also returns successfully, however it don't returns the clicked page, just the source page.
I tried to search the solutions, take the page down to the bottom:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)",element)
But it's the same result, only debug successfully.
Thanks
It seems, that your button initiate an AJAX request. Driver doesn't wait it finished, because there is no page reload. So you should add explicit wait. Something like that:
expected_number_of_articles = 10 # enter your number
article_locator = (By.CSS_SELECTOR, 'div#article') # enter your locator
wait.until(lambda driver: len(driver.find_elements(*article_locator)) >= expected_number_of_articles)
Before accessing page source wait for a small interval to wait for page load
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
driver = webdriver.Firefox()
driver.get('https://baijia.baidu.com')
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.ID, 'getMoreArticles'))).click()
time.sleep(4)
content = driver.page_source
page = open('test3.html','w')
page.write(content)