I've searched several solutions but it didn't work.
That's my code
driver = webdriver.PhantomJS()
driver.get('https://baijia.baidu.com')
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.ID, 'getMoreArticles'))).click()
content = driver.page_source
page = open('test.html','wb')
page.write(content)
I've tried to debug the code, It successfully returns the clicked page.
when I run the code, it also returns successfully, however it don't returns the clicked page, just the source page.
I tried to search the solutions, take the page down to the bottom:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)",element)
But it's the same result, only debug successfully.
Thanks
It seems, that your button initiate an AJAX request. Driver doesn't wait it finished, because there is no page reload. So you should add explicit wait. Something like that:
expected_number_of_articles = 10 # enter your number
article_locator = (By.CSS_SELECTOR, 'div#article') # enter your locator
wait.until(lambda driver: len(driver.find_elements(*article_locator)) >= expected_number_of_articles)
Before accessing page source wait for a small interval to wait for page load
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
driver = webdriver.Firefox()
driver.get('https://baijia.baidu.com')
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.ID, 'getMoreArticles'))).click()
time.sleep(4)
content = driver.page_source
page = open('test3.html','w')
page.write(content)
Related
So I am trying to scrape data from a table from several hundred pages on a website. Here is part of what I have so far:
driver.get("link")
driver.maximize_window()
window_before = driver.window_handles[0]
driver.switch_to.window(window_before)
wait = WebDriverWait(driver, 10)
driver.execute_script("window.scrollTo(0, 350)")
games = driver.find_elements(By.XPATH, '//*[#id="schedule"]/tbody/tr')
This code only works sometimes. If I run this chunk 10 times, only 5 times will the website actually scroll down. I tried using this:
for i in range(0, 2): driver.find_element(By.XPATH, '//*[#id="meta"]/div[1]/p[1]/a').send_keys(Keys.DOWN)
but the same issue arises. Sometimes that scrolls down the amount I need, other times it does nothing, and other times it scrolls the entire page.
This part of my code navigates to the first link I need to click and on the next page I need to scroll another page, where the same issue is present. This is all part of a loop that goes through several hundred pages to read html tables, so even if it works the first 50 times, I won't get all the data I need.
Edit: Directly after the above snippet I have this:
for idx, game in enumerate(games):
driver.find_element(By.XPATH, '/html/body/div[2]/div[6]/div[3]/div[2]/table/tbody/tr['+str(idx+1)+']/td[6]/a').click()
Which is where I get the "element is not clickable at point (X, Y)" error.
Am I doing something wrong here, or is there a work around to accomplish my goal?
Here is one way to access href attribute for every 'Box Score' link from that page (according to OP's clarification in comments):
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
actions = ActionChains(browser)
url = 'https://www.basketball-reference.com/leagues/NBA_2014_games-october.html'
browser.get(url)
# print(browser.page_source)
# browser.maximize_window()
try:
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[#class="qc-cmp2-summary-section"]'))).click()
print('clicked cookie parent')
wait.until(EC.element_to_be_clickable((By.XPATH, '//button[#mode="primary"]'))).click()
print('accepted cookies')
except Exception as e:
print('no cookies')
wait.until(EC.element_to_be_clickable((By.XPATH, '//div[#id="all_schedule"]'))).location_once_scrolled_into_view
table_with_score_links = wait.until(EC.presence_of_element_located((By.XPATH, '//table[#id="schedule"]')))
# print(table_with_score_links.get_attribute('outerHTML'))
links_from_table = [x.get_attribute('href') for x in table_with_score_links.find_elements(By.TAG_NAME, 'a') if x.text == 'Box Score']
print(links_from_table)
Result printed in terminal:
clicked cookie parent
accepted cookies
['https://www.basketball-reference.com/boxscores/201310290IND.html', 'https://www.basketball-reference.com/boxscores/201310290MIA.html', 'https://www.basketball-reference.com/boxscores/201310290LAL.html', 'https://www.basketball-reference.com/boxscores/201310300CLE.html', 'https://www.basketball-reference.com/boxscores/201310300TOR.html', 'https://www.basketball-reference.com/boxscores/201310300PHI.html', 'https://www.basketball-reference.com/boxscores/201310300DET.html', 'https://www.basketball-reference.com/boxscores/201310300NYK.html', 'https://www.basketball-reference.com/boxscores/201310300NOP.html', 'https://www.basketball-reference.com/boxscores/201310300MIN.html', 'https://www.basketball-reference.com/boxscores/201310300HOU.html', 'https://www.basketball-reference.com/boxscores/201310300SAS.html', 'https://www.basketball-reference.com/boxscores/201310300DAL.html', 'https://www.basketball-reference.com/boxscores/201310300UTA.html', 'https://www.basketball-reference.com/boxscores/201310300PHO.html', 'https://www.basketball-reference.com/boxscores/201310300SAC.html', 'https://www.basketball-reference.com/boxscores/201310300GSW.html', 'https://www.basketball-reference.com/boxscores/201310310CHI.html', 'https://www.basketball-reference.com/boxscores/201310310LAC.html']
I tried to make variable names as descriptive as possible, and also left some commented out lines of code, to help with the thought process - build up to reach the end goal.
You can now go through those links one by one, etc.
Selenium documentation can be found here: https://www.selenium.dev/documentation/
I'm a beginner in web scrapping and I've followed a few YouTube videos about how to do this, but regardless to what I try I can't have my code accept the cookies.
This is the code I have so far:
from selenium import webdriver
import time
driver = webdriver.Safari()
URL = "https://www.zoopla.co.uk/new-homes/property/london/?q=London&results_sort=newest_listings&search_source=new-homes&page_size=25&pn=1&view_type=list"
driver.get(URL)
time.sleep(2) # Wait a couple of seconds, so the website doesn't suspect you are a bot
try:
driver.switch_to_frame('gdpr-consent-notice') # This is the id of the frame
accept_cookies_button = driver.find_element_by_xpath('//*[#id="save"]')
accept_cookies_button.click()
except AttributeError: # If you have the latest version of selenium, the code above won't run because the "switch_to_frame" is deprecated
driver.switch_to.frame('gdpr-consent-notice') # This is the id of the frame
accept_cookies_button = driver.find_element_by_xpath('//*[#id="save"]')
accept_cookies_button.click()
except:
pass # If there is no cookies button, we won't find it, so we can pass
I don't have safari webdriver but chrome webdriver, but I think they works similar. On chrome you close the cookie banner with this code
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get(URL)
# wait no more than 20 seconds for the `iframe` with id `gdpr-consent-notice` to appear, then switch to it
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "gdpr-consent-notice")))
# click accept cookies button
driver.find_element(By.CSS_SELECTOR, '#save').click()
I'm trying to input the text string "L62T18H029-P3215" into the search input box on this website https://lamerfashion.com and press Enter.
I have tried to execute some javascript to change the value of the hidden element however I am unable to make Selenium send the ENTER key to submit.
driver = webdriver.Chrome(ChromeDriver)
driver.get("https://lamerfashion.com")
element = WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, '//a[#class="search-icon"]')))
element.click()
e = driver.execute_script("return document.getElementsByName('type')[0].value;")
print(e)
driver.execute_script("document.getElementsByName('type')[0].value='L62T18H029-P3215';")
e = driver.execute_script("return document.getElementsByName('type')[0].value;")
print(e)
Output:
product
L62T18H029-P3215
I run script in java, maybe this will help you..Try this(for Reference)
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://lamerfashion.com");
WebElement newSearch = driver.findElement(By.className("search-icon")););
newSearch.click();
Thread.sleep(1000);
WebElement searchpro = driver.findElement(By.xpath("//*[#id=\"navbar\"]/div/ul[2]/li[1]/form/input[2]"));
searchpro.sendKeys("L62T18H029-P3215");
searchpro.sendKeys(Keys.ENTER);
I don't see any need of JS here.
You can simply go ahead with send_keys method which is already present in selenium.
Code :
driver = webdriver.Chrome(executable_path = r'chromedriverpath')
wait = WebDriverWait(driver,10)
driver.maximize_window()
driver.get("https://lamerfashion.com")
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a.search-icon'))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[#class='search-icon']/following-sibling::form/input[#name='q']"))).send_keys("L62T18H029-P3215")
imports :
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Seeing the website, there are two inputs and the one with hidden might actually not be the one you need.
Try the same thing but for the name "q"
Moreover, try using Selenium command
element.send_keys('text_you_want') - that should be sufficient istead of the JS
Using selenium and python. I am trying to get a URL and save it by doing this:
driver = webdriver.Firefox()
driver.get("https://google.com")
elem = driver.find_element(By.XPATH, "/html/body/div/div[3]/div[1]/div/div/div/div[1]/div[1]/a")
elem.click()
url = driver.current_url
print url
url that prints is google.com and not the new clicked link which gmail.
My question is, how can I get the second url and save it.
You are getting the current url before the new page is loaded. Add an Explicit Wait to, for instance, wait for the page title to contain "Gmail":
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://google.com")
# click "Gmail" link
elem = driver.find_element_by_link_text("Gmail")
elem.click()
# wait for the page to load
wait = WebDriverWait(driver, 10)
wait.until(EC.title_contains("Gmail"))
url = driver.current_url
print(url)
Also note how I've improved the way to locate the Gmail link.
I want to grab the page source of the page after I make a click. And then go back using browser.back() function. But Selenium doesn't let the page fully load after the click and the content which is generated by JavaScript isn't being included in the page source of that page.
element[i].click()
#Need to wait here until the content is fully generated by JS.
#And then grab the page source.
scoreCardHTML = browser.page_source
browser.back()
As Alan mentioned - you can wait for some element to be loaded. Below is an example code
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "element_id")))
you can also use seleniums staleness_of
from selenium.webdriver.support.expected_conditions import staleness_of
def wait_for_page_load(browser, timeout=30):
old_page = browser.find_element_by_tag_name('html')
yield
WebDriverWait(browser, timeout).until(
staleness_of(old_page)
)
You can do it using this method of a loop of try and wait, an easy to implement method
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
Button=''
while not Button:
try:
Button=browser.find_element_by_name('NAME OF ELEMENT')
Button.click()
except:continue
Assuming "pass" is an element in the current page and won't be present at the target page.
I mostly use Id of the link I am going to click on. because it is rarely present at the target page.
while True:
try:
browser.find_element_by_id("pass")
except:
break