I wan to reload a html page which I created locally in my computer using python. I have tried using this:
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get('C://User/Desktop/total.html')
while True:
time.sleep(20)
driver.refresh()
driver.quit()
but it was throwing FileNotFounError. Any idea on how to do it? Thanks.
Related
I am trying to scrape a js website with selenium. When beautiful soup reads what selenium retrieved I get an html page that says: "Cookies must be enabled in order to view this page."
If anyone could help me past this stumbling block I would appreciate it. Here is my code:
# import libraries and specify URL
import lxml as lxml
import pandas as pd
from bs4 import BeautifulSoup
import html5lib
from selenium import webdriver
import urllib.request
import csv
url = "https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2020/06/09"
#new chrome session
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(executable_path= '/Users/susanwhite/PycharmProjects/Horse
Racing/chromedriver', chrome_options=chrome_options)
# Wait for the page to fully load
driver.implicitly_wait(time_to_wait=10)
# Load the web page
driver.get(url)
cookies = driver.get_cookies()
# Parse HTML code and grab tables with Beautiful Soup
soup = BeautifulSoup(driver.page_source, 'html5lib')
print(soup)
Try removing this line: chrome_options.add_argument("--incognito"). There's no need for it, as Selenium naturally doesn't save cookies or any other information from websites.
Removing below code solved it for me, but headless mode will be disabled and the browser window will be visible.
chrome_options.add_argument("--headless")
Your issues might also be with the specific website you're accessing. I had the same issue, and after poking around with it, it looks like something in the way the HKJC website loads, selenium thinks the page is finished loading prematurely. I was able to get good page_source objects out of fetching the page by putting in a time.sleep(30) after the get statement, so my code looks like:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import time
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options, executable_path=r'C:\Python\webdrivers\geckodriver.exe')
driver.get("https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2023/01/01&RaceNo=1")
time.sleep(30)
html = driver.page_source
with open('Date_2023-01-01_Race1.html', 'wb') as f:
f.write(html.encode('utf-8'))
f.close()
You might not have to sleep that long. I found manually loading the pages takes 20+ seconds for me because I have slow internet over VPNs. It also works headless for me as above.
You do have to make sure the Firefox geckodriver is the latest (at least according to other posts, as I only tried this over ~2 days, so not long enough for my installed Firefox and geckodriver to get out of sync)
I'm trying to navigate to the following page and extract the html https://www.automobile.it/annunci?b=data&d=DESC, but everytime i call the get() method it looks like the website redirects me to the another page, always the same one which is https://www.automobile.it/torrenova?radius=100&b=data&d=DESC.
here's the simple code i'm running:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=ex_path)
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
html=driver.page_source
if i do the same thing using the request module i don't get redirected
import requests
html=requests.get("https://www.automobile.it/annunci?b=data&d=DESC")
i don't understand why it's behaving like this, any ideas?
Use driver.delete_all_cookies()
from selenium import webdriver
driver = webdriver.Chrome(executable_path=ex_path)
driver.delete_all_cookies()
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
html=driver.page_source
PS: be also warned: Page_source will not get you the completed DOM as rendered.
Well you can clear browser cache by using the below code :
I am assuming that you are using chrome.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path=ex_path)
driver.get('chrome://settings/clearBrowserData')
driver.find_element_by_xpath('//settings-ui').send_keys(Keys.ENTER)
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
I used webdriver, because I need to make to copy the site after authentication.
from selenium import webdriver
import myconnutils
import re
from time import sleep
connection = myconnutils.getConnection()
#use Chrome
driver = webdriver.Chrome("/Users/User/Documents/sender/chromedriver")
#enter to site
driver.get("https://example.com/en/account")
driver.find_element_by_id("user").send_keys("userlogin")
driver.find_element_by_id("password").send_keys("passwordinput")
driver.find_element_by_id("submit").click()
What is next? How to copy all page with css, js, images?
Eventually try using Selenium with BeautifulSoup. You should be able to get the source code like this:
example_soup = BeautifulSoup(driver.page_source, 'html.parser')
Eventually this blog post also helps.
I have written some code to scrape a web page using selenium. It is working fine if I use Chrome web driver. But if i change it to PhantomJS(), i am getting error saying no such element exception. The code is:
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from time import sleep
s = requests.session()
driver = webdriver.Chrome('F:\chromedriver')
driver.get("https://in.bookmyshow.com/booktickets/VMAX/2061")
sleep(40)
# To switch to frame
driver.switch_to.frame(driver.find_element_by_id("wiz-iframe"))
# Clicking on the element inside the frame
e2 = driver.find_element_by_xpath("//div[#class='wzrkPPwarp']//a")
e2.click()
# Switching back to main content
driver.switch_to_default_content()
# Then only we can access elements
e3 = driver.find_element_by_xpath("//button[#class='No thanks']")
e3.click()
This is the code written using Chrome web driver. When i change this to:
driver = webdriver.PhantomJS()
I am getting the error as below:
NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '//div[#class='wzrkPPwarp']//a'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"113","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:56829","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"//div[#class='wzrkPPwarp']//a\", \"sessionId\": \"ccc33320-10e5-11e8-b5fa-dbfae1ffdb07\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/ccc33320-10e5-11e8-b5fa-dbfae1ffdb07/element"}}
Screenshot: available via screen
How to make it correct? Please help. Thanks!
I want to run tests with selenium. IE gives me a modal error after bringing up IE 8 with this text "This is the initial start page for the WebDriver server" :
from selenium import webdriver
import time
browser = webdriver.Ie() # Get local session of IE
browser.get("http://www.google.com") # Load page
time.sleep(5)
browser.close()
So I tried Chrome.
from selenium import webdriver
browser = webdriver.Chrome()
browser.get("http://www.google.com")
time.sleep(5)
browser.close()
and Selenium errors for not having the right path to the chrome.exe application. Chrome is installed as expected... C:\Users\%USERNAME%\AppData\Local\Google\Chrome\Application\chrome.exe
A little help here would be greatly appreciated.
Have u downloaded the Chrome Driver?
To get set up, first download the appropriate prebuilt server. Make sure the server can be located on your PATH or specify its location via the webdriver.chrome.driver system property.
Then when u run
from selenium import webdriver
browser = webdriver.Chrome()
browser.get("http://www.google.com")
time.sleep(5)
browser.close()
It should work.