How to convert the Selenium with Chrome code into PhantomJS? - python

I have written some code to scrape a web page using selenium. It is working fine if I use Chrome web driver. But if i change it to PhantomJS(), i am getting error saying no such element exception. The code is:
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from time import sleep
s = requests.session()
driver = webdriver.Chrome('F:\chromedriver')
driver.get("https://in.bookmyshow.com/booktickets/VMAX/2061")
sleep(40)
# To switch to frame
driver.switch_to.frame(driver.find_element_by_id("wiz-iframe"))
# Clicking on the element inside the frame
e2 = driver.find_element_by_xpath("//div[#class='wzrkPPwarp']//a")
e2.click()
# Switching back to main content
driver.switch_to_default_content()
# Then only we can access elements
e3 = driver.find_element_by_xpath("//button[#class='No thanks']")
e3.click()
This is the code written using Chrome web driver. When i change this to:
driver = webdriver.PhantomJS()
I am getting the error as below:
NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '//div[#class='wzrkPPwarp']//a'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"113","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:56829","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"//div[#class='wzrkPPwarp']//a\", \"sessionId\": \"ccc33320-10e5-11e8-b5fa-dbfae1ffdb07\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/ccc33320-10e5-11e8-b5fa-dbfae1ffdb07/element"}}
Screenshot: available via screen
How to make it correct? Please help. Thanks!

Related

Selenium get() redirects to another url

I'm trying to navigate to the following page and extract the html https://www.automobile.it/annunci?b=data&d=DESC, but everytime i call the get() method it looks like the website redirects me to the another page, always the same one which is https://www.automobile.it/torrenova?radius=100&b=data&d=DESC.
here's the simple code i'm running:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=ex_path)
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
html=driver.page_source
if i do the same thing using the request module i don't get redirected
import requests
html=requests.get("https://www.automobile.it/annunci?b=data&d=DESC")
i don't understand why it's behaving like this, any ideas?
Use driver.delete_all_cookies()
from selenium import webdriver
driver = webdriver.Chrome(executable_path=ex_path)
driver.delete_all_cookies()
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
html=driver.page_source
PS: be also warned: Page_source will not get you the completed DOM as rendered.
Well you can clear browser cache by using the below code :
I am assuming that you are using chrome.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path=ex_path)
driver.get('chrome://settings/clearBrowserData')
driver.find_element_by_xpath('//settings-ui').send_keys(Keys.ENTER)
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")

Python webscraper click xbrl link

I'm trying to go to the edgar database of the SEC and click the first new 8-K filing available.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from requests import get
import chromedriver_binary
import time
#locate Chrome Driver
driver = webdriver.Chrome('C:\Program Files\Python38\chromedriver.exe')
#Go to the url
driver.get("https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=8-K&owner=include&count=40&action=getcurrent")
print(driver.title)
#select first element available
elem = driver.find_element_by_xpath("/html/body/div/table[2]/tbody/tr[3]/td[2]/a[1]")
#click element from line 22
elem.click()
giving the following result:
This is where I get stuck. I am trying to get the little script to click the current report. Using the chrome dev tools I located the element to the following:
Now I have tried to locate the Xpath which gives me: //tr[(((count(preceding-sibling::) + 1) = 2) and parent::)]//a
XBelem = driver.find_element_by_xpath(".//tr[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a")
XBelem.click()
However if I try to use it like I did in the previous file it doesn't do anything.
if I add a "." in front of the //tr.. it just returns me to the homepage.
It might be the case, that you didn't upgrade your current url, so the driver is still stucked in the HTML-File of the old url. You can update it by:
url = driver.current_url()
driver.get(url)

Element not Clickable ( Chrome + Selenium + Python)

I am using Chromewebdriver /Selenium in Python
I tried several solutions ( actions, maximize window etc) to get rid of this exception without success.
The error is :
selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element ... is not clickable at point (410, 513). Other element would receive the click: ...
The code :
from selenium import webdriver
import time
url = 'https://www.tmdn.org/tmview/welcome#/tmview/detail/EM500000018203824'
driver = webdriver.Chrome(executable_path = "D:\Python\chromedriver.exe")
driver.get(url)
time.sleep(30)
driver.find_element_by_link_text('Show more').click()
I tested this code on my linux pc with latest libraries, python3 and chromedriver. It works perfectly(to my mind). So try to update everything and try again(Try to not to leave chrome). Here is the code:
from selenium import webdriver
import time
url = 'https://www.tmdn.org/tmview/welcome#/tmview/detail/EM500000018203824'
driver = webdriver.Chrome(executable_path = "chromedriver")
driver.get(url)
time.sleep(30)
driver.find_element_by_link_text('Show more').click()
P.S. chromedriver is in the same folder as script.
Thank you for your assistance.
Actually, the issue was the panel on the footer of the web page 'We use cookies....' which is overlapping with the 'Show more' link, when the driver tries to click, the click was intercepted by that panel.
The solution is to close that panel, and the code worked fine.
code is working fine but if you manually click on some other element after page loaded before sleep time is over then you can recreate same error
for example after site is loaded and i clicked on element search for trade mark with similar image then selenium is not able to find search element.so maybe some other part of your code is clicking on other element and loading different url due to which
selenium is generating this error .your code is fine just check for conflict cases.

How to reload a html page using python

I wan to reload a html page which I created locally in my computer using python. I have tried using this:
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get('C://User/Desktop/total.html')
while True:
time.sleep(20)
driver.refresh()
driver.quit()
but it was throwing FileNotFounError. Any idea on how to do it? Thanks.

Splinter or Selenium: Can we get current html page after clicking a button?

I'm trying to crawl the website "http://everydayhealth.com". However, I found that the page will dynamically rendered. So, when I click the button "More", some new news will be shown. However, using splinter to click the button doesn't let "browser.html" automatically changes to the current html content. Is there a way to let it get newest html source, using either splinter or selenium? My code in splinter is as follows:
import requests
from bs4 import BeautifulSoup
from splinter import Browser
browser = Browser()
browser.visit('http://everydayhealth.com')
browser.click_link_by_text("More")
print(browser.html)
Based on #Louis's answer, I rewrote the program as follows:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.get("http://www.everydayhealth.com")
more_xpath = '//a[#class="btn-more"]'
more_btn = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(more_xpath))
more_btn.click()
more_news_xpath = '(//a[#href="http://www.everydayhealth.com/recipe-rehab/5-herbs-and-spices-to-intensify-flavor.aspx"])[2]'
WebDriverWait(driver, 5).until(lambda driver: driver.find_element_by_xpath(more_news_xpath))
print(driver.execute_script("return document.documentElement.outerHTML;"))
driver.quit()
However, in the output text, I still couldn't find the text in the updated page. For example, when I search "Is Milk Your Friend or Foe?", it still returns nothing. What's the problem?
With Selenium, assuming that driver is your initialized WebDriver object, this will give you the HTML that corresponds to the state of the DOM at the time you make the call:
driver.execute_script("return document.documentElement.outerHTML;")
The return value is a string so you could do:
print(driver.execute_script("return document.documentElement.outerHTML;"))
When I use Selenium for tasks like this, I know browser.page_source does get updated.

Categories