Python webscraper click xbrl link - python

I'm trying to go to the edgar database of the SEC and click the first new 8-K filing available.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from requests import get
import chromedriver_binary
import time
#locate Chrome Driver
driver = webdriver.Chrome('C:\Program Files\Python38\chromedriver.exe')
#Go to the url
driver.get("https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=8-K&owner=include&count=40&action=getcurrent")
print(driver.title)
#select first element available
elem = driver.find_element_by_xpath("/html/body/div/table[2]/tbody/tr[3]/td[2]/a[1]")
#click element from line 22
elem.click()
giving the following result:
This is where I get stuck. I am trying to get the little script to click the current report. Using the chrome dev tools I located the element to the following:
Now I have tried to locate the Xpath which gives me: //tr[(((count(preceding-sibling::) + 1) = 2) and parent::)]//a
XBelem = driver.find_element_by_xpath(".//tr[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a")
XBelem.click()
However if I try to use it like I did in the previous file it doesn't do anything.
if I add a "." in front of the //tr.. it just returns me to the homepage.

It might be the case, that you didn't upgrade your current url, so the driver is still stucked in the HTML-File of the old url. You can update it by:
url = driver.current_url()
driver.get(url)

Related

Using selenium in Python, select HTML page element content with xpath?

In the Xpath Helper plugin, I was able to get the HTML tag content:
QUERY://div[#id="cardModel"]/div[#class="modal-dialog"]/div[#class="modal-content"]//tr[1]/td[1]//tr/td[2]/div/span/text()
RESULTS (1):Enrico
The result is:
Enrico
But in Python:
from selenium import webdriver
from lxml import etree
driver = webdriver.Chrome()
detailUrl = 'https://www.enf.com.cn/3d-energy-1?directory=panel&utm_source=ENF&utm_medium=perc&utm_content=22196&utm_campaign=profiles_panel'
driver.get(detailUrl)
html_ele_detail = etree.HTML(driver.page_source)
time.sleep(5)
companyPhone = html_ele_detail.xpath('//div[#id="cardModel"]/div[#class="modal-dialog"]/div[#class="modal-content"]//tr[1]/td[1]//tr/td[2]/div/span/text()')
print("companyPhone = ", companyPhone)
companyPhone shows empty, what's wrong?Thank you all for solving this problem
As you are already using the selenium library, you do not need to use etree library.
For this application selenium library is enough
see the example below and adapt for your purpose:
from selenium import webdriver
driver = webdriver.Chrome()
detailUrl = 'your url here'
driver.get(detailUrl)
web_element_text = driver.find_element_by_xpath('your xpath directory here').text
print(web_element_text)
See some other examples in another topic by clicking here
Let me know if this was helpful.

How to convert the Selenium with Chrome code into PhantomJS?

I have written some code to scrape a web page using selenium. It is working fine if I use Chrome web driver. But if i change it to PhantomJS(), i am getting error saying no such element exception. The code is:
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from time import sleep
s = requests.session()
driver = webdriver.Chrome('F:\chromedriver')
driver.get("https://in.bookmyshow.com/booktickets/VMAX/2061")
sleep(40)
# To switch to frame
driver.switch_to.frame(driver.find_element_by_id("wiz-iframe"))
# Clicking on the element inside the frame
e2 = driver.find_element_by_xpath("//div[#class='wzrkPPwarp']//a")
e2.click()
# Switching back to main content
driver.switch_to_default_content()
# Then only we can access elements
e3 = driver.find_element_by_xpath("//button[#class='No thanks']")
e3.click()
This is the code written using Chrome web driver. When i change this to:
driver = webdriver.PhantomJS()
I am getting the error as below:
NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '//div[#class='wzrkPPwarp']//a'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"113","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:56829","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"//div[#class='wzrkPPwarp']//a\", \"sessionId\": \"ccc33320-10e5-11e8-b5fa-dbfae1ffdb07\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/ccc33320-10e5-11e8-b5fa-dbfae1ffdb07/element"}}
Screenshot: available via screen
How to make it correct? Please help. Thanks!

Selenium Python - Getting the current URL of web browser?

I have this so far:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('C:\Users\Fan\Desktop\chromedriver.exe')
url = driver.current_url
print url
It keeps saying that line 4 "driver" is an invalid syntax. How would I fix this?
Also is there a way I can get all the current tabs open, and not just a single one?
EDIT: the above code works now; But I have another problem!
The code now opens a new tab, and for some reason the URL bar has "data;" in it, and it outputs data; as the print.
But I want it to take the existing URL from existing web browser already opened, how do I solve this?
In Python you do not specify the type of variable as is required in Java which is the reason for the error. The same error will also happen because your last line starts with String.
Calling webdriver.Chrome() returns a driver object so the line webdriver driver = new webdriver() is actually not needed.
The new keyword is not used in Python to create a new object.
Try this:
from selenium import webdriver
driver = webdriver.Chrome()
url = driver.getCurrentUrl()
In order to extract the url of the current page from the web driver you have to call the current_url attribute:
from selenium import webdriver
import time
driver = webdriver.Chrome()
#Opens a known doi url
driver.get("https://doi.org/10.1002/rsa.1006")
#Gives the browser a few seconds to process the redirect
time.sleep(3)
#Retrieves the url after the redirect
#In this case https://onlinelibrary.wiley.com/doi/abs/10.1002/rsa.1006
url = driver.current_url

How to get all links on a web page using python and selenium IDE

I want to get all link from a web page using selenium ide and python.
For example if I search test or anything on google website and I want all link related to that.
Here is my code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
baseurl="https://www.google.co.in/?gws_rd=ssl"
driver = webdriver.Firefox()
driver.get(baseurl)
driver.find_element_by_id("lst-ib").click()
driver.find_element_by_id("lst-ib").clear()
driver.find_element_by_id("lst-ib").send_keys("test")
link_name=driver.find_element_by_xpath(".//*[#id='rso']/div[2]/li[2]/div/h3/a")
print link_name
driver.close()
Output
<selenium.webdriver.remote.webelement.WebElement object at 0x7f0ba50c2090>
Using xpath $x(".//*[#id='rso']/div[2]/li[2]/div/h3/a") in Firebug's console.
Output
[a jtypes2.asp]
How can I get links content from a object.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
baseurl="https://www.google.co.in/?gws_rd=ssl"
driver = webdriver.Firefox()
driver.get(baseurl)
driver.find_element_by_id("lst-ib").click()
driver.find_element_by_id("lst-ib").clear()
driver.find_element_by_id("lst-ib").send_keys("test")
driver.find_element_by_id("lst-ib").send_keys(Keys.RETURN)
driver.implicitly_wait(2)
link_name=driver.find_elements_by_xpath(".//*[#id='rso']/div/li/div/h3/a")
for link in link_name:
print link.get_attribute('href')
Try the above code. Your code doesn't send a RETURN key after giving the search keyword. Also I've made changes to implicitly wait for 2 seconds to load the search results and I've changed xpath to get all links.

Splinter or Selenium: Can we get current html page after clicking a button?

I'm trying to crawl the website "http://everydayhealth.com". However, I found that the page will dynamically rendered. So, when I click the button "More", some new news will be shown. However, using splinter to click the button doesn't let "browser.html" automatically changes to the current html content. Is there a way to let it get newest html source, using either splinter or selenium? My code in splinter is as follows:
import requests
from bs4 import BeautifulSoup
from splinter import Browser
browser = Browser()
browser.visit('http://everydayhealth.com')
browser.click_link_by_text("More")
print(browser.html)
Based on #Louis's answer, I rewrote the program as follows:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.get("http://www.everydayhealth.com")
more_xpath = '//a[#class="btn-more"]'
more_btn = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(more_xpath))
more_btn.click()
more_news_xpath = '(//a[#href="http://www.everydayhealth.com/recipe-rehab/5-herbs-and-spices-to-intensify-flavor.aspx"])[2]'
WebDriverWait(driver, 5).until(lambda driver: driver.find_element_by_xpath(more_news_xpath))
print(driver.execute_script("return document.documentElement.outerHTML;"))
driver.quit()
However, in the output text, I still couldn't find the text in the updated page. For example, when I search "Is Milk Your Friend or Foe?", it still returns nothing. What's the problem?
With Selenium, assuming that driver is your initialized WebDriver object, this will give you the HTML that corresponds to the state of the DOM at the time you make the call:
driver.execute_script("return document.documentElement.outerHTML;")
The return value is a string so you could do:
print(driver.execute_script("return document.documentElement.outerHTML;"))
When I use Selenium for tasks like this, I know browser.page_source does get updated.

Categories