I'm using Python and Selenium in PyCharm to go to the SEC website to download a 10-K CSV file. Ideally, the program should ask for user input for a "ticker symbol", then go to the SEC's website, input the ticker symbol provided and download the 10-K and 10-Q CSV files from the page. I was using Microsoft's ticker symbol (MSFT) as an example test. The SEC's Edgar search website is this:
https://www.sec.gov/edgar/searchedgar/companysearch.html
and I am using the 'Fast Search' search engine. I created a function 'get_edgar_results' to perform this download. It might be that I'm new to web scraping, but I thought I identified the HTML tags correctly on where to put my search term. Previous problems suggested that I might need to have the program wait before searching for the HTML element, so I added code for the program to wait. I continue getting this error:
line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: [id="Find"]
My code is below:
import selenium.webdriver.support.ui as ui
from pathlib import Path
import selenium.webdriver as webdriver
ticker_symbol = input("please provide a ticker symbol: ")
def get_edgar_results(ticker_symbol):
url = "https://www.sec.gov/edgar/searchedgar/companysearch.html"
driver = webdriver.Firefox(executable_path=r"C:\Program Files\JetBrains\geckodriver.exe")
wait = ui.WebDriverWait(driver,30)
driver.set_page_load_timeout(30)
driver.get(url)
search_box = driver.find_element_by_id("Find")
search_box.send_keys(ticker_symbol)
search_box.submit()
annual_links = driver.find_elements_by_class_name("10-K")
quarterly_links = driver.find_elements_by_class_name("10-Q")
results = []
driver.close()
driver.quit()
return results
get_edgar_results(ticker_symbol)
Any help would be greatly appreciated.
Consider using a waitUntil or Implicit/Explicit waits method to wait until an element is loaded. This way you can circumvent the error shown above, code below for wait until method.
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
try:
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
print "Page is ready!"
except TimeoutException:
print "Loading took too much time!"
You also seem to have an error in the following code
search_box = driver.find_element_by_id("Find")
search_box.send_keys(ticker_symbol)
search_box.submit()
The id=find locates the Search Box and not the input element and therefore sending keys value to a button is incorrect. I would recommend you to use the xpath to uniquely locate the element of your choice.
The following will send a value to the input box and will make a button click on the SEARCH button.
driver.findElement(By.xpath("//*[#id="lesscompany"])).sendKeys("your value");
driver.findElement(By.xpath("//*[#id="search_button_1"]")).click();
Related
I am trying to scrape names and odds for a project I am working on with Selenium 4, and just having an issue with the locator.
When I use the driver.find_element(By.XPATH), the XPATH I give it only seems to work when I open the inspect window on that particular page. When I close the inspect window, it gives me an NoSuchElementException: no such element: Unable to locate element: error.
Code:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
url = 'https://www.bet365.com.au/#/AC/B13/C1/D50/E2/F163/'
driver.get(url)
Player1 = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[4]/div[3]/div/div/div/div[1]/div/div/div[2]/div/div/div[1]/div[2]/div/div[1]/div[2]/div[1]/div/div[2]/div/div[1]/div').text
Player2 = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[4]/div[3]/div/div/div/div[1]/div/div/div[2]/div/div/div[1]/div[2]/div/div[1]/div[2]/div[1]/div/div[2]/div/div[2]/div').text
Odds1 = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[4]/div[3]/div/div/div/div[1]/div/div/div[2]/div/div/div[1]/div[2]/div/div[2]/div[2]/span').text
Odds2 = driver.find_element(By.XPATH, '/html/body/div[1]/div/div[4]/div[3]/div/div/div/div[1]/div/div/div[2]/div/div/div[1]/div[2]/div/div[3]/div[2]/span').text
print(f'{Player1}\t{Odds1}')
print(f'{Player2}\t{Odds2}')
Run just the section from Player1 onwards with the inspect window open and without it open. Hopefully you'll be able to replicate the issue.
I also ran
try:
if driver.find_element(By.XPATH, '/html/body/div[1]/div/div[4]/div[3]/div/div/div/div[1]/div/div/div[2]/div/div/div[1]/div[2]/div/div[1]/div[2]/div[1]/div/div[2]/div/div[1]/div').text:
print("yay")
except:
print("nay")
and it seemed to show the same situation. The element couldn't be found without the inspect window open. See attached image for where I got the XPATHs from.
Many thanks in advance!
I am scraping pages of the Italian website publishing new laws (Gazzetta Ufficiale) to save the final page which holds the law text.
I have a loop that builds a list of the pages to download and am attaching a fully working cose sample which shows the problem I'm running in (the sample is not looped I am just doing two "gets".
What is the best way to handle the rare page which does not show the "Visualizza" (show) button but goes straight to the desired full text?
Hope the code is pretty self explanatory and commented. Thank you in advance and super happy 2022!
import time
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome("/Users/bob/Documents/work/scraper/scrape_gu/chromedriver")
# showing the "normal" behaviour
driver.get(
"https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07300&tipoSerie=serie_generale&tipoVigenza=originario"
)
# this page has a "Visualizza" button, find it and click it.
bottoni = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located(
(By.XPATH, '//*[#id="corpo_export"]/div/input[1]')
)
)
time.sleep(5) # just to see the "normal" result with the "Visualizza" button
bottoni[0].click() # now click it and this shows the desired final webpage
time.sleep(5) # just to see the "normal" desired result
# but unfortunately some pages directly get to the end result WITHOUT the "Visualizza" button.
# as an example see the following get
# showing the "normal" behaviour
driver.get(
"https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07249&tipoSerie=serie_generale&tipoVigenza=originario"
) # get a law page
time.sleep(
5
) # as you can see we are now on the final desired full page WITHOUT the Visualizza button
# hence the following code, identical to that above will fail and timeout
bottoni = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located(
(By.XPATH, '//*[#id="corpo_export"]/div/input[1]')
)
)
time.sleep(5) # just to see the result
bottoni[0].click() # and this shows the desired final webpage
# and the program abends with the following message
# File "/Users/bob/Documents/work/scraper/scrape_gu/temp.py", line 33, in <module>
# bottoni = WebDriverWait(driver, 10).until(
# File "/Users/bob/opt/miniconda3/envs/scraping/lib/python3.8/site-packages/selenium/webdriver/support/wait.py", line 80, in until
# raise TimeoutException(message, screen, stacktrace)
# selenium.common.exceptions.TimeoutException: Message:
Catch the exception with a try and except block - If there is no button extract the text directly - Handling Exeptions
...
urls = [
'https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07300&tipoSerie=serie_generale&tipoVigenza=originario',
'https://www.gazzettaufficiale.it/atto/vediMenuHTML?atto.dataPubblicazioneGazzetta=2021-01-02&atto.codiceRedazionale=20A07249&tipoSerie=serie_generale&tipoVigenza=originario'
]
data = []
for url in urls:
driver.get(url)
try:
bottoni = WebDriverWait(driver,1).until(
EC.element_to_be_clickable(
(By.XPATH, '//input[#value="Visualizza"]')
)
)
bottoni.click()
except TimeoutException:
print('no bottoni -')
finally:
data.append(driver.find_element(By.XPATH, '//body').text)
driver.close()
print(data)
...
First, using selenium for this task is overkill.
You'd be able to do the same thing using requests or aiohttp coupled with beautifulsoup, except that would be much faster and easier to code.
Now to get back to your question, there are a few solutions.
The simplest would be :
Catch the timeout exception : if the button isn't found, then go straight to parsing the law.
Check if the button is present : !driver.findElements(By.id("corpo_export")).isEmpty(), before either clicking on it, or parsing the web page.
But then again, you'd have a much easier time getting rid of selenium and using beautifulsoup instead.
Error message :
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: input.ytd-searchbox
I keep getting this error, even though i added a sleep command from other solutions for the page to load dynamically with javascript, but still it cannot find it?
import time
from selenium import webdriver
firefox = webdriver.Firefox()
firefox.get("https://www.youtube.com")
element = firefox.find_element_by_css_selector("ytd-mini-guide-entry-renderer.style-scope:nth-child(3) > a:nth-child(1)") # opens subscriptions
element.click()
time.sleep(10) # wait for page to load before finding it
searchelement = firefox.find_element_by_css_selector('input.ytd-searchbox') # search bar
searchelement.send_keys("Cute Puppies")
searchelement.submit()
I just changed the CSS Selector. You did that wrong here.
Umm... how i did that? Well there's an easy trick for selecting CSS Selectors.
Type the tag name first. In your case it's input.
If there's an ID present here, type the ID name with # on it. So
as i did : #search.
If there's a class there, then use . before it's name. For
example .search.
Try this. It's working :
import time
from selenium import webdriver
firefox = webdriver.Firefox(executable_path=r'C:\Users\intel\Downloads\Setups\geckodriver.exe')
firefox.get("https://www.youtube.com")
element = firefox.find_element_by_css_selector(".style-scope:nth-child(1) > #items > .style-scope:nth-child(3) > #endpoint .title") # opens subscriptions
element.click()
time.sleep(10) # wait for page to load before finding it
searchelement = firefox.find_element_by_css_selector('input#search') # search bar
searchelement.send_keys("Cute Puppies")
searchelement.submit()
Set-up
I'm trying to log in to a website using Python + Selenium.
My code to load the website is,
browser = webdriver.Firefox(
executable_path='/mypath/to/geckodriver')
url = 'https://secure6.e-boekhouden.nl/bh/'
browser.get(url)
Problem
Selenium cannot locate the element containing the account and password fields.
For example, for the field 'Gebruikersnaam',
browser.find_element_by_id('txtEmail')
browser.find_element_by_xpath('//*[#name="txtEmail"]')
browser.find_element_by_class_name('INPUTBOX')
all give NoSuchElementException: Unable to locate element.
Even worse, Selenium cannot find the body element on the page,
browser.find_element_by_xpath('/html/body')
gives NoSuchElementException: Unable to locate element: /html/body.
I'm guessing something on the page is either blocking Selenium (maybe the 'secure6' in the url) or is written in a language/form Selenium cannot handle.
Any suggestions?
All elements are inside the frame. So that, it is throwing No Such Element exception. Please try to switch to the frame before all actions as given below.
browser = webdriver.Firefox(
executable_path='/mypath/to/geckodriver')
url = 'https://secure6.e-boekhouden.nl/bh/'
browser.get(url)
browser.switch_to.frame(browser.find_element_by_id("mainframe"))
browser.find_element_by_id('txtEmail')
browser.find_element_by_xpath('//*[#name="txtEmail"]')
browser.find_element_by_class_name('INPUTBOX')
I am trying to run a script in selenium webdriver python. Where I am trying to click on search field, but its always showing exception of "An element could not be located on the page using the given search parameters."
Here is script:
from selenium import webdriver
from selenium.webdriver.common.by import By
class Exercise:
def safari(self):
class Exercise:
def safari(self):
driver = webdriver.Safari()
driver.maximize_window()
url= "https://www.airbnb.com"
driver.implicitly_wait(15)
Title = driver.title
driver.get(url)
CurrentURL = driver.current_url
print("Current URL is "+CurrentURL)
SearchButton =driver.find_element(By.XPATH, "//*[#id='GeocompleteController-via-SearchBarV2-SearchBarV2']")
SearchButton.click()
note= Exercise()
note.safari()
Please Tell me, where I am wrong?
There appears to be two matching cases:
The one that matches the search bar is actually the second one. So you'd edit your XPath as follows:
SearchButton = driver.find_element(By.XPATH, "(//*[#id='GeocompleteController-via-SearchBarV2-SearchBarV2'])[2]")
Or simply:
SearchButton = driver.find_element_by_xpath("(//*[#id='GeocompleteController-via-SearchBarV2-SearchBarV2'])[2]")
You can paste your XPath in Chrome's Inspector tool (as seen above) by loading the same website in Google Chrome and hitting F12 (or just right click anywhere and click "Inspect"). This gives you the matching elements. If you scroll to 2 of 2 it highlights the search bar. Therefore, we want the second result. XPath indices start at 1 unlike most languages (which usually have indices start at 0), so to get the second index, encapsulate the entire original XPath in parentheses and then add [2] next to it.