I've been learning how to use selenium to parse data and I've been doing alright with that process. So I'm trying something different, in that I found data ta parse, but there is a provided export button which to me, sounds like a quicker solution, so I thought I'd have a stab at it. But I'm not quite understanding how it's not working:
browser = webdriver.Chrome()
url = 'https://www.rotowire.com/football/injury-report.php'
browser.get(url)
button = browser.find_elements_by_xpath('//*[#id="injury-report"]/div[2]/div[2]/button[2]')
button.click()
browser.close()
I just want to click on the export csv button on the page.
Also, I haven't looked yet, but my next step would be to specify where to save the csv file it exports. Right now it, defaults to the downloads folder. Is there a way to specify a location without changing the default? Also is there a way to specify a file name?
Try below code to click required button:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Chrome()
url = 'https://www.rotowire.com/football/injury-report.php'
browser.get(url)
button = wait(browser, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "is-csv")))
button.click()
browser.close()
Also check how to save file to specific folder
Related
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome import options
import unittest
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
#link to website
website = 'https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance'
path = ('../chromedriver') #Folder location where is the chromedriver
driver = webdriver.Chrome(path)
driver.get(website)
driver.maximize_window()
#Selection of the sections where the information I am looking for is located
state_click = driver.find_element(by=By.XPATH, value='//*[#id="show-statewide"]/a').click()
time_series_click = driver.find_element(by=By.XPATH, value='.//*[#id="time-series"]/div[3]/button').click()
#selection of the years (for all files the same range of 1950 - 2021)
star_year_dropdown =Select(driver.find_element(by=By.ID, value='begyear'))
star_year_dropdown.select_by_visible_text('1950')
end_year_dropdown = Select(driver.find_element(by=By.ID, value='endyear'))
end_year_dropdown.select_by_visible_text('2021')
#selection of the parameter to download: Average temperature
parameter_dropdown = Select(driver.find_element(by=By.ID, value='parameter'))
parameter_dropdown.select_by_visible_text('Average Temperature')
#Creating a loop to loop through all the states and all the months:
#state selection
select_state = driver.find_element(by=By.XPATH, value='.//*[#id="state"]')
opcion_state = select_state.find_elements(by=By.TAG_NAME, value='option')
#month selection
select_month = driver.find_element(by=By.XPATH, value = '//*[#id="month"]')
opcion_month = select_month.find_elements(by = By.TAG_NAME, value='option')
for option in opcion_month:
option.click()
for option in opcion_state:
option.click()
time.sleep(3)
plot = driver.find_element(by=By.XPATH, value='.//input[#id="submit"]').click()
dowload = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="csv-download"]'))).click()
time.sleep(3)
The code works fine, but in the plot and download functions created (at the end of the whole) when trying to click it gives an error and cannot be solved. I think it is because the web page at the time of executing that command does not display the button in the screen to plot the graph and download the csv. I have tried modifying the waiting time, but not work. Let's see if someone can help me.
Thanks in advance!!
So the problem that you are having with downloading the CSV file is not that the command is wrong (it does work), however, the issue is that the download CSV button is not visible on the page, which prevents you from clicking it.
A way to get around having to visibly see the element on the page and still click it you can do the following:
driver.execute_script("arguments[0].click();", driver.find_element(By.XPATH, '//*[#id="csv-download"]'))
This would be the preferred method, otherwise you would have to scroll to where the button is visible on the page and then click the button. The page has a scrolling effect if trying to click on a button which you can do the following (but the previous method is preferred as it is cleaner and does not take any additional time to do):
from selenium.common.exceptions import ElementClickInterceptedException
download_btn = driver.find_element(By.XPATH, '//*[#id="csv-download"]')
try:
download_btn.click() # the first time you try to click it will throw an error, while navigating the button into the page view
except ElementClickInterceptedException: # catch the exception and proceed as if clicking again
download_btn.click()
Other comments on your code
you have option.click() twice in your code, and it is unclear if you want to click the month or state option - there may be a confusion about which options are clicked. I would suggest renaming your iterator variables appropriately so that you know which buttons are being clicked
I'm testing some web scraping on Instagram with Selenium and Python.
In this case I want to upload a picture.
Normally you have to click on the upload icon and choose the file from a window. How can I manage it with Selenium?
I tried:
driver.find_element_by_class_name("coreSpriteFeedCreation").send_keys('C:\\path-to-file\\file.jpg')
and also with find_element_by_xpath but I get an exception:
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot focus element
I tried also only with click() but nothing happens.
Any Idea?
EDIT
Thanks to #homersimpson comment I tried this:
actions = ActionChains(driver)
element = driver.find_element_by_class_name("coreSpriteFeedCreation")
actions.move_to_element(element)
actions.click()
actions.send_keys('C:\\path-to-file\\file.jpg')
actions.perform()
Now the window to choose the file appears. The problem is that I would like to avoid this window and give directly the path of my file.
If right understand, you are trying to avoid handling with a native window. You can try this:
# get all inputs
inputs = driver.find_elements_by_xpath("//input[#accept = 'image/jpeg']").send_keys(os.getcwd() + "/image.png")
Now you can try all of them. I don't know which of them will work.
More about os.getcwd() is here
To be able to perform this code you have to have an element like this:
<input type="file" name="fileToUpload" id="fileToUpload2" class="fileToUpload">
EDIT:
It looks like instagram turned of input fields interaction for posts. For Account image it still works, but not for posting. I assume it is was done to prevent bots to post images. Anyway, there is a solution for this problem. You can use AutoIt like this:
import autoit
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
ActionChains(driver).move_to_element( driver.find_element_by_xpath("//path/to/upload/button")).click().perform()
handle = "[CLASS:#32770; TITLE:Open]"
autoit.win_wait(handle, 60)
autoit.control_set_text(handle, "Edit1", "\\file\\path")
autoit.control_click(handle, "Button1")
I think I may have found a solution that works for me. I found if you first have the bot click the plus icon while the browser is in the mobile view.
self.driver.find_element_by_xpath("/html/body/div[1]/section/nav[2]/div/div/div[2]/div/div/div[3]")\
.click()
after that, I would immediately send my file to an input tag in the HTML and I find you may need to play around as to which one works but I find the last input tag worked for me.
self.driver.find_element_by_xpath("/html/body/div[1]/section/nav[2]/div/div/form/input")\
.send_keys("/image/path.jpg")
The one weird thing about this is you will have a popup menu on top of the page but your code will still function with this window displayed over the window you are working on.
Addition to HumbleFox's answer. To solve his problem regarding the pop-up box not closing or the file pop-up box not closing (bug)
The solution to this is to make the browser headless here's a part of my code for example:
mobile_emulation = { "deviceName": "Pixel 2" }
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_experimental_option("mobileEmulation", mobile_emulation)
chrome_options.binary_location = self.opt.binary_location
self.driver = webdriver.Chrome(executable_path=self.path, options=chrome_options)
I've written a script in python in combination with selenium to get some titles out of some images from a webpage. The thing is the content I would like to parse are located near the bottom of that page. So, If i try like the conventional way to grab that, the browse fails.
So, I used a javascript code within my scraper to let the browser scroll to the bottom and it worked.
However, I don't think it's a good solution to keep up so tried with .scrollIntoView() but that didn't work either. What can be the ideal way to serve the purpose?
This is my script:
from selenium import webdriver
import time
URL = "https://www.99acres.com/supertech-cape-town-sector-74-noida-npxid-r922?sid=UiB8IFFTIHwgUyB8IzMxIyAgfCAxIHwgNyM0MyMgfCA4MjEyIHwjNSMgIHwg"
driver = webdriver.Chrome()
driver.get(URL)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") #I don't wish to keep this line
time.sleep(3)
for item in driver.find_elements_by_css_selector("#carousel img"):
print(item.get_attribute("title"))
driver.quit()
Try to use below code that should allow you to scroll to required node and scrape images:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
banks = driver.find_element_by_id("xidBankSection")
driver.execute_script("arguments[0].scrollIntoView();", banks)
images = WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#carousel img")))
for image in images:
print(image.get_attribute("title"))
Some explanation: initially those images are absent in source code and generated inside BankSection once you scrolled to it, so you need to scroll down to BankSection and wait until images generated
You can try below line of code
recentList = driver.find_elements_by_css_selector("#carousel img"):
for list in recentList :
driver.execute_script("arguments[0].scrollIntoView();", list )
print(list.get_attribute("title"))
I'm new to selenium and trying to automate the download of some government data. When using the code below. I manage to navigate to the right page and enter the right parmeter in the form, but then can't find a way to click the 'submit' button. I've tried find_element_by_partial_link_text("Subm").click() and I've tried find_element_by_class_name on a number of class names. Nothing works. Any ideas?
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
main_url="http://data.stats.gov.cn/english/easyquery.htm?cn=E0101"
driver = webdriver.Firefox()
driver.get(main_url)
time.sleep(8)
driver.find_element_by_partial_link_text("Industry").click()
time.sleep(8)
driver.find_element_by_partial_link_text("Main Economic Indicat").click()
time.sleep(8)
driver.find_element_by_id("mySelect_sj").click()
time.sleep(3)
driver.find_element_by_class_name("dtText").send_keys("last72")
time.sleep(4)
try:
driver.find_element_by_class_name("dtFoot").click()
except:
driver.find_element_by_class_name("dtFoot").submit()
Solved my own problem, the key was using
driver.find_element_by_class_name(`dtTextBtn`)
instead of
driver.find_element_by_class_name(`dtTextBtn f10`)
The latter was what I saw in the source code, but the f10 blocked selenium.
I need to scrape this page (which has a form): http://kllads.kar.nic.in/MLAWise_reports.aspx, with Python preferably (if not Python, then JavaScript). I was looking at libraries like RoboBrowser (which is basically Mechanize + BeautifulSoup) and (maybe) Selenium but I'm not quite sure on how to go about it. From inspecting the element, it seems to be a WebForm that I need to fill in. After filling that in, the webpage generates some data that I need to store. How should I do this?
You can interact with the javascript web forms relatively easily in Selenium. You may need to install a webdriver quickly, but besides that all you need to do is find the form using its xpath and then have Selenium select an option from the drop down menu using the option's xpath. For the web page provided that would look something like this:
#import functions from selenium module
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# open chrome browser using webdriver
path_to_chromedriver = '/Users/Michael/Downloads/chromedriver'
browser = webdriver.Chrome(executable_path=path_to_chromedriver)
# open web page using browser
browser.get('http://kllads.kar.nic.in/MLAWise_reports.aspx')
# wait for page to load then find 'Constituency Name' dropdown and select 'Aland (46)''
const_name = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="ddlconstname"]')))
browser.find_element_by_xpath('//*[#id="ddlconstname"]/option[2]').click()
# wait for the page to load then find 'Select Status' dropdown and select 'OnGoing'
sel_status = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="ddlstatus1"]')))
browser.find_element_by_xpath('//*[#id="ddlstatus1"]/option[2]').click()
# wait for browser to load then click 'Generate Report'
gen_report = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="BtnReport"]')))
browser.find_element_by_xpath('//*[#id="BtnReport"]').click()
Between each interaction, you are just giving the browser some time to load before attempting click the next element. Once all the forms are filled out, the page will display the data based on the options selected and you should be able to scrape the table data. I had a few issues when attempting to load data for the first Constituency Name option, but the others seemed to work fine.
You should also be able to loop through all the dropdown options available under each web form to display all the data.
Hope that helps!