I have been trying to understand how switching to a frameset works using py-selenium with no avail.
The website I am stating here is very similar to the web application that I am working on.
https://www.quackit.com/html/tutorial/frame_example_frameset_1.html
Ideally I would like to access element-1 in this image and then move to the second frame and access element-2.
Here is one approach.
Load the initial page. Use an xpath expression to find the two frame elements. The, for each of them, get its url. Now you can use driver.get again (for each url) to load the page corresponding to the frame, and then find the p element that you want.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('https://www.quackit.com/html/tutorial/frame_example_frameset_1.html')
>>> for frame in driver.find_elements_by_xpath('.//frame'):
... frame.get_attribute('src')
...
'https://www.quackit.com/html/tutorial/frame_example_left.html'
'https://www.quackit.com/html/tutorial/frame_example_right.html'
Any questions, please ask. If this does what you want, please mark the answer 'accepted' since that's the protocol on SO.
Related
I have never done website scraping before. Not even sure if this is the way to go.
I want to be able to collect data from the tables in the image, which changes 5 times in a second for every parameter. This data will be available on this webserver (IP accessible) created automatically by a microchip. I want to collect and save this data to a database quick enough.
Am I correct to be looking into beautiful soup/selenium? If not, what tools can I use to collect and store data and make sure it is updated every second?
Any help much appreciated!
PS: I only know Python and SQL.
Webpage
Open Inspect Element and see if its a WS connection or simple AJAX
If its WS, then use https://pypi.org/project/websocket-client/
If its AJAX, then copy that request as cURL and convert that to Python code using https://curl.trillworks.com/
PS: you do not need to use Selenium at all.
I advise you to check curl. Whatever the language you wanna use.
What you want to do is possible with Selenium though
With Selenium :
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
s = Service('chromedriver.exe')
driver = webdriver.Chrome(service=s)
url = ...
driver.get(url)
while True:
element = driver.find_elements(By.CLASS_NAME, "nameOfElement")
#or driver.find_elements(By.TAG_NAME, "nameOfElement") or id or w/e
content = element[0].text
time.sleep(1)
etc ...
This is the site I want to scrape.
I want to scrape all the information in the table on the first page:
then click on the second and do the same:
And so on until the 51st page. I know how to use selenium to click on page two:
link = "http://www.nigeriatradehub.gov.ng/Organizations"
driver = webdriver.Firefox()
driver.get(link)
xpath = '/html/body/form/div[3]/div[4]/div[1]/div/div/div[1]/div/div/div/div/div/div[2]/div[2]/span/a[1]'
find_element_by_xpath(xpath).click()
But I don't know how to set the code up so that it cycles through each page. The process of me getting the xpath is a manual one in the first place (I go on to Firefox, inspect the item and copy it into the code), so I don't know how to automate that step in and of itself and then the following ones.
I tried going a level higher in the webpage html and choosing the entire section of the page with the elements I want, and cycling through them, but that doesn't work because it's a Firefox web object(see below). Here'a a snapshot of the relevant part of the page source:
By calling the xpath of the higher class like so:
path = '//*[#id="dnn_ctr454_View_OrganizationsListViewDataPager"]'
driver.find_element_by_xpath(path)
and trying to see if I can cycle though it:
for i in driver.find_element_by_xpath(path):
i.click()
I get the following error:
Any advice would be greatly appreciated.
This error message...
...implies that you are trying to iterate through a WebElement where as only list objects are iterable.
Solution
Within the for() loop to create a list to iterate through it's elements, instead of using find_element* you need to use find_elements*. So your effective code block will be:
for i in driver.find_elements_by_xpath(path):
i.click()
I am attempting to scrape the Census website for ACS data. I have scripted the whole processes using Selenium except the very last click. I am using Python. I need to click a download button that is in a window that pops when the data is zipped and ready, but I can't seem to identify this button. It also seems that the button might change names based on when it was last run, for example, yui-gen2, yui-gen3, etc so I am thinking I might need to account for this someone. Although I normally only see yui-gen2.
Also, the tag seems to be in a "span" which might be adding to my difficulty honing in on the button I need to click.
Please help if you can shed any light on this for me.
code snippet:
#Refine search results to get tables
driver.find_element_by_id("prodautocomplete").send_keys("S0101")
time.sleep(2)
driver.find_element_by_id("prodsubmit").click()
driver.implicitly_wait(100)
time.sleep(2)
driver.find_element_by_id("check_all_btn_above").click()
driver.implicitly_wait(100)
time.sleep(2)
driver.find_element_by_id("dnld_btn_above").click()
driver.implicitly_wait(100)
driver.find_element_by_id("yui-gen0-button").click()
time.sleep(10)
driver.implicitly_wait(100)
driver.find_element_by_id("yui-gen2-button").click()
enter image description here
enter image description here
Instead of using the element id, which as you pointed out varies, you can use XPath as Nogoseke mentioned or CSS Selector. Be careful to not make the XPath/selector too specific or reliant on changing values, in this case the element id. Rather than using the id in XPath, try expressing the XPath in terms of the DOM structure (tags):
//*/div/div/div/span/span/span/button[contains(text(),'Download')]
TIL you can validate your XPath by using the search function, rather than by running it in Selenium. I right-clicked the webpage, "inspect element", ctrl+f, and typed in the above XPath to validate that it is the Download button.
For posterity, if the above XPath is too specific, i.e. it is reliant on too many levels of the DOM structure, you can do something shorter, like
//*button[contains(text(),'Download')]
although, this may not be specific enough and may require an additional field, since there may be multiple buttons on the page with the 'Download' text.
Given the HTML you provided, you should be able to use
driver.find_element_by_id("yui-gen2-button")
I know you said you tried it but you didn't say if it works at all or what error message you are getting. If that never works, you likely have an IFRAME that you need to switch to.
If it works sometimes but not consistently due to changing ID, you can use something like
driver.find_element_by_xpath("//button[.='Download']")
On the code inspection view on Chrome you can right click on the item you want to find and copy the xpath. You can they find your element by xpath on Selenium.
I'm trying to click on the webpage "https://2018.navalny.com/hq/arkhangelsk/" from the website's main page. However, I get this error
selenium.common.exceptions.ElementNotInteractableException: Message:
There's nothing after "Message:"
My code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox()
browser.get('https://2018.navalny.com/')
time.sleep(5)
linkElem = browser.find_element_by_xpath("//a[contains(#href,'arkhangelsk')]")
type(linkElem)
linkElem.click()
I think xpath is necessary for me because, ultimately, my goal is to click not on a single link but on 80 links on this webpage. I've already managed to print all the relevant links using this :
driver.find_elements_by_xpath("//a[contains(#href,'hq')]")
However, for starters, I'm trying to make it click at least a single link.
Thanks for your help,
The best way to figure out issues like this, is to look at the page source using developer tools of your preferred browser. For instance, when I go to this page and look at HTML tab of the Firebug, and look for //a[contains(#href,'arkhangelsk')] I see this:
So the link is located within div, which is currently not visible (in fact entire sub-section starting from div with id="hqList" is hidden). Selenium will not allow you to click on invisible elements, although it will allow you to inspect them. Hence getting element works, clicking on it - does not.
What you do with it depends on what your expectations are. In this particular case it looks like you need to click on <label class="branches-map__toggle-label" for="branchesToggle">Список</label> to get that link visible. So add this:
browser.find_element_by_link_text("Список").click();
after that you can click on any links in the list.
I want to use Python to edit an element on a webpage. I've been trying to figure out how to use selenium to do that. Right now, this is what I have so far...
driver = webdriver.Chrome()
driver.get('https://www.website.com')
elem = driver.find_element_by_id('id')
print(elem)
Reading through the documentation (http://selenium-python.readthedocs.io/getting-started.html) I noticed they do the following
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
But I'm a little confused...is that changing the id name? I want to change a different aspect of the element I find. If you could point me in the right direction, or help me print out something more useful than elem (typical output looks like this (session="8428be97c843ee6fecc9038bceccbc0e", element="0.0761228464802568-1")), I'd really appreciate it!
Selenium isn't a tool to edit elements on website. It used commonly to automate tests imitating user behavior on website.