Generalised solution to find all images on a website, Selenium Python

Generalised solution to find all images on a website, Selenium Python - python

I am looking to find all images in a website without having to click through every image. For example, the first page contains a bunch of images, click the first image and goes to another page with the story/images. Retrieve the image/text and so on.
Site of interest is: https://www.homestolove.com.au/australian-house-and-garden.
I have found that all image xpaths ends with /img, as per below. Is there away to do this using Selenium? Can I use regex to search the pages? Using Selenium because of JS content.
xpath
//*[#id="app"]/div/div/div[3]/div/div[1]/div[1]/section/div/div[1]/div[3]/article[3]/div/a/span/picture/img
Or do I need to click on every page, kinda painful.
How is this normally done?
Thanks.

As you said, you want to select the images, you can use
list_of_img = driver.find_elements_by_tag_name('img')
and then iterate through list_of_img.
So in your case,
from selenium import webdriver
driver = webdriver.Chrome('webdriver-meta\\chromedriver.exe')
driver.get("https://www.homestolove.com.au/australian-house-and-garden")
imgs = driver.find_elements_by_tag_name('img')

Related

How do I loop through these web pages with selenium?

I am new to programming but am getting familiar with web-scraping.
I wish to write a code which clicks on each link on the page.
In my attempted code, I have made a sample of just two links to click on to speed things up. However, my current code is only yielding the first link to be clicked on but not the second.
from selenium import webdriver
import csv
driver = webdriver.Firefox()
driver.get("https://www.betexplorer.com/baseball/usa/mlb-2018/results/?
stage=KvfZSOKj&month=all")
matches = driver.find_elements_by_xpath('//td[#class="h-text-left"]')
m_samp = matches[0:1]
for i in m_samp:
i.click()
driver.get("https://www.betexplorer.com/baseball/usa/mlb-2018/results/?
stage=KvfZSOKj&month=all")
Ideally, I would like it to click the first link, then go back to the previous page, then click the second link, then go back to the previous page.
Any help is appreciated.

First take the all the clickable urls into one list
then iterate list
like list_urls= ["url1","url2"]
for i in list_urls:
driver.get(i)
save the all urls other wise going back and clicking will not work , because the you have only one instance of driver not the multiple

how to extract openload link from a website using selenium in python

I have been trying to extract the src which is a openload link from a website.
The src is loacted in iframe which is loaded dynamically.
the website is "https://www1.fmovies.se/film/daddys-home-2.kk29w".
Now The problem is that iframe is loaded dynamically. So this is my code
from selenium import webdriver
driver=webdriver.Chrome('C:\\Users\\aman krishna\Desktop\\New folder(3)\chromedriver.exe')
driver.get("https://bmovies.to/film/daddys-home-2.kk29w/78vp5j")
driver.find_element_by_xpath("//iframe[contains(#src,'<https://openload.co/embed/qe3n5GZGyGo/?autostart=true')]")

I could spoon feed you the code, as I wrote my own scraper with python that is eerily similar to what you've posted. but that wouldn't help you in the long run.
I'll give you a hint though. Use var box and var frames to get what you need.

Why does trying to click with selenium brings up "ElementNotInteractableException"?

I'm trying to click on the webpage "https://2018.navalny.com/hq/arkhangelsk/" from the website's main page. However, I get this error
selenium.common.exceptions.ElementNotInteractableException: Message:
There's nothing after "Message:"
My code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox()
browser.get('https://2018.navalny.com/')
time.sleep(5)
linkElem = browser.find_element_by_xpath("//a[contains(#href,'arkhangelsk')]")
type(linkElem)
linkElem.click()
I think xpath is necessary for me because, ultimately, my goal is to click not on a single link but on 80 links on this webpage. I've already managed to print all the relevant links using this :
driver.find_elements_by_xpath("//a[contains(#href,'hq')]")
However, for starters, I'm trying to make it click at least a single link.
Thanks for your help,

The best way to figure out issues like this, is to look at the page source using developer tools of your preferred browser. For instance, when I go to this page and look at HTML tab of the Firebug, and look for //a[contains(#href,'arkhangelsk')] I see this:
So the link is located within div, which is currently not visible (in fact entire sub-section starting from div with id="hqList" is hidden). Selenium will not allow you to click on invisible elements, although it will allow you to inspect them. Hence getting element works, clicking on it - does not.
What you do with it depends on what your expectations are. In this particular case it looks like you need to click on <label class="branches-map__toggle-label" for="branchesToggle">Список</label> to get that link visible. So add this:
browser.find_element_by_link_text("Список").click();
after that you can click on any links in the list.

How to switch to frameset using Python Selenium?

I have been trying to understand how switching to a frameset works using py-selenium with no avail.
The website I am stating here is very similar to the web application that I am working on.
https://www.quackit.com/html/tutorial/frame_example_frameset_1.html
Ideally I would like to access element-1 in this image and then move to the second frame and access element-2.

Here is one approach.
Load the initial page. Use an xpath expression to find the two frame elements. The, for each of them, get its url. Now you can use driver.get again (for each url) to load the page corresponding to the frame, and then find the p element that you want.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('https://www.quackit.com/html/tutorial/frame_example_frameset_1.html')
>>> for frame in driver.find_elements_by_xpath('.//frame'):
... frame.get_attribute('src')
...
'https://www.quackit.com/html/tutorial/frame_example_left.html'
'https://www.quackit.com/html/tutorial/frame_example_right.html'
Any questions, please ask. If this does what you want, please mark the answer 'accepted' since that's the protocol on SO.

Extracting info from dynamic page element in Python without "clicking" to make visible?

For the life of me I can't think of a better title...
I have a Python WebDriver-based scraper that goes to Google, enters a local search such as chiropractors+new york+ny, which, after clicking on More chiropractors+New York+NY, ends up on a page like this
The goal of the scraper is to grab the phone number and full address (including suite# etc.) of each of the 20 results on such a results page. In order to do so, I need to have WebDriver click 20 entries needs to be clicked on the bring up an overlay over the Google Map:
This is mighty slow. Were it not having to trigger each of these overlays, I would be able to do everything up to that point with the much faster lxml, by going straight to the ultimate URL of the results page and then extracting via XPath. But I appear to be stuck with not being able to get data from the overlay without first clicking on the link that brings up the overlay.
Is there a way to get the data out of this page element without having to click the associated links?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generalised solution to find all images on a website, Selenium Python - python

Related

How do I loop through these web pages with selenium?

how to extract openload link from a website using selenium in python

Why does trying to click with selenium brings up "ElementNotInteractableException"?

How to switch to frameset using Python Selenium?

Extracting info from dynamic page element in Python without "clicking" to make visible?

Categories

Resources