I have a basic Selenium script that makes use of the chromedriver binary. I'm trying to display a page with recaptcha on it and then hang until the answer has been completed and then store that in a variable for future use.
The roadblock I'm hitting is that I am unable to find the recaptcha element.
#!/bin/env python2.7
import os
from selenium import webdriver
driverBin=os.path.expanduser("~/Desktop/chromedriver")
driver=webdriver.Chrome(driverBin)
driver.implicitly_wait(5)
driver.get('http://patrickhlauke.github.io/recaptcha/')
Is there anything special needed to be able to see this element?
Also is there a way to grab the token after user solve without refreshing the page?
As it is now the input type of the recaptcha-token id is hidden. After solve a second recaptcha-token id is created. This is the value I wish to store in a variable. I was thinking of having a loop of checking length of found elements with that id. If greater than 1 parse. But I'm unsure whether the source updates per se.
UPDATE:
With more research it has to do with the nature of the element, particularly: with the tag: <input type="hidden". So I guess to rephrase my question, how does one extract the value of a hidden element.
The element you are looking for (the input) is in an iframe. You'll need switch to the iframe before you can locate the element and interact with it.
import os
from selenium import webdriver
driver=webdriver.Chrome()
try:
driver.implicitly_wait(5)
driver.get('http://patrickhlauke.github.io/recaptcha/')
# Find the iframe and switch to it
iframe_path = '//iframe[#title="recaptcha widget"]'
iframe = driver.find_element_by_xpath(iframe_path)
driver.switch_to.frame(iframe)
# Find the input element
input_elem = driver.find_element_by_id("recaptcha-token")
print("Found the input element: ", input_elem)
finally:
driver.quit()
Related
#Thanks in advance for help. New to python, tried for hour trying to correct mistake.#
Trying to locate login button element. Attached is the image of the website with the element of the login button. please see here
Below is code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
url = "https://www.xxxxxx.com/index.php/admin/index"
username = 'xxxx'
password = 'xxxxx'
driver = webdriver.Firefox(executable_path="C:\\Users\\kk\\AppData\\Local\\Programs\\Python\\Python38-32\\geckodriver.exe")
driver.get(url)
driver.find_element_by_name(name='aname').send_keys(username)
driver.find_element_by_name(name='apass').send_keys(password)
driver.find_elements_by_xpath('//style[#type="submit"]')
Rather than finding it with a CSS selector. Why not use find_element_by_xpath()
To get the XPath of that element just right-click the HTML of the input in Inspect Element, hover over Copy and you'll see "Full XPath"
Issue is your xpath.
driver.find_elements_by_xpath('//style[#type="submit"]')
Use below:
driver.find_elements_by_xpath('//input[#type="submit"]')
or
driver.find_elements_by_xpath('//input[#value="login"]')
#This is more accurate as many input tags could have type as submit
Also, please use some sort of wait as i am not sure if page will be loading fast enough every time you launch URL.
You can identify the submit button by using any of these 2:
//input[#type="submit"] or //input[#value="login"]
They should work without any problem if you don't have any similar elements on your page (which I doubt)
But if you want to be more precise, you can mix these 2 into:
//input[#value="login" and #type="submit"]
My code:
website = browser.find_element_by_link_text('Website')
if website:
website.click()
else:
print('no website')
What I am trying to do is click the button if it is available on the page. If the button isn't available I want it to print no website available on the console and proceed to the next step.
I do not know what I am doing wrong does anyone know how to do fix this?
Thanks in advance I am new to coding!
Before you find an element, you need first visit a website.
driver = webdriver.Firefox()
print('Acessando web site: {}'.format(os.getenv('VISIT_URL')))
driver.get('www.example.com')
#here onwards you can access browser elements like buttons links etc
You are giving a instance of Web element to if() condition, where a boolean expression is expected.
1) Try checking the presence first using find_elements_by_link_text() # _elements_
if len(driver.find_elements_by_link_text('Website')) > 0:
driver.find_element_by_id('Website').click()
2) Or use expected_conditions to check whether the element is available; expected_conditions documentation
3) Or use try/except block;
try:
website = driver.find_elements_by_link_text('Website')
except NoSuchElementException:
# code to execute if the expected element is not available
How do I click an element using selenium and beautifulsoup in python? I got these lines of code and I find it difficult to achieve. I want to click every element in every iteration. There are no pagination or next page. There are only like about 10 elements and after clicking the last element, it should stop. Does anyone know what should I do. Here are my code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
import urllib
import urllib.request
from bs4 import BeautifulSoup
chrome_path = r"C:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
url = 'https://www.99.co/singapore/condos-apartments/a-treasure-trove'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html,'lxml')
details = soup.select('.FloorPlans__container__rwH_w') //Whole container of the result
for d in details:
picture = d.find('span',{'class':'Tappable-inactive'}).click() //the single element.
print(d)
driver.close()
Here is the site https://www.99.co/singapore/condos-apartments/a-treasure-trove . I want to scrape the details and the image in every floor plans section but it is difficult because the image only appears after you click the specific element. I can only get the details except for the image itself. Try it yourself so that you know what I mean.
EDIT:
I tried this method
for d in driver.find_elements_by_xpath('//*[#id="floorPlans"]/div/div/div/div/span'):
d.click()
The problem is it clicks too fast that the image couldn't load. And also im using selenium here. Is there any method like selecting a beautifulsoup like this format picture = d.find('span',{'class':'Tappable-inactive'}).click() ?
You cannot interact with website widgets by using beautifulSoup you need to work with selenium. There are 2 ways to handle this problem.
First is to get the main wrapper (class) of the 10 elements and then iterate to each child element of the main class.
You can get the element by xpath and increment the last number in xpath by one in each iteration to move to the next element.
I print some result to check your code.
"details" only has one item.
And "picture" is not element. (So it's not clickable.)
details = soup.select('.FloorPlans__container__rwH_w')
print(details)
print(len(details))
for d in details:
print(d)
picture = d.find('span',{'class':'Tappable-inactive'})
print(picture)
Output:
For your edited version, you can check images have been visible before you do click().
Use visibility_of_element_located to do.
Reference: https://selenium-python.readthedocs.io/waits.html
Disclaimer: I do not have any background in web-scraping/HTML/javascripts/css and the likes but I know a bit of Python.
My end goal is to download all 4th image view of every 3515 car views in the ShapeNet website WITH the associated tag.
For instance the first of the 3515 couples would be the image that can be found in the collapse menu on the right of this picture: (that can be loaded by clicking on the first item of the first page and then on Images) with the associated tag "sport utility" as can be seen in the first picture (first car top left).
To do that I wrote with the help of #DebanjanB a snippet of code that click on the sport utility on the first picture opens the iframe clicks on images and then download the 4th picture link to my question. The full working code is this one:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
import os
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.socks", "yourproxy")
profile.set_preference("network.proxy.socks_port", yourport)
#browser = webdriver.Firefox(firefox_profile=profile)
browser = webdriver.Firefox()
browser.get('https://www.shapenet.org/taxonomy-viewer')
#Page is long to load
wait = WebDriverWait(browser, 30)
element = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[#id='02958343_anchor']")))
linkElem = browser.find_element_by_xpath("//*[#id='02958343_anchor']")
linkElem.click()
#Page is also long to display iframe
element = wait.until(EC.element_to_be_clickable((By.ID, "model_3dw_bcf0b18a19bce6d91ad107790a9e2d51")))
linkElem = browser.find_element_by_id("model_3dw_bcf0b18a19bce6d91ad107790a9e2d51")
linkElem.click()
#iframe slow to be displayed
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, 'viewerIframe')))
#iframe = browser.find_elements_by_id('viewerIframe')
#browser.switch_to_frame(iframe[0])
element = wait.until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[3]/div[3]/h4")))
time.sleep(10)
linkElem = browser.find_element_by_xpath("/html/body/div[3]/div[3]/h4")
linkElem.click()
img = browser.find_element_by_xpath("/html/body/div[3]/div[3]//div[#class='searchResult' and #id='image.3dw.bcf0b18a19bce6d91ad107790a9e2d51.3']/img[#class='enlarge']")
src = img.get_attribute('src')
os.system("wget %s --no-check-certificate"%src)
There are several issues with this. First I need to know by hand the xpath model_3dw_bcf0b18a19bce6d91ad107790a9e2d51 for each model I also need to extract the tag they both can be found at:
. So I need to extract it by inspecting every image displayed. Then I need to switch page (there are 22 pages) and maybe even scroll down on each page to be sure I have everything. Secondly I had to use time.sleep twice because the other method based on wait to be clickable does not seem to work as intented.
I have two questions the first one is obvious is it the right way of proceeding ? I feel that even if this could be quite fast without the time.sleep this feels very much like what a human would do and therefore must be terribly inefficient secondly if it is indeed the way to go: How could I write a double for loop on pages and items to be able to extract the tag and model id efficiently ?
EDIT 1: It seems that:
l=browser.find_elements_by_xpath("//div[starts-with(#id,'model_3dw')]")
might be the first step towards completion
EDIT 2: Almost there but the code is filled with time.sleep. Still need to get the tag name and to loop through the pages
EDIT 3: Got the tag name still need to loop through the pages and will post first draft of solution
So let me try to understand correctly what you mean and then see if I can help you solve the problem. I do not know Python, so excuse my synthax errors.
You want to click on each and every of the 183533 cars, and then download the 4th image within the iframe that pops up. Correct?
Now if this is the case, lets look at the first element you need, elements on the page with all the cars on it.
So to get all 160 cars of page 1, you are going to need:
elements = browser.find_elements_by_xpath("//img[#class='resultImg lazy']");
This is going to return 160 image elements for you. Which is exactly the amount of the displayed images (on page 1)
Then you can say:
for el in elements:
{here you place the code you need to download the 4th image,
so like switch to iframe, click on the 4th image etc.}
Now, for the first page, you have made a loop which will download the 4th image for every vehicle on it.
This doens't entirely solve your problem as you have multiple pages. Thankfully, the page navigation, previous and next, are greyed out on first and/or last page.
So you can just say:
browser.find_element_by_xpath("//a[#class='next']").click();
Just make sure you catch if element is not clickable as element will be greyed out on the last page.
Rather than scraping the site, you might consider examining the URLs that the webpage uses to query the data, then use the Python 'requests' package to simply make API requests directly from the server. I'm not a registered user on the site, so I can't provide you with any examples, but the paper that describes the shapenet.org site specifically mentions:
"To provide convenient access to all of the model and an-
notation data contained within ShapeNet, we construct an
index over all the 3D models and their associated annota-
tions using the Apache Solr framework. Each stored an-
notation for a given 3D model is contained within the index
as a separate attribute that can be easily queried and filtered
through a simple web-based UI. In addition, to make the
dataset conveniently accessible to researchers, we provide a
batched download capability."
This suggests that it might be easier to do what you want via API, as long as you can learn what their query language provides. A search in their QA/Forum may be productive too.
I came up with this answer, which kind of works but I don't know how to remove the several calls to time.sleep I will not accept my answer until someone finds something more elegant (also when it arrives at the end of the last page it fails):
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
import os
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.socks", "yourproxy")
profile.set_preference("network.proxy.socks_port", yourport)
#browser = webdriver.Firefox(firefox_profile=profile)
browser = webdriver.Firefox()
browser.get('https://www.shapenet.org/taxonomy-viewer')
#Page is long to load
wait = WebDriverWait(browser, 30)
element = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[#id='02958343_anchor']")))
linkElem = browser.find_element_by_xpath("//*[#id='02958343_anchor']")
linkElem.click()
tag_names=[]
page_count=0
while True:
if page_count>0:
browser.find_element_by_xpath("//a[#class='next']").click()
time.sleep(2)
wait.until(EC.presence_of_element_located((By.XPATH, "//div[starts-with(#id,'model_3dw')]")))
list_of_items_on_page=browser.find_elements_by_xpath("//div[starts-with(#id,'model_3dw')]")
list_of_ids=[e.get_attribute("id") for e in list_of_items_on_page]
for i,item in enumerate(list_of_items_on_page):
#Page is also long to display iframe
current_id=list_of_ids[i]
element = wait.until(EC.element_to_be_clickable((By.ID, current_id)))
car_image=browser.find_element_by_id(current_id)
original_tag_name=car_image.find_element_by_xpath("./div[#style='text-align: center']").get_attribute("innerHTML")
count=0
tag_name=original_tag_name
while tag_name in tag_names:
tag_name=original_tag_name+"_"+str(count)
count+=1
tag_names.append(tag_name)
car_image.click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, 'viewerIframe')))
element = wait.until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[3]/div[3]/h4")))
time.sleep(10)
linkElem = browser.find_element_by_xpath("/html/body/div[3]/div[3]/h4")
linkElem.click()
img = browser.find_element_by_xpath("/html/body/div[3]/div[3]//div[#class='searchResult' and #id='image.3dw.%s.3']/img[#class='enlarge']"%current_id.split("_")[2])
src = img.get_attribute('src')
os.system("wget %s --no-check-certificate -O %s.png"%(src,tag_name))
browser.switch_to.default_content()
browser.find_element_by_css_selector(".btn-danger").click()
time.sleep(1)
page_count+=1
One can also import a NoSuchElementException from selenium and use a while True loop with try except to get rid of the arbitrary time.sleep.
from selenium import webdriver
browser=webdriver.Firefox()
browser.get("http://dollarupload.com/dl/08c646d60")
browser.find_element_by_id("reg_download").click()
elementlist=browser.find_elements_by_class_name("offer_title")
Actually I was trying to get all the class named offer_title and with that I would like to click the link.But as I can see elementlist is empty.Why?
I think the elements you want are in an iframe to a different page.
If you get the iframe element and try again from there you might have more luck.
elementlist=browser.find_element_by_tag_name("iframe").find_elements_by_class_name("offer_title")