How to Fetch href links in Chromedriver? - python

I am trying to scrape the link from a button. If I click the button, it opens a new tab and I can't navigate in it. So I thought I'd scrape the link, go to it via webdriver.get(link) and do it that way since this will be a background program. I cannot find any tutorials on this using the most recent version of selenium. This is in Python
I tried using
wd.find_element("xpath", 'xpath here')
but that just scrapes the button title. Is there a different tag I should be using?
I've also tried just clicking the button but that opens a new tab and I don't know how to navigate on it, since it doesn't work by default and I'm still fairly new to Chromedriver.
I can't use beautifulsoup to my knowledge, since the webpage must be logged in.

You need to get the href attribute of the button. If your code gets the right button you can just use
button.get_attribute("href")
Of course if you get redirected using Javascript this is a different story, but since you didn't specify I will assume my answer works

You can use swith_of function to manage multiple windows(tabs) in same test case session
driver.switch_to.window(name_or_handler)
An extra information: If you want to get attribute value from element, you can use get_attribute() function
link_value = driver.find_element(By, selector).get_attribute("href")
P.S: example code written in Python. If you use another language, you can use equivalent Selenium functions for them.

Related

How can I click "invisible" reCAPTCHA buttons using Selenium web automation?

I am using Python and Selenium to automate this website: https://prenotami.esteri.it
The script I made fills out a form and then clicks a button to advance to the next page. These actions are carried out using Selenium's find_element_by_xpath() function. Recently, the website added a reCAPTCHA that pops up after the button is clicked, and must be completed before advancing.
I have already written a Python script that is capable of surpassing this type of captchas by using the audio option. However, in this particular website, I am not able to find the xpath to the audio button of the reCAPTCHA. Although there is an iframe that contains the reCAPTCHA, there seems not to be anything inside it.
In the first attached image you can see how this website's reCAPTCHA looks like in HTML, compared to other website that is visible in the second image, where a #document can be seen inside the iframe.
My intention is to run this program using headless Chrome, so I can't relay in any mouse control functions offered by pyautogui for example.
I've been scratching my head around this problem for a while, so any advice is useful. Thanks!
Edit: after some research I have found that this type of reCAPTCHA that doesn't need to check a "I am not a robot" checkbox is called "invisible reCAPTCHA". The captcha only pops up if the detected activity is suspicious (for example clicking too fast). I have tried adding random waits and movements to mimic human behaviour, but the captcha still appears after some tries. Since I don't think there is a way to avoid the captcha from appearing 100% of the times, the question of how to click the buttons using Selenium's find_element_by_xpath() function remains the same. Leaving this as a note just in case someone finds it useful.
Ever tried to use the following function:
add_argument("-auto-open-devtools-for-tabs")
I managed to interact with captcha
If the position is always fixed, you can use PyAutoGUI to move the mouse and click on it
import pyautogui
pyautogui.click(100, 100) # button coordinates
Since, it is in iframe, we need to move our selenium pointing to iframe and then use your xpath.
driver.switch_to.frame("c-la7g7xqfbit4")
capchaBtn = driver.find_element_by_xpath("(//button[#id='recaptcha-audio-button'])[2]")

Clicking multiple <span> elements with Selenium Python

I'm new to using Selenium, and I am having trouble figuring out how to click through all iterations of a specific element. To clarify, I can't even get it to click through one as it's a dropdown but is defined as an element.
I am trying to scrape fanduel; when clicking on a specific game you are presented with a bunch of main title bets and in order to get the information I need to click the dropdowns to get to that information. There is also another drop down that states, "See More" which is a similar problem, but assuming this gets fixed I'm assuming I will be able to figure that out.
So far, I have tried to use:
find_element_by_class_name()
find_element_by_css_selector()
I have also used them in the sense of elements, and tried to loop through and click on each index of the list, but that did not work.
If there are any ideas, they would be much appreciated.
FYI: I am using beautiful soup to scrape the website for the information, I figured Selenium would be helpful making the information that isn't currently accessible, accessible.
This image shows the dropdowns that I am trying to access, in this case the dropdown 'Win Margin'. The HTML code is shown to the left of it.
This also shows that there are multiple dropdowns, varying in amount based off the game.
You can also try using action chains from selenium
menu = driver.find_element_by_css_selector(".nav")
hidden_submenu = driver.find_element_by_css_selector(".nav # submenu1")
ActionChains(driver).move_to_element(menu).click(hidden_submenu).perform()
Source: here

Python Selenium - Swtich calendar year -

I'm trying to click on a button to change the year displayed by the calendar via the python module called Selenium.
I have tried all of the methods presented by other users but nothing seems to work. In fact, it seems to be impossible to click on a href containing some sort of Javascript code.
Have you ever encountered the same problem?
I'll join 2 pictures (Html code and UI of the calendar).
Calendar UI
HTML Code
Thanks for your help/time.
From the HTML you provided, you should be able to use either link text or an XPath and click on the link/A tag that contains "<<"
find_element_by_link_text("<<")
find_element_by_xpath("//a[.='<<']")
Either one should work.

Selenium Python: Census ACS Data- unable to select Download button in window

I am attempting to scrape the Census website for ACS data. I have scripted the whole processes using Selenium except the very last click. I am using Python. I need to click a download button that is in a window that pops when the data is zipped and ready, but I can't seem to identify this button. It also seems that the button might change names based on when it was last run, for example, yui-gen2, yui-gen3, etc so I am thinking I might need to account for this someone. Although I normally only see yui-gen2.
Also, the tag seems to be in a "span" which might be adding to my difficulty honing in on the button I need to click.
Please help if you can shed any light on this for me.
code snippet:
#Refine search results to get tables
driver.find_element_by_id("prodautocomplete").send_keys("S0101")
time.sleep(2)
driver.find_element_by_id("prodsubmit").click()
driver.implicitly_wait(100)
time.sleep(2)
driver.find_element_by_id("check_all_btn_above").click()
driver.implicitly_wait(100)
time.sleep(2)
driver.find_element_by_id("dnld_btn_above").click()
driver.implicitly_wait(100)
driver.find_element_by_id("yui-gen0-button").click()
time.sleep(10)
driver.implicitly_wait(100)
driver.find_element_by_id("yui-gen2-button").click()
enter image description here
enter image description here
Instead of using the element id, which as you pointed out varies, you can use XPath as Nogoseke mentioned or CSS Selector. Be careful to not make the XPath/selector too specific or reliant on changing values, in this case the element id. Rather than using the id in XPath, try expressing the XPath in terms of the DOM structure (tags):
//*/div/div/div/span/span/span/button[contains(text(),'Download')]
TIL you can validate your XPath by using the search function, rather than by running it in Selenium. I right-clicked the webpage, "inspect element", ctrl+f, and typed in the above XPath to validate that it is the Download button.
For posterity, if the above XPath is too specific, i.e. it is reliant on too many levels of the DOM structure, you can do something shorter, like
//*button[contains(text(),'Download')]
although, this may not be specific enough and may require an additional field, since there may be multiple buttons on the page with the 'Download' text.
Given the HTML you provided, you should be able to use
driver.find_element_by_id("yui-gen2-button")
I know you said you tried it but you didn't say if it works at all or what error message you are getting. If that never works, you likely have an IFRAME that you need to switch to.
If it works sometimes but not consistently due to changing ID, you can use something like
driver.find_element_by_xpath("//button[.='Download']")
On the code inspection view on Chrome you can right click on the item you want to find and copy the xpath. You can they find your element by xpath on Selenium.

Selenium: How to get current url of a tab without switching to it?

I often open hundreds of tabs when using web browsers, and this slows my computer. So I want to write a browser manager in Python and Selenium , which opens tabs and can save the urls of those tabs, then I can reopen them later.
But it seems like the only way to get the url of a tab in Python Selenium is calling get_current_url.
I'm wondering if there's a way to get the url of a tab without switching to it?
There is no other way to get the specific tab titles of the browser without switching to the specific TAB as Selenium needs focus on the DOM Tree to perform any operation.
Just go to the text link which is switching to other tab and save its #href attribute link into a string or list
I am not sure about your actual scenario but we can get the list of all hyperlinks present in the current page. The idea is to collect all web elements with tag "a" and later get their "href" attribute value. Below is a sample code in Java. Kindly modify it accordingly.
//Collecting all hyperlink elements
List<WebElement> allLinks = driver.findElements(By.tagName("a"));
//For each Hyperlink element getting its target href value
for(WebElement link: allLinks)
{
System.out.println(link.getAttribute("href"));
}
Hope this helps you.
Thanks.
My recommendation is to use an extension for that or write/extend your own.There seem to be some of those types like
https://addons.mozilla.org/en-US/firefox/addon/export-tabs-urls-and-titles/
or
https://chrome.google.com/webstore/detail/save-all-tab-urls/bgjfbcjoaghcfdhnnnnaofkjbnelkkcm?hl=en-GB
To my kowledge, there is no way of getting/accessing an url of a webpage without first switching to it.

Categories