Using beautiful soup in a new tab

Using beautiful soup in a new tab - python

The code I am running opens up a webpage to get data from. Once I open up a new tab and try to scrape the data from that webpage it just scrapes it from the original webpage. Is there some sort of command or function I should try.
The line to create a new tab -
driver.switch_to.new_window('tab')
any help would be greatly appreciated

First, get all the windows handles and switch to another window or tab using the below code:
windows = driver.window_handles
for window in windows:
driver.switch_to.window(window)
If you have more than 2 windows or tabs:
def switch_to_windows(self, title):
windows = driver.window_handles
for window in windows:
driver.switch_to.window(window)
print("Window title: ",driver.title, ", Id: ",window)
if driver.title == title:
break
return True
pass the page's title to ensure you are on the right window or tab. Then you can scrape the data from the correct tab.

Related

Python Selenium: Closing a new tab with driver.close() is probably causing "no such window: target window already closed; web view not found" error

Recently I have started working with multiple tabs in Selenium and I have encountered a strange problem. When I'm executing this code:
WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
driver.switch_to.window(driver.window_handles[-1])
time.sleep(1)
url_in_page_source= eu.look_for_url_in_page_source(
page_html=driver.page_source,
left_delimiter='placeholder',
right_delimiter='placeholder'
)
driver.close()
driver.switch_to.window(driver.window_handles[0])
# time.sleep(10) # <--- this is fixing the error
return url_in_page_source
and immediately after the return statement when I'm trying to visit extracted url by driver.get() I'm getting this error:
Message: no such window: target window already closed
from unknown error: web view not found
However I have found out that adding a simple time.sleep(10) just before return statement is fixing the issue, but what is more strange to me - when I have tried to lower the wait time just below 10 secs the error is still existing. I have no idea why it is happening. Maybe I'm doing something wrong. I will be very grateful for any help and explanations.
Edit:
Here's source code of the eu.look_for_url_in_page_source() as per #JeffC request
def look_for_url_in_page_source(page_html, left_url_delimiter, right_url_delimiter):
print('Processing URL with f:look_for_url_in_page_source()')
# extracts multiple URLs from page_source
extracted_urls = []
while True:
# check if delimiters are present in page_html
find_left_url_delimiter = page_html.find(left_url_delimiter)
find_right_url_delimiter = page_html.find(right_url_delimiter)
if find_left_url_delimiter == -1 or find_right_url_delimiter == -1:
if len(extracted_urls) > 0:
return extracted_urls
print('f:look_for_url_in_page_source() was not able to get any text.')
return False
left_url_delimiter_pos = find_left_url_delimiter + len(left_url_delimiter)
right_url_delimiter_pos = page_html[left_url_delimiter_pos:].find(right_url_delimiter) + left_url_delimiter_pos
extracted_url = page_html[left_url_delimiter_pos:right_url_delimiter_pos].strip()
extracted_urls.append(extracted_url)
page_html = page_html[right_url_delimiter_pos:]

There are a lot many process involved in the process of closing a tab and switching to the parent Browsing Context which is beyond the scope of our naked eyes. Having said that, neither
driver.switch_to.window(driver.window_handles[-1])
is an ideal way to switch to the new tab, nor
driver.switch_to.window(driver.window_handles[0])
is an ideal way to switch to the parent tab.
You can find a detailed discussion on tab switching in Open web in new tab Selenium + Python
References
A few relevant discussions:
"NoSuchWindowException: no such window: window was already closed" while switching tabs using Selenium and WebDriver through Python3

pywinauto/others - getting opened tabs url from Edge

I have question about possibility to get url of all opened tabs in Edge browser.
I have started with pywinauto and I have possibility to get all tab names from Chrome or from Edge:
from pywinauto import Desktop
desktop = Desktop(backend="uia")
window = desktop.windows(title_re="edge", control_type="Window")[0] # change title_re to "chrome" to get values for chrome browser
wrapper_list = window.descendants(control_type="TabItem")
tab_names = [tab.window_text() for tab in wrapper_list]
And yes, I'm getting all opened Edge tabs.
I have also checked the code below:
import pywinauto
app = pywinauto.Application(backend='uia')
app.connect(title_re=".*Microsoft Edge.*", found_index=0)
dlg = app.top_window()
wrapper = dlg.child_window(title="App bar", control_type="ToolBar")
url = wrapper.descendants(control_type='Edit')[0]
print(url.get_value())
With no success - the "url = wrapper.descendants(control_type='Edit')[0]" returns error:
"ElementNotFound"
So... how to use it to get url of all the opened tabs? Or just one by using it's title?
It's not necessary to use pywinauto of course.
Have a nice day!

there is more easier way to do that, and it do more, for example,get title of each tab:
from clicknium import clicknium as cc
if not cc.edge.extension.is_installed():
cc.edge.extension.install_or_update()
for browser in cc.edge.browsers:
for tab in browser.tabs:
print(tab.url)

Refreshing the source code without refreshing the web page

I want to measure the number of cookies regarding of the policie accepted by the user.
So, for example, one the website https://sephora.fr i'm first accessing to the control panel :
driver = webdriver.Chrome(PATH)
driver.implicitly_wait(10)
driver.get(website)
driver.find_element_by_id('footer_tc_privacy_button').click()
Then i would like to click on the black button ("Enregistrer")
driver.find_element_by_id('save-consent').click()
The problem is that the HTML code source is updated after the first click and selenium keep the initial source code -> selenium can't find the button 'save-content'.
Unfortunately i can't refresh the page because it will close the control panel.
I tried to sleep some seconds after the first click, but it's not working.
Any idea ?
Edit : i also tried to switch to the Iframe
frame = driver.find_element_by_xpath('//frame[#name="privacy-iframe"]')
driver.switch_to.frame(frame)
because the button Enregistrer is on this iframe
<iframe id="privacy-iframe" class="tc-reset-css tc-privacy-center-iframe" src="https://cdn.trustcommander.net/privacy-center/default/modern/index.html" title="Vos paramètres cookies" lang="fr"></iframe>
but it's not also not working
EDIT : Solution
I switched to the iframe
frame = driver.find_element_by_id('privacy-iframe')
driver.switch_to.frame(frame)
driver.find_element_by_id('save-consent').click()
then i switched back to the parent
driver._switch_to.parent_frame()

I do not see the name in the iframe tag attribute shared by OP :
try with ID instead :
frame = driver.find_element_by_id('privacy-iframe')
driver.switch_to.frame(frame)

How to open new window using robot framework, selenium?

There is a link on my tested page which is opened in new TAB by default. I need to open the link and verify some values on the newly opened page.
As I found selenium does not support working with tabs, so I am trying to open the link in new window, but it still does not work..
I implemented python function to hold SHIFT key (I have done this before for CTRL and it works) and then I called "click" function, but the link is still being opened in new tab
from robot.libraries.BuiltIn import BuiltIn
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
class CustomSeleniumLibrary(object):
def __init__(self):
self.driver = None
self.library = None
self.ac = None
def get_library_instance(self):
if self.library is None:
self.library = BuiltIn().get_library_instance('ExtendedSelenium2Library')
return self.library
def get_action_chain(self):
if self.ac is None:
self.ac = ActionChains(self.get_library_instance()._current_browser())
return self.ac
def hold_shift(self):
actionChain = self.get_action_chain()
actionChain.key_down(Keys.SHIFT)
actionChain.perform()
The robot keyword is
Open project detail
wait until element is visible ${LINK_TO_PROJECT}
${project}= get text ${LINK_TO_PROJECT}
hold shift
click element ${LINK_TO_PROJECT}
#sleep 2s
#release shift
element should contain //h3 Project Details: ${project}
I tried many variants with sleeps, releasing the key etc. but it never really opens the link in new window. I also tried to verify the data in newly opened tab (without trying to open in new window), but it is always redirected into original tab very quickly so the DOM on new tab is not loaded yet.. Thanks for any suggestion!

You can use below code to handle page opened in new tab:
current = driver.current_window_handle
driver.find_element_by_css_selector('a').click() # use own selector
new_tab = [tab for tab in driver.window_handles if tab != current][0]
driver.switch_to.window(new_tab)
# do some actions
driver.close()
driver.switch_to.window(current)
Also you can make little hack (not recommended, but...) to avoid handling new tabs and force link to open in the current tab:
link = driver.find_element_by_css_selector('a')
driver.execute_script('arguments[0].target="_self";', link)

Downloading multiple files using Selenium click()?

Using Firefox/Python/Selenium-- I am able to use click() on a file link on a webpage to download it, and the file downloads to my Downloads folder as expected.
However, when I add more lines to click() on more than 1 link, the script no longer runs as expected. Instead of the files being downloaded, they are all opening in separate browser windows, which all close after the script completes.
Is this by design or is there a way around it or a better way to download multiple files on a webpage?
This is the website in question: https://www.treasury.gov/about/organizational-structure/ig/Pages/igdeskbook.aspx
I am trying to download the links to the Introduction and all parts of Volumes 1-4.
I have a dictionary of the locators:
IgDeskbookPageMap = dict(IgDeskbookBannerXpath = "//div[contains(text(), 'The Inspector General Deskbook')]",
IgDeskbookIntroId = "anch_202",
IgDeskbookVol1Part1Id = "anch_203",
IgDeskbookVol1Part2Id = "anch_204",
IgDeskbookVol1Part3Id = "anch_205",
IgDeskbookVol1Part4Id = "anch_206",
IgDeskbookVol2Id = "anch_207",
IgDeskbookVol3Id = "anch_208",
IgDeskbookVol4Part1Id = "anch_209",
IgDeskbookVol4Part2Id = "anch_210",
IgDeskbookVol4Part3Id = "anch_211"
This is the method:
def click(self, waitTime, locatorMode, Locator):
self.wait_until_element_clickable(waitTime, locatorMode, Locator).click()
These are the click() calls (there are more than 3, but just truncating here for space:
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookIntroId']
)
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookVol1Part1Id']
)
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookVol1Part2Id']
)

I added the following code for launching Firefox and now the download behavior works as expected when clicking on each file:
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.helperApps.alwaysAsk.force', False)
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf,application/x-pdf')
profile.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
profile.set_preference("pdfjs.disabled", True)
self.driver = webdriver.Firefox(profile)

A way to download such multiple files if opened in different tabs could be to follow these algorithmic steps in your own coding language :
for( all such links) :
click() the pdf link
findElement the download element
click() the download link
close the tab
switch back to last tab //should ideally be completed with previous step

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using beautiful soup in a new tab - python

Related

Python Selenium: Closing a new tab with driver.close() is probably causing "no such window: target window already closed; web view not found" error

pywinauto/others - getting opened tabs url from Edge

Refreshing the source code without refreshing the web page

How to open new window using robot framework, selenium?

Downloading multiple files using Selenium click()?

Categories

Resources