How can I open a new window and close the previous one, then open again another window and close the previous one.
The number of links is indeterminate.
urls = ["https://www.oxxo.com/",
"https://pypi.org/project/fake-useragent/",
"https://www.youtube.com/"]
for posts in range(len(urls)):
print(posts)
driver.get(urls[posts])
driver.implicitly_wait(val)
timestamp = datetime.datetime.now().strftime('%d_%m_%Y')
chars = [':','/','%']
image_name = driver.current_url
for char in chars:
image_name = image_name.replace(char,'_')
driver.save_screenshot(str(cont)+'_'+image_name+'_'+timestamp+'.png')
cont += 1
if(posts!=len(urls)-1):
driver.execute_script("window.open('');")
chwd = driver.window_handles
driver.switch_to.window(chwd[-1])
driver.close()
The urls within the list urls are independent. So neither you need a reference nor you need to open them in the adjacent tab. You can easily invoke them once by one and do your task as follows:
driver = webdriver.Chrome(service=s, options=options)
urls = ["https://www.oxxo.com/",
"https://pypi.org/project/fake-useragent/",
"https://www.youtube.com/"]
for url in urls:
driver.get(url)
time.sleep(3) # for demonstaration
print("Do whatever you wish")
driver.quit()
Related
I wanted to extract text from multiple pages. Currently, I am able to extract data from the first page but I want to append and go to muliple pages and extract the data from pagination. I have written this simple code which extracts data from the first page. I am not able to extract the data from multiple pages which is dynamic in number.
`
element_list = []
opts = webdriver.ChromeOptions()
opts.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
base_url = "XYZ"
driver.maximize_window()
driver.get(base_url)
driver.set_page_load_timeout(50)
element = WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.ID, 'all-my-groups')))
l = []
l = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
for i in l:
print(i.text)
`
I have shared the images of class if this could help from pagination.
If we could extract the automate and extract from all the pages that would be awesome. Also, I am new so please pardon me for asking silly questions. Thanks in advance.
You have provided the code just for the previous page button. I guess you need to go to the next page until next page exists. As I don't know what site we are talking about I can only guess its behavior. So I'm assuming the button 'next' disappears when no next page exists. If so, it can be done like this:
element_list = []
opts = webdriver.ChromeOptions()
opts.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
base_url = "XYZ"
driver.maximize_window()
driver.get(base_url)
driver.set_page_load_timeout(50)
element = WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.ID, 'all-my-groups')))
l = []
l = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
while True:
try:
next_page = driver.find_element(By.XPATH, '//button[#label="Next page"]')
except NoSuchElementException:
break
next_page.click()
l.extend(driver.find_elements(By.XPATH, "//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]"))
for i in l:
print(i.text)
To be able to catch the exception this import has to be added:
from selenium.common.exceptions import NoSuchElementException
Also note that the method find_elements_by_xpath is deprecated and it would be better to replace this line:
l = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
by this one:
l = driver.find_elements(By.XPATH, "//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
This is the link
https://www.unibet.eu/betting/sports/filter/football/matches
Using selenium driver, I access this link. This is what we have on the page
The actual task for me is to click on each of the match link. I found all those matches by
elems = driver.find_elements_by_class_name('eb700')
When i did this
for elem in elems:
elements
elem.click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
The first time it clicked, loaded new page, went to previous page and then gave the following error
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
I also tried getting HREF attribute from the elem, but it gave None, Is it possible to open the page in a new tab instead of clicking the elem?
You can retry to click on element once again since it is no longer present in DOM.
Code :
driver = webdriver.Chrome("C:\\Users\\**\\Inc\\Desktop\\Selenium+Python\\chromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.unibet.eu/betting/sports/filter/football/matches")
wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, "OK"))).click()
sleep(2)
elements = driver.find_elements(By.XPATH, "//div[contains(#class,'_')]/div[#data-test-name='accordionLevel1']")
element_len = len(elements)
print(element_len)
counter = 0
while counter < element_len:
attempts = 0
while attempts < 2:
try:
ActionChains(driver).move_to_element(elements[counter]).click().perform()
except:
pass
attempts = attempts + 1
sleep(2)
# driver.execute_script("window.history.go(-1)") #may be get team name
#using //div[#data-test-name='teamName'] xpath
sleep(2)
# driver.refresh()
sleep(2)
counter = counter + 1
Since you move to next page, the elements no longer exists in DOM. So, you will get Stale Element exception.
What you can do is when comming back to same page, get all the links again (elems) and use while loop instead of for loop.
elems = driver.find_elements_by_class_name('eb700')
i=0
while i<len(elems):
elems[i].click()
time.sleep(2)
driver.execute_script("window.history.go(-1)")
time.sleep(2)
elems = driver.find_elements_by_class_name('eb700')
i++
Other solution is to remain on same page and save all href attributes in a list and then use driver.navigate to open each match link.
matchLinks=[]
elems = driver.find_elements_by_class_name('eb700')
for elem in elems:
matchLinks.append(elem.get_attribute('href')
for match in matchLinks:
driver.get(match)
#do whatever you want to do on match page.
I'm trying to make a script where the program takes input of multiple URLs and then opens tabs for each of them, this is what I came up with
s=raw_input()
l=s.split()
t=len(l)
for elements in l:
elements = ["https://" + elements + "" for elements in l]
driver = webdriver.Chrome(r"C:/Users/mynam/Desktop/WB/chromedriver.exe")
driver.get("https://www.google.com")
for e in elements:
driver.implicitly_wait(3)
driver.execute_script("window.open(e,'new window')")
print "Opened in new tab"
I get an error of e not defined, how do we pass an argument to window.open in selenium
You need to execute new window, switch to it and open new page.
from selenium import webdriver
import os
def open_tab_page(page, page_number):
browser.execute_script("window.open('');")
browser.switch_to.window(browser.window_handles[page_number])
browser.get(page)
# initialise driver
chrome_driver = os.path.abspath(os.path.dirname(__file__)) + '/chromedriver'
browser = webdriver.Chrome(chrome_driver)
browser.get("http://stackoverflow.com/")
# list of pages to open
pages_list = ['https://www.google.com', 'https://www.youtube.com/']
page_number = 1
for page in pages_list:
open_tab_page(page, page_number)
page_number +=1
I'm scraping this website using Python and Selenium. I have the code working but it currently only scrapes the first page, I would like to iterate through all the pages and scrape them all but they handle pagination in a weird way how would I go through the pages and scrape them one by one?
Pagination HTML:
<div class="pagination">
First
Prev
1
<span class="current">2</span>
3
4
Next
Last
</div>
Scraper:
import re
import json
import requests
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
options = Options()
# options.add_argument('--headless')
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options,
executable_path=r'/Users/weaabduljamac/Downloads/chromedriver')
url = 'https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList'
driver.get(url)
def getData():
data = []
rows = driver.find_element_by_xpath('//*[#id="form1"]/table/tbody').find_elements_by_tag_name('tr')
for row in rows:
app_number = row.find_elements_by_tag_name('td')[1].text
address = row.find_elements_by_tag_name('td')[2].text
proposals = row.find_elements_by_tag_name('td')[3].text
status = row.find_elements_by_tag_name('td')[4].text
data.append({"CaseRef": app_number, "address": address, "proposals": proposals, "status": status})
print(data)
return data
def main():
all_data = []
select = Select(driver.find_element_by_xpath("//select[#class='formitem' and #id='selWeek']"))
list_options = select.options
for item in range(len(list_options)):
select = Select(driver.find_element_by_xpath("//select[#class='formitem' and #id='selWeek']"))
select.select_by_index(str(item))
driver.find_element_by_css_selector("input.formbutton#csbtnSearch").click()
all_data.extend( getData() )
driver.find_element_by_xpath('//*[#id="form1"]/div[3]/a[4]').click()
driver.get(url)
with open( 'wiltshire.json', 'w+' ) as f:
json.dump( all_data, f )
driver.quit()
if __name__ == "__main__":
main()
Before moving on to automating any scenario, always write down the manual steps you would perform to execute the scenario. Manual steps for what you want to (which I understand from the question) is -
1) Go to site - https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList
2) Select first week option
3) Click search
4) Get the data from every page
5) Load the url again
6) Select second week option
7) Click search
8) Get the data from every page
.. and so on.
You are having a loop to select different weeks but inside each loop iteration for the week option, you also need to include a loop to iterate over all the pages. Since you are not doing that, your code is returning only the data from the first page.
Another problem is with how you are locaing the 'Next' button -
driver.find_element_by_xpath('//*[#id="form1"]/div[3]/a[4]').click()
You are selecting the 4th <a> element which is ofcourse not robust because in different pages, the Next button's index will be different. Instead, use this better locator -
driver.find_element_by_xpath("//a[contains(text(),'Next')]").click()
Logic for creating loop which will iterate through pages -
First you will need the number of pages. I did that by locating the <a> immediately before the "Next" button. As per the screenshot below, it is clear that the text of this element will be equal to the number of pages -
-
I did that using following code -
number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text)
Now once you have number of pages as number_of_pages, you only need to click "Next" button number_of_pages - 1 times!
Final code for your main function-
def main():
all_data = []
select = Select(driver.find_element_by_xpath("//select[#class='formitem' and #id='selWeek']"))
list_options = select.options
for item in range(len(list_options)):
select = Select(driver.find_element_by_xpath("//select[#class='formitem' and #id='selWeek']"))
select.select_by_index(str(item))
driver.find_element_by_css_selector("input.formbutton#csbtnSearch").click()
number_of_pages = int(driver.find_element_by_xpath("//a[contains(text(),'Next')]/preceding-sibling::a[1]").text)
for j in range(number_of_pages - 1):
all_data.extend(getData())
driver.find_element_by_xpath("//a[contains(text(),'Next')]").click()
time.sleep(1)
driver.get(url)
with open( 'wiltshire.json', 'w+' ) as f:
json.dump( all_data, f )
driver.quit()
Following approach is simply worked for me.
driver.find_element_by_link_text("3").click()
driver.find_element_by_link_text("4").click()
....
driver.find_element_by_link_text("Next").click()
first get the total pages in the pagination, using
ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,1')
ins.find_element_by_class_name("pagination")
source = BeautifulSoup(ins.page_source)
div = source.find_all('div', {'class':'pagination'})
all_as = div[0].find_all('a')
total = 0
for i in range(len(all_as)):
if 'Next' in all_as[i].text:
total = all_as[i-1].text
break
Now just loop through the range
for i in range(total):
ins.get('https://services.wiltshire.gov.uk/PlanningGIS/LLPG/WeeklyList/10702380,{}'.format(count))
keep incrementing the count and get the source code for the page and then get the data for it.
Note: Don't forget the sleep when clicking on going form one page to another
For example a site
I need to script clicking "close" button on an appeared frame.
I'm already tried using xpath, css_selection still worthless.
Need to do stuff using headless-browser, like HtmlUnit
Because there is no "a"-tag.
from selenium import webdriver
from lxml import etree, html
url = "http://2gis.ru/moscow/search/%D1%81%D0%BF%D0%BE%D1%80%D1%82%D0%B8%D0%B2%D0%BD%D1%8B%D0%B5%20%D1%81%D0%B5%D0%BA%D1%86%D0%B8%D0%B8/center/37.437286%2C55.753395/tab/firms/zoom/11"
driver = webdriver.Firefox()
#driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)
driver.get(url)
content = (driver.page_source).encode('utf-8')
doc = html.fromstring(content)
elems = doc.xpath('//article[#data-module="miniCard"]')
elem = elems[0]
# get element id to click on
el1_id = elem.attrib['id']
#simulate click to open frame
el1_to_click = driver.find_element_by_xpath('//article[#id="{0}"]//\
a[contains(#class, "miniCard__headerTitle")]'.format(el1_id))
el1_to_click.click()
#some stuff
pass
#now need to close this
close = driver.find_element_by_xpath('//html/body/div[1]/div/div[1]/div[2]/div[2]/div[2]/div/div[2]/div/div[2]/div[3]/div[1]/div[2]/svg/use')
close.click()
But the last part isn't working (can't close frame).
How to do this ?
Try this. this should work.
from selenium import webdriver
from lxml import etree, html
url = "http://2gis.ru/moscow/search/%D1%81%D0%BF%D0%BE%D1%80%D1%82%D0%B8%D0%B2%D0%BD%D1%8B%D0%B5%20%D1%81%D0%B5%D0%BA%D1%86%D0%B8%D0%B8/center/37.437286%2C55.753395/tab/firms/zoom/11"
driver = webdriver.Firefox()
driver.implicitly_wait(10)
# driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)
driver.get(url)
content = (driver.page_source).encode('utf-8')
doc = html.fromstring(content)
elems = doc.xpath('//article[#data-module="miniCard"]')
elem = elems[0]
# get element id to click on
el1_id = elem.attrib['id']
# simulate click to open frame
el1_to_click = driver.find_element_by_xpath('//article[#id="{0}"]//\
a[contains(#class, "miniCard__headerTitle")]'.format(el1_id))
el1_to_click.click()
# some stuff
pass
# now need to close this
close = driver.find_element_by_xpath(
'//div[#class="frame _num_2 _pos_last _moverDir_left _active _show _state_visible _ready _cont_firmCard _mover"]/div/div/div[#class="frame__controlsButton _close"]')
close.click()
close = driver.find_element_by_xpath('//div[#data-module="frame"]/\
div[#class="frame__content"]/div[#class="frame__controls"]/\
div[contains(#class, "frame__controlsButton")\
and contains(#class,"_close")]')
That is the answer for any driver.set_window_size()
But need to find headless.