Why doesn't my webscrape code accept cookies - python

I'm a beginner in web scrapping and I've followed a few YouTube videos about how to do this, but regardless to what I try I can't have my code accept the cookies.
This is the code I have so far:
from selenium import webdriver
import time
driver = webdriver.Safari()
URL = "https://www.zoopla.co.uk/new-homes/property/london/?q=London&results_sort=newest_listings&search_source=new-homes&page_size=25&pn=1&view_type=list"
driver.get(URL)
time.sleep(2) # Wait a couple of seconds, so the website doesn't suspect you are a bot
try:
driver.switch_to_frame('gdpr-consent-notice') # This is the id of the frame
accept_cookies_button = driver.find_element_by_xpath('//*[#id="save"]')
accept_cookies_button.click()
except AttributeError: # If you have the latest version of selenium, the code above won't run because the "switch_to_frame" is deprecated
driver.switch_to.frame('gdpr-consent-notice') # This is the id of the frame
accept_cookies_button = driver.find_element_by_xpath('//*[#id="save"]')
accept_cookies_button.click()
except:
pass # If there is no cookies button, we won't find it, so we can pass

I don't have safari webdriver but chrome webdriver, but I think they works similar. On chrome you close the cookie banner with this code
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get(URL)
# wait no more than 20 seconds for the `iframe` with id `gdpr-consent-notice` to appear, then switch to it
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "gdpr-consent-notice")))
# click accept cookies button
driver.find_element(By.CSS_SELECTOR, '#save').click()

Related

Unable to Click to next page while crawling selenium

I'm trying to scrap the list of services we have for us from this site but not able to click to the next page.
This is what I've tried so far using selenium & bs4,
#attempt1
next_pg_btn = browser.find_elements(By.CLASS_NAME, 'ui-lib-pagination_item_nav')
next_pg_btn.click() # nothing happens
#attemp2
browser.find_element(By.XPATH, "//div[#role = 'button']").click() # nothing happens
#attempt3 - saw in some stackoverflow post that sometimes we need to scroll to the
#bottom of page to have the button clickable, so tried that
browser.execute_script("window.scrollTo(0,2500)")
browser.find_element(By.XPATH, "//div[#role = 'button']").click() # nothing happens
I'm not so experienced with scrapping, pls advice how to handle this and where I'm going wrong.
Thanks
Several issues with your code:
You tried wrong locators.
You probably need to wait for the element to be loaded before clicking it. But if before clicking the pagination you performing some actions on the page this is not needed since during you scraping the page content web elements are already got loaded.
Pagination button is on the buttom of the page, so you need to scroll the page to bring the pagination button into the visible screen.
After scrolling some delay should be added, as you can see in the code below.
Now pagination element can be clicked.
The following code works
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 10)
url = "https://www.tamm.abudhabi/en/life-events/individual/HousingProperties"
driver.get(url)
pagination = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "ui-lib-pagination__item_nav")))
pagination.location_once_scrolled_into_view
time.sleep(0.5)
pagination.click()

Changing country while scraping Aliexpress using python selenium

I'm working on a scraping project of Aliexpress, and I want to change the ship to country using selenium,for example change spain to Australia and click Save button and then scrap the page, I already found an answer it worked just I don't know how can I save it by clicking the button save using selenium, any help is highly appreciated. This is my code using for this task :
country_button = driver.find_element_by_class_name('ship-to')
country_button.click()
country_buttonn = driver.find_element_by_class_name('shipping-text')
country_buttonn.click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//li[#class='address-select-item ']//span[#class='shipping-text' and text()='Australia']"))).click()
Well, there are 2 pop-ups there you need to close first in order to access any other elements. Then You can select the desired shipment destination. I used WebDriverWait for all those commands to make the code stable. Also, I used scrolling to scroll the desired destination button before clicking on it and finally clicked the save button.
The code below works.
Just pay attention that after selecting a new destination pop-ups can appear again.
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
options = Options()
options.add_argument("--start-maximized")
s = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=s)
url = 'https://www.aliexpress.com/'
wait = WebDriverWait(driver, 10)
actions = ActionChains(driver)
driver.get(url)
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[contains(#style,'display: block')]//img[contains(#src,'TB1')]"))).click()
except:
pass
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//img[#class='_24EHh']"))).click()
except:
pass
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "ship-to"))).click()
wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "shipping-text"))).click()
ship_to_australia_element = driver.find_element(By.XPATH, "//li[#class='address-select-item ']//span[#class='shipping-text' and text()='Australia']")
actions.move_to_element(ship_to_australia_element).perform()
time.sleep(0.5)
ship_to_australia_element.click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[#data-role='save']"))).click()
I mostly used XPath locators here. CSS Selectors could be used as well

Unable to locate element: {"method":"xpath","selector":"/html/body/div/div[2]/div[3]/div[2]/button"}

So im Trying to make a discord bot that sends news from selected news services, and Ive got it working on every website, except Bild.de. They have got a banner you have to accept before accesing the website, and I cant get past that. Like I said, I had no problems on any other website, but this one.
My code (python):
import time
from selenium import webdriver
# selenium part
url = 'https://www.bild.de/home/newsticker/news/alle-news-54190636.bild.html'
browser = webdriver.Chrome()
browser.get(url)
time.sleep(5)
#trying to accept cookie banner
browser.find_element_by_xpath(
"/html/body/div/div[2]/div[3]/div[2]/button").click()
Error Message
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div/div[2]/div[3]/div[2]/button"}
Things to be noted down here.
Cookies button is in iframe, so first we have to switch to iframe in Selenium.
I am using execute script to click on it.
Remember to switch back to default content when you are done with the iframe.
Code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.bild.de/home/newsticker/news/alle-news-54190636.bild.html")
wait = WebDriverWait(driver, 20)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[title='SP Consent Message']")))
button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.message-component.message-row.mobile-reverse>div:nth-child(2)>button")))
driver.execute_script("arguments[0].click();", button)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium is normal during debugging, but the run click fails

I've searched several solutions but it didn't work.
That's my code
driver = webdriver.PhantomJS()
driver.get('https://baijia.baidu.com')
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.ID, 'getMoreArticles'))).click()
content = driver.page_source
page = open('test.html','wb')
page.write(content)
I've tried to debug the code, It successfully returns the clicked page.
when I run the code, it also returns successfully, however it don't returns the clicked page, just the source page.
I tried to search the solutions, take the page down to the bottom:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)",element)
But it's the same result, only debug successfully.
Thanks
It seems, that your button initiate an AJAX request. Driver doesn't wait it finished, because there is no page reload. So you should add explicit wait. Something like that:
expected_number_of_articles = 10 # enter your number
article_locator = (By.CSS_SELECTOR, 'div#article') # enter your locator
wait.until(lambda driver: len(driver.find_elements(*article_locator)) >= expected_number_of_articles)
Before accessing page source wait for a small interval to wait for page load
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
driver = webdriver.Firefox()
driver.get('https://baijia.baidu.com')
wait = WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.ID, 'getMoreArticles'))).click()
time.sleep(4)
content = driver.page_source
page = open('test3.html','w')
page.write(content)

How to kick out "ad" popping up when browser opens?

I've written a script in python in combination with selenium to parse some company names from a webpage. The selector I've defined are flawless. However, as soon as the webpage opens up an annoying ad pops up hiding the data and for that I can't reach there. How can i kick it out and parse the data I would like to. I've tried with switching several iframes available in that webpage but none of them worked. The existing one which I used within my script throws an error showing Message: no such element: Unable to locate element.
This is what i tried so far:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.inc.com/inc5000/list/2017")
driver.switch_to_frame(driver.find_element_by_id("jw_player_iconic"))
for item in driver.find_elements_by_css_selector("#data-container .row"):
company = item.find_elements_by_css_selector(".company a")[0].text
print(company)
driver.quit()
The pop-up ad is something like below (the bright one):
You can try to wait for ad to close it by clicking "SKIP" button:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.inc.com/inc5000/list/2017")
driver.maximize_window()
try:
ad_iframe_close = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, "//span[.='SKIP']")))
ad_iframe_close.click()
except TimeoutException:
pass
for item in driver.find_elements_by_css_selector("#data-container .row"):
company = item.find_elements_by_css_selector(".company a")[0].text
print(company)
This should allow you to wait to close ad frame or do nothing in case it didn't appear whithin 3 seconds

Categories