Fill forms using selenium or requests - python

I'm trying to enter this site to retrieve my bank account, first I tried with selenium, but only filled username (maybe because it has 2 forms):
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("https://www.santandertotta.pt/pt_PT/Particulares.html")
user = driver.find_element_by_name("usr")
user.send_keys("user")
pas = driver.find_element_by_name("claveConsultiva")
pas.send_keys("password")
login = driver.find_element_by_id("login_button").click()
Then, I gone rambo mode :) trying figured out why I can't fill password space, and what are the hidden values of the form using requests, this is the code:
url = "https://www.particulares.santandertotta.pt/pagina/indice/0,,276_1_2,00.html"
user_agent = {"user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/..."}
session = requests.session()
r = session.get(url)
soup = BeautifulSoup(r.text, "html.parser")
data = {t['name']:t.get('value') for t in soup.find_all('input', attrs={'type': 'hidden'})}
print(data)
But just received an empty dict. What is the best approach for enter a site with login and scrape?

You cannot get access to Password field because it's not present on main page. To handle Password field you have to click Login button to get to Login page. Also you need to switch to iframe which contains authentication form
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://www.santandertotta.pt/pt_PT/Particulares.html")
driver.find_element_by_xpath("//input[#title='Login de Particulares']").click()
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it("ws"))
user = driver.find_element_by_name("identificacionUsuario")
user.send_keys("user")
pas = driver.find_element_by_name("claveConsultiva")
pas.send_keys("password")
pas.submit()

Once you access the url https://www.santandertotta.pt/pt_PT/Particulares.html first you have to click on the element with text as Login then only the the Nome and Password field appears but to access those fileds you have to switch to the frame with id as ws inducing WebDriverWait. Next to locate the element of Nome you have to induce WebDriverWait again as follows :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver=webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.santandertotta.pt/pt_PT/Particulares.html")
driver.find_element_by_xpath("//input[#class='ttAH_button03']").click()
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID, "ws")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[#class='inputlong' and #id='identificacionUsuario']"))).send_keys("your_name")
driver.find_element_by_xpath("//input[#id='claveConsultiva' and #name='claveConsultiva']").send_keys("your_password")
driver.find_element_by_link_text("Entrar no NetBanco Particulares").click()
Here you can find a relevant discussion on Ways to deal with #document under iframe

Related

Pyhton for loop "stale element reference: element is not attached to the page document"

python for loop "stale element reference: element is not attached to the page document"
here is my code
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('F:\\work\\chromedriver_win32\\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 10)
Content =["listtext1","listtext2","listtext3","listtext4"]
for i in range(4):
time.sleep(7)
url = "https://quillbot.com/"
driver.get(url)
Text_block = driver.find_element(By.ID, "inputText")
Text_block.send_keys(Content[i])# Change (fetch from Search_list)
time.sleep(2)
I made a few fixes to your code:
Inserted user_agent (it will come in handy with other experiments on selenium)
inserted a web driver manager to run selenium on all operating systems.
You have to accept cookies before you can interact with the page.
removed unnecessary sleep.
This is the result, code tested:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
options = Options()
options.add_argument("start-maximized")
# add user_agent
user_agent = "user-agent=[Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36]"
options.add_argument(user_agent)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options) # to use over all systems
browser_delay = 2 # set if based on your connection and device speed
Content =["listtext1","listtext2","listtext3","listtext4"]
for i in range(len(Content)):
url = "https://quillbot.com/"
driver.get(url)
try:
cookie_btn = WebDriverWait(driver, browser_delay).until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler')))
cookie_btn.click()
except TimeoutException:
pass # it's a timeout or element just clicked
Text_block = driver.find_element(By.ID, "inputText")
Text_block.send_keys(Content[i]) # Change (fetch from Search_list)
Here is one way of sending those texts into that textbox, based on your existing code and how you defined waits:
[....]
content =["listtext1","listtext2","listtext3","listtext4"]
for i in content:
driver.get('https://quillbot.com/')
try:
wait.until(EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))).click()
print('accepted cookies')
except Exception as e:
print('no cookie button!')
text_block = wait.until(EC.element_to_be_clickable((By.ID, "inputText")))
text_block.send_keys(i)
print('sent', i)
t.sleep(5)
See Selenium documentation at https://www.selenium.dev/documentation/

Python Selenium: How to avoid being detected/ blocked?

The error messageI try to extract data from the below website. But when the selenium click the "search" button (the last step of the code), error was returned, it seems blocked by the server (It is totally alright when I access the website manually. But when I use automated Chrome browser, the attached error message was returned when I click the "search" button). How should I get around this?
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time
ser = Service(r"C:\Users\shekc\Documents\chromedriver.exe")
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("–Referer=https://www.dahsing.com/jsp/fundPlatform/index_e.jsp")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36")
driver = webdriver.Chrome(options=options, service=ser)
url = "https://www.dahsing.com/jsp/fundPlatform/risk_warning_e.jsp"
driver.get(url)
time.sleep(3)
# click "Agree"
driver.find_element(By.LINK_TEXT,"Agree").click()
driver.switch_to.default_content()
driver.switch_to.frame(1)
# return the # Fund house
from selenium.webdriver.support.ui import Select
Select =Select(driver.find_element(By.XPATH,'//*[#id="mainContent_ddlFundHouse"]'))
FH_No=len(Select.options)
# select " all per page"
from selenium.webdriver.support.ui import Select
Select =Select(driver.find_element(By.XPATH,'//*[#id="mainContent_ddlPageNumber"]'))
Select.select_by_index(len(Select.options)-1)
Select =Select(driver.find_element(By.XPATH,'//*[#id="mainContent_ddlFundHouse"]'))
Select.select_by_index(1)
FH_name=Select.first_selected_option.text
# click "Search"
driver.find_element(By.LINK_TEXT,"Search").click()

Selenium Error Website Access Denied Webdriver WebdriverWait

I get this weird access denied message when I try to login to the "offspring.co.uk" website. This denial message pops up right after clicking the login button. I heard something about the Akamai Bot-Protection on this website. Maybe this protection detects my automation. Does anyone know how to prevent this website access denial?
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver import ActionChains
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
def call_Website():
# configurations
profile = webdriver.FirefoxProfile()
profile.accept_untrusted_certs = True
profile.set_preference("general.useragent.override","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:92.0) Gecko/20100101 Firefox/92.0")
firefox_capabilities = webdriver.DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
# start webdriver etc
browser = webdriver.Firefox(firefox_profile=profile, desired_capabilities=firefox_capabilities)
wait = WebDriverWait(browser,20)
action = ActionChains(browser)
###########checkig if proxy works, old snippet
try:
browser.get("https://httpbin.org/ip")
except:
browser.close()
print("proxy was not working")
##############################################
time.sleep(2)
browser.get('https://www.offspring.co.uk/view/secured/content/login')
time.sleep(2)
# accept cookie
browser.find_element_by_css_selector("#onetrust-accept-btn-handler").click()
time.sleep(1)
#choose currency
browser.find_element_by_css_selector("li.EUR:nth-child(2)").click()
#fills out username
username_form = browser.find_element_by_css_selector('#user')
action.move_to_element(username_form).pause(1).click().pause(0.5).send_keys('username')
#fills out password
password_form = browser.find_element_by_css_selector('#loginpassword')
action.pause(2).move_to_element(password_form).pause(1).click().pause(0.5).send_keys('password')
#clicks on login
Login_Btn = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,'#loginButton')))
action.move_to_element(Login_Btn).pause(1).click().perform()
if __name__ == "__main__":
call_Website()
And here is the "Access Denied"-Page.

How to click on an element and parse text from linked xml file (python)?

I would like to parse addresses from the following website: https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/
So far I am able to go to the website and click away any pop-ups. But then I need to select the drop-down menu with "1163 STANDORTE" which I am not able to locate with my code.
My code so far:
import pandas as pd
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import time
import itertools
import os
import numpy as np
import csv
import pdb
os.chdir("Directory")
options = webdriver.ChromeOptions()
options.add_argument("--incognito")
driver = webdriver.Chrome('Directory/chromedriver.exe')
driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")
time.sleep(1)
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#class='close-icon']"))).click() # if there is smth to click away
except:
pass
time.sleep(4)
Then my attempts using the span and button element and several options of navigation:
#Version 1
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#class='sc-hKFxyN jdMjfs']"))).click()
#Version 2
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].scrollIntoView();", element)
driver.execute_script("arguments[0].click();", element)
# Version 3
element = driver.find_element_by_class_name('sc-eCApnc kiXUNl sc-jSFjdj lcZmPE')
driver.execute_script("arguments[0].click();", element)
#Version 4
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#class='sc-eCApnc kiXUNl sc-jSFjdj lcZmPE']"))).click()
# Version 5
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[2]/div/main/nav/header/button[1]"))).click()
# Version 6
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='1163 STANDORTE']"))).click()
Actually, there are three problems:
If I just open the link on my Chrome manually, "1163 STANDORTE" appears, whereas if I open the link on Chrome using python, fewer STANDORTE appear, but I cannot zoom out. So I crucially need ALL 1163 STANDORTE to appear.
I cannot locate the button using class and XPATH.
Behind the button is a probably linked XML file, and the information of the addresses only appears after having clicked on the button. In the end I want to scrape text, written on the XML file linked to that button.
Any suggestions?
My question is similar to these previous questions: How to parse several attributes of website with same class name in python? and to Selenium-Debugging: Element is not clickable at point (X,Y)
Few points :
Launch browser in full screen mode.
Use explicit waits.
Use this xpath //span[contains(#aria-label, 'Standorte anzeigen')]/..
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
driver.get("https://filialen.migros.ch/de/center:46.8202,6.9575/zoom:8/")
try:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#class='close-icon']"))).click() # if there is smth to click away
except:
pass
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[contains(#aria-label, 'Standorte anzeigen')]/.."))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PS : Please check in the dev tools (Google chrome) if we have unique entry in HTML DOM or not.
Steps to check:
Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath and see, if your desired element is getting highlighted with 1/1 matching node.
The data you are looking for is based of fetch / xhr call.
You can get it without scraping. See below.
import requests
headers = {'Origin': 'https://filialen.migros.ch',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'}
r = requests.get(
'https://web-api.migros.ch/widgets/stores?key=loh7Diephiengaiv&aggregation_options[empty_buckets]=true&filters[markets][0][0]=super&filters[markets][0][1]=mno&filters[markets][0][2]=voi&filters[markets][0][3]=mp&filters[markets][0][4]=out&filters[markets][0][5]=spx&filters[markets][0][6]=doi&filters[markets][0][7]=mec&filters[markets][0][8]=mica&filters[markets][0][9]=res&filters[markets][0][10]=flori&filters[markets][0][11]=gour&filters[markets][0][12]=alna&filters[markets][0][13]=cof&filters[markets][0][14]=chng&verbosity=store&offset=0&limit=5000',
headers=headers)
if r.status_code == 200:
print('stores data below:')
data = r.json()
print(data)
else:
print(f'Oops. Statud code is {r.status_code}')

Python | PhantomJS not clicking on element

I have been trying to solve this for an entire week now and this is my last shot at this (asking stackoverflow).
I use phantomjs with selenium to go to the login page of YouTube and fill in the credentials and log in.
I get to the login page and it manages to fill in the email, but no matter what I try, it won't click on the "next" button.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["-phantomjs.page.settings.userAgent-"] = (
"-Mozilla-5.0 (Windows NT 6.3; WOW64) AppleWebKit-537.36 (KHTML, like Gecko) Chrome-34.0.1847.137 Safari-537.36-"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.set_window_size(1920,1080)
driver.get("https://youtube.com")
driver.find_element_by_class_name("yt-uix-button-content").click()
print("Logging in...")
driver.find_element_by_id("identifierId").send_keys("email")
time.sleep(1)
driver.find_element_by_class_name("ZFr60d").click()
driver.save_screenshot('testing4.png')
Now I have tried all these
driver.find_element_by_xpath("""//*[#id="identifierNext"]/content/span""").click()
driver.find_element_by_css_selector("#identifierNext>content>span").click()
webdriver.ActionChains(driver).move_to_element(element).click(element).perform()
driver.find_element_by_id("identifierNext").click()
and none of these works. I tried the javascript command aswell.
I would also like to add that clicking on the element works perfectly fine with selenium without PhantomJS.
I would really appreciate it if anyone here could help me.
EDIT:
This info might be helpful. After clicking "Next", it takes about a second to get to the password part. It's a sliding animation.
This question has yet not been answered.
Here is the Answer to your Question:
A couple of words:
The locator you have used to identify the Sign in button is not unique. Consider constructing an unique xpath for the Sign in button.
The locator you have used to identify the Email or phone also needs to be modified a bit.
You can consider to use the locator id to identify and click on the Next button.
Here is the code block which does the same and prints out Clicked on Next Button on the console.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["-phantomjs.page.settings.userAgent-"] = (
"-Mozilla-5.0 (Windows NT 6.3; WOW64) AppleWebKit-537.36 (KHTML, like Gecko) Chrome-34.0.1847.137 Safari-537.36-"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap, executable_path="C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe")
driver.get("https://youtube.com")
driver.find_element_by_xpath("//button[#class='yt-uix-button yt-uix-button-size-default yt-uix-button-primary']/span[#class='yt-uix-button-content']").click()
print("Logging in...")
email_phone = driver.find_element_by_xpath("//input[#id='identifierId']")
email_phone.send_keys("debanjanb")
driver.find_element_by_id("identifierNext").click()
print("Clicked on Next Button")
Let me know if this Answers your Query.

Categories