I'm running a simple scrape code to scrape a few lines from a website. The problem is that Python always opens a new Chrome window and logs in again every time I run the bot.
So, I read online and created a Chrome profile. Now it opens the Chrome profile, but it still asks me to log in to the website. I guess I need to save cookies. I tried some YouTube tutorials, but I couldn't figure out how to save the cookies. I'm a complete noob, so can anyone explain me how to do so?
This is my code:
options = Options()
options.add_argument("user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 2")
driver = webdriver.Chrome(executable_path=r'C:\Program Files (x86)\chromedriver.exe', chrome_options=options)
driver.get("https://websitetologin.com")
search = driver.find_element_by_name("fm-login-id")
search.send_keys("loginemail")
search.send_keys(Keys.RETURN)
time.sleep(3)
search = driver.find_element_by_name("fm-login-password")
search.send_keys("loginpassword")
search.send_keys(Keys.RETURN)
time.sleep(3)
search = driver.find_element_by_class_name("fm-button")
search.send_keys(Keys.RETURN)
time.sleep(3)
You can use the chrome options as well user-data-dir=selenium
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=selenium")
driver = webdriver.Chrome(options =options)
it would save cookies for current session, that can be used later for profiles and folders.
You can refer here for more
or
driver.get('http://google.com')
for cookie in cookies:
driver.add_cookie(cookie)
Related
I am building a dynamic scraper with selenium and flask that can take in any URL and scrape for cookies and other details. Now I want to check if the URL has any cookie consent popup. I am unable to make this feature dynamic.
I have tried PARTIAL_LINK_TEXT, it works only for some website
url="https://www.spitzer-silo.com/"
desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument("--ignore-certificate-errors")
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options, desired_capabilities=desired_capabilities)
driver.get(url)
myDiv = driver.find_element(By.PARTIAL_LINK_TEXT, 'Cookie')
https://www.spitzer-silo.com/ works
https://www.siemens.com/de/de.html doesn't work
Also, I am searching with the "Cookie" keyword, which may not be present on some websites
another approach, I tried using a window handle but it shows only one window
url="https://www.siemens.com/de/de.html"
desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
# Create the webdriver object and pass the arguments
options = webdriver.ChromeOptions()
# Chrome will start in Headless mode
options.add_argument('headless')
# Ignores any certificate errors if there is any
options.add_argument("--ignore-certificate-errors")
# Startup the chrome webdriver with executable path and
# pass the chrome options and desired capabilities as
# parameters.
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options, desired_capabilities=desired_capabilities)
# Send a request to the website and let it load
driver.get(url)
time.sleep(30)
whandle = driver.window_handles
['CDwindow-E9E6A9B1021BBA75132EF9DCA40A2824']
Is there any way I could check if there is a popup on the website and then check if the popup has a text cookie on it
I appreciate all the help I can get.
Im using Selenium to scrape some data from a website I signed up to, now every time I run the program it opens a new chrome browser and login to my account and eventually I runed into Captcha, how can I make it that it will open the same browser session with my account already logged in?
right now this is what I use:
PATH ="C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("website example")
Thanks!
Create a Guest user profile in your chrome driver.
Add this parameter to your chrome driver instance:
user-data-dir={UserProfilePath}
For me UserProfilePath is - C:\\Users\\My_Username\\AppData\\Local\\Google\\Chrome\\User Data\\Guest Profile
I tried to login and scrape a few details from my Instagram page in Python. I wanna do it in the headless mode because I'm going to deploy it in Heroku. So when I try to login using this code in the headless Chrome driver, the Instagram login page is not fetched. I have provided the screenshot also.
def login_insta(driver,username,password):
driver.get("https://www.instagram.com/accounts/login")
time.sleep(5)
driver.save_screenshot('scrnsh.png')
driver.find_element_by_xpath(
"//input[#name='username']").send_keys(username)
driver.find_element_by_xpath(
"//input[#name='password']").send_keys(password)
driver.find_element_by_xpath("//button/div[text()='Log In']").click()
print("Logged in")
options = Options()
PATH = r"C:\Users\pcname\Downloads\chromedriver"
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(executable_path=PATH, chrome_options=options)
login_insta(driver,"name","pass")
Screenshot
The screenshot said "Error Please wait a few minutes before you try again"
This error doesn't occur with the headlesss Firefox driver, I don't how to add Firefox buildpacks in Heroku. I have recent Chrome driver version. Please help me solve this issue.
Or if you can suggest buildpacks for Firefox for Heroku, and the steps to add them, it would be very helpful.
Thank you!
I would like to log into a website and download a file. I'm using selenium and the chromedriver. Would like to know if there is a better way. It currently opens up a chrome browser window and sends the info. I don't want to see the browser window opened up and the data being sent. Just want to send it and return the data into a variable.
from selenium import webdriver
driver = webdriver.Chrome()
def site_login(URL,ID_username,ID_password,ID_submit,name,pas):
driver.get(URL)
driver.find_element_by_id(ID_username).send_keys(name)
driver.find_element_by_id(ID_password).send_keys(pas)
driver.find_element_by_id(ID_submit).click()
URL = "www.mywebsite.com/login"
ID_username = "name"
ID_password = "password"
ID_submit = "submit"
name = "myemail#mail.com"
pas = "mypassword"
resp=site_login(URL,ID_username,ID_password,ID_submit,name,pas)
You can run chrome in headless mode. In which case, the chrome UI won't show up and still performing the task you were doing. Some article I found on this https://intoli.com/blog/running-selenium-with-headless-chrome/. Hope this helps.
First option: If you are able to change the driver, you can use phantom-js as driver. That was a headless browser and you can use it with selenium.
Second option: If the site are not dynamic (easily called it SPA) or you are able to trace packet (which can be done in chrome dev tools), you can directly use request with the help of beautifulsoup if you need to get some data on the page.
Just add this two lines
chrome_options = Options()
chrome_options.add_argument("--headless")
This should make chrome run in the background.
As a regular task within my job role i often have to download full Facebook accounts from within a users account. I am trying to improve my workflow and automate this if i can.
I have tried searching for this topic on the site and although many cover the login part i am yet to locate a question that deals with the popup windows of Facebook. If i am wrong i apologise and please amend the post accordingly.
As a starting point i have decided to start learning python and am using this to script the process with a little help from selenium and Chrome Driver. I have managed to write the code to login and navigate to the correct page and click the initial link 'download a copy'. I am struggling however to get the script to locate and click the 'Start My Archive' button within the popup window.
Here is the code that i have used so far including the several alternative code blocks that i have tried commented out at the bottom:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
#Global Variables
target = "john.doe.7"
username = "john#doe.com"
password = "Password"
sleep = 5
#Finds and locates the ChromeDriver
driver = webdriver.Chrome("C:\Python35-32\ChromeDriver\chromedriver.exe")
#Set Chrome Options
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
#Directs browser to webpage
driver.get('http://www.facebook.com');
#code block to log in user
def logmein():
search_box = driver.find_element_by_name('email')
search_box.send_keys(username)
search_box = driver.find_element_by_name('pass')
search_box.send_keys(password)
search_box.submit()
#code to go to specific URL
def GoToURL(URL):
time.sleep(sleep) # Let the user actually see something!
driver.get(URL);
logmein()
GoToURL('https://www.facebook.com/'+'settings')
link = driver.find_element_by_link_text('Download a copy')
link.click()
#driver.find_element_by.xpath("//button[contains(., 'Start My Archive')]").click()
#driver.find_element_by_css_selector('button._42ft._42fu.selected._42gz._42gy').click()
#driver.find_element_by_xpath("//*[contains(text(), 'Start My Archive')]").click()
#driver.find_element_by_css_selector('button._42ft._42fu.layerConfirm.uiOverlayButton.selected._42g-._42gy').click()
#from selenium.webdriver.common.action_chains.ActionChains import driver
#buttons = driver.find_elements_by_xpath("//title[contains(text(),'Start My Archive')]")
#actions = ActionChains(self.driver)
#time.sleep(2)
#actions.click(button)
#actions.perform()
Just add this to your code and run if it is working or not. Always use CssSelector.
Look i ran the below code in eclipse with java and I'm not aware of python so if something is wrong in syntax , i apologies. Just Try this and see.
driver.implicitly_wait(10)
Startmyarchive = driver.find_element_by_css_selector("._42ft._42fu.selected._42gz._42gy")
Startmyarchive.click()
driver.implicitly_wait(10)
Acknowledge = driver.find_element_by_css_selector("._42ft._42fu.layerConfirm.uiOverlayButton.selected._42g-._42gy")
Acknowledge.click()
driver.implicitly_wait(10)
ClickOkay = driver.find_element_by_css_selector("._42ft._42fu.layerCancel.uiOverlayButton.selected._42g-._42gy")
ClickOkay.click()
Happy Learning :-) Do reply back for any query.