I have managed to log into a website using webdriver. Now that I am logged in, I would like to navigate to a new URL on the same site using driver.get(). However, often (not all the time) in doing so I am logged out of the website. I have tried to duplicate the cookies after navigating to the new url, however, I still get the same problem. I am unsure if this method should work / if I am doing it correctly.
cookies = driver.get_cookies()
driver.get(link)
timer(time_limit)
for i in cookies:
driver.add_cookie(i)
How can I navigate to a different part of the website (without clicking links on the screen) whilst maintaining my log-in session?
I just had to refresh the page after adding the cookies: driver.refresh()
Related
I am using selenium to automate some test on many websites. Everytime I get the cookie wall popup.
I know I can search for the xpath of the Accept cookie button and then click on it with selenium. This solution is not convenient for me because I need to search for the button manually. I want a script that accepts cookie for all sites automatically.
What I tried to do is get a cookie jar by making a request to the website with python requests and then set the cookie in Selenium ==> Not working with many error
I found this on stackoverflow :
fp = webdriver.FirefoxProfile()
fp.set_preference("network.cookie.cookieBehavior", 2)
driver = webdriver.Firefox(firefox_profile=fp, executable_path="./geckodriver")
This worked for google.com (no accept cookie popup appeared) but it failed with facebook.com and instagram.com
I am facing an issue.
I navigate on the page via Selenium Chrome. I have timeouts and WebDriverWait as I need a full page to get JSON out of it.
Then I click the navigation button with
driver.execute_script("arguments[0].click();", element)
as normal click never worked.
And it is navigating OK, I see Selenium is surfing normally. No problem.
But the driver.page_source remains for the first page that I got via 'get' method
All timeouts are the same as for the first page. And I see those new pages normally, but the page_source never updates.
What am I doing wrong?
After navigating to the new Page, you need to get the current URL by:
url = driver.current_url()
and then:
driver.get(url)
driver.getPageSource()
I am using selenium for a crawling project, but I struggle with a specific webpage (both chrome and firefox).
I found 2 workarounds that work to an extend but I want to know why this issue happens and how to avoid it.
1) Opening chrome manually and then opening selenium with my user profile.
If i manually start chrome and then run:
from selenium import webdriver
options.add_argument(r"user-data-dir=C:\Users\User\AppData\Local\Google\Chrome\User Data")
driver = webdriver.Chrome(options=options)
the page loads as intended
2) Passing a variable in the request
by appending /?anything to the url the page loads as intended in selenium
For some reason the webpage has a function in the header despite not loading... I suspect this could be a clue but I do not know enough to determine the cause.
I am currently working on real-estate data and wanted to scrape some data from StreetEasy, which is the Register to see what it closed for about 2 months ago below listed price.
Example url
http://streeteasy.com/sale/1220187
The data I need requires login but the login mechanism is pretty different. There is no login page and the login is a pop-up. Is there anyway I can use Python to get the authentication and accesss the page after login like the image below?
With Selenium and PhantomJS, you get a powerful combination when it comes to scraping data.
from selenium import webdriver
host = "http://streeteasy.com/sale/1220187"
driver = webdriver.PhantomJS()
# Set the "window" wide enough so PhantomJS can "see" the right panel
driver.set_window_size(1280, 800)
driver.get(host)
driver.find_element_by_link_text("Register to see what it closed for").click()
driver.save_screenshot("output.jpg")
What you see is a small snippet of how Selenium can get you to the webpage login (verified via the JPG screencap). From there, it's a matter of toggling the login box, providing the credentials and click()ing your way in.
Oh, and be mindful of the TOS. Good luck!
I am currently automating a website and have a test which checks the functionality of the Remember Me option.
My test script logs in, entering a valid username and password and checks the Remember Me checkbox before clicking Log In button.
To test this functionality I save the cookies to file using pickle, close the browser and then reopen the browser (loading the cookies file).
def closeWebsite(self, saveCookies=False):
if saveCookies:
pickle.dump(self.driver.get_cookies(), open('cookies.pkl', 'wb'))
self.driver.close()
def openWebsite(self, loadCookies=False):
desired_caps = {}
desired_caps['browserName'] = 'firefox'
profile = webdriver.FirefoxProfile(firefoxProfile)
self.driver = webdriver.Firefox(profile)
self.driver.get(appUrl)
if loadCookies:
for cookie in pickle.load(open('cookies.pkl', 'rb')):
self.driver.add_cookie(cookie)
However, when I do this, the new browser is not logged in. I understand that everytime you call the open the browser a new Session is created and this session ID can be obtained using driver.session_id
Is it possible, in the openWebsite method to load a driver and specify the sessionID?
When I test this manually, the Remember Me option works as expected. I'm just having trouble understanding how Selenium handles this case.
For starters you're loading the page before adding the cookies. Although there is the potential for them to arrive before the page needs / queries them, this isn't correct let alone reliable.
Yet, if you try to set the cookies before any page has loaded you will get an error.
The solution seems to be this:
First of all, you need to be on the domain that the cookie will be valid for. If you are trying to preset cookies before you start interacting with
a site and your homepage is large / takes a while to load an
alternative is to find a smaller page on the site, [...]
In other words:
Navigate to your home page, or a small entry page on the same domain as appUrl (no need to wait until fully loaded).
Add your cookies.
Load appUrl. From then on you should be fine.