I have run into a problem when running selenium with python bindings.
I have a rest web service that i would like to call from selenium.firefox web driver with a pre-created session cookie. (the cookie is previously created by a python.requests request, i am just passing it to selenium)
In order to be able to add cookie to a specific domain, i am running a dummy request to that domain first, and then setting the cookie for the second, real request: (if i don't do that, add_cookie throws error)
driver = webdriver.Firefox()
driver.get("http://url.com/preheat")
#sleep(10)
driver.add_cookie(cookie_dict=persistedCookie)
driver.get("http://url.com/realrequest")
The problem is, when i run the code above, the web framework cannot see any cookies set. If i uncomment the sleep and waiting 10 seconds after the first request and then setting the cookie, everything works as desired.
(I was trying to apply WebDriverWait for an element in the result document of the first request, but experienced the same)
Is it an expected behaviour? If yes, could anyone recommend me a "deterministic" way of doing this?
Thanks,
Marcell
Related
I am a lot confused about cookies.
I am trying to work out scraping with many post/get requests chained
And I notice that every step needs 'Cookie' evaluated in the headers dictionary, because passing the right dictionary is the only way I don't get any access errors.
However, I look always at my cookie jar (via .cookies method) as it was at the previous step, but I cannot find what I need for the current step. And I know that by inspecting the network data in my browser.
So how shall I build up step by step a chain like 1) login, 2) botton interaction for changing dates, 3) file downloading?
My fault is that I am using requests instead of selenium?
I already use requests.Session().... but this means that I don't need to show up Cookie field when sending headers? Either way (showing or not showing Cookie in headers) I get server access error AFTER having correctly logged in...
Thanks,
David
If you cant access your cookies probably the login credentials are stored in an HTTP Only cookies. This is a secure place to store them to prevent CSS attacks.
You should try using a requests session to send this cookies along with your future requests.
Used Selenium in python3 to open a page. It does not open under selenium but it does open under firefox private page.
What is the difference and how to fix it?
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get('https://google.com') # creating a google cookie
driver.get_cookies() # check google gets cookies
sleep(3.0)
url='https://www.realestate.com.au/buy/in-sydney+cbd%2c+nsw/list-1'
driver.get(url)
Creating a google cookie is not necessary. It is not there under firefox private page either but it works without it. However, under Selenium the behavior is different.
I also see the website returns [HTTP/2 429 Too Many Requests 173ms] status and the page is blank white. It does not happen in firefox private mode.
UPDATE:
I turned on the persistent log. Firefox on private mode will receive a 429 response too but it seems the javascript will resume from another url. It only happens for the first time.
On selenium however, the request does not survive the 429 response. It does report something to cdndex website. I have blocked that website so you o not see the request go through there. This is still a different behavior between firefox and selenium.
Selenium with persistent log:
Firefox with persistent log:
This is just my huch after working with selenium and webdriver for a while; I suspect that it is due to the default user agent of selenium being set to something lame by default and that the server side recognizes this and provides you with a silly HTTP code and a blank page as a result.
Try setting the user agent to something reasonable and/or disable selenium's interfering with defaults.
Another tips is to look at the request using wireshark or similar to see exactly what is sent over the wire.
429 Too Many Requests
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests within a short period of time. The 429 status code is intended for use with rate-limiting schemes.
Root Cause
When your server detects that a user agent is trying to access a specific page too often in a short period of time, it triggers a rate-limiting feature. The most common example of this is when a user (or an attacker) repeatedly tries to log into a web application.
The server can also identify a bot with cookies, rather than by their login credentials. Requests may also be counted on a per-request basis, across your server, or across several servers. So there are a variety of situations that can result in you seeing an error like one of these:
429 Too Many Requests
429 Error
HTTP 429
Error 429 (Too Many Requests)
This usecase
This usecase seems to be a classical case of Selenium driven GeckoDriver initiated firefox Browsing Context getting detected as a bot due to the fact:
Selenium identifies itself
References
You can find a couple of relevant detailed discussions in:
How to Conceal WebDriver in Geckodriver from BotD in Java?
How can I make a Selenium script undetectable using GeckoDriver and Firefox through Python?
I'm not understanding why the python requests library isn't pulling in all cookies. For examples, I am running this code
import requests
a_session = requests.Session()
a_session.get('https://google.com/')
session_cookies = a_session.cookies
cookies_dictionary = session_cookies.get_dict()
print(cookies_dictionary)
But I only get the cookie "1P_JAR" even though there should be several cookies.
list of cookies shown up on inspector pannel
Ultimately I'm trying to figure out why its choosing only that 1 cookie and not the others because I'm trying to build my own application that generates a cookie but when I run this script on my application I get back and empty list even though the inspector shows that I have generated a cookie.
A cookie is set by a server response to a specific request.
Your basic google.com request only sets that cookie, which you can observe by the set-cookie header.
The other cookies are probably set by other requests or even the js code. Requests doesn't evaluate or run js and thus doesn't make any other requests.
If you don't want to completely reverse engeneer every single cookie, the way to go would be to simulate a browser by using Selenium + Chrome Driver or a similar solution.
I am running automation with Selenium and Python on Opera web driver, when I enter the specific page that I need, a request is sent to the server, it is authenticated with anti-content which blocks me from requesting it, then the only solution is to get the returned JSON after sending the request, I had checked selenium-wire, but I think it doesn't fit my needs, I thought if there is another way to do that, any suggestions?
You can try to use Titanium Web Proxy. It is a proxy server and can be installed via Nuget package and used with Selenium.
string body = await e.GetResponseBodyAsString();
Reference:
https://github.com/justcoding121/Titanium-Web-Proxy/issues/176
https://www.automatetheplanet.com/webdriver-capture-modify-http-traffic/#tab-con-9
Hello there are some pages which is created to be impossible automatize the request.
That rule works in JavaScript and there are companies which makes this detection and close the access for a bot.
So I am sorry to cannot solve your problem, I tried to do the same as You and there are not way.
To test my API, I need to send a request on my viewer url on which there is a tracking service that tell my API how many time I've spent on the page (classical).
I have this small function in my tests :
def does_it_track(response, **kwargs):
# some unrelated actions
r = requests.get('my_viewer_url')
This request works fine but it only last for less than a second and it doesn't allow me to test my statistic generator, neither the my tracker precision.
I've tried :
This SO issue : how to make python request.get wait a few seconds? it didn't help
The sleep method (but I got has no attribute 'sleep'
To repeat the request send, but it obviously create several stats and I only need a longer one
Does someone know about a "not-to-complicated-way" to make my request wait on my page ?
I'm python 2.7
Thank you !
"how many time you've spent on the page" has nothing to do with the HTTP request/response cycle, but with your browser.
From the server's point of view, the server gets a request, returns a response and the job is over, period - and from the client's point of view once the server returned a response the HTTP transaction is over too. There's not even a notion of "page" here, only HTTP request and response.
Your "tracker" is (obviously) using javascript to send data from the browser itself (most likely by sending a request each X seconds indicating the page is still displayed in the browser). IOW, the only way to test this is to use a headless browser that will execute javascript.
Try VCR, it might help you to solve your issue.
Indeed you would be able to save your request and see what's happening.
VCR