I am trying to make a request to a server using Python Requests and it returns a 403. The page works fine using my browser and using urllib.
The headers are identical. I even tried using an ordered dict to make sure the header ordering is identical, but it still won't work.
Then I tried to see the SSL differences, and found that the main difference between the 3 (my browser, requests, and urllib) is that requests doesn't support TLS session tickets.
url="https://www.howsmyssl.com/a/check"
import requests
req = requests.get(url=url)
print(req.text)
import urllib
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
print(response.read())
The Cipher Suite is almost identical across the 3. The TLS version is 1.3 across all. But the session_ticket_supported is true only for the browser and urllib (both of which work) and is false for requests (which returns 403).
So I assumed that the problem is there.
I dug deeper and learned that requests is actually using urllib3, but I got stuck at confirming which SSL adapter they use and how to configure it.
Any ideas on how to enable TLS session tickets for requests? Or maybe I am looking in the wrong place here?
PS. I am using Python 3.9.13 and the latest versions for all packages
PSS. curl also supports session tickets on my system and can access the server fine
Related
I am trying to use Proxies for my WebScraping Project, which i did build with HTTPX.
However when i was setting up my proxies i still got blocked, so i tryed out if the actully work/get used. I bought some proxys from an professional website/seller, so they work just fine.
I found a website, which returns the IP, from which i am making the request.
I Tryed to test the USE of proxies like that:
import httpx
import requests
#Username:PW:Hostname
proxies = {"http://": "http://username:pw.io:hostname"}
#response = requests.get('http://ipinfo.io/json',proxies=proxies)
response = httpx.get('http://ipinfo.io/json',proxies=proxies)
print(response.text)
Both requests and httpx dont work for me, as the response always returns my real IP. How do i need to set up my Proxiex? Keep in mind, that i actually want to use HTTPX and just used requests for debugging aswell.
I am trying to send a request wtih post method to an API, my code looks like the following one:
import urllib.request
import json
url = "https://api.cloudflareclient.com/v0a745/reg"
referrer = "e7b507ed-5256-4bfc-8f17-2652d3f0851f"
body = {"referrer": referrer}
data = json.dumps(body).encode('utf8')
headers = {'User-Agent': 'okhttp/3.12.1'}
req = urllib.request.Request(url, data, headers)
response = urllib.request.urlopen(req)
status_code = response.getcode()
print (status_code)
Actually it works fine but i want to use "requests" library instead as it's more updated and more flexible with proxies with following code:
import requests
import json
url = "https://api.cloudflareclient.com/v0a745/reg"
referrer = "e7b507ed-5256-4bfc-8f17-2652d3f0851f"
data = {"referrer": referrer}
headers = {'User-Agent': 'okhttp/3.12.1'}
req = requests.post(url, headers=headers, json=data)
status_code = req.status_code
print (status_code)
But it returns 403 status code, how can i fix it ?
Keep in mind that this API is open to everyone and you can just run the code with no worries.
EDIT-1: i have tried removing json.dumps(body).encode('utf8') or just .encode('utf8') from the second code by #tomasz-wojcik advice but i am still getting 403 while the first code still works!
EDIT-2: i tried requesting with postman that successfully made the request and returned 200 status code. postman generated the following python code:
import requests
url = "https://api.cloudflareclient.com/v0a745/reg"
payload = "{\"referrer\": \"e7b507ed-5256-4bfc-8f17-2652d3f0851f\"}"
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'okhttp/3.12.1',
'Host': 'api.cloudflareclient.com'
}
response = requests.request("POST", url, headers=headers, data=payload)
status_code = response.status_code
print (status_code)
If you run the code outside of postman, it still returns 403 status code, i'm a litte confused, i am thinking that maybe "requests" library doesn't changing the user-agent in the second code.
EDIT-3: I have looked into it and found out that it works on python 2.7.16 but doesn't work on python 3.8.5!
EDIT-4: Some Developers are reporting that the second code works on python 3.6 too but the main thing is why it is working on other versions but not working on 3.8 or 3.7 ?
Python Versions that returned 403 status code(second code): 3.8.5 & 3.7
Python Versions that returned 200 status code(second code): 3.6 & 2.7.16
The issue seems to be with how the host is handling ssl. Newer versions of requests uses certifi which in your case is having issues with the host server. I downgraded requests to an earlier version and it worked. (2.1.0). You can fix the version in your requirements.txt and it should work with any python version.
https://requests.readthedocs.io/en/master/user/advanced/#ca-certificates
Before version 2.16, Requests bundled a set of root CAs that it trusted, sourced from the Mozilla trust store.
The certificates were only updated once for each Requests version. When certifi was not installed, this led to extremely out-of-date certificate bundles when using significantly older versions of Requests.
For the sake of security we recommend upgrading certifi frequently!
I am using python 2.7 requests module to make a web crawler. But I am having trouble while making requests to a site that requires certificate. When I made requests.get(url), it throws sslError, certificate verify failed, ok.
So, I tried requests.get(url, verify=False), it works but it returns meta http-equiv="refresh" url='...', and the url is not the one I made the request. Is there a way to solve this problem or a need to send the certificate?
I saw in requests doc that I can send the certificate and the key. I have the certificate.crt, but I don't have the key, is there a way to get the key?
The certificate is AC certisign multipla G5 and uses TLS 1.2
After a long time of trying to solve this issue, I figured it out. The problem was not with the SSL certificate.
I was making a request to a web page that needs a session; The url that I was using is redirected from another page. To access it correctly, I had to send a request to that page and get the last redirected page.
So, what I did was using Requests' Session method:
Session.get(url, verify=False)
where the url is the redirecting url.
I am trying to login to website using urllib2 and cookiejar. It saves the session id, but when I try to open another link, which requires authentication it says that I am not logged in. What am I doing wrong?
Here's the code, which fails for me:
import urllib
import urllib2
import cookielib
cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
# Gives response saying that I logged in succesfully
response = opener.open("http://site.com/login", "username=testuser&password=" + md5encode("testpassword"))
# Gives response saying that I am not logged in
response1 = opener.open("http://site.com/check")
Your implementation seems fine... and should work.
It should be sending in the correct cookies, but I see it as the case when the site is actually not logging you in.
How can you say that its not sending the cookies or may be cookies that you are getting are not the one that authenticates you.
Use : response.info() to see the headers of the responses to see what cookies you are receiving actually.
The site may not be logging you in because :
Its having a check on User-agent that you are not setting, since some sites open from 4 major browsers only to disallow bot access.
The site might be looking for some special hidden form field that you might not be sending in.
1 piece of advise:
from urllib import urlencode
# Use urlencode to encode your data
data = urlencode(dict(username='testuser', password=md5encode("testpassword")))
response = opener.open("http://site.com/login", data)
Moreover 1 thing is strange here :
You are md5 encoding your password before sending it over. (Strange)
This is generally done by the server before comparing to database.
This is possible only if the site.com implements md5 in javascript.
Its a very rare case, since only may be 0.01 % websites do that..
Check that - that might be the problem, and you are providing the hashed form and not the actual password to the server.
So, server would have been again calculating a md5 for your md5 hash.
Check out.. !!
:)
I had a similar problem with my own test server, which worked fine with a browser, but not with the urllib2.build_opener solution.
The problem seems to be in urllib2. As these answers suggest, it's easy to use more powerful mechanize library instead of urllib2:
cookieJar = cookielib.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cookieJar)
opener = mechanize.build_opener(*browser.handlers)
And the opener will work as expected!
I'm working on a simple HTML scraper for Hulu in python 2.6 and am having problems with logging on to my account. Here's my code so far:
import urllib
import urllib2
from cookielib import CookieJar
#make a cookie and redirect handlers
cookies = CookieJar()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)#make opener w/ handlers
#build the url
login_info = {'username':USER,'password':PASS}#USER and PASS are defined
data = urllib.urlencode(login_info)
req = urllib2.Request("http://www.hulu.com/account/authenticate",data)#make the request
test = opener.open(req) #open the page
print test.read() #print html results
The code compiles and runs, but all that prints is:
Login.onError("Please \074a href=\"/support/login_faq#cant_login\"\076enable cookies\074/a\076 and try again.");
I assume there is some error in how I'm handling cookies, but just can't seem to spot it. I've heard Mechanize is a very useful module for this type of program, but as this seems to be the only speed bump left, I was hoping to find my bug.
What you're seeing is a ajax return. It is probably using javascript to set the cookie, and screwing up your attempts to authenticate.
The error message you are getting back could be misleading. For example the server might be looking at user-agent and seeing that say it's not one of the supported browsers, or looking at HTTP_REFERER expecting it to be coming from hulu domain. My point is there are two many variables coming in the request to keep guessing them one by one
I recommend using an http analyzer tool, e.g. Charles or the one in Firebug to figure out what (header fields, cookies, parameters) the client sends to server when you doing hulu login via a browser. This will give you the exact request that you need to construct in your python code.