import requests
proxies = {'http': '203.92.33.87:80'}
# Creating the session and setting up the proxies.
s = requests.Session()
s.proxies = proxies
# Making the HTTP request through the created session.
r = s.get('https://www.trackip.net/ip')
# Check if the proxy was indeed used (the text should contain the proxy IP).
print(r.text)
In above code I am expecting that print will print 203.92.33.87.
But it is printing my real public IP.
In your proxies dictionary, you only specify a proxy for protocol http. But in your s.get(), you specificy protocol https. Since there is no https key in your dictionary, no proxy is used.
If 203.92.33.87:80 is, in fact, an https proxy, then change the proxies dictionary to reflect that. On the other hand, if it is an http proxy, then change s.get() to s.get('http://...').
Also, I believe you've incorrectly specified the proxy URL. According to the documentation:
Note that proxy URLs must include the scheme
Related
I'm working on a webscrape function that's going to be pulling HTML data from internal (non public) servers. I have a connection through a VPN and proxy server so when I ping any public site I get code 200 no problem, but our internals are returning 401.
Heres my code:
http_str = f'http://{username}:{password}#proxy.yourorg.com:80'
proxyDict = {
'http' : http_str,
'https' : https_str,
'ftp' : https_str
}
html_text = requests.get(url, verify=True, proxies=proxyDict, auth=HTTPBasicAuth(user, pwd))
I've tried flushing my DNS server, using different certificate chains (that had a whole new list of problems). I'm using urllib3 on version 1.23 because that seemed to help with SSL errors. I've considered using a requests session but I'm not sure what that would change.
Also, the url's we're trying to access DO NOT require a log in. I'm not sure why its throwing 401 errors but the auth is for the proxy server, I think. Any help or idea are appreciated, along with questions as at this point I'm not even sure what to ask to move this along.
Edit: the proxyDict has a string with the user and pwd passed it for each type, https http fts, etc.
To use HTTP Basic Auth with your proxy, use the http://user:password#host/ syntax in any of the proxy configuration entries. See apidocs.
import requests
proxyDict = {
"http": "http://username:password#proxy.yourorg.com:80",
"https": "http://username:password#proxy.yourorg.com:80"
}
url = 'http://myorg.com/example'
response = requests.get(url, proxies=proxyDict)
If, however, you are accessing internal URLs via VPN (i.e., internal to your organization on your intranet) then you should NOT need the proxy to access them.
Try:
import requests
url = 'http://myorg.com/example'
response = requests.get(url, verify=False)
while working with python Requests, I met the problem with ConnectionRefusedError [WinError 10061], because of network settings and limitations in my network, or company's network software won't allow it (I think).
But I was interested in what happens when I call requests.get(). Maybe I'm not good at reading the documentation, but I could not find any processes which happen after the call.
For example, why if I access URL by the browser it is ok, but while I try to access by requests - it fails.
What I'm asking about is what processes happen after the call get() method: starts the server at localhost? configure it? form headers? how it send the request?
Generally, most companies use proxy server for each outgoing request. Once set in connection settings, the browsers will read them and set with each request. You can check if proxy is enabled by checking the settings in your browser.
However, when you're making a python request, you will need to set the proxy in the request, like this:
proxyDict = {
"http" : "192.168.100.3:8080",
"https" : "Some/Same proxy for https",
"ftp" : "Some proxy for ftp (Optional)"
}
r = requests.get(url, headers=headers, proxies=proxyDict)
Also, browsers set the content-types, request headers and other such parameters. You can check browser's developer console, like one of Google Chrome, and goto Network tab and see what all params are being set with the request, and imply the same paramters in your request.get(). In case of headers, it should be :
r = requests.get(url, headers=headers, proxies=proxyDict, headers = {'Content-type':'application/json')
I have experiment with sessions in requests. One thing confuses me: when I reuse a session, on the second request the cookies are empty.
This short example boils it down, and the result is same with all host I try.
import requests
import time
# ==== First Request ====
session = requests.Session()
response = session.get(url="http://www.example.com")
print(response.cookies)
# <RequestsCookieJar[<Cookie UID=759854d4058cf52df60bbbe2a19d1402f5aee (...)
time.sleep(2)
# ==== Second Request ====
response = session.get(url="http://www.example.com")
print(response.cookies)
# <RequestsCookieJar[]> (EMPTY!)
But according to documentation:
The Session object allows you to persist certain parameters across
requests. It also persists cookies across all requests made from the
Session instance (...)
What am I missing?
Edit: the answer explained that I was doing wrong. And dir(session) made me realize that the cookies were stored in session.cookies
This is because you check the response's http header instead of the request.
Your first request creates the session on the server for the first time and the server responds to your request with the Set-Cookie HTTP header. This is what you see in the printout of the first response.
In your second request, the session is already created, therefore the server doesn't need to include the cookie in its response.
Try to examine your requests instead of the responses.
Hi I have written a few simple lines of code. But I seem to be getting a Authentication error. Can anyone please suggest , what credentials are being looked for python here ?
Code:
import urllib2
response = urllib2.urlopen('http://google.com')
html = response.read()
Error
urllib2.HTTPError: HTTP Error 407: Proxy Authentication Required
PS: I do not have acces to IE -->Advanced settings or regedit
As advised I've modified the code :
import urllib2
proxy_support = urllib2.ProxyHandler({'http':r'http://usename:psw#IP:port'})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy_support, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://google.com')
html = response.read()
Also I have created two environment variables :
HTTP_PROXY = http://username:password#proxyserver.domain.com
HTTPS_PROXY = https://username:password#proxyserver.domain.com
But still getting the error .
urllib2.HTTPError: HTTP Error 407: Proxy Authentication Required
There are multiple ways to work-around your problem. You may want to try defining an environment variables with the names http_proxy and https_proxy with each set to you proxy URL. Refer to this link for more details.
Alternatively, you may want to explicitly define a ProxyHandler to work with urllib2 while handling requests through the proxy. The link is already present within the comment to your query; however I am including it here for the sake of completeness.
Hope this helps
If your OS is windows and behind ISA proxy, urllib2 does not use any information about proxy; instead "Firewall Client for ISA server" automatically authenticates the user. That means we don't need to set http_proxy and https_proxy system environment variables. Keep it empty in ProxyHandler as following:
proxy = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
u = urllib2.urlopen('your-url-goes-here')
data = u.read()
The error code and message seem that the username and password failed to pass the authentications of proxy servers.
The following code:
proxy_handler = urllib2.ProxyHandler({'http': 'usename:psw#IP:port'})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://google.com')
html = response.read()
should also works if the authentication is passed.
I am using a proxy server to connect to several target servers. Some of the target servers expect http and others expect https. My http requests work swimmingly, but urllib2 ignores the proxy handler on the https requests and sends the requests directly to the target server.
I've tried a number of different things but here is one reasonably concise attempt:
import urllib2
cookie_handler = urllib2.HTTPCookieProcessor (cookielib.LWPCookieJar())
proxies = {'http': 'http://123.456.78.9/',
'https': 'http://123.45.78.9/'}
proxy_handler = urllib2.ProxyHandler (proxies)
url_opener = urllib2.build_opener (proxy_handler, cookie_handler)
request = urllib2.Request ('https://example.com')
response = url_opener.open (request)
I understand that urllib2 has had the ability to send https requests to a proxy server since Python 2.6.3, but I can't seem to get it to work. I'm using 2.7.3.
Thanks for any advice you can offer.
UPDATE: The code above does work. I'm not certain why it wasn't working when I asked this question. Most likely, I had a typo in the https proxy URL.