Python Head request with urrlib and proxy doesn't work - python

I'm failing to do a HEAD request through my local Tor Proxy
import httplib
host = 'www.heise.de'
inputfilename="/newsticker/classic/"
conn = httplib.HTTPSConnection("127.0.0.1", 9151)
conn.set_tunnel(host, 443)
conn.request("HEAD", inputfilename)
res = conn.getresponse()
print res
I get a lot of error messages, what would be the correct syntax?

Your Tor proxy is a SOCKS proxy, which isn't supported by httplib.
You can use a recent version of requests (which httplib recommends to use instead of itself, anyway).
Install requests and pySocks
Then, you can do:
import requests
proxies = {
'http': 'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'
}
# You need to use the url, not just the host name
url = 'http://www.heise.de'
response = requests.head(url, proxies=proxies)
print(response.headers)
#{'Vary': 'X-Forwarded-Proto, ... 'Last-Modified': 'Sun, 26 Feb 2017 09:27:45 GMT'}

Related

Being unable to send request using different proxies in Python (using requests module)

I am trying to send a request to a URL with a proxy using requests module of Python (3.6.5). The request has been done successfully, but when I check the origin of the request (by printing req.content), it still shows my IP. Checking through the examples over the Internet, I couldn't get the point behind this problem.
def send_request(url):
header = get_random('UserAgents.txt')
proxy = get_random('ProxyList.txt')
print("Proxy: " + str(proxy))
proxies = {
'http' : 'http://' + str(proxy),
}
try:
session = requests.Session()
session.proxies = proxies
session.headers = HEADER
req = session.get(url)
# req = requests.get(url, headers = { 'User-Agent' : HEADER
# }, proxies = proxies)
print(req.content)
req.raise_for_status()
except Exception as e:
print(e)
sys.exit()
print('Request is successful!')
return req
There is a possibility that your particular proxy may not hide your i.p. You can try one of these free proxies (results are in json format)
getproxylist
gimmeproxy
pubproxy
and see if that works. btw, if what you want is anonymous web access, vpn is far better than proxy.

407 response from proxy using python requests

Below is the code that i used. I am using latest python requests. I am getting 407 response from the below request using python 2.7.
And strange thing is i am getting 503 response while using https instead of http in the requests.
response = requests.get(query, proxies={'https': "https://username:password#104.247.XX.XX:80"}, headers=headers, timeout=30, allow_redirects=True)
print response
Output: Response [503]
response = requests.get(query, proxies={'http': "http://username:password#104.247.XX.XX:80"}, headers=headers, timeout=30, allow_redirects=True)
print response
Output: Response [407]
But the same code is working on my amazon ec2 instance. Though i am trying to run in local machine.
import urllib2
import urllib
import portalocker
import cookielib
import requests
query = 'http://google.com/search?q=wtf&num=100&tbs=cdr:1,cd_min:2000,cd_max:2015&start=0&filter=0'
headers = {'user-agent': 'Mozilla/5.0 (X11; Linux; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 Midori/0.4'}
response = requests.get(query, proxies={'http': "http://username:password#104.247.XX.XX:80"}, headers=headers, timeout=30, allow_redirects=True)
print response
from requests.auth import HTTPProxyAuth
proxyDict = {
'http' : '77.75.105.165',
'https' : '77.75.105.165'
}
auth = HTTPProxyAuth('username', 'mypassword')
r = requests.get("http://www.google.com", proxies=proxyDict, auth=auth)
The status codes might give a clue:
407 Proxy Authentication Required
503 Service Unavailable
These suggest that your proxy isn't running for https and the username/password combination is wrong for the proxy that you are using. Note that it is very unlikely that your local machine needs the same proxy as your the ec2 instance.

how to disable SSL authentication in python 3

I am new to python. I have a script, trying to post something to a site. now how do I disable SSL authentication in the script?
In python2, you can use
requests.get('https://kennethreitz.com', verify=False)
but I don't know how to do it in python 3.
import urllib.parse
import urllib.request
url = 'https://something.com'
headers = { 'APILOGIN' : "user",
'APITOKEN' : "passwd"}
values = {"dba":"Test API Merchant","web":"","mids.mid":"ACH"}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
req = urllib.request.Request(url, data, headers)
with urllib.request.urlopen(req) as response:
the_page = response.read()
See Verifying HTTPS certificates with urllib.request - by not specifying either cafile or capath in your call to urlopen, by default any HTTPS connection is not verified.

Python proxy authentication with Requests and Urlib2

Does anyone have any ideas on why the Urlib2 version returns the webpage, while the Requests version returns a connection error:
[Errno 10060] A connection attempt failed because the connected party
did not properly respond after a period of time, or established
connection failed because connected host has failed to respond.
Urllib2 code (Working):
import urllib2
proxy = urllib2.ProxyHandler({'http': 'http://login:password#proxy1.com:80'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
wPage = urllib2.urlopen('http://www.google.com/')
print wPage.read();
Requests code (Not working - Errno 10060):
import requests
proxy = {"http": "http://login:password#proxy1.com:80"}
wPage = requests.get('http://www.google.com/', proxies=proxy)
print wPage.text
The requests version returns intranet webpages, but gives an error on internet pages.
I am running Python 2.7
* Edit *
Based on m170897017's suggestion, I looked for differences in the GET requests. The only difference was in Connection and Proxy-Connection.
Urllib2 version:
header: Connection: close
header: Proxy-Connection: close
Requests version :
header: Connection: Keep-Alive
header: Proxy-Connection: Keep-Alive
I forced the Requests version to close both of those connections by modifying the header
header = {
"Connection": "close",
"Proxy-Connection": "close"
}
The GET request for both now match, however the Requests version still does not work.
Try this:
import urllib2
proxy = urllib2.ProxyHandler({'http': '1.1.1.1:9090'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://www.google.com/')
datum = response.read().decode("UTF-8")
response.close()
print datum
A little late... but for future reference this line:
proxy = {"http": "http://login:password#proxy1.com:80"}
should also have a second key/value pair for https, even if it's not going to be used.
Also there is an awesome requests module called proxy requests that does something very similar:
pip3 install proxy-requests
https://pypi.org/project/proxy-requests/

Python-Requests close http connection

I was wondering, how do you close a connection with Requests (python-requests.org)?
With httplib it's HTTPConnection.close(), but how do I do the same with Requests?
Code:
r = requests.post("https://stream.twitter.com/1/statuses/filter.json", data={'track':toTrack}, auth=('username', 'passwd'))
for line in r.iter_lines():
if line:
self.mongo['db'].tweets.insert(json.loads(line))
I think a more reliable way of closing a connection is to tell the sever explicitly to close it in a way compliant with HTTP specification:
HTTP/1.1 defines the "close" connection option for the sender to
signal that the connection will be closed after completion of the
response. For example,
Connection: close
in either the request or the response header fields indicates that the
connection SHOULD NOT be considered `persistent' (section 8.1) after
the current request/response is complete.
The Connection: close header is added to the actual request:
r = requests.post(url=url, data=body, headers={'Connection':'close'})
I came to this question looking to solve the "too many open files" error, but I am using requests.session() in my code. A few searches later and I came up with an answer on the Python Requests Documentation which suggests to use the with block so that the session is closed even if there are unhandled exceptions:
with requests.Session() as s:
s.get('http://google.com')
If you're not using Session you can actually do the same thing: https://2.python-requests.org/en/master/user/advanced/#session-objects
with requests.get('http://httpbin.org/get', stream=True) as r:
# Do something
As discussed here, there really isn't such a thing as an HTTP connection and what httplib refers to as the HTTPConnection is really the underlying TCP connection which doesn't really know much about your requests at all. Requests abstracts that away and you won't ever see it.
The newest version of Requests does in fact keep the TCP connection alive after your request.. If you do want your TCP connections to close, you can just configure the requests to not use keep-alive.
s = requests.session()
s.config['keep_alive'] = False
please use response.close() to close to avoid "too many open files" error
for example:
r = requests.post("https://stream.twitter.com/1/statuses/filter.json", data={'track':toTrack}, auth=('username', 'passwd'))
....
r.close()
On Requests 1.X, the connection is available on the response object:
r = requests.post("https://stream.twitter.com/1/statuses/filter.json",
data={'track': toTrack}, auth=('username', 'passwd'))
r.connection.close()
this works for me:
res = requests.get(<url>, timeout=10).content
requests.session().close()
Based on the latest requests(2.25.1), the requests.<method> will close the connection by default
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
https://github.com/psf/requests/blob/master/requests/api.py#L60
Thus, if you use the latest version of requests, it seems we don't need to close the connection by ourselves.
Also, if you need to send multiple times of requests with the same session, it's better to use requests.Session() instead of open/close the connection multiple times.
EX:
with requests.Session() as s:
r = s.get('https://example.org/1/')
print(r.text)
r = s.get('https://example.org/2/')
print(r.text)
r = s.get('https://example.org/3/')
print(r.text)
To remove the "keep-alive" header in requests, I just created it from the Request object and then send it with Session
headers = {
'Host' : '1.2.3.4',
'User-Agent' : 'Test client (x86_64-pc-linux-gnu 7.16.3)',
'Accept' : '*/*',
'Accept-Encoding' : 'deflate, gzip',
'Accept-Language' : 'it_IT'
}
url = "https://stream.twitter.com/1/statuses/filter.json"
#r = requests.get(url, headers = headers) #this triggers keep-alive: True
s = requests.Session()
r = requests.Request('GET', url, headers)

Categories