I want to use urllib2 and make a request from different IP addresses.
I have checked this but I am having no luck: Source interface with Python and urllib2
Code from link:
class BoundHTTPHandler(urllib2.HTTPHandler):
def __init__(self, source_address=None, debuglevel=0):
urllib2.HTTPHandler.__init__(self, debuglevel)
self.http_class = functools.partial(httplib.HTTPConnection,
source_address=source_address)
def http_open(self, req):
return self.do_open(self.http_class, req)
# test
handler = BoundHTTPHandler("192.168.1.1", 0)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
urllib2.urlopen("http://google.com/").read()
Error: TypeError: init() got an unexpected keyword argument 'source_address'
And how would I run this code before using urllib2?
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind((sourceIP, 0))
return sock
socket.socket = bound_socket
So you have the bound_socket function, then what?
Edit I don't believe my python version supports the source_address, which is why I'm getting the error, I think.
So, let's try the socket code.
The code socket.socket = bound_socket affects all code that is run after it in any module globally i.e., once you run it; you don't need to do anything else.
httplib.HTTPConnection class has source_address parameter in Python 2.7 therefore your BoundHTTPHandler should also work.
Related
How to force the requests library to use a specific internet protocol version for a get request? Or can this be achieved better with another method in Python? I could but I do not want to use curl…
Example to clarify purpose:
import requests
r = requests.get('https://my-dyn-dns-service.domain/?hostname=my.domain',
auth = ('myUserName', 'my-password'))
I've found a minimalistic solution to force urrlib3 to use either ipv4 or ipv6. This method is used by urrlib3 for creating new connection both for Http and Https. You can specify in it any AF_FAMILY you want to use.
import socket
import requests.packages.urllib3.util.connection as urllib3_cn
def allowed_gai_family():
"""
https://github.com/shazow/urllib3/blob/master/urllib3/util/connection.py
"""
family = socket.AF_INET
if urllib3_cn.HAS_IPV6:
family = socket.AF_INET6 # force ipv6 only if it is available
return family
urllib3_cn.allowed_gai_family = allowed_gai_family
This is a hack, but you can monkey-patch getaddrinfo to filter to only IPv4 addresses:
# Monkey patch to force IPv4, since FB seems to hang on IPv6
import socket
old_getaddrinfo = socket.getaddrinfo
def new_getaddrinfo(*args, **kwargs):
responses = old_getaddrinfo(*args, **kwargs)
return [response
for response in responses
if response[0] == socket.AF_INET]
socket.getaddrinfo = new_getaddrinfo
You can use this hack to force requests to use IPv4:
requests.packages.urllib3.util.connection.HAS_IPV6 = False
(This was submitted as a comment by Nulano and I thought it deserved to be a proper answer).
I've written a runtime patch for requests+urllib3+socket that allows passing the required address family optionally and on a per-request basis.
Unlike other solutions there is no monkeypatching involved, rather you replace your imports of requests with the patched file and it present a request-compatible interface with all exposed classes subclassed and patched and all “simple API” function reimplemented. The only noticeable difference should be the fact that there is an extra family parameter exposed that you can use to restrict the address family used during name resolution to socket.AF_INET or socket.AF_INET6. A somewhat complicated (but mostly just LoC intensive) series of strategic method overrides is then used to pass this value all the way down to the bottom layers of urllib3 where it will be used in an alternate implementation of the socket.create_connection function call.
TL;DR usage looks like this:
import socket
from . import requests_wrapper as requests # Use this load the patch
# This will work (if IPv6 connectivity is available) …
requests.get("http://ip6only.me/", family=socket.AF_INET6)
# … but this won't
requests.get("http://ip6only.me/", family=socket.AF_INET)
# This one will fail as well
requests.get("http://127.0.0.1/", family=socket.AF_INET6)
# This one will work if you have IPv4 available
requests.get("http://ip6.me/", family=socket.AF_INET)
# This one will work on both IPv4 and IPv6 (the default)
requests.get("http://ip6.me/", family=socket.AF_UNSPEC)
Full link to the patch library (~350 LoC): https://gitlab.com/snippets/1900824
I took a similar approach to https://stackoverflow.com/a/33046939/5059062, but instead patched out the part in socket that makes DNS requests so it only does IPv6 or IPv4, for every request, which means this can be used in urllib just as effectively as in requests.
This might be bad if your program also uses unix pipes and other such things, so I urge caution with monkeypatching.
import requests
import socket
from unittest.mock import patch
import re
orig_getaddrinfo = socket.getaddrinfo
def getaddrinfoIPv6(host, port, family=0, type=0, proto=0, flags=0):
return orig_getaddrinfo(host=host, port=port, family=socket.AF_INET6, type=type, proto=proto, flags=flags)
def getaddrinfoIPv4(host, port, family=0, type=0, proto=0, flags=0):
return orig_getaddrinfo(host=host, port=port, family=socket.AF_INET, type=type, proto=proto, flags=flags)
with patch('socket.getaddrinfo', side_effect=getaddrinfoIPv6):
r = requests.get('http://ip6.me')
print('ipv6: '+re.search(r'\+3>(.*?)</',r.content.decode('utf-8')).group(1))
with patch('socket.getaddrinfo', side_effect=getaddrinfoIPv4):
r = requests.get('http://ip6.me')
print('ipv4: '+re.search(r'\+3>(.*?)</',r.content.decode('utf-8')).group(1))
and without requests:
import urllib.request
import socket
from unittest.mock import patch
import re
orig_getaddrinfo = socket.getaddrinfo
def getaddrinfoIPv6(host, port, family=0, type=0, proto=0, flags=0):
return orig_getaddrinfo(host=host, port=port, family=socket.AF_INET6, type=type, proto=proto, flags=flags)
def getaddrinfoIPv4(host, port, family=0, type=0, proto=0, flags=0):
return orig_getaddrinfo(host=host, port=port, family=socket.AF_INET, type=type, proto=proto, flags=flags)
with patch('socket.getaddrinfo', side_effect=getaddrinfoIPv6):
r = urllib.request.urlopen('http://ip6.me')
print('ipv6: '+re.search(r'\+3>(.*?)</',r.read().decode('utf-8')).group(1))
with patch('socket.getaddrinfo', side_effect=getaddrinfoIPv4):
r = urllib.request.urlopen('http://ip6.me')
print('ipv4: '+re.search(r'\+3>(.*?)</',r.read().decode('utf-8')).group(1))
Tested in 3.5.2
This is totally untested and will probably require some tweaks, but combining answers from Using Python “requests” with existing socket connection and how to force python httplib library to use only A requests, it looks like you should be able to create an IPv6 only socket and then have requests use that for its connection pool with something like:
try:
from http.client import HTTPConnection
except ImportError:
from httplib import HTTPConnection
class MyHTTPConnection(HTTPConnection):
def connect(self):
print("This actually called called")
self.sock = socket.socket(socket.AF_INET6)
self.sock.connect((self.host, self.port,0,0))
if self._tunnel_host:
self._tunnel()
requests.packages.urllib3.connectionpool.HTTPConnection = MyHTTPConnection
After reading the previous answer, I had to modify the code to force IPv4 instead of IPv6. Notice that I used socket.AF_INET instead of socket.AF_INET6, and self.sock.connect() has 2-item tuple argument.
I also needed to override the HTTPSConnection which is much different than HTTPConnection since requests wraps the httplib.HTTPSConnection to verify the certificate if the ssl module is available.
import socket
import ssl
try:
from http.client import HTTPConnection
except ImportError:
from httplib import HTTPConnection
from requests.packages.urllib3.connection import VerifiedHTTPSConnection
# HTTP
class MyHTTPConnection(HTTPConnection):
def connect(self):
self.sock = socket.socket(socket.AF_INET)
self.sock.connect((self.host, self.port))
if self._tunnel_host:
self._tunnel()
requests.packages.urllib3.connectionpool.HTTPConnection = MyHTTPConnection
requests.packages.urllib3.connectionpool.HTTPConnectionPool.ConnectionCls = MyHTTPConnection
# HTTPS
class MyHTTPSConnection(VerifiedHTTPSConnection):
def connect(self):
self.sock = socket.socket(socket.AF_INET)
self.sock.connect((self.host, self.port))
if self._tunnel_host:
self._tunnel()
self.sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file)
requests.packages.urllib3.connectionpool.HTTPSConnection = MyHTTPSConnection
requests.packages.urllib3.connectionpool.VerifiedHTTPSConnection = MyHTTPSConnection
requests.packages.urllib3.connectionpool.HTTPSConnectionPool.ConnectionCls = MyHTTPSConnection
I'm trying to write a function which will take a URL and return the contents of that URL. There is one additional argument (useTor) which, when set to True, will use SocksiPy to route the request over a SOCKS 5 proxy server (in this case, Tor).
I can set the proxy globally for all connections just fine but I cannot work out two things:
How can I move this setting into a function so that it can be decided on the useTor variable? I'm unable to access socks within the function and have no idea how to do so.
I'm assuming that if I don't set the proxy, then the next time the request is made it'll go direct. The SocksiPy documentation doesn't seem to give any indication of as to how the proxy is reset.
Can anyone advise? My (beginners) code is below:
import gzip
import socks
import socket
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
# next line works just fine if I want to set the proxy globally
# socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket
socket.create_connection = create_connection
import urllib2
import sys
def getURL(url, useTor=False):
if useTor:
print "Using tor..."
# Throws- AttributeError: 'module' object has no attribute 'setproxy'
socks.setproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
else:
print "Not using tor..."
# Not sure how to cancel the proxy, assuming it persists
opener = urllib2.build_opener()
usock = opener.open(url)
url = usock.geturl()
encoding = usock.info().get("Content-Encoding")
if encoding in ('gzip', 'x-gzip', 'deflate'):
content = usock.read()
if encoding == 'deflate':
data = StringIO.StringIO(zlib.decompress(content))
else:
data = gzip.GzipFile('', 'rb', 9, StringIO.StringIO(content))
result = data.read()
else:
result = usock.read()
usock.close()
return result
# Connect to the same site both with and without using Tor
print getURL('https://check.torproject.org', False)
print getURL('https://check.torproject.org', True)
Example
Simply invoke socksocket.set_proxy with no arguments, this will effectively remove any previously set proxy settings.
import socks
sck = socks.socksocket ()
# use TOR
sck.setproxy (socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
# reset to normal use
sck.setproxy ()
Details
By looking at the source of socks.py, and digging into the contents of socksocket.setproxy, we quickly realize that in order to discard of any previous proxy attributes we simply invoke the function with no additional arguments (besides self).
class socksocket(socket.socket):
... # additional functionality ignored
def setproxy(self,proxytype=None,addr=None,port=None,rdns=True,username=None,password=None):
"""setproxy(proxytype, addr[, port[, rdns[, username[, password]]]])
Sets the proxy to be used.
proxytype - The type of the proxy to be used. Three types
are supported: PROXY_TYPE_SOCKS4 (including socks4a),
PROXY_TYPE_SOCKS5 and PROXY_TYPE_HTTP
addr - The address of the server (IP or DNS).
port - The port of the server. Defaults to 1080 for SOCKS
servers and 8080 for HTTP proxy servers.
rdns - Should DNS queries be preformed on the remote side
(rather than the local side). The default is True.
Note: This has no effect with SOCKS4 servers.
username - Username to authenticate with to the server.
The default is no authentication.
password - Password to authenticate with to the server.
Only relevant when username is also provided.
"""
self.__proxy = (proxytype,addr,port,rdns,username,password)
... # additional functionality ignored
Note: When a new connection is about to be negotiated, the implementation will use the contents of self.__proxy unless the potentially required element is None (in which case the setting is simply ignored).
How can I use HTTPServer (or some other class) to set up an HTTP server that listens to a filesystem socket instead of an actual network socket? By "filesystem socket" I mean sockets of the AF_UNIX type.
HTTPServer inherits from SocketServer.TCPServer, so I think it's fair to say that it isn't intended for that use-case, and even if you try to work around it, you may run into problems since you are kind of "abusing" it.
That being said, however, it would be possible per se to define a subclass of HTTPServer that creates and binds Unix sockets quite simply, as such:
class UnixHTTPServer(HTTPServer):
address_family = socket.AF_UNIX
def server_bind(self):
SocketServer.TCPServer.server_bind(self)
self.server_name = "foo"
self.server_port = 0
Then, just pass the path you want to bind to by the server_address argument to the constructor:
server = UnixHTTPServer("/tmp/http.socket", ...)
Again, though, I can't guarantee that it will actually work well. You may have to implement your own HTTP server instead.
I followed the example from #Dolda2000 above in Python 3.5 and ran into an issue with the HTTP handler falling over with an invalid client address. You don't have a client address with Unix sockets in the same way that you do with TCP, so the code below fakes it.
import socketserver
...
class UnixSocketHttpServer(socketserver.UnixStreamServer):
def get_request(self):
request, client_address = super(UnixSocketHttpServer, self).get_request()
return (request, ["local", 0])
...
server = UnixSocketHttpServer((sock_file), YourHttpHandler)
server.serve_forever()
With these changes, you can perform an HTTP request against the Unix socket with tools such as cURL.
curl --unix-socket /run/test.sock http:/test
Overview
In case it help anyone else, I have created a complete example (made for Python 3.8) based on Roger Lucas's example:
Server
import socketserver
from http.server import BaseHTTPRequestHandler
class myHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header('Content-type','text/html')
self.end_headers()
self.wfile.write(b"Hello world!")
return
class UnixSocketHttpServer(socketserver.UnixStreamServer):
def get_request(self):
request, client_address = super(UnixSocketHttpServer, self).get_request()
return (request, ["local", 0])
server = UnixSocketHttpServer(("/tmp/http.socket"), myHandler)
server.serve_forever()
This will listen on the unix socket and respond with "Hello World!" for all GET requests.
Client Request
You can send a request with:
curl --unix-socket /tmp/http.socket http://any_path/abc/123
Troubleshooting
If you run into this error:
OSError: [Errno 98] Address already in use
Then delete the socket file:
rm /tmp/http.socket
Code:
from socket import *
sP = 14000
servSock = socket(AF_INET,SOCK_STREAM)
servSock.bind(('',sP))
servSock.listen(1)
while 1:
connSock, addr = servSock.accept()
connSock.send('HTTP/1.0 200 OK\nContent-Type:text/html\nConnection:close\n<html>...</html>')
connSock.close()
When I go to the browser and type in localhost:14000, I get an error 101- ERR_CONNECTION_RESET The connection was reset? Not sure why! What am I doing wrong
Several bugs, some more severe than others ... as #IanWetherbee already noted, you need an empty line before the body. You also should send \r\n not just \n. You should use sendall to avoid short sends. Last, you need to close the connection once you're done sending.
Here's a slightly modified version of the above:
from socket import *
sP = 14000
servSock = socket(AF_INET,SOCK_STREAM)
servSock.bind(('',sP))
servSock.listen(1)
while 1:
connSock, addr = servSock.accept()
connSock.sendall('HTTP/1.0 200 OK\r\nContent-Type:text/html\r\nConnection:close\r\n\r\n<html><head>foo</head></html>\r\n')
connSock.close()
Running your code, I have similar errors and am unsure on their origins too. However, rather than rolling your own HTTP server, have you considered a built in one? Check out the sample below. This can also support POST as well (have to add the do_POST method).
Simple HTTP Server
from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler
class customHTTPServer(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers()
self.wfile.write('<HTML><body>Hello World!</body></HTML>')
return
def main():
try:
server = HTTPServer(('',14000),customHTTPServer)
print 'server started at port 14000'
server.serve_forever()
except KeyboardInterrupt:
server.socket.close()
if __name__=='__main__':
main()
I create little SimpleXMLRPCServer for check ip of client.
I try this:
Server
import xmlrpclib
from SimpleXMLRPCServer import SimpleXMLRPCServer
server = SimpleXMLRPCServer(("localhost", 8000))
def MyIp():
return "Your ip is: %s" % server.socket.getpeername()
server.register_function(MyIp)
server.serve_forever()
Client
import xmlrpclib
se = xmlrpclib.Server("http://localhost:8000")
print se.MyIp()
Error
xmlrpclib.Fault: :(107, 'Transport endpoint is not connected')">
How make client_address visible to all functions?
If you want for example to pass client_address as the first argument to every function, you could subclass SimpleXMLRPCRequestHandler (pass your subclass as the handler when you instantiate SimpleXMLRPCServer) and override _dispatch (to prepend self.client_address to the params tuple and then delegate the rest to SimpleXMLRPCRequestHandler._dispatch). If this approach is OK and you want to see code, just ask!
I'm not sure how you'd safely use anything but the function arguments to "make client_address visible" -- there's no client_address as a bare name, global or otherwise, there's just the self.client_address of each instance of the request handler class (and hacks such as copying it to a global variables feel really yucky indeed -- and unsafe under threading, etc etc).