I am writing a crawler in Python that will run through Tor. I have Tor working and used code from this YouTube tutorial on how to route my Python requests to go through the Tor SOCKS proxy at 127.0.0.1:9050.
What I can't figure out is how to toggle this on/off within my script. Some requests I want to go through Tor and some I don't. Basically, I can't figure out the correct "close" or "shutdown" method in the socket objects I am using because I don't understand them.
Here's what happens now
import socket
import socks
import requests
def connect_to_socks():
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9050, True)
socket.socket = socks.socksocket
r = requests.get('http://wtfismyip.com/text')
print r.text #prints my ordinary IP address
connect_to_socks()
r = requests.get('http://wtfismyip.com/text')
print r.text #prints my Tor IP address
How do I turn off the socket routing to the SOCKS proxy so that it goes through my ordinary internet connection?
I'm hoping to use requests instead of urllib2 as it seems a lot easier but if I have to get into the guts of urllib2 or even httplib I will. But would prefer not to.
Figured it out by listening to this good YouTube tutorial.
Just need to call socket.setdefaultproxy() and it brings me back.
For Python 3 you can set back default socket by using this:
socks.setdefaultproxy(None)
socket.socket = socks.socksocket
Related
Supposing this working code :
import win_inet_pton
import socks
import socket
s = socks.socksocket()
s.set_proxy(socks.SOCKS5, "localhost", 9050)
s.connect(("xmh57jrzrnw6insl.onion",80))
s.send("GET / HTTP/1.1\r\n\r\n")
print 'sended'
data=s.recv(1024)
print data
Tor service is indeed running at port 9050.
In normal condition, python will perform DNS resolution through the SOCKS5 proxy, which is connected to the Tor relay. However, Tor do not handle UDP packet(It handles the resolution directly from the hostname in the TCP packet) so DNS resolution will fail.
How is it possible that this code work? (the equivalent code in java for exemple will fail as the DNS resolution can't be made).
IT is explained in this link: Python requests fails when tryign to connect to .onion site
You simply have to use socks5h instead of socks5
I'm writing a proxy software, this proxy software support all standard aready, but now the hardest problem, "socks proxy per domain/url (if '???' in self.host: do socks'" without break the whole script by using monkey patching method, I must use monkey patching method because it is the best without any error so far, but if I use that method, my proxy will use that socks server to download all page, not only pages that I want to use socks proxy only because simply monkey patching method "socket" change the whole socket library and make the whole socket library use that socks proxy, and httplib, urllib based on socket library.
https://github.com/Anorov/PySocks
import urllib2
import socket
import socks
socks.set_default_proxy(socks.SOCKS5, "localhost")
socket.socket = socks.socksocket
urllib2.urlopen("http://www.somesite.com/") # All requests will pass through the SOCKS proxy
I must use monkey patch anyway, opener.open method from that page break a lot page like redirect 30x forever, TLSv1 error.... but the monkey patch is "perfect", no bug, no anything, just work but the whole local proxy will use the socks proxy which is overkill, I want to use socks proxy per page.
After day after day non-stop thinking, I figure something, "If I create a new fresh process using multiprocessing, and then monkey patch socket of that process, urlopen to get content from the domain/url like above without affect my local proxy main process and return the content to my main process and just use that content to display to my web browser, that is just great!"
My method is almost like: "I have a proxy listen at 127.0.0.1:1111 and create another proxy listen at 127.0.0.1:2222, the proxy with port 2222 will be monkey patched socket library to make it download page using my socks proxy, and every time I want to use my socks proxy I will chain my 1111 proxy with 2222 proxy".
Plus if that monkey patching method can be done, probably we can do bandwidth throttle by monkey patching socket and more without break the main process.
My idea may be the born of the greatest monkey patch ever, please help me, I will really appreciate if you can help me write something like demo code.
And here is my answer for my question, thank everybody:
import multiprocessing as mp
import urllib.request
def foo(q):
#q.put("123")
import socks
import socket
socks.set_default_proxy(socks.SOCKS5, "localhost", 10080)
socket.socket = socks.socksocket
r = urllib.request.urlopen("http://httpbin.org/ip")
#print("123")
q.put(r.read())
if __name__ == '__main__':
ctx = mp.get_context('spawn')
q = ctx.Queue()
p = ctx.Process(target=foo, args=(q,))
p.start()
print(q.get())
p.join()
r = urllib.request.urlopen("http://httpbin.org/ip")
print(r.read())
This way will create an isolated process and monkey patch socket library from that process.
Output from ILDE:
b'{\n "origin": "xxx.148.2.18"\n}\n'
b'{\n "origin": "xx.xxx.113.133"\n}\n'
How could I get my script that's using gspread to have the gspread connections to google's servers use a SOCKS proxy?
SocksiPy should work for this, as per the SO question: How can I use a SOCKS 4/5 proxy with urllib2?.
import socks
import socket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 8080)
socket.socket = socks.socksocket
import gspread
# do whatever
If this is not the desired result, you may have to create a custom instance of the bundled HTTPSession object.
I'm using TOR to proxy connections but am having difficulty proxying DNS lookups via socket.gethostbyname("www.yahoo.com") -- I learned that it was not sending DNS traffic via proxy by sniffing traffic with wireshark. Here's a copy of the code I'm using
import StringIO
import socket
import socks # SocksiPy module
import stem.process
from stem.util import term
SOCKS_PORT = 7000
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT)
socket.socket = socks.socksocket
def getaddrinfo(*args):
return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]
socket.getaddrinfo = getaddrinfo
socket.gethostbyname("www.yahoo.com") <--- This line is not sending traffic via proxy
Any help is greatly appreciated!
You're calling gethostbyname in the socket module. It doesn't know anything about your SOCKS socket; it is simply interacting with your operating system's name resolution mechanisms. Setting socket.socket = socks.socksocket may affect network connections made through the socket module, but the module does not make direct connections to DNS servers to perform name resolution so replacing socket.socket has no impact on this behavior.
If you simply call the connect(...) method on a socks.socksocket object using a hostname, the proxy will perform name resolution via SOCKS:
s = socks.socksocket()
s.connect(('www.yahoo.com', 80))
If you actually want to perform raw DNS queries over your SOCKS connection, you'll need to find a Python DNS module to which you can provide your socksocket object.
If you resolve the DNS yourself with Socks5 you may leak information about your own computer. Instead try tunneling with Proxifier, then to Tor. Alternatively you can use SocksiPy's Socks4A extension. This will make sure information is not leaked.
I am trying to figure out how to send data to a server through a proxy. I was hoping this would be possible through tor but being as tor uses SOCKS it apparently isn't possible with httplib (correct me if I am wrong)
This is what I have right now
import httplib
con = httplib.HTTPConnection("google.com")
con.set_tunnel(proxy, port)
con.send("Sent Stuff")
The problem is, it seems to freeze when the tunnel is set. Thanks for your help.
If you want to use http proxy, it should be like this:
import httplib
conn = httplib.HTTPConnection(proxyHost, proxyPort)
conn.request("POST", "http://www.google.com", params)
If you want to use SOCKS proxy, you can use SocksiPy as in this question: How can I use a SOCKS 4/5 proxy with urllib2?
Looks like the correct answer is:
http://bugs.python.org/issue11448#msg130413
import httplib
con = httplib.HTTPConnection(proxyHost, proxyPort)
con.set_tunnel("www.google.com", 80)
con.send("Sent Stuff")
As a follow-up to Khue Vu's answer, here's a complete example, the details of getting this working with a SOCKS proxy were more complex than expected.
First install PySocks with:
pip install PySocks
Then you need to manually set up your SOCKS proxy after instantiating your HTTPConnection and informing it that it's going to be using a proxy:
from http.client import HTTPConnection
from urllib.parse import urlparse, urlencode
import socks
url = urlparse("http://final.destination.example.com:8888/")
conn = HTTPConnection('127.0.0.1', 9000) # use socks proxy address
conn.set_tunnel(url.netloc, url.port) # remote host and port that you actually want to talk to
conn.sock = socks.socksocket() # manually set socket
conn.sock.set_proxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9000) # use socks proxy address
conn.sock.connect((url.netloc, url.port)) # remote host and port that you actually want to talk to
request_path = "%s?%s" % (url.path, url.query)
conn.request("POST", request_path, post_data)
Note that the imports above are python3.x