Few days back I was wanting to build a proxy that could allow me to securely and anonymously connect to websites and servers. At first it seemed like a pretty easy idea, I would create an HTTP proxy that uses SSL between the client and the proxy, It would then create a SSL connection with what ever website/server the client requested and then forward that information to and from the client and server. I spent about a day researching and writing code that would do just that. But I then realized that someone could compromise the proxy and use the session key that the proxy had to decrypt and read the data being sent to and from the server.
After a little more research it seem that a socks proxy is what I needed. However there is not much documentation on a python version of a socks proxy(Mostly just how to connect to one). I was able to find The PySocks Module and read the Socks.py file. It looks great for creating a client but I don't see how I could use it to make a proxy.
I was wondering if anyone had a simple example of a socks5 proxy or if someone could point me to some material that could help me begin learning and building one?
You create a python server to listen on a port and listen on IP 127.0.0.1. When you connect to your server you send: "www.facebook.com:80". No URL path nor http scheme. If the connect fails you send a failure message which may look something like "number Unable to connect to host." where number is an specific code that signifies a failed connection attempt. Upon success of a connection you send "200 Connection established". Then data is sent and received as normal. You do not want to use an http proxy because it accepts only website traffic.
You may want to use a framework for the proxy server because it should handle multiple connections.
I've read an ebook on asyncio named O'Reilly "Using Asyncio In Python 2020" multiple times and re-read it every now and again to try to grasp multiple connections. I have also just started to search for solutions using Flask because I want my proxy server to run along side a webserver.
I recommend using requesocks along with stem (assumes Tor). The official stem library is provided by Tor. Here's a simplified example based on a scraper that I wrote which also uses fake_useragent so you look like a browser:
import requesocks
from fake_useragent import UserAgent
from stem import Signal
from stem.control import Controller
class Proxy(object):
def __init__(self,
socks_port=9050,
tor_control_port=9051,
tor_connection_password='password')
self._socks_port = int(socks_port)
self._tor_control_port = int(tor_control_port)
self._tor_connection_password = tor_connection_password
self._user_agent = UserAgent()
self._session = None
self._update_session()
def _update_session(self):
self._session = requesocks.session()
# port 9050 is the default SOCKS port
self._session.proxies = {
'http': 'socks5://127.0.0.1:{}'.format(self._socks_port),
'https': 'socks5://127.0.0.1:{}'.format(self._socks_port),
}
def _renew_tor_connection(self):
with Controller.from_port(port=self._tor_control_port) as controller:
controller.authenticate(password=self._tor_connection_password)
controller.signal(Signal.NEWNYM)
def _sample_get_response(self, url):
if not self._session:
self._update_session()
# generate random user agent string for every request
headers = {
'User-Agent': self._user_agent.random,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-us,en;q=0.5',
} # adjust as desired
response = self._session.get(url, verify=False, headers=headers)
return response
You must have the Tor service running before executing this script and you must also modify your torrc file to enable the control port (9051).
Tor puts the torrc file in /usr/local/etc/tor/torrc if you compiled Tor from source, and /etc/tor/torrc or /etc/torrc if you installed a pre-built package. If you installed Tor Browser, look for
Browser/TorBrowser/Data/Tor/torrc inside your Tor Browser directory (On Mac OS X, you must right-click or command-click on the Tor Browser icon and select "Show Package Contents" before the Tor Browser directories become
visible).
Once you've found your torrc file, you need to uncomment the corresponding lines:
ControlPort 9051
## If you enable the controlport, be sure to enable one of these
## authentication methods, to prevent attackers from accessing it.
HashedControlPassword 16:05834BCEDD478D1060F1D7E2CE98E9C13075E8D3061D702F63BCD674DE
Please note that the HashedControlPassword above is for the password "password". If you want to set a different password (recommended), replace the HashedControlPassword in the torrc file by noting the output from tor --hash-password "<new_password>" where <new_password> is the password that you want to set.
Once you've changed your torrc file, you will need to restart tor for the changes to take effect (note that you actually only need to send Tor a HUP signal, not actually restart it). To restart it:
sudo service tor restart
I hope this helps and at least gets you started for what you were looking for.
Related
I'm trying to access Azure EvenHub but my network makes me use proxy and allows connection only over https (port 443)
Based on https://learn.microsoft.com/en-us/python/api/azure-eventhub/azure.eventhub.aio.eventhubproducerclient?view=azure-python
I added proxy configuration and TransportType.AmqpOverWebsocket parametr and my Producer looks like this:
async def run():
producer = EventHubProducerClient.from_connection_string(
"Endpoint=sb://my_eh.servicebus.windows.net/;SharedAccessKeyName=eh-sender;SharedAccessKey=MFGf5MX6Mdummykey=",
eventhub_name="my_eh",
auth_timeout=180,
http_proxy=HTTP_PROXY,
transport_type=TransportType.AmqpOverWebsocket,
)
and I get an error:
File "/usr/local/lib64/python3.9/site-packages/uamqp/authentication/cbs_auth_async.py", line 74, in create_authenticator_async
raise errors.AMQPConnectionError(
uamqp.errors.AMQPConnectionError: Unable to open authentication session on connection b'EHProducer-a1cc5f12-96a1-4c29-ae54-70aafacd3097'.
Please confirm target hostname exists: b'my_eh.servicebus.windows.net'
I don't know what might be the issue.
Might it be related to this one ? https://github.com/Azure/azure-event-hubs-c/issues/50#issuecomment-501437753
you should be able to set up a proxy that the SDK uses to access EventHub. Here is a sample that shows you how to set the HTTP_PROXY dictionary with the proxy information. Behind the scenes when proxy is passed in, it automatically goes over websockets.
As #BrunoLucasAzure suggested checking the ports on the proxy itself will be good to check, because based on the error message it looks like it made it past the proxy and cant resolve the endpoint.
I want to make an anonymous web request using python 3 with help of Tor, and I'm following this tutorial: https://computerscienceandfangs.blogspot.com/2018/04/setting-up-tor-for-windows-10-python-3.html.
So far I'm just testing the first part of the tutorial code (below):
import requests
def get_tor_session():
session = requests.session()
# Tor uses the 9050 port as the default socks port
session.proxies = {'http': 'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'}
return session
# Make a request through the Tor connection
# IP visible through Tor
session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
# Above should print an IP different than your public IP
# Following prints your normal public IP
print(requests.get("http://httpbin.org/ip").text)
So when I execute the code: print(session.get("http://httpbin.org/ip").text), it should show me a different IP address to mine. However instead I get the error:
File "C:\Program Files\Anaconda3\lib\site-packages\requests\adapters.py", line 43, in SOCKSProxyManager
try:
InvalidSchema: Missing dependencies for SOCKS support.
I've installed the packages below as per the tutorial:
1)pip install requests -- upgrade
2)pip install requests[socks]
3)pip install stem
I'm using Windows 7 (64-bit). Spyder for Python IDE. Python version 3.5.
Second question, which is more general. I'm looking to make requests on a bigger scale as part of a project for a web-scraper. Is the approach above, using the tutorial I referenced, still a good approach (i.e. coding things manually using Python), to ensure you don't get banned/black-listed? Or Are there services out there more advanced that can do Anonymous IP request, IP rotating and request throttling all for you, without you having to code your own software and configure manually, and with unlimited number of requests?
Many thanks in advance.
Are you running a tor service from the cli?
Your proxy should look like this:
session.proxies = {'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'}
Also, requests is not designed for making mass amounts of requests in the way you describe. I would recommend using the following setup, which uses aiohttp, aiohttp_socks, and asyncio.
import asyncio, aiohttp
from aiohttp_socks import SocksConnector
async def get_one(url, callback):
connector = SocksConnector.from_url('socks5://localhost:9050', rdns=True)
# rdns=True is important!
# 1) Can't connect to hidden services without it
# 2) You will make DNS lookup requests using your real IP, and not your Tor IP!
async with aiohttp.ClientSession(connector=connector) as session:
print(f'Starting {url}')
async with session.get(url) as res:
return await callback(res)
def get_all(urls, callback):
future = []
for url in urls:
task = asyncio.ensure_future(get_one(url, callback))
future.append(task)
return future
def test_callback(res):
print(res.status)
if __name__ == '__main__':
urls = [
'https://python.org',
'https://google.com',
#...
]
loop = asyncio.get_event_loop()
future = get_all(urls, test_callback)
loop.run_until_complete(asyncio.wait(future))
To resolve the error: InvalidSchema: Missing dependencies for SOCKS support I had restart Tor service in Windows OS, by running the following in command line:
tor --service remove
then
tor --service install -options ControlPort 9051
I'm using Tor, Privoxy, and Python to anonymously crawl sources on the web. Tor is configured with ControlPort 9051, while Privoxy is configured with forward-socks5 / localhost:9050 .
My scripts are working flawlessly, except when I request an API resource that I have running on 8000 on the same machine. If I hit the API via urllib2 setup with the proxy, I get an empty string response. If I hit the API using a new, non-proxy instance of urllib2, I get a HTTP Error 503: Forwarding failure.
I'm sure that if I open 8000 to the world I'll be able to access the port through the proxy. However, there must be a better way to access the resource on localhost. Curious how people deal with this.
I was able to switch off proxy and hit internal API by using the following to opener:
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
opener = urllib2.build_opener(urllib2.HTTPSHandler(context=ctx))
urllib2.install_opener(opener)
I'm not sure if there is a better way, but it worked.
I have a question about python mechanize's proxy support. I'm making some web client script, and I would like to insert proxy support function into my script.
For example, if I have:
params = urllib.urlencode({'id':id, 'passwd':pw})
rq = mechanize.Request('http://www.example.com', params)
rs = mechanize.urlopen(rq)
How can I add proxy support into my mechanize script?
Whenever I open this www.example.com website, i would like it to go through the proxy.
I'm not sure whether that help or not but you can set proxy settings on mechanize proxy browser.
br = Browser()
# Explicitly configure proxies (Browser will attempt to set good defaults).
# Note the userinfo ("joe:password#") and port number (":3128") are optional.
br.set_proxies({"http": "joe:password#myproxy.example.com:3128",
"ftp": "proxy.example.com",
})
# Add HTTP Basic/Digest auth username and password for HTTP proxy access.
# (equivalent to using "joe:password#..." form above)
br.add_proxy_password("joe", "password")
You use mechanize.Request.set_proxy(host, type) (at least as of 0.1.11)
assuming an http proxy running at localhost:8888
req = mechanize.Request("http://www.google.com")
req.set_proxy("localhost:8888","http")
mechanize.urlopen(req)
Should work.
I am using Python's urllib2 with Tor as a proxy to access a website. When I
open the site's main page it works fine but when I try to view the login page
(not actually log-in but just view it) I get the following error...
URLError: <urlopen error (10060, 'Operation timed out')>
To counteract this I did the following:
import socket
socket.setdefaulttimeout(None).
I still get the same timeout error.
Does this mean the website is timing out on the server side? (I don't know much
about http processes so sorry if this is a dumb question)
Is there any way I can correct it so that Python is able to view the page?
Thanks,
Rob
According to the Python Socket Documentation the default is no timeout so specifying a value of "None" is redundant.
There are a number of possible reasons that your connection is dropping. One could be that your user-agent is "Python-urllib" which may very well be blocked. To change your user agent:
request = urllib2.Request('site.com/login')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux i686; it-IT; rv:1.9.0.2) Gecko/2008092313 Ubuntu/9.04 (jaunty) Firefox/3.5')
You may also want to try overriding the proxy settings before you try and open the url using something along the lines of:
proxy = urllib2.ProxyHandler({"http":"http://127.0.0.1:8118"})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
I don't know enough about Tor to be sure, but the timeout may not happen on the server side, but on one of the Tor nodes somewhere between you and the server. In that case there is nothing you can do other than to retry the connection.
urllib2.urlopen(url[, data][, timeout])
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS, FTP and FTPS connections.
http://docs.python.org/library/urllib2.html