I'm using Tor, Privoxy, and Python to anonymously crawl sources on the web. Tor is configured with ControlPort 9051, while Privoxy is configured with forward-socks5 / localhost:9050 .
My scripts are working flawlessly, except when I request an API resource that I have running on 8000 on the same machine. If I hit the API via urllib2 setup with the proxy, I get an empty string response. If I hit the API using a new, non-proxy instance of urllib2, I get a HTTP Error 503: Forwarding failure.
I'm sure that if I open 8000 to the world I'll be able to access the port through the proxy. However, there must be a better way to access the resource on localhost. Curious how people deal with this.
I was able to switch off proxy and hit internal API by using the following to opener:
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
opener = urllib2.build_opener(urllib2.HTTPSHandler(context=ctx))
urllib2.install_opener(opener)
I'm not sure if there is a better way, but it worked.
Related
I'm trying to scrape my own site from my local server. But when I use python requests on it, it gives me a response 503. Other ordinary sites on the web work. Any reason/solution for this?
import requests
url = 'http://127.0.0.1:8080/full_report/a1uE0000002vu2jIAA/'
r = requests.get(url)
print r
prints out
<Response [503]>
After further investigation, I've found a similar problem to mine.
Python requests 503 erros when trying to access localhost:8000
However, I don't think he's solved it yet. I can access the local website via the web browser but can't access using the requests.get function. I'm also using Django to host the server.
python manage.py runserver 8080
When I use:
curl -vvv http://127.0.0.1:8080
* Rebuilt URL to: http://127.0.0.1:8080/
* Trying 10.37.135.39...
* Connected to proxy.kdc.[company-name].com (10.37.135.39) port 8099 (#0)
* Proxy auth using Basic with user '[company-id]'
> GET http://127.0.0.1:8080/ HTTP/1.1
> Host: 127.0.0.1:8080
> Proxy-Authorization: Basic Y2FhNTc2OnJ2YTkxQ29kZQ==
> User-Agent: curl/7.49.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: BlueCoat-Security-Appliance
< Location:http://10.118.216.201
< Connection: Close
<
<HTML>
<HEAD><TITLE>Redirection</TITLE></HEAD>
<BODY><H1>Redirect</H1></BODY>
* Closing connection 0
I cannot request a local url using python requests because the company's network software won't allow it. This is a dead end and other avenues must be pursued.
EDIT: Working Solution
>>> import requests
>>> session = requests.Session()
>>> session.trust_env = False
>>> r = session.get("http://127.0.0.1:8080")
>>> r
<Response [200]>
Maybe you should disable your proxies in your requests.
import requests
proxies = {
"http": None,
"https": None,
}
requests.get("http://127.0.0.1:8080/myfunction", proxies=proxies)
ref:
https://stackoverflow.com/a/35470245/8011839
https://2.python-requests.org//en/master/user/advanced/#proxies
HTTP Error 503 means:
The Web server (running the Web site) is currently unable to handle the HTTP request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. Some servers in this state may also simply refuse the socket connection, in which case a different error may be generated because the socket creation timed out.
You may do following things:
Check you are able to open URL in the browser
If URL is opening, then check the domain in your code, it might be incorrect.
If in browser also it is not opening, your site may be overloaded or server resources are full to perform request
The most common cause of a 503 error is that a proxy host of some form is unable to communicate with the back end. For example, if you have Varnish trying to handle a request but Apache is down.
In your case, you have Django running on port 8080. (That's what the 8080 means). When you try to get content from 127.0.0.1, though, you're going to the default HTTP port (80). This means that your default server (Apache maybe? NginX?) is trying to find a host to serve 127.0.0.1 and can't find one.
You have two choices. Either you can update your server's configuration, or you can include the port in the URL.
url = 'http://127.0.0.1:8080/full_report/a1uE0000002vu2jIAA/'
Few days back I was wanting to build a proxy that could allow me to securely and anonymously connect to websites and servers. At first it seemed like a pretty easy idea, I would create an HTTP proxy that uses SSL between the client and the proxy, It would then create a SSL connection with what ever website/server the client requested and then forward that information to and from the client and server. I spent about a day researching and writing code that would do just that. But I then realized that someone could compromise the proxy and use the session key that the proxy had to decrypt and read the data being sent to and from the server.
After a little more research it seem that a socks proxy is what I needed. However there is not much documentation on a python version of a socks proxy(Mostly just how to connect to one). I was able to find The PySocks Module and read the Socks.py file. It looks great for creating a client but I don't see how I could use it to make a proxy.
I was wondering if anyone had a simple example of a socks5 proxy or if someone could point me to some material that could help me begin learning and building one?
You create a python server to listen on a port and listen on IP 127.0.0.1. When you connect to your server you send: "www.facebook.com:80". No URL path nor http scheme. If the connect fails you send a failure message which may look something like "number Unable to connect to host." where number is an specific code that signifies a failed connection attempt. Upon success of a connection you send "200 Connection established". Then data is sent and received as normal. You do not want to use an http proxy because it accepts only website traffic.
You may want to use a framework for the proxy server because it should handle multiple connections.
I've read an ebook on asyncio named O'Reilly "Using Asyncio In Python 2020" multiple times and re-read it every now and again to try to grasp multiple connections. I have also just started to search for solutions using Flask because I want my proxy server to run along side a webserver.
I recommend using requesocks along with stem (assumes Tor). The official stem library is provided by Tor. Here's a simplified example based on a scraper that I wrote which also uses fake_useragent so you look like a browser:
import requesocks
from fake_useragent import UserAgent
from stem import Signal
from stem.control import Controller
class Proxy(object):
def __init__(self,
socks_port=9050,
tor_control_port=9051,
tor_connection_password='password')
self._socks_port = int(socks_port)
self._tor_control_port = int(tor_control_port)
self._tor_connection_password = tor_connection_password
self._user_agent = UserAgent()
self._session = None
self._update_session()
def _update_session(self):
self._session = requesocks.session()
# port 9050 is the default SOCKS port
self._session.proxies = {
'http': 'socks5://127.0.0.1:{}'.format(self._socks_port),
'https': 'socks5://127.0.0.1:{}'.format(self._socks_port),
}
def _renew_tor_connection(self):
with Controller.from_port(port=self._tor_control_port) as controller:
controller.authenticate(password=self._tor_connection_password)
controller.signal(Signal.NEWNYM)
def _sample_get_response(self, url):
if not self._session:
self._update_session()
# generate random user agent string for every request
headers = {
'User-Agent': self._user_agent.random,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-us,en;q=0.5',
} # adjust as desired
response = self._session.get(url, verify=False, headers=headers)
return response
You must have the Tor service running before executing this script and you must also modify your torrc file to enable the control port (9051).
Tor puts the torrc file in /usr/local/etc/tor/torrc if you compiled Tor from source, and /etc/tor/torrc or /etc/torrc if you installed a pre-built package. If you installed Tor Browser, look for
Browser/TorBrowser/Data/Tor/torrc inside your Tor Browser directory (On Mac OS X, you must right-click or command-click on the Tor Browser icon and select "Show Package Contents" before the Tor Browser directories become
visible).
Once you've found your torrc file, you need to uncomment the corresponding lines:
ControlPort 9051
## If you enable the controlport, be sure to enable one of these
## authentication methods, to prevent attackers from accessing it.
HashedControlPassword 16:05834BCEDD478D1060F1D7E2CE98E9C13075E8D3061D702F63BCD674DE
Please note that the HashedControlPassword above is for the password "password". If you want to set a different password (recommended), replace the HashedControlPassword in the torrc file by noting the output from tor --hash-password "<new_password>" where <new_password> is the password that you want to set.
Once you've changed your torrc file, you will need to restart tor for the changes to take effect (note that you actually only need to send Tor a HUP signal, not actually restart it). To restart it:
sudo service tor restart
I hope this helps and at least gets you started for what you were looking for.
I have my web app API running.
If I go to http://127.0.0.1:5000/ via any browser I get the right response.
If I use the Advanced REST Client Chrome app and send a GET request to my app at that address I get the right response.
However this gives me a 503:
import requests
response = requests.get('http://127.0.0.1:5000/')
I read to try this for some reason:
s = requests.Session()
response = s.get('http://127.0.0.1:5000/')
But I still get a 503 response.
Other things I've tried: Not prefixing with http://, not using a port in the URL, running on a different port, trying a different API call like Post, etc.
Thanks.
Is http://127.0.0.1:5000/ your localhost? If so, try 'http://localhost:5000' instead
Just in case someone is struggling with this as well, what finally worked was running the application on my local network ip.
I.e., I just opened up the web app and changed the app.run(debug=True) line to app.run(host="my.ip.address", debug = True).
I'm guessing the requests library perhaps was trying to protect me from a localhost attack? Or our corporate proxy or firewall was preventing communication from unknown apps to the 127 address. I had set NO_PROXY to include the 127.0.0.1 address, so I don't think that was the problem. In the end I'm not really sure why it is working now, but I'm glad that it is.
I am using gh-issues-import to migrate issues between GitHub and a GitHub Enterprise server. The problem I have is, our GHE requires going through a VPN proxy, while GitHubs API requires HTTPS route to access. I can only get one or the other, but having a hell of a time finding a way to access both via the same Python project using urllib.requests. Here is a scaled down script I used to utilize the library that is failing in gh-issues-import...
import urllib.request
# works through VPN (notice able to use http), requires VPN
GitHubEnterpriseurl = "http://xxxxx/api/v3/"
req = urllib.request.Request(GitHubEnterpriseurl)
response = urllib.request.urlopen(req)
json_data = response.read()
print(json_data)
# does not work on VPN due to https path, but fine outside of VPN
req = urllib.request.Request("https://api.github.com")
response = urllib.request.urlopen(req)
json_data = response.read()
print(json_data)
I have tried other HTTP libraries and comes down to the VPN blocking access to the https://api.github.com. What are some solutions for this? Can I create a script on another server, my VPN has access to, and simply clone the requests and route the data?
* I am able to connect to https://api.github.com using VPN through the browser (Chrome / Firefox) but when running any command line tools or this script to access it fails.
I have a question about python mechanize's proxy support. I'm making some web client script, and I would like to insert proxy support function into my script.
For example, if I have:
params = urllib.urlencode({'id':id, 'passwd':pw})
rq = mechanize.Request('http://www.example.com', params)
rs = mechanize.urlopen(rq)
How can I add proxy support into my mechanize script?
Whenever I open this www.example.com website, i would like it to go through the proxy.
I'm not sure whether that help or not but you can set proxy settings on mechanize proxy browser.
br = Browser()
# Explicitly configure proxies (Browser will attempt to set good defaults).
# Note the userinfo ("joe:password#") and port number (":3128") are optional.
br.set_proxies({"http": "joe:password#myproxy.example.com:3128",
"ftp": "proxy.example.com",
})
# Add HTTP Basic/Digest auth username and password for HTTP proxy access.
# (equivalent to using "joe:password#..." form above)
br.add_proxy_password("joe", "password")
You use mechanize.Request.set_proxy(host, type) (at least as of 0.1.11)
assuming an http proxy running at localhost:8888
req = mechanize.Request("http://www.google.com")
req.set_proxy("localhost:8888","http")
mechanize.urlopen(req)
Should work.