I am trying to use Python to get a JSON file from the Web. If I open the URL in my browser (Mozilla or Chromium) I do see the JSON. But when I do the following with the Python:
response = urllib2.urlopen(url)
data = json.loads(response.read())
I get an error message that tells me the following (after translation in English): Errno 10060, a connection troughs an error, since the server after a certain time period did not react, or the connection was erroneous, or the host did not react.
ADDED
It looks like there are many people who faced the described problem. There are also some answers to the similar (or the same) question. For example here we can see the following solution:
import requests
r = requests.get("http://www.google.com", proxies={"http": "http://61.233.25.166:80"})
print(r.text)
It is already a step forward for me (I think that it is very likely that the proxy is the reason of the problem). However, I still did not get it done since I do not know URL of my proxy and I probably will need user name and password. Howe can I find them? How did it happen that my browsers have them I do not?
ADDED 2
I think I am now one step further. I have used this site to find out what my proxy is: http://www.whatismyproxy.com/
Then I have used the following code:
proxies = {'http':'my_proxy.blabla.com/'}
r = requests.get(url, proxies = proxies)
print r
As a result I get
<Response [404]>
Looks not so good, but at least I think that my proxy is correct, because when I randomly change the address of the proxy I get another error:
Cannot connect to proxy
So, I can connect to proxy but something is not found.
I think there might be something wrong, when you're trying to get the json from the online source(URL). Just to make things clear, here is a small code snippet
#!/usr/bin/env python
try:
# For Python 3+
from urllib.request import urlopen
except ImportError:
# For Python 2
from urllib2 import urlopen
import json
def get_jsonparsed_data(url):
response = urlopen(url)
data = str(response.read())
return json.loads(data)
If you still get a connection error, You can try a couple of steps:
Try to urlopen() a random site from the Interpreter (Interactive Mode). If you are able to grab the source code you're good. If not check internet conditions or try the request module. Check here
Check and see if the json in the URL is in the correct syntax. For sample json syntax check here
Try the simplejson module.
Edit 1:
if you want to access websites using a system wide proxy you will have to use a proxy handler to use loopback(local host) to connect to that proxy.. A sample code is shown below.
proxy = urllib2.ProxyHandler({
'http': '127.0.0.1',
'https': '127.0.0.1'
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# this way you can send both http and https request using proxies
urllib2.urlopen('http://www.google.com')
urllib2.urlopen('https://www.google.com')
I have not not worked a lot with ProxyHandler. I just know the theory and code. I am sure there are better ways to access websites through proxies; One which does not involve installing the opener everytime you run the program. But hopefully it will point you in the right direction.
Related
I am trying to access a server over my internal network under https://prodserver.de/info.
I have the code structure as below:
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password))
print(resp.status_code)
While trying to access this server via browser, it works perfectly fine.
What am I doing wrong?
By default, requests library verifies the SSL certificate for HTTPS requests. If the certificate is not verified, it will raise a SSLError. You check this by disabling the certificate verification by passing verify=False as an argument to the get method, if this is the issue.
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password), verify=False)
print(resp.status_code)
try using requests' generic auth, like this:
resp = requests.get('https://prodserver.de/info/', auth=(username,password)
What am I doing wrong?
I can not be sure without investigating your server, but I suggest checking if assumption (you have made) that server is using Basic authorization, there exist various Authentication schemes, it is also possible that your server use cookie-based solution, rather than headers-based one.
While trying to access this server via browser, it works perfectly
fine.
You might then use developer tools to see what is actually send inside and with request which does result in success.
I am trying to redirect one page to another by using mitmproxy and Python. I can run my inline script together with mitmproxy without issues, but I am stuck when it comes to changing the URL to another URL. Like if I went to google.com it would redirect to stackoverflow.com
def response(context, flow):
print("DEBUG")
if flow.request.url.startswith("http://google.com/"):
print("It does contain it")
flow.request.url = "http://stackoverflow/"
This should in theory work. I see http://google.com/ in the GUI of mitmproxy (as GET) but the print("It does contain it") never gets fired.
When I try to just put flow.request.url = "http://stackoverflow.com" right under the print("DEBUG") it won't work neither.
What am I doing wrong? I have also tried if "google.com" in flow.request.url to check if the URL contains google.com but that won't work either.
Thanks
The following mitmproxy script will
Redirect requesto from mydomain.com to newsite.mydomain.com
Change the request method path (supposed to be something like /getjson? to a new one `/getxml
Change the destination host scheme
Change the destination server port
Overwrite the request header Host to pretend to be the origi
import mitmproxy
from mitmproxy.models import HTTPResponse
from netlib.http import Headers
def request(flow):
if flow.request.pretty_host.endswith("mydomain.com"):
mitmproxy.ctx.log( flow.request.path )
method = flow.request.path.split('/')[3].split('?')[0]
flow.request.host = "newsite.mydomain.com"
flow.request.port = 8181
flow.request.scheme = 'http'
if method == 'getjson':
flow.request.path=flow.request.path.replace(method,"getxml")
flow.request.headers["Host"] = "newsite.mydomain.com"
You can set .url attribute, which will update the underlying attributes. Looking at your code, your problem is that you change the URL in the response hook, after the request has been done. You need to change the URL in the request hook, so that the change is applied before requesting resources from the upstream server.
Setting the url attribute will not help you, as it is merely constructed from underlying data. [EDIT: I was wrong, see Maximilian’s answer. The rest of my answer should still work, though.]
Depending on what exactly you want to accomplish, there are two options.
(1) You can send an actual HTTP redirection response to the client. Assuming that the client understands HTTP redirections, it will submit a new request to the URL you give it.
from mitmproxy.models import HTTPResponse
from netlib.http import Headers
def request(context, flow):
if flow.request.host == 'google.com':
flow.reply(HTTPResponse('HTTP/1.1', 302, 'Found',
Headers(Location='http://stackoverflow.com/',
Content_Length='0'),
b''))
(2) You can silently route the same request to a different host. The client will not see this, it will assume that it’s still talking to google.com.
def request(context, flow):
if flow.request.url == 'http://google.com/accounts/':
flow.request.host = 'stackoverflow.com'
flow.request.path = '/users/'
These snippets were adapted from an example found in mitmproxy’s own GitHub repo. There are many more examples there.
For some reason, I can’t seem to make these snippets work for Firefox when used with TLS (https://), but maybe you don’t need that.
So every second I am making a bunch of requests to website X every second, as of now with the standard urllib packages like so (the requestreturns a json):
import urllib.request
import threading, time
def makerequests():
request = urllib.request.Request('http://www.X.com/Y')
while True:
time.sleep(0.2)
response = urllib.request.urlopen(request)
data = json.loads(response.read().decode('utf-8'))
for i in range(4):
t = threading.Thread(target=makerequests)
t.start()
However because I'm making so much requests after about 500 requests the website returns HTTPError 429: Too manyrequests. I was thinking it might help if I re-use the initial TCP connection, however I noticed it was not possible to do this with the urllib packages.
So I did some googling and discovered that the following packages might help:
Requests
http.client
socket ?
So I have a question: which one is best suited for my situation and can someone show an example of either one of them (for Python 3)?
requests handles keep alive automatically if you use a session. This might not actually help you if the server is rate limiting requests, however, requests also handles parsing JSON so that's a good reason to use it. Here's an example:
import requests
s = requests.Session()
while True:
time.sleep(0.2)
response = s.get('http://www.X.com/y')
data = response.json()
I wish to access information from this website using Python. But I cannot figure out how to post my log in information using urllib2
https://www.linksyssmartwifi.com/
Can anyone explain why it isn't working?
Edit - This question has been listed as too broad. I will be more specific.
When I try to use the following code, I can't seem to 'post' the user name and password to the webpage.
import urllib2
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm=' ??? ',
uri=' ??? ',
user='USERNAME',
passwd='PASSWORD')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
urllib2.urlopen('EXAMPLELINK')
I am assuming this is because there is no string in the url containing the data and/or there is no api on this page that allows for posts. I don't understand WHY it is not working, I don't actually have any error code. If I print the content I've received after urlopen I get the html code of a page, but without having it logged in.
But my understanding here may be incorrect.
I actually would like to find out information about how many people are logged in to my home network, using a remote connection. I'm assuming this is the only way. I would like to automate some stuff based on who is logged into my local network, the information is available on this page after I log in.
I would preferably like to use Python or Bash scripting to do this.
This site does not use basic http authentication so the HTTPBasicAuthHandler you made will not even be called. The site just posts the username and password using SSLv3. I checked it out with fiddler.
Also, you may need to create an ssl handler. I had problems with urllib with https and had to use the handler below.
import urllib2, ssl
sslv3_handler = urllib2.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib2.build_opener(sslv3_handler)
Does urllib2 in Python 2.6.1 support proxy via https?
I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:
NOTE
Currently urllib2 does not support
fetching of https locations through a
proxy. This can be a problem.
I'm trying automate login in to web site and downloading document, I have valid username/password.
proxy_info = {
'host':"axxx", # commented out the real data
'port':"1234" # commented out the real data
}
proxy_handler = urllib2.ProxyHandler(
{"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)
I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.
I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?
EDIT:
Nope, I tested with
proxies = {
"http" : "http://%(host)s:%(port)s" % proxy_info,
"https" : "https://%(host)s:%(port)s" % proxy_info
}
proxy_handler = urllib2.ProxyHandler(proxies)
And I get error:
urllib2.URLError: urlopen error
[Errno 8] _ssl.c:480: EOF occurred in
violation of protocol
Fixed in Python 2.6.3 and several other branches:
_bugs.python.org/issue1424152 (replace _ with http...)
http://www.python.org/download/releases/2.6.3/NEWS.txt
Issue #1424152: Fix for httplib, urllib2 to support SSL while working through
proxy. Original patch by Christopher Li, changes made by Senthil Kumaran.
I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)
Incase anyone else have this issue in the future I'd like to point out that it does support https proxying now, make sure the proxy supports it too or you risk running into a bug that puts the python library into an infinite loop (this happened to me).
See the unittest in the python source that is testing https proxying support for further information:
http://svn.python.org/view/python/branches/release26-maint/Lib/test/test_urllib2.py?r1=74203&r2=74202&pathrev=74203