urllib2 IncompleteRead error - python

I have the following code:
urllib2.urlopen('http://muhlenberg.edu/alumni/').geturl()
This should return http://www.muhlenbergconnect.com/s/1570/index.aspx?gid=2&pgid=61/ which is the link that it gets redirected to. However, I get an IncompleteRead error when running that code.
Is there a way I can prevent this error from happening and still return the correct link?

Related

Getting warning and error at same time in tornado proxy

It seems like i am hitting some race condition in between the time I stop my server and during that I make request through tornado proxy frontend.
I get very famous error
WARNING:tornado.access:404 POST /request-url/eff74/36eb5e9f-def1-4689-ad58-3bf866798864/client-update (::1) 0.88ms
ERROR:tornado.general:Cannot send error response after headers written
WARNING:tornado.access:404 POST /request-url/eff74/36eb5e9f-def1-4689-ad58-3bf866798864/client-update (::1) 1.36ms
ERROR:tornado.general:Cannot send error response after headers written
WARNING:tornado.access:404 POST /request-url/eff74/36eb5e9f-def1-4689-
which is described in source code here
Can I resolve this problem?
Point of confusion is I am getting 404 already, then why I am getting an error after wards.
Cheers
Actually, it was very silly bug in my code.
Bug was the proxy function which was handling "requests" wasn't issuing "return" statement from the function.
Example:
def proxy_handler(self, request, response):
# some task
response.set_status(status_code)
response.send("status_code: message")
return # this was missing earlier
So tornado was trying to send respons which already has been sent. Thus getting the error
WARNING:tornado.access:404 POST /request-url/eff74/36eb5e9f-def1-4689-ad58-3bf866798864/client-update (::1) 1.36ms
ERROR:tornado.general:Cannot send error response after headers written
Cheers

Twill : requests.exceptions.MissingSchema

I'm working on a simple web scraper for a page which requires users to be logged in to see its content.
from twill.commands import *
go("https://website.com/user")
fv("1","edit-name","NICKNAME")
fv("1","edit-pass","NICKNAME")
submit('0')
That is my current code. When running it, I get the following error
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '/user': No schema supplied. Perhaps you meant http:///user?
What am I doing wrong?

Serving a downloadable file in django throws server error

I am new to django. I am trying to serve downloadable files using django FileWrapper. But, I keep getting internal server error messages.
EDIT: My settings.py
DEBUG=True
link in html:
abcd.com/downloadResult/fileName/
My urls.py
url(r'^downloadResult/(\w{0,50})/$',"myApp.views.showResult",name="Result"),
My views.py
from django.core.servers.basehttp import FileWrapper
def showResult(request,fileName):
file=open("/path/to/file/"+fileName+".txt","r")
response=HttpResponse(FileWrapper(file),mimetype='application/force-download')
response['Content-Disposition']='attachment; filename=%s' % smart_str(fileName+".txt")
file.close()
return response
Can someone please direct me to some discussions or point out what I am missing?
Thanks!
I figured out the problem with my code. I closed the file before I return anything. After commenting out the second last line in views.py, it works fine. I found the error message in apache log file but did not get any error message in the browser even though DEBUG was set to True.
Now, the problem with this solution is that, there are files that are opened but never closed.

Python requests causing error on certain urls

For some reason when I try to get and process the following url with python-requests I receive an error causing my program to fail. Other similar urls seems to work fine
import requests
test = requests.get('http://t.co/Ilvvq1cKjK')
print test.url, test.status_code
What could be causing this URL to fail instead of just producing a 404 status code?
The requests library has an exception hierarchy as listed here
So wrap your GET request in a try/except block:
import requests
try:
test = requests.get('http://t.co/Ilvvq1cKjK')
print test.url, test.status_code
except requests.exceptions.ConnectionError as e:
print e.request.url, "*connection failed*"
That way you end up with similar behaviour to what you're doing now (so you get the redirected url), but cater for not being able to connect rather than print the status code.

Trying to access the Internet using urllib2 in Python

I'm trying to write a program that will (among other things) get text or source code from a predetermined website. I'm learning Python to do this, and most sources have told me to use urllib2. Just as a test, I tried this code:
import urllib2
response = urllib2.urlopen('http://www.python.org')
html = response.read()
Instead of acting in any expected way, the shell just sits there, like it's waiting for some input. There aren't even an ">>>" or "...". The only way to exit this state is with [ctrl]+c. When I do this, I get a whole bunch of error messages, like
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/m/mls/pkg/ix86-Linux-RHEL5/lib/python2.5/urllib2.py", line 124, in urlopen
return _opener.open(url, data)
File "/m/mls/pkg/ix86-Linux-RHEL5/lib/python2.5/urllib2.py", line 381, in open
response = self._open(req, data)
I'd appreciate any feedback. Is there a different tool than urllib2 to use, or can you give advice on how to fix this. I'm using a network computer at my work, and I'm not entirely sure how the shell is configured or how that might affect anything.
With 99.999% probability, it's a proxy issue. Python is incredibly bad at detecting the right http proxy to use, and when it cannot find the right one, it just hangs and eventually times out.
So first you have to find out which proxy should be used, check the options of your browser (Tools -> Internet Options -> Connections -> LAN Setup... in IE, etc). If it's using a script to autoconfigure, you'll have to fetch the script (which should be some sort of javascript) and find out where your request is supposed to go. If there is no script specified and the "automatically determine" option is ticked, you might as well just ask some IT guy at your company.
I assume you're using Python 2.x. From the Python docs on urllib :
# Use http://www.someproxy.com:3128 for http proxying
proxies = {'http': 'http://www.someproxy.com:3128'}
filehandle = urllib.urlopen(some_url, proxies=proxies)
Note that the point on ProxyHandler figuring out default values is what happens already when you use urlopen, so it's probably not going to work.
If you really want urllib2, you'll have to specify a ProxyHandler, like the example in this page. Authentication might or might not be required (usually it's not).
This isn't a good answer to "How to do this with urllib2", but let me suggest python-requests. The whole reason it exists is because the author found urllib2 to be an unwieldy mess. And he's probably right.
That is very weird, have you tried a different URL?
Otherwise there is HTTPLib, however it is more complicated. Here's your example using HTTPLib
import httplib as h
domain = h.HTTPConnection('www.python.org')
domain.connect()
domain.request('GET', '/fish.html')
response = domain.getresponse()
if response.status == h.OK:
html = response.read()
I get a 404 error almost immediately (no hanging):
>>> import urllib2
>>> response = urllib2.urlopen('http://www.python.org/fish.html')
Traceback (most recent call last):
...
urllib2.HTTPError: HTTP Error 404: Not Found
If I try and contact an address that doesn't have an HTTP server running, it hangs for quite a while until the timeout happens. You can shorten it by passing the timeout parameter to urlopen:
>>> response = urllib2.urlopen('http://cs.princeton.edu/fish.html', timeout=5)
Traceback (most recent call last):
...
urllib2.URLError: <urlopen error timed out>

Categories