OS: Windows 7; Python 2.7.3 using the Python GUI Shell
I'm trying to read a website through Python, and several authors use the urllib and urllib2 libraries. To store the site in a variable, I've seen a similar approach proposed:
import urllib
import urllib2
g = "http://www.google.com/"
read = urllib2.urlopen(g)
The last line generates an error after a 120+ seconds:
> Traceback (most recent call last): File "<pyshell#27>", line 1, in
> <module>
> r = urllib2.urlopen(o) File "C:\Python27\lib\urllib2.py", line 126, in urlopen
> return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 400, in open
> response = self._open(req, data) File "C:\Python27\lib\urllib2.py", line 418, in _open
> '_open', req) File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
> result = func(*args) File "C:\Python27\lib\urllib2.py", line 1207, in http_open
> return self.do_open(httplib.HTTPConnection, req) File "C:\Python27\lib\urllib2.py", line 1177, in do_open
> raise URLError(err) URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly
> respond after a period of time, or established connection failed
> because connected host has failed to respond>
I tried bypassing the g variable and trying to urlopen("http://www.google.com/") with no success either (it generates the same error after the same length of time).
The error code 10060 means it cannot connect to the remote peer. It might be because of the network problem or mostly your setting issues, such as proxy setting.
You could try to connect the same host with other tools(such as ncat) and/or with another PC within your same local network to find out where the problem is occuring.
For proxy issue, there are some material here:
Using an HTTP PROXY - Python
Why can't I get Python's urlopen() method to work on Windows?
Hope it helps!
Answer (Basic is advance!):
Error: 10060
Adding a timeout parameter to request solved the issue for me.
Example 1
import urllib
import urllib2
g = "http://www.google.com/"
read = urllib2.urlopen(g, timeout=20)
Example 2
A similar error also occurred while I was making a GET request. Again, passing a timeout parameter solved the 10060 Error.
response = requests.get(param_url, timeout=20)
This is because of the proxy settings.
I also had the same problem, under which I could not use any of the modules which were fetching data from the internet.
There are simple steps to follow:
1. open the control panel
2. open internet options
3. under connection tab open LAN settings
4. go to advance settings and unmark everything, delete every proxy in there. Or u can just unmark the checkbox in proxy server this will also do the same
5. save all the settings by clicking ok.
you are done.
try to run the programme again, it must work
it worked for me at least
just change your internet connection it is going to work..
Related
OS: Windows 7; Python 2.7.3 using the Python GUI Shell
I'm trying to read a website through Python, and several authors use the urllib and urllib2 libraries. To store the site in a variable, I've seen a similar approach proposed:
import urllib
import urllib2
g = "http://www.google.com/"
read = urllib2.urlopen(g)
The last line generates an error after a 120+ seconds:
> Traceback (most recent call last): File "<pyshell#27>", line 1, in
> <module>
> r = urllib2.urlopen(o) File "C:\Python27\lib\urllib2.py", line 126, in urlopen
> return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 400, in open
> response = self._open(req, data) File "C:\Python27\lib\urllib2.py", line 418, in _open
> '_open', req) File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
> result = func(*args) File "C:\Python27\lib\urllib2.py", line 1207, in http_open
> return self.do_open(httplib.HTTPConnection, req) File "C:\Python27\lib\urllib2.py", line 1177, in do_open
> raise URLError(err) URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly
> respond after a period of time, or established connection failed
> because connected host has failed to respond>
I tried bypassing the g variable and trying to urlopen("http://www.google.com/") with no success either (it generates the same error after the same length of time).
The error code 10060 means it cannot connect to the remote peer. It might be because of the network problem or mostly your setting issues, such as proxy setting.
You could try to connect the same host with other tools(such as ncat) and/or with another PC within your same local network to find out where the problem is occuring.
For proxy issue, there are some material here:
Using an HTTP PROXY - Python
Why can't I get Python's urlopen() method to work on Windows?
Hope it helps!
Answer (Basic is advance!):
Error: 10060
Adding a timeout parameter to request solved the issue for me.
Example 1
import urllib
import urllib2
g = "http://www.google.com/"
read = urllib2.urlopen(g, timeout=20)
Example 2
A similar error also occurred while I was making a GET request. Again, passing a timeout parameter solved the 10060 Error.
response = requests.get(param_url, timeout=20)
This is because of the proxy settings.
I also had the same problem, under which I could not use any of the modules which were fetching data from the internet.
There are simple steps to follow:
1. open the control panel
2. open internet options
3. under connection tab open LAN settings
4. go to advance settings and unmark everything, delete every proxy in there. Or u can just unmark the checkbox in proxy server this will also do the same
5. save all the settings by clicking ok.
you are done.
try to run the programme again, it must work
it worked for me at least
just change your internet connection it is going to work..
I am trying to extract data from Civic Commons Apps link for my project. I am able to obtain the links of the page that I need. But when I try to open the links I get "urlopen error [Errno -2] Name or service not known"
The web scraping python code:
from bs4 import BeautifulSoup
from urlparse import urlparse, parse_qs
import re
import urllib2
import pdb
base_url = "http://civiccommons.org"
url = "http://civiccommons.org/apps"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
list_of_links = []
for link_tag in soup.findAll('a', href=re.compile('^/civic-function.*')):
string_temp_link = base_url+link_tag.get('href')
list_of_links.append(string_temp_link)
list_of_links = list(set(list_of_links))
list_of_next_pages = []
for categorized_apps_url in list_of_links:
categorized_apps_page = urllib2.urlopen(categorized_apps_url)
categorized_apps_soup = BeautifulSoup(categorized_apps_page.read())
last_page_tag = categorized_apps_soup.find('a', title="Go to last page")
if last_page_tag:
last_page_url = base_url+last_page_tag.get('href')
index_value = last_page_url.find("page=") + 5
base_url_for_next_page = last_page_url[:index_value]
for pageno in xrange(0, int(parse_qs(urlparse(last_page_url).query)['page'][0]) + 1):
list_of_next_pages.append(base_url_for_next_page+str(pageno))
else:
list_of_next_pages.append(categorized_apps_url)
I get the following error:
urllib2.urlopen(categorized_apps_url)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno -2] Name or service not known>
Should I take care of anything specific when I perform urlopen? Because I don't see a problem with the http links that I get.
[edit]
On second run I got the following error:
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
The same code runs fine in my friend's Mac, but fails in my ubuntu 12.04.
Also I tried running the code in scraper wiki and it finished successfully. But few url's were missing (when compared to mac). Are there any reason for these behavior?
The code works on my Mac and on your friends mac. It runs fine from a virtual machine instance of Ubuntu 12.04 server. There is obviously something in your particular environment - your os (Ubuntu Desktop?) or network that is causing it to crap out. For example my home router's default setting throttles the number of calls to the same domain in x seconds - and could cause this kind of issue if I didn't turn it off. It could be a number of things.
At this stage I would suggest refactoring your code to catch the URLError and set aside problematic urls for a retry. Also log/print errors if they fail after several retries. Maybe even throw in some code to time your calls between errors. It is better than having your script just fail outright and you'll get feedback as to whether it is just particular urls causing the problem or a timing issue (i.e. does it fail after x number of urlopen calls, or if it is failing after x number of urlopen calls in x amount of micro/seconds). If it's a timing issue, a simple time.sleep(1) inserted into your loops might do the trick.
SyncMaster,
I ran into the same issue recently after jumping onto an old ubuntu box I haven't played with in a while. This issue is actually caused because of the DNS settings on your machine. I would highly recommend that you check your DNS settings (/etc/resolv.conf and add nameserver 8.8.8.8) and then try again, you should meet success.
ers,
I'm still getting to grips with the Python basics..
My current requirement is to develop a Python script that will test the availabilty of web-based interfaces of mulitple devices (e.g. where you may have to enter "http://192.168.0.2:9876" via a web browser), this does not have to be over complicated.
I'm trying to convert from the simple bash curl command, as originally I had something like the following in a bash script:
date=`date +"%Y-%m-%d_%H-%M-%S-%N"`
curl -s --connect-timeout 1 ${ip} -o /dev/null
test=$?
if [[ $test == 0 ]] ;then
echo "${date}:webping - Web Page Up for ${ip}" >> $log
else
echo "${date}:webping - Web Page Down for ${ip}" >> $log
fi
which worked for the original concept, but I was looking to have something similar in python. the output can vary, within reason... anyone have any pointers on where to start.
P.S I have looked at some other questions on here, but they appear to give false-positives, where the interface has been "taken down" (i.e. I stopped the service) and it still gives a status code of 200.
EDIT: Below is the code I have tried.
for url in ["http://www.google.co.uk", "http://192.168.0.2:8000"]:
try:
connection = urllib2.urlopen(url)
print connection.getcode()
connection.close()
except urllib2.HTTPError, e:
print "none"
CORRECTION: I get the following results...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 391, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 409, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 369, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1173, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python27\lib\urllib2.py", line 1148, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
I would prefer not to see the python error output.
Thanks in advance
Take a look at http://docs.python-requests.org/en/latest/index.html for a Python module providing the facilities you need with a nice friendly API. In this instance you'd do something along these lines:
import requests
...
try:
r = requests.get(url, timeout=1)
ok = (r.status_code // 100) == 2
except:
ok = False
# now use the value of ok
though I don't know whether the particular test I've used there (success means a 2xx response) is exactly what you want.
I've been trying to get Tor to work with Python, but I've been hitting a brick wall. I simply can't get any of the examples to work. Here is one from Stackoverflow
import urllib2
proxy = urllib2.ProxyHandler({'http':'127.0.0.1:8118'})
opener = urllib2.build_opener(proxy)
print opener.open('http://check.torproject.org/').read()
I've installed Tor and it works fine while browsing through Aurora. However running this python script I get
Traceback (most recent call last):
File "/home/x/Tor.py", line 4, in <module>
print opener.open('http://check.torproject.org/').read()
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1161, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.6/urllib2.py", line 1136, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>
I've searched the web, but been unable to find people with simiair problems. Am I missing something totally obvious?!
I've written an article showing how to use Tor with Python (using SOCKS) on http://blog.databigbang.com/distributed-scraping-with-multiple-tor-circuits/
Hope it helps.
I have the same problem but can not find a solution!
I am running Ubuntu, I can open TOR (latest version) with Vidalia, and surf the web correctly. So vidalia works and is connected.
If I use TorCtl in python, I get a response from TOR saying it is live and running!
However, if I want to open a page using urllib2 as described by Loko, I get the same answer.
If someone has a good idea, it would be really nice!
Tor acts as a Socks5 proxy. You need to configure your script with that in mind. Google "socks.py"
This work fine:
import urllib2
opener = urllib2.build_opener(
urllib2.HTTPHandler(),
urllib2.HTTPSHandler(),
urllib2.ProxyHandler({'http': 'http://user:pass#proxy:3128'}))
urllib2.install_opener(opener)
print urllib2.urlopen('http://www.google.com').read()
But, if http change to https:
...
print urllib2.urlopen('https://www.google.com').read()
There are errors:
Traceback (most recent call last):
File "D:\Temp\6\tmp.py", line 13, in <module>
print urllib2.urlopen('https://www.google.com').read()
File "C:\Python26\lib\urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python26\lib\urllib2.py", line 389, in open
response = self._open(req, data)
File "C:\Python26\lib\urllib2.py", line 407, in _open
'_open', req)
File "C:\Python26\lib\urllib2.py", line 367, in _call_chain
result = func(*args)
File "C:\Python26\lib\urllib2.py", line 1154, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "C:\Python26\lib\urllib2.py", line 1121, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 10060]
Why and how solve this problem?
Change this line:
urllib2.ProxyHandler({'http': 'http://user:pass#proxy:3128'}))
to this:
urllib2.ProxyHandler({'https': 'http://user:pass#proxy:3128'}))
It works fine for me.
On Windows, errno 10060 is a winsock error meaning the connection timed out. Are you able to reach https://www.google.com from the same machine using a web browser with a proxy set to http://user:pass#proxy:3128 ? Are you sure your proxy server can handle both https and http on the same port?
The documentation for urllib2 says the following:
Note: Currently urllib2 does not support fetching of https locations
through a proxy. However, this can be enabled by extending urllib2 as
shown in this recipe.
I must admit above recipe didn't work right away for Jython 2.5.3, but I'm still trying.
UPDATE: I applied this patch to Jython 2.5.3, and it worked for me. I can fetch HTTPS resources over a proxy server now.
UPDATE2: Here is the code to query HTTPS resources with Basic authentication over HTTP Proxy (DON'T FORGET TO INSTALL PATCH FIRST (see previous update)):
from suds.client import Client
from suds.transport.https import HttpAuthenticated
credentials = dict(username='...', password='...', proxy={'https': 'host:port', 'http': 'host:port'})
t = HttpAuthenticated(**credentials)
url = 'https://example.com/service?wsdl'
client = Client(url, transport=t)
print client.service.getFoo()