Can't get urlopen in Python 2.7 to work

Can't get urlopen in Python 2.7 to work - python

I'm trying to follow the suggestions from this thread, but its not working. At this point, I just wanted to perform a very simple test to make sure that I was actually getting the data returned from the site I'm trying to open. For my simple test I was trying to open the Yahoo weather api. I've verified that typing in this address in a web browser does indeed return data. I've tried both of these code snippets and neither one is working.
import urllib
params = urllib.urlencode({'w': 2482950})
f = urllib.urlopen("http://weather.yahooapis.com/forecastrss?%s" % params)
print f.read()
This example came straight from the Python website. This dies with:
IOError: [Errno socket error] [Errno 110] Connection timed out
I also tried using httplib like this:
import httplib
conn = httplib.HTTPConnection(host='weather.yahooapis.com', timeout=10)
req = '/forecastrss?w=2482950'
try:
conn.request('GET',req)
except:
print "Didn't work"
content = conn.getresponse().read()
print content
Trying this gives me the following error:
self.fp = sock.makefile('rb', 0)
AttributeError: 'NoneType' object has no attribute 'makefile'
It appears that I'm not ever making the connection to the remote host. Any ideas what I'm doing wrong?

The problem was indeed the firewall. I added the following code to get it to work:
proxy_support = urllib2.ProxyHandler({"http":"http://<proxy>:<port>"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen(url).read()
print html

Related

How to fix urllib2.URLError: <urlopen error [Errno 111] Connection refused> error?

I'm setting a python unshort url code and I want
I want the long version of the shorted URLs in a file to be written on the screen.
There is no hassle when a normal URL arrives.
But I get an error when a malicious URL arrives.
I use the following code:
import requests
import urllib
dosya=open("urller.txt","r")
satirlar=dosya.readlines()
for satir in satirlar:
resp = urllib.urlopen(satir)
print(resp.url)
dosya.close()

For the script to complete successfully you want to add error handling. Maybe something like the following:
import requests
import urllib
dosya=open("urller.txt","r")
satirlar=dosya.readlines()
for satir in satirlar:
try:
resp = urllib.urlopen(satir)
print(resp.url)
except urllib.error.URLError as e:
print("Failed to open URL {0} Reason: {1}".format(satir, e.reason))
dosya.close()
This code will print all valid response URLs. For any invalid URL it will instead print an error report.

Urllib returns 'No such file or directory error'

I am trying to access a certain 'third-party' database via a URL, using the urllib.request library but I get this error instead:
<urlopen error [Errno 2] No such file or directory>
I have been able to access the link several times with the same script some time before though.
import urllib.request
def read_temp():
url = "https://maturity.000webhostapp.com/api/temp/read_all.php"
response = urllib.request.urlopen(url).read()
json_obj = str(response, 'utf-8')
data = json.loads(json_obj)
I'd expected that the script should work effortlessly but sometimes it fails to connect.

The website has been taken down by the owner.

Python geoip find country using json

from urllib2 import urlopen
from contextlib import closing
import json
import time
import os
while True:
url = 'http://freegeoip.net/json/'
try:
with closing(urlopen(url)) as response:
location = json.loads(response.read())
location_city = location['city']
location_state = location['region_name']
location_country = location['country_name']
#print(location_country)
if location_country == "Germany":
print("You are now surfing from: " + location_country)
os.system(r'firefox /home/user/Documents/alert.html')
except:
print("Could not find location, searching again...")
time.sleep(1)
Its doesn't reply any country can I get help to solve the problem?

Besides of the wrong indentation, your code looks fine.
The problem seems to be that the page itself does not respond. If you try to open it in a browser for example, the connection gets refused.
Probably the api is either overloaded, or does no longer exist.

For one thing, the server appears to be down.
You would probably have noticed this but the bare except hides the fact. In general you should not catch all exceptions, but should catch those that you expect - in this case a urllib2.URLError exception would seem appropriate:
import urllib2
url = 'http://freegeoip.net/json/'
try:
response = urllib2.urlopen(url)
...
except urllib2.URLError as exc:
print('Could not find location due to exception: {}'.format(exc))
If you run the code above you might see this output:
Could not find location due to exception: <urlopen error [Errno 101] Network is unreachable>
The server might have been up earlier, and the problem might actually have a different cause, e.g. json.loads() might be failing. If you change the exception handler as shown above you will be able to see where it's failing.

Max retries exceeded with URL in requests

I'm trying to get the content of App Store > Business:
import requests
from lxml import html
page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)
flist = []
plist = []
for i in range(0, 100):
app = tree.xpath("//div[#class='column first']/ul/li/a/#href")
ap = app[0]
page1 = requests.get(ap)
When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:
Traceback (most recent call last):
File "/home/preetham/Desktop/eg.py", line 17, in <module>
page1 = requests.get(ap)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

Just use requests features:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.get(url)
This will GET the URL and retry 3 times in case of requests.exceptions.ConnectionError. backoff_factor will help to apply delays between attempts to avoid failing again in case of periodic request quota.
Take a look at urllib3.util.retry.Retry, it has many options to simplify retries.

What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)
Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8
error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".
There is an issue at about python.requests lib at Github, check it out here
To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:
try:
page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don't forget to import sleep)
from time import sleep
All in all requests is awesome python lib, hope that solves your problem.

Just do this,
Paste the following code in place of page = requests.get(url):
import time
page = ''
while page == '':
try:
page = requests.get(url)
break
except:
print("Connection refused by the server..")
print("Let me sleep for 5 seconds")
print("ZZzzzz...")
time.sleep(5)
print("Was a nice sleep, now let me continue...")
continue
You're welcome :)

I got similar problem but the following code worked for me.
url = <some REST url>
page = requests.get(url, verify=False)
"verify=False" disables SSL verification. Try and catch can be added as usual.

pip install pyopenssl seemed to solve it for me.
https://github.com/requests/requests/issues/4246

Specifying the proxy in a corporate environment solved it for me.
page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})
The full error is:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:
try:
res = requests.get(adress,timeout=30)
except requests.ConnectionError as e:
print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
print(str(e))
renewIPadress()
continue
except requests.Timeout as e:
print("OOPS!! Timeout Error")
print(str(e))
renewIPadress()
continue
except requests.RequestException as e:
print("OOPS!! General Error")
print(str(e))
renewIPadress()
continue
except KeyboardInterrupt:
print("Someone closed the program")
Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.

Adding my own experience for those who are experiencing this in the future. My specific error was
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'
It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.

When I was writing a selenium browser test script, I encountered this error when calling driver.quit() before a usage of a JS api call.Remember that quiting webdriver is last thing to do!

i wasn't able to make it work on windows even after installing pyopenssl and trying various python versions (while it worked fine on mac), so i switched to urllib and it works on python 3.6 (from python .org) and 3.7 (anaconda)
import urllib
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)

just import time
and add :
time.sleep(6)
somewhere in the for loop, to avoid sending too many request to the server in a short time.
the number 6 means: 6 seconds.
keep testing numbers starting from 1, until you reach the minimum seconds that will help to avoid the problem.

It could be network config issue also. So, for that u need to re-config ur network confgurations.
for Ubuntu :
sudo vim /etc/network/interfaces
add 8.8.8.8 in dns-nameserver and save it.
reset ur network : /etc/init.d/networking restart
Now try..

Adding my own experience :
r = requests.get(download_url)
when I tried to download a file specified in the url.
The error was
HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
I corrected it by adding verify = False in the function as follows :
r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)

Check your network connection. I had this and the VM did not have a proper network connection.

I had the same error when I run the route in the browser, but in postman, it works fine. It issue with mine was that, there was no / after the route before the query string.
127.0.0.1:5000/api/v1/search/?location=Madina raise the error and removing / after the search worked for me.

This happens when you send too many requests to the public IP address of https://itunes.apple.com. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://itunes.apple.com. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.
import re
import socket
import subprocess
from typing import Tuple
ENDPOINT = 'https://anydomainname.example.com/'
ENDPOINT = 'https://itunes.apple.com/'
def get_public_ip() -> Tuple[str, str, str]:
"""
Command to get public_ip address of host machine and endpoint domain
Returns
-------
my_public_ip : str
Ip address string of host machine.
end_point_ip_address : str
Ip address of endpoint domain host.
end_point_domain : str
domain name of endpoint.
"""
# bash_command = """host myip.opendns.com resolver1.opendns.com | \
# grep "myip.opendns.com has" | awk '{print $4}'"""
# bash_command = """curl ifconfig.co"""
# bash_command = """curl ifconfig.me"""
bash_command = """ curl icanhazip.com"""
my_public_ip = subprocess.getoutput(bash_command)
my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
end_point_domain = (
ENDPOINT.replace("https://", "")
.replace("http://", "")
.replace("/", "")
)
end_point_ip_address = socket.gethostbyname(end_point_domain)
return my_public_ip, end_point_ip_address, end_point_domain
def set_etc_host(ip_address: str, domain: str) -> str:
"""
A function to write mapping of ip_address and domain name in /etc/hosts.
Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
Parameters
----------
ip_address : str
IP address of the domain.
domain : str
domain name of endpoint.
Returns
-------
str
Message to identify success or failure of the operation.
"""
bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)
output = subprocess.getoutput(bash_command)
return output
if __name__ == "__main__":
my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
print("My public IP address:", my_public_ip)
print("ENDPOINT public IP address:", end_point_ip_address)
print("ENDPOINT Domain Name:", end_point_domain )
print("Command output:", output)
You can call the above script before running your desired function :)

My situation is rather special. I tried the answers above, none of them worked. I suddenly thought whether it has something to do with my Internet proxy? You know, I'm in mainland China, and I can't access sites like google without an internet proxy. Then I turned off my Internet proxy and the problem was solved.

In my case, I am deploying some docker containers inside the python script and then calling one of the deployed services. Error is fixed when I add some delay before calling the service. I think it needs time to get ready to accept connections.
from time import sleep
#deploy containers
#get URL of the container
sleep(5)
response = requests.get(url,verify=False)
print(response.json())

First I ran the run.py file and then I ran the unit_test.py file, it works for me

Add headers for this request.
headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
requests.get(ap, headers=headers)

I am coding a test with Gauge and I encountered this error as well, it was because I was trying to request an internal URL without activating VPN.

How do I get HTTP header info without authentication using python?

I'm trying to write a small program that will simply display the header information of a website. Here is the code:
import urllib2
url = 'http://some.ip.add.ress/'
request = urllib2.Request(url)
try:
html = urllib2.urlopen(request)
except urllib2.URLError, e:
print e.code
else:
print html.info()
If 'some.ip.add.ress' is google.com then the header information is returned without a problem. However if it's an ip address that requires basic authentication before access then it returns a 401. Is there a way to get header (or any other) information without authentication?
I've worked it out.
After try has failed due to unauthorized access the following modification will print the header information:
print e.info()
instead of:
print e.code()
Thanks for looking :)

If you want just the headers, instead of using urllib2, you should go lower level and use httplib
import httplib
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
print conn.getresponse().getheaders()

If all you want are HTTP headers then you should make HEAD not GET request. You can see how to do this by reading Python - HEAD request with urllib2.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't get urlopen in Python 2.7 to work - python

The problem was indeed the firewall. I added the following code to get it to work: proxy_support = urllib2.ProxyHandler({"http":"http://<proxy>:<port>"}) opener = urllib2.build_opener(proxy_support) urllib2.install_opener(opener) html = urllib2.urlopen(url).read() print html

Related

How to fix urllib2.URLError: <urlopen error [Errno 111] Connection refused> error?

Urllib returns 'No such file or directory error'

Python geoip find country using json

Max retries exceeded with URL in requests

How do I get HTTP header info without authentication using python?

Categories

Resources