receiving 401 errors inconsistently from python urllib2.urlopen call - python

I have a python script that makes a series of url calls using urllib2. The url is on http, but requires authentication. I am currently trying to run the script such that it will make over 100 calls. Every time I run the script, some calls fail with error code 401, and some pass. All calls are for the same URL using the same username and password. (Each time I run the script it is not the same calls that fail, sometimes the first call fails, sometimes it works.)
Any ideas why a 401 might occur inconsistently?
The error message printed to the screen is...
Here is the method responsible for making the url call:
def simpleExecuteRequest(minX, minY, maxX, maxY, type) :
url = 'http://myhost.com/geowebcache/rest/seed/mylayer.xml'
msgTemplate = """<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<seedRequest>
<name>mylayer</name>
<bounds>
<coords>
<double>%s</double>
<double>%s</double>
<double>%s</double>
<double>%s</double>
</coords>
</bounds>
<gridSetId>nyc</gridSetId>
<zoomStart>0</zoomStart>
<zoomStop>10</zoomStop>
<format>image/png</format>
<type>%s</type>
<threadCount>1</threadCount>
</seedRequest>
"""
message = msgTemplate%(minX, minY, maxX, maxY, type)
headers = { 'User-Agent' : "Python script", 'Content-type' : 'text/xml; charset="UTF-8"', 'Content-length': '%d' % len(message) }
passwordManager = urllib2.HTTPPasswordMgrWithDefaultRealm()
passwordManager.add_password(None, url, 'username', 'xxx')
authenticationHandler = urllib2.HTTPBasicAuthHandler(passwordManager)
proxyHandler = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxyHandler, authenticationHandler)
urllib2.install_opener(opener)
try :
request = urllib2.Request(url, message, headers)
response = urllib2.urlopen(request)
content = response.read()
print 'success'
except IOError, e:
print e
Sometimes the output will look like this...
<urlopen error (10053, 'Software caused connection abort')>
success
success
<urlopen error (10053, 'Software caused connection abort')>
<urlopen error (10053, 'Software caused connection abort')>
...
When run 1 minute later it might look like this...
success
<urlopen error (10053, 'Software caused connection abort')>
success
success
<urlopen error (10053, 'Software caused connection abort')>
On both runs the same series of inputs for min/max x/y and type were provided in the same order.
...

The code looks correct to me, so I don't see the issue.
Here are a few thoughts on how to proceed:
I usually work-out the http requests at the command line using curl before translating it into a script.
The requests library is easier to use than urllib2
When you receive a response, print out the headers so you can see what is going on
Instead of except IOError, e use except IOError as e. The new way protects you from hard to find errors.
I presume you redacted the username and password and are using the real ones in your own script ;-)

Related

post requests in python creates an internal server error

I'm trying to create a post request in python, but I get an internal server error when issuing the request.
I'm trying to intercept it with a try-statement, but that doesn't seem to work.
import logging
import requests
logging.basicConfig(filename='python.log', filemode='w', level=logging.DEBUG)
url = "https://redacted-url.com/my-api/check_email"
json = {"email":request.params["email"].strip(), "list":request.params["list"]}
headers = {"Content-Type":"application/json", "Accept": "text/plain"}
try:
r = requests.post(url, headers=headers, json=json)
except requests.exceptions.RequestException as e:
logging.error(e, exc_info=True)
A: I have no idea where that logging-file would be stored. Do I have to add the full server-part? What If I just use «python.log»? Where would it be stored?
B: the try/except doesn't seem to work, I still get an internal server error
C: the error definitely occurs on the line r = requests.post(url, headers=headers, json=json). If I comment that out, the error doesn't occur.
D: Since I don't get an error that's meaningful: What am I doing wrong with that request? This is actually my main problem, but it would be nice to figure out how to log that error and how to intercept it.
Last but not least: If I run the same command from the terminal, the request is processed fine. WTH???

How to fix urllib2.URLError: <urlopen error [Errno 111] Connection refused> error?

I'm setting a python unshort url code and I want
I want the long version of the shorted URLs in a file to be written on the screen.
There is no hassle when a normal URL arrives.
But I get an error when a malicious URL arrives.
I use the following code:
import requests
import urllib
dosya=open("urller.txt","r")
satirlar=dosya.readlines()
for satir in satirlar:
resp = urllib.urlopen(satir)
print(resp.url)
dosya.close()
For the script to complete successfully you want to add error handling. Maybe something like the following:
import requests
import urllib
dosya=open("urller.txt","r")
satirlar=dosya.readlines()
for satir in satirlar:
try:
resp = urllib.urlopen(satir)
print(resp.url)
except urllib.error.URLError as e:
print("Failed to open URL {0} Reason: {1}".format(satir, e.reason))
dosya.close()
This code will print all valid response URLs. For any invalid URL it will instead print an error report.

HTTP Error 403: Forbidden while fetching html source on the server

When I run code locally and try to fetch data from URL and then parse it to text everything work properly.
When I run exactly the same code on the remote server and try to fetch data from URL error HTTP Error 403: Forbidden occur
Answers from questions:
HTTP error 403 in Python 3 Web Scraping,
urllib2.HTTPError: HTTP Error 403: Forbidden helped me when I tried to run it locally and everything work fine.
Do you know what can be different in fetching data from remote server while code is the same(locally and on the server) and way of running code is the same but result is absolutely different?
URL that I want to fetch:
url=https://bithumb.cafe/notice
Code that I was trying to use to fetch data(once it work, second not)
try:
request = urllib.request.Request(url)
request.add_header('User-Agent', 'cheese')
logger.info("request: {}".format(request))
content = urllib.request.urlopen(request).read()
logger.info('content: {}'.format(content))
decoded = content.decode('utf-8')
logger.info('content_decoded: {}'.format(decoded))
return decoded
except Exception as e:
logger.error('failed with error message: {}'.format(e))
return ''`
second way of fetching data(also work locally but on the remote server not):
class AppURLopener(urllib.request.FancyURLopener):
version = "Mozilla/5.0"
method:
try:
opener = AppURLopener()
response = opener.open(url)
logger.info("request response: {}. response type: {}. response_dict: {}"
.format(response, type(response), response.__dict__))
html_response = response.read()
logger.info("html_Response".format(html_response))
encoding = response.headers.get_content_charset('utf-8')
decoded_html = html_response.decode(encoding)
logger.info('content_decoded: {}'.format(decoded_html))
return decoded_html
except Exception as e:
logger.error('failed with error message: {}'.format(e))
return ''

Why do I get two different status code from conn.getresponse().status in python?

so I want to check if a URL is reachable from python, and I got this code from googling:
def checkUrl(url):
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
Here is my URL: https://eurotableau.nomisonline.com.
It works fine if I just pass that in to the function. The resp.status is 302. However, if I add a port 443 at the end of it, https://eurotableau.nomisonline.com:443, it returns false. The resp.status is 400. I tried both URL in google Chrome, both of them work. So my question is why is this happening? Anyway I can include the port value and still get valid resp.status value (< 400)? Thanks.
Use http.client.HTTPSConnection instead. The plain old HTTPConnection ignores the protocol that is part of the URL.
If you do not require the HEAD method but just wish to check if host is available then why not do:
from urllib2 import urlopen
try:
u = urlopen("https://eurotableau.nomisonline.com")
u.close()
print "Everything fine!"
except Exception, e:
if hasattr(e, "code"):
print "Server is there but something is wrong with rest of URL"
else: print "Server is on vacations or was never there!"
print e
This will establish a connection with server but it won't download any data unless you read it. It'll only read few KB to get the header (like when using HEAD method) and wait for you to request more. But you will close it there.
So, you can catch an exception and see what the problem is, or if there is no exception, just close the connection.
urllib2 will handle HTTPS and protocol://user#URL:PORT for you neatly.
No worries about anything.

Max retries exceeded with URL in requests

I'm trying to get the content of App Store > Business:
import requests
from lxml import html
page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)
flist = []
plist = []
for i in range(0, 100):
app = tree.xpath("//div[#class='column first']/ul/li/a/#href")
ap = app[0]
page1 = requests.get(ap)
When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:
Traceback (most recent call last):
File "/home/preetham/Desktop/eg.py", line 17, in <module>
page1 = requests.get(ap)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
Just use requests features:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.get(url)
This will GET the URL and retry 3 times in case of requests.exceptions.ConnectionError. backoff_factor will help to apply delays between attempts to avoid failing again in case of periodic request quota.
Take a look at urllib3.util.retry.Retry, it has many options to simplify retries.
What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)
Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8
error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".
There is an issue at about python.requests lib at Github, check it out here
To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:
try:
page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don't forget to import sleep)
from time import sleep
All in all requests is awesome python lib, hope that solves your problem.
Just do this,
Paste the following code in place of page = requests.get(url):
import time
page = ''
while page == '':
try:
page = requests.get(url)
break
except:
print("Connection refused by the server..")
print("Let me sleep for 5 seconds")
print("ZZzzzz...")
time.sleep(5)
print("Was a nice sleep, now let me continue...")
continue
You're welcome :)
I got similar problem but the following code worked for me.
url = <some REST url>
page = requests.get(url, verify=False)
"verify=False" disables SSL verification. Try and catch can be added as usual.
pip install pyopenssl seemed to solve it for me.
https://github.com/requests/requests/issues/4246
Specifying the proxy in a corporate environment solved it for me.
page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})
The full error is:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:
try:
res = requests.get(adress,timeout=30)
except requests.ConnectionError as e:
print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
print(str(e))
renewIPadress()
continue
except requests.Timeout as e:
print("OOPS!! Timeout Error")
print(str(e))
renewIPadress()
continue
except requests.RequestException as e:
print("OOPS!! General Error")
print(str(e))
renewIPadress()
continue
except KeyboardInterrupt:
print("Someone closed the program")
Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.
Adding my own experience for those who are experiencing this in the future. My specific error was
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'
It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.
When I was writing a selenium browser test script, I encountered this error when calling driver.quit() before a usage of a JS api call.Remember that quiting webdriver is last thing to do!
i wasn't able to make it work on windows even after installing pyopenssl and trying various python versions (while it worked fine on mac), so i switched to urllib and it works on python 3.6 (from python .org) and 3.7 (anaconda)
import urllib
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)
just import time
and add :
time.sleep(6)
somewhere in the for loop, to avoid sending too many request to the server in a short time.
the number 6 means: 6 seconds.
keep testing numbers starting from 1, until you reach the minimum seconds that will help to avoid the problem.
It could be network config issue also. So, for that u need to re-config ur network confgurations.
for Ubuntu :
sudo vim /etc/network/interfaces
add 8.8.8.8 in dns-nameserver and save it.
reset ur network : /etc/init.d/networking restart
Now try..
Adding my own experience :
r = requests.get(download_url)
when I tried to download a file specified in the url.
The error was
HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
I corrected it by adding verify = False in the function as follows :
r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)
Check your network connection. I had this and the VM did not have a proper network connection.
I had the same error when I run the route in the browser, but in postman, it works fine. It issue with mine was that, there was no / after the route before the query string.
127.0.0.1:5000/api/v1/search/?location=Madina raise the error and removing / after the search worked for me.
This happens when you send too many requests to the public IP address of https://itunes.apple.com. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://itunes.apple.com. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.
import re
import socket
import subprocess
from typing import Tuple
ENDPOINT = 'https://anydomainname.example.com/'
ENDPOINT = 'https://itunes.apple.com/'
def get_public_ip() -> Tuple[str, str, str]:
"""
Command to get public_ip address of host machine and endpoint domain
Returns
-------
my_public_ip : str
Ip address string of host machine.
end_point_ip_address : str
Ip address of endpoint domain host.
end_point_domain : str
domain name of endpoint.
"""
# bash_command = """host myip.opendns.com resolver1.opendns.com | \
# grep "myip.opendns.com has" | awk '{print $4}'"""
# bash_command = """curl ifconfig.co"""
# bash_command = """curl ifconfig.me"""
bash_command = """ curl icanhazip.com"""
my_public_ip = subprocess.getoutput(bash_command)
my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
end_point_domain = (
ENDPOINT.replace("https://", "")
.replace("http://", "")
.replace("/", "")
)
end_point_ip_address = socket.gethostbyname(end_point_domain)
return my_public_ip, end_point_ip_address, end_point_domain
def set_etc_host(ip_address: str, domain: str) -> str:
"""
A function to write mapping of ip_address and domain name in /etc/hosts.
Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
Parameters
----------
ip_address : str
IP address of the domain.
domain : str
domain name of endpoint.
Returns
-------
str
Message to identify success or failure of the operation.
"""
bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)
output = subprocess.getoutput(bash_command)
return output
if __name__ == "__main__":
my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
print("My public IP address:", my_public_ip)
print("ENDPOINT public IP address:", end_point_ip_address)
print("ENDPOINT Domain Name:", end_point_domain )
print("Command output:", output)
You can call the above script before running your desired function :)
My situation is rather special. I tried the answers above, none of them worked. I suddenly thought whether it has something to do with my Internet proxy? You know, I'm in mainland China, and I can't access sites like google without an internet proxy. Then I turned off my Internet proxy and the problem was solved.
In my case, I am deploying some docker containers inside the python script and then calling one of the deployed services. Error is fixed when I add some delay before calling the service. I think it needs time to get ready to accept connections.
from time import sleep
#deploy containers
#get URL of the container
sleep(5)
response = requests.get(url,verify=False)
print(response.json())
First I ran the run.py file and then I ran the unit_test.py file, it works for me
Add headers for this request.
headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
requests.get(ap, headers=headers)
I am coding a test with Gauge and I encountered this error as well, it was because I was trying to request an internal URL without activating VPN.

Categories