PYTHON
import requests
url = "https://REDACTED/pb/s/api/auth/login"
r = requests.post(
url,
data = {
'username': 'username',
'password': 'password'
}
)
NIM
import httpclient, json
let client = newHttpClient()
client.headers = newHttpHeaders({ "Content-Type": "application/json" })
let body = %*{
"username": "username",
"password": "password"
}
let resp = client.request("https://REDACTED.com/pb/s/api/auth/login", httpMethod = httpPOST, body = $body)
echo resp.body
I'm calling an API to get some data. Running the python code I get the traceback below. However, the nim code works perfectly so there must be something wrong with the python code or setup.
I'm running Python version 2.7.15.
requests lib version 2.19.1
Traceback (most recent call last):
File "C:/Python27/testht.py", line 21, in <module>
"Referer": "https://REDACTED.com/pb/a/"
File "C:\Python27\lib\site-packages\requests\api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 511, in send
raise SSLError(e, request=request)
SSLError: HTTPSConnectionPool(host='REDACTED.com', port=443): Max retries exceeded with url: /pb/s/api/auth/login (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))
The requests module will verify the cert it gets from the server, much like a browser would. Rather than being able to click through and say "add exception" like you would in your browser, requests will raise that exception.
There's a way around it though: try adding verify=False to your post call.
However, the nim code works perfectly so there must be something wrong with the python code or setup.
Actually, your Python code or setup is less to blame but instead the nim code or better the defaults on the httpclient library. In the documentation for nim can be seen that httpclient.request uses a SSL context returned by getDefaultSSL by default which according to this code creates a context which does not verify the certificate:
proc getDefaultSSL(): SSLContext =
result = defaultSslContext
when defined(ssl):
if result == nil:
defaultSSLContext = newContext(verifyMode = CVerifyNone)
Your Python code instead attempts to properly verify the certificate since the requests library does this by default. And it fails to verify the certificate because something is wrong - either with your setup or the server.
It is unclear who has issued the certificate for your site but if it is not in your default CA store you can use the verify argument of requests to specify the issuer CA. See this documentation for details.
If the site you are trying to access works with the browser but fails with your program it might be that it uses a special CA which was added as trusted to the browser (like a company certificate). Browsers and Python use different trust stores so this added certificate needs to be added to Python or at least to your program as trusted too. It might also be that the setup of the server has problems. Browsers can sometimes work around problems like a missing intermediate certificate but Python doesn't. In case of a public accessible site you could use SSLLabs to check what's wrong.
Im writing a web scraping program in python using mechanize. The problem I'm having is that the website I'm scraping from limits the amount of time that you can be on the website. When I was doing everything by hand, I would use a SOCKS proxy as a work-around.
What I tried to do is go to the network preferences (Macbook Pro Retina 13', mavericks) and change to the proxy. However, the program didn't respond to that change. It kept running without the proxy.
Then I added .set_proxies() so now the code to open the website looks something like this:
b=mechanize.Browser() #open browser
b.set_proxies({"http":"96.8.113.76:8080"}) #proxy
DBJ=b.open(URL) #open url
When I ran the program, I got this error:
Traceback (most recent call last):
File "GM1.py", line 74, in <module>
DBJ=b.open(URL)
File "build/bdist.macosx-10.9-intel/egg/mechanize/_mechanize.py", line 203, in open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_mechanize.py", line 230, in _mech_open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_opener.py", line 193, in open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 344, in _open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 1142, in http_open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 1118, in do_open
urllib2.URLError: <urlopen error [Errno 54] Connection reset by peer>
Im assuming that the proxy was changed and that this error is in response to that proxy.
Maybe I am misusing .set_proxies().
Im not sure if the proxy itself is the issue or the connection is really slow.
Should I even be using SOCKS proxies for this type of thing or is there a better alternative for what I am trying to do?
Any information would be extremely helpful. Thanks in advance.
A SOCKS proxy is not the same as a HTTP proxy. The protocol between client and proxy is different. The line:
b.set_proxies({"http":"96.8.113.76:8080"})
tells mechanize to use the HTTP proxy at 96.8.113.76:8080 for requests having the http scheme in the URL, e.g. a request for URL http://httpbin.org/get will be sent via the proxy at 96.8.113.76:8080. Mechanize expects this to be a HTTP proxy server, and uses the corresponding protocol. It seems that your SOCKS proxy is closing the connection because it is not receiving a valid SOCKS proxy request (because it is a actually a HTTP proxy request).
I don't think that mechanize has builtin support for SOCKS, so you may have to resort to some dirty tricks such as those in this answer. For that you will need to install the PySocks package. This might work for you:
import socks
import socket
from mechanize import Browser
SOCKS_PROXY_HOST = '96.8.113.76'
SOCKS_PROXY_PORT = 8080
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
# add username and password arguments if proxy authentication required.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, SOCKS_PROXY_HOST, SOCKS_PROXY_PORT)
# patch the socket module
socket.socket = socks.socksocket
socket.create_connection = create_connection
br = Browser()
response = br.open('http://httpbin.org/get')
>>> print response.read()
{
"args": {},
"headers": {
"Accept-Encoding": "identity",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "Python-urllib/2.7",
"X-Request-Id": "e728cd40-002c-4f96-a26a-78ce4d651fda"
},
"origin": "192.161.1.100",
"url": "http://httpbin.org/get"
}
I'm trying to get the content of App Store > Business:
import requests
from lxml import html
page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)
flist = []
plist = []
for i in range(0, 100):
app = tree.xpath("//div[#class='column first']/ul/li/a/#href")
ap = app[0]
page1 = requests.get(ap)
When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:
Traceback (most recent call last):
File "/home/preetham/Desktop/eg.py", line 17, in <module>
page1 = requests.get(ap)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
Just use requests features:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.get(url)
This will GET the URL and retry 3 times in case of requests.exceptions.ConnectionError. backoff_factor will help to apply delays between attempts to avoid failing again in case of periodic request quota.
Take a look at urllib3.util.retry.Retry, it has many options to simplify retries.
What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)
Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8
error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".
There is an issue at about python.requests lib at Github, check it out here
To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:
try:
page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don't forget to import sleep)
from time import sleep
All in all requests is awesome python lib, hope that solves your problem.
Just do this,
Paste the following code in place of page = requests.get(url):
import time
page = ''
while page == '':
try:
page = requests.get(url)
break
except:
print("Connection refused by the server..")
print("Let me sleep for 5 seconds")
print("ZZzzzz...")
time.sleep(5)
print("Was a nice sleep, now let me continue...")
continue
You're welcome :)
I got similar problem but the following code worked for me.
url = <some REST url>
page = requests.get(url, verify=False)
"verify=False" disables SSL verification. Try and catch can be added as usual.
pip install pyopenssl seemed to solve it for me.
https://github.com/requests/requests/issues/4246
Specifying the proxy in a corporate environment solved it for me.
page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})
The full error is:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:
try:
res = requests.get(adress,timeout=30)
except requests.ConnectionError as e:
print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
print(str(e))
renewIPadress()
continue
except requests.Timeout as e:
print("OOPS!! Timeout Error")
print(str(e))
renewIPadress()
continue
except requests.RequestException as e:
print("OOPS!! General Error")
print(str(e))
renewIPadress()
continue
except KeyboardInterrupt:
print("Someone closed the program")
Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.
Adding my own experience for those who are experiencing this in the future. My specific error was
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'
It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.
When I was writing a selenium browser test script, I encountered this error when calling driver.quit() before a usage of a JS api call.Remember that quiting webdriver is last thing to do!
i wasn't able to make it work on windows even after installing pyopenssl and trying various python versions (while it worked fine on mac), so i switched to urllib and it works on python 3.6 (from python .org) and 3.7 (anaconda)
import urllib
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)
just import time
and add :
time.sleep(6)
somewhere in the for loop, to avoid sending too many request to the server in a short time.
the number 6 means: 6 seconds.
keep testing numbers starting from 1, until you reach the minimum seconds that will help to avoid the problem.
It could be network config issue also. So, for that u need to re-config ur network confgurations.
for Ubuntu :
sudo vim /etc/network/interfaces
add 8.8.8.8 in dns-nameserver and save it.
reset ur network : /etc/init.d/networking restart
Now try..
Adding my own experience :
r = requests.get(download_url)
when I tried to download a file specified in the url.
The error was
HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
I corrected it by adding verify = False in the function as follows :
r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)
Check your network connection. I had this and the VM did not have a proper network connection.
I had the same error when I run the route in the browser, but in postman, it works fine. It issue with mine was that, there was no / after the route before the query string.
127.0.0.1:5000/api/v1/search/?location=Madina raise the error and removing / after the search worked for me.
This happens when you send too many requests to the public IP address of https://itunes.apple.com. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://itunes.apple.com. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.
import re
import socket
import subprocess
from typing import Tuple
ENDPOINT = 'https://anydomainname.example.com/'
ENDPOINT = 'https://itunes.apple.com/'
def get_public_ip() -> Tuple[str, str, str]:
"""
Command to get public_ip address of host machine and endpoint domain
Returns
-------
my_public_ip : str
Ip address string of host machine.
end_point_ip_address : str
Ip address of endpoint domain host.
end_point_domain : str
domain name of endpoint.
"""
# bash_command = """host myip.opendns.com resolver1.opendns.com | \
# grep "myip.opendns.com has" | awk '{print $4}'"""
# bash_command = """curl ifconfig.co"""
# bash_command = """curl ifconfig.me"""
bash_command = """ curl icanhazip.com"""
my_public_ip = subprocess.getoutput(bash_command)
my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
end_point_domain = (
ENDPOINT.replace("https://", "")
.replace("http://", "")
.replace("/", "")
)
end_point_ip_address = socket.gethostbyname(end_point_domain)
return my_public_ip, end_point_ip_address, end_point_domain
def set_etc_host(ip_address: str, domain: str) -> str:
"""
A function to write mapping of ip_address and domain name in /etc/hosts.
Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
Parameters
----------
ip_address : str
IP address of the domain.
domain : str
domain name of endpoint.
Returns
-------
str
Message to identify success or failure of the operation.
"""
bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)
output = subprocess.getoutput(bash_command)
return output
if __name__ == "__main__":
my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
print("My public IP address:", my_public_ip)
print("ENDPOINT public IP address:", end_point_ip_address)
print("ENDPOINT Domain Name:", end_point_domain )
print("Command output:", output)
You can call the above script before running your desired function :)
My situation is rather special. I tried the answers above, none of them worked. I suddenly thought whether it has something to do with my Internet proxy? You know, I'm in mainland China, and I can't access sites like google without an internet proxy. Then I turned off my Internet proxy and the problem was solved.
In my case, I am deploying some docker containers inside the python script and then calling one of the deployed services. Error is fixed when I add some delay before calling the service. I think it needs time to get ready to accept connections.
from time import sleep
#deploy containers
#get URL of the container
sleep(5)
response = requests.get(url,verify=False)
print(response.json())
First I ran the run.py file and then I ran the unit_test.py file, it works for me
Add headers for this request.
headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
requests.get(ap, headers=headers)
I am coding a test with Gauge and I encountered this error as well, it was because I was trying to request an internal URL without activating VPN.
I have a pile of tasks to automate within cPanel. There is a cPanel API described at http://videos.cpanel.net/cpanel-api-automation/ but I tried what I thought was easier for me...
Based on an answer from skyronic at How do I send a HTTP POST value to a (PHP) page using Python? I tried
import urllib, urllib2, ssl
url = 'https://mysite.com:2083/login'
user_agent = 'Mozilla/5.0 meridia (Windows NT 5.1; U; en)'
values = {'name':cpaneluser,
'pass':cpanelpw}
headers = {'User-Agent':user_agent}
data = urllib.urlencode(values)
req = urllib2.Request(url,data,headers)
response = urllib2.urlopen(req)
page = response.read()
The call to urlopen() is raising NameError: global name 'HTTPSConnectionV3' is not defined.
So then based on http://bugs.python.org/issue11220 I tried preceding the code above with
import httplib
class HTTPSConnectionV3(httplib.HTTPSConnection):
def __init__(self, *args, **kwargs):
httplib.HTTPSConnection.__init__(self, *args, **kwargs)
def connect(self):
sock = socket.create_connection((self.host, self.port), self.timeout)
if self._tunnel_host:
self.sock = sock
self._tunnel()
try:
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file, \
ssl_version=ssl.PROTOCOL_SSLv3)
except ssl.SSLError, e:
print("Trying SSLv3.")
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file, \
ssl_version=ssl.PROTOCOL_SSLv23)
class HTTPSHandlerV3(urllib2.HTTPSHandler):
def https_open(self, req):
return self.do_open(HTTPSConnectionV3, req)
urllib2.install_opener(urllib2.build_opener(HTTPSHandlerV3()))
This does print the "Trying SSLv3" and raises URLError: <urlopen error [Errno 1] _ssl.c:504: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol>
And finally that led me to https://github.com/kennethreitz/requests/issues/606 where gregakespret who say he solved a similar problem using a solution from Senthil Kuaran at http://bugs.python.org/issue11220 :
https_sslv3_handler = urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
But that raises AttributeError: 'module' object has no attribute 'request'. And indeed help(urllib) doesn't include any mention of request, and import urllib.request results in No module named request.
I'm using Python 2.7.3 within the Enthought Canopy distribution. The cPanel site is using a self-signed certificate, which I mention since it'sa an irregularity that would trip up a regular browser, though I gather that urllib and urllib2 don't actually authenticate the certificate anyway.
Thank you for reading, more so if you have a suggestion or can help me understand the problem.
I would use the requests library.
I'm the OP and it's been a while since I posted this. I've solved other related tasks (POST to use Instructure's Canvas API) using the requests library and found that code that had worked with urllib/urllib2 is now much shorter and sweeter.
Someone just upvoted this question, causing me to see that noone had answered it. My answer isn't much of one to my OP, but it is the direction I'd advise, having solved related problems since the post.
As for this question, I solved the problem by scripting in BASH on the server that had cPanel running. It was just a matter of identifying which cPanel scripts to call. But I did not get it running through the cPanel Web API.