Failed to establish a new connection - python

import requests
url = 'http://ES_search_demo.com/document/record/_search?pretty=true'
data = '{"query":{"bool":{"must":[{"text":{"record.document":"SOME_JOURNAL"}},{"text":{"record.articleTitle":"farmers"}}],"must_not":[],"should":[]}},"from":0,"size":50,"sort":[],"facets":{}}'
response = requests.get(url, data=data)
when i run this code i get this error
Traceback (most recent call last):
File "simpleclient.py", line 6, in <module>
response = requests.get(url, data=data)
File "/home/ryan/local/lib/python2.7/site-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/home/ryan/local/lib/python2.7/site-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/home/ryan/local/lib/python2.7/site-packages/requests/sessions.py", line 471, in request
resp = self.send(prep, **send_kwargs)
File "/home/ryan/local/lib/python2.7/site-packages/requests/sessions.py", line 581, in send
r = adapter.send(request, **kwargs)
File "/home/ryan/local/lib/python2.7/site-packages/requests/adapters.py", line 481, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='es_search_demo.com', port=80): Max retries exceeded with url: /document/record/_search?pretty=true (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f544af9a5d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f8f95d196d0>: Failed to establish a new connection: [Errno -5] No address associated with hostname',))
It means the Domain Name Server (DNS) that your computer is connecting to does not know about the given host.
However there is a valid IP address with the domain name if we check for it.
$ nslookup mmsdemo.cloudapp.net
(...)
Non-authoritative answer:
Name: mmsdemo.cloudapp.net
Address: 40.113.108.105
So most probably the error means that there is no connection to the internet. Hence, check that the internet connection is set up properly (on the machine that runs your code, and more specifically so that python can make use of it).
Update
My answer refers to the original question, where a different domain was mentioned. The domain referenced in the question now is unknown:
$ nslookup es_search_demo.com
(...)
** server can't find es_search_demo.com: NXDOMAIN

Seems like your DNS server can't find an address associated to the URL you are passing

Related

TimeoutError: Large amount of data in requests python

I am trying to make a script that scrape presentation from a slideshare link and download it as a PDF.
The script is working fine, until the total slides are under 20. Is there any alternative to requests in python that can do the job.
Here is the scripts:
import requests
from bs4 import BeautifulSoup
from PIL import Image
import io
URL_LESS = "https://www.slideshare.net/angelucmex/global-warming-2373190?qid=8f04572c-48df-4f53-b2b0-0eb71021931c&v=&b=&from_search=1"
URL="https://www.slideshare.net/tusharpanda88/python-basics-59573634?qid=03cb80ee-36f0-4241-a516-454ad64808a8&v=&b=&from_search=5"
r = requests.get(URL_LESS)
soup = BeautifulSoup(r.content, "html5lib")
imgs = soup.find_all('img', class_="slide-image")
imgSRC = [x.get("srcset").split(',')[0].strip().split(' ')[0].split('?')[0] for x in imgs]
imagesJPG = []
for img in imgSRC:
im = requests.get(img)
f = io.BytesIO(im.content)
imgJPG = Image.open(f)
imagesJPG.append(imgJPG)
imagesJPG[0].save(f"{soup.title.string}.pdf",save_all=True, append_images=imagesJPG[1:])
Try changing URL_LESS to URL, you will get the idea.
Here is the traceback
Traceback (most recent call last):
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
raise err
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connection.py", line 358, in connect
conn = self._new_conn()
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x00000259643FF820>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\adapters.py", line 440, in send
resp = conn.urlopen(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\util\retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='image.slidesharecdn.com', port=443): Max retries exceeded with url: /pythonbasics-160315100530/85/python-basics-8-320.jpg (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000259643FF820>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\Work\py\scrapingScripts\slideshare\main.py", line 16, in <module>
im = requests.get(img)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='image.slidesharecdn.com', port=443): Max retries exceeded with url: /pythonbasics-160315100530/85/python-basics-8-320.jpg (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000259643FF820>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did
not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
The script worked perfectly for me both when using URL and URL_LESS, so your internet might be the culprit here.
My guesses are:
You're having a slow/inconsistent internet.
Slideshare is blacklisting your IP/ web-agent maybe for DDOS protection.(unlikely)
You're Using ipv6, which has been the culprit in these kind of cases for me, try switching your network to use ipv4 only.
and when it comes to requests, I have personally used it to scrape a fairly large amount of data for a fairly long time so I can say it's an amazing library to use

Webscraping with Python3.7: ConnectionError: HTTPSConnectionPool(host='www.google.com', port=443):

I want to scrape web results from google.com. I followed the first answer from this questions, Google Search Web Scraping with Python. Unfortunately I am getting connecting error. I happened to check with other websites too, its not connecting . Is it because of the corporate proxy settings?
Please note that i am using virtual env "Webscraping".
from urllib.parse import urlencode, urlparse, parse_qs
from lxml.html import fromstring
from requests import get
raw = get("https://www.google.com/search?q=StackOverflow").text
page = fromstring(raw)
for result in page.cssselect(".r a"):
url = result.get("href")
if url.startswith("/url?"):
url = parse_qs(urlparse(url).query)['q']
print(url[0])
raw = get("https://www.google.com/search?q=StackOverflow").text
Traceback (most recent call last):
File "", line 1, in
raw = get("https://www.google.com/search?q=StackOverflow").text
File
"c:\users\appdata\local\programs\python\python37\webscraping\lib\site-packages\requests\api.py",
line 75, in get
return request('get', url, params=params, **kwargs)
File
"c:\users\appdata\local\programs\python\python37\webscraping\lib\site-packages\requests\api.py",
line 60, in request
return session.request(method=method, url=url, **kwargs)
File
"c:\users\appdata\local\programs\python\python37\webscraping\lib\site-packages\requests\sessions.py",
line 524, in request
resp = self.send(prep, **send_kwargs)
File
"c:\users\appdata\local\programs\python\python37\webscraping\lib\site-packages\requests\sessions.py",
line 637, in send
r = adapter.send(request, **kwargs)
File
"c:\users\appdata\local\programs\python\python37\webscraping\lib\site-packages\requests\adapters.py",
line 516, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='www.google.com', port=443):
Max retries exceeded with url: /search?q=StackOverflow (Caused by
NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object
at 0x0000021B79768748>: Failed to establish a new connection:
[WinError 10060] A connection attempt failed because the connected
party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond'))
Please advise. Thanks
EDIT: I tried pining google.com, it fails.
import os
hostname = "https://www.google.com" #example
response = os.system("ping -c 1 " + hostname)
#and then check the response...
if response == 0:
print(hostname, 'is up!')
else:
print(hostname, 'is down!')
https://www.google.com is down!
I think you are getting this error because of your proxy setting.
Try run one of the following commands in command prompt
set http_proxy=http://proxy_address:port
set http_proxy=http://user:password#proxy_address:port
set https_proxy=https://proxy_address:port
set https_proxy=https://user:password#proxy_address:port

Name or service not known error in EasyWebDav for Python

I'm trying to setup a WebDAV connection using easywebdav in Python. (Using 2.7.8 for now)
import csv, easywebdav
webdav=easywebdav.connect('https://sakai.rutgers.edu/dav/restoftheurl,username="",password="")
print webdav.ls()
Though when I run this I get the following error message. My guess is that it possibly has something to do with the URL using HTTPS?
Traceback (most recent call last):
File "/home/willkara/Development/SakaiStuff/WorkProjects/sakai-manager/file.py", line 4, in <module>
print webdav.ls()
File "build/bdist.linux-x86_64/egg/easywebdav/client.py", line 176, in ls
File "build/bdist.linux-x86_64/egg/easywebdav/client.py", line 97, in _send
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 559, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 375, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='https', port=80): Max retries exceeded with url: //sakai.rutgers.edu/dav/url:80/. (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
[Finished in 0.1s with exit code 1]
I find it strange that you combine HTTPS protocol and port 80. HTTPS uses port 443.
Though the error message "Name or service not known" would rather indicate that the hostname sakai.rutgers.edu is not recognized on your system. Try to ping the host.
I noticed that you shouldn't have http:// or https:// in the beginning of your adress, only the host name. You select protocol with protocol='https'. Also, I couln't get it to work if I added the path the url, I had to use it as argument to the operations like easywebdav.ls('/dav/restoftheurl') or easywebdav.cd('/dav/restoftheurl').

Get context type of requested url using python

I am trying to get headers of url using python using http://docs.python-requests.org/en/latest/ this tutorial. I am trying following code in python idle , I am getting following error,
>>> import requests
>>> r = requests.get('https://api.github.com/user')
Traceback (most recent call last):
File "<pyshell#32>", line 1, in <module>
r = requests.get('https://api.github.com/user')
File "C:\Python27\lib\site-packages\requests-2.3.0-py2.7.egg\requests\api.py", line 55, in get
return request('get', url, **kwargs)
File "C:\Python27\lib\site-packages\requests-2.3.0-py2.7.egg\requests\api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests-2.3.0-py2.7.egg\requests\sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests-2.3.0-py2.7.egg\requests\sessions.py", line 559, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests-2.3.0-py2.7.egg\requests\adapters.py", line 375, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /user (Caused by <class 'socket.error'>: [Errno 10013] An attempt was made to access a socket in a way forbidden by its access permissions)
Looks like github is denying you access to the requested page. Before attempting to request pages in python try typing them into the browser to see what is returned. When I did this I was returned some JSON stating
{
"message": "Requires authentication",
"documentation_url": "https://developer.github.com/v3"
}
If you want to test your code and find headers of a webpage, try a publicly accessible webpage before delving into APIs.

What does this Python requests error mean?

What does this Python requests error mean? Does this mean it tries to connect to the server and couldn't? What does [Errno 8] nodename nor servname provided, or not known mean?
File "python2.7/site-packages/requests/api.py", line 55, in get
File "python2.7/site-packages/requests/api.py", line 44, in request
File "python2.7/site-packages/requests/sessions.py", line 279, in
request
File "python2.7/site-packages/requests/sessions.py", line 374, in
send
File "python2.7/site-packages/requests/adapters.py", line 209, in
send
ConnectionError: HTTPConnectionPool(host='localhost', port=8091): Max
retries exceeded with url: /pools/default (Caused by : [Errno 8] nodename nor servname provided, or not
known
The code generated this: http://github.com...
class RestConnection(object):
def __init__(self, serverInfo):
#serverInfo can be a json object
if isinstance(serverInfo, dict):
self.ip = serverInfo["ip"]
self.username = serverInfo["username"]
self.password = serverInfo["password"]
self.port = serverInfo["port"]
self.couch_api_base = serverInfo.get("couchApiBase")
else:
self.ip = serverInfo.ip
self.username = serverInfo.rest_username
self.password = serverInfo.rest_password
self.port = serverInfo.port
self.couch_api_base = None
self.base_url = "http://{0}:{1}".format(self.ip, self.port)
server_config_uri = ''.join([self.base_url, '/pools/default'])
self.config = requests.get(server_config_uri).json()
# if couchApiBase is not set earlier, let's look it up
if self.couch_api_base is None:
#couchApiBase is not in node config before Couchbase Server 2.0
self.couch_api_base = self.config["nodes"][0].get("couchApiBase")
FYI, This error is generated from the Couchbase Python client.
It means the DNS lookup of the host name you're trying to access failed. For example, I get the same error if I try to look up an invalid host name:
>>> import requests
>>> requests.get('http://apsoapsodjaopisjdaoij.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 65, in get
return request('get', url, **kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/safe_mode.py", line 39, in wrapped
return function(method, url, **kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 51, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/sessions.py", line 241, in request
r.send(prefetch=prefetch)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/models.py", line 631, in send
raise ConnectionError(sockerr)
requests.exceptions.ConnectionError: [Errno 8] nodename nor servname provided, or not known
Check the value of serverInfo["ip"] and make sure it's set correctly.
Adding my own experience for those who are experiencing this in the future.
It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.
I was getting this error when trying to connect to .onion sites and it turns out that all I had to do was put "h" in front of the name of my socks proxies like this:
proxies = {
'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'
}
This might not be your solution but I am posting this answer just in case someone with the same problem as me comes looking for the solution here
Incorrect DNS, for e.g. when using a docker service and calling http://host.docker.internal:8000 instead of http://localhost:8000 when your are not on the docker service itself.
If you are here because the twitter api gave this error, try regenerating your
Bearer token and it will work fine.

Categories