python client request fails with default timeout - python

The following request from a python client to elasticsearch fails
2014-12-19 13:39:05,429 WARNING GET http://10.129.0.53:9200/delivery-logs-index.prod-20141218/_search?timeout=20m [status:N/A request:10.010s]
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py", line 46, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=headers, **kw)
File "/usr/lib/python2.6/site-packages/urllib3/connectionpool.py", line 559, in urlopen
_pool=self, _stacktrace=stacktrace)
File "/usr/lib/python2.6/site-packages/urllib3/util/retry.py", line 223, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python2.6/site-packages/urllib3/connectionpool.py", line 516, in urlopen
body=body, headers=headers)
File "/usr/lib/python2.6/site-packages/urllib3/connectionpool.py", line 336, in _make_request
self, url, "Read timed out. (read timeout=%s)" % read_timeout)
ReadTimeoutError: HTTPConnectionPool(host=u'10.129.0.53', port=9200): Read timed out. (read timeout=10)
Elasticsearch([es_host],
sniff_on_start=True,
max_retries=100,
retry_on_timeout=True,
sniff_on_connection_fail=True,
sniff_timeout=1000)
Is there a way to increase the request timeout? Currently it seems to be configured by default to read timeout=10

You can try adding a request_timeout to a value in your request like:
res = client.search(index=blabla, search_type="count", timeout="20m", request_timeout="10000", body={

You can also pass timeout=60 when instantiating the client object (60 meaning 60 seconds and of course being only an example).
This parameter overrides the 10s default specified in the Connection constructor.
https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/connection/base.py#L27

Related

getting a meaningful exception with Python Retry

Making a request to an API and I want to retry if I get 500. Alright, simple, I just use this solution (that you can also find on SO) which works wonders:
Create this function in the beginning:
import requests
from requests.adapters import HTTPAdapter Retry
def requests_retry_session(
retries=3,
backoff_factor=0.3,
status_forcelist=(500, 502, 504),
session=None,
):
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
return session
And later use it like so:
s = requests_retry_session()
response = s.get('http://httpbin.org')
I also like to use this method because the main part of my code is clear of try-catches and readability is important to me.
This method works and retries when a request fails for the number of retries you set.
When testing the solution on an error code response: http://httpbin.org/status/500 (url that gives you status 500 response)
I get this exception:
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\adapters.py", line 489, in send
resp = conn.urlopen(
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\connectionpool.py", line 878, in urlopen
return self.urlopen(
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\connectionpool.py", line 878, in urlopen
return self.urlopen(
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\connectionpool.py", line 878, in urlopen
return self.urlopen(
[Previous line repeated 3 more times]
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\connectionpool.py", line 868, in urlopen
retries = retries.increment(method, url, response=response, _pool=self)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\urllib3\util\retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /status/500 (Caused by ResponseError('too many 500 error responses'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/user/Downloads/request_retry.py", line 25, in <module>
r = s.get('http://httpbin.org/status/500')
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\requests\adapters.py", line 556, in send
raise RetryError(e, request=request)
requests.exceptions.RetryError: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /status/500 (Caused by ResponseError('too many 500 error responses'))
My expectation was to get the exception you would get if you ran this code:
import requests
response = requests.get('http://httpbin.org/status/500')
response.raise_for_status()
(Since making a normal get request doesn't raise an exception I would use response.raise_for_status() to raise the expected exception.)
Doing that you'll get this small and very nice exception (which I'm looking for):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://httpbin.org/status/500
Could someone help me make sense out of these exceptions and get normal ones?
The link you provided as the working-solution itself provides an explanation for this. You are getting a MaxRetryError, which in fact comes (Caused by ResponseError('too many 500 error responses')).
As it shows, you still need to catch the exception and handle it as you wish:
try:
response = requests_retry_session().get(
'http://httpbin.org/status/500',
)
except Exception as x:
print('It failed :(', x.__class__.__name__) # here comes your code
else:
print('It eventually worked', response.status_code)

problems downloading large files with requests?

I'm trying to download a video file using an API, the equivalent curl command works without problem, the python code below works without error for small videos:
with requests.get("http://username:password#url/Download/", data=data, stream=True) as r:
r.raise_for_status()
with open("deliverables/video_output34.mp4", "wb") as f:
for chunk in r.iter_content(chunk_size=1024):
f.write(chunk)
it fails for large videos (failed for video ~34M) (the equivalent curl command works for this one)
Traceback (most recent call last):
File "/home/nabil/.local/lib/python3.7/site-packages/requests/adapters.py", line 479, in send
r = low_conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nabil/.local/lib/python3.7/site-packages/requests/adapters.py", line 482, in send
r = low_conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1321, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 296, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 265, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nabil/.local/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: Remote end closed connection without response
I've checked links like the following without success
Thanks to SilentGhost on IRC#python who pointed out to this suggesting I should upgrade my requests, which solved it(from 2.22.0 to 2.24.0).
upgrading the package is done like this:
pip install requests --upgrade
Another source that may help someone looking at this question is to use pycurl, here is a good starting point: https://github.com/rajatkhanduja/PyCurl-Downloader
or/and you can use --libcurl to your curl command to get a good indication on how to use pycurl

2captcha selenium out of range

I'm trying to implement 2captcha using selenium with Python.
I just copied the example form their documentation:
https://github.com/2captcha/2captcha-api-examples/blob/master/ReCaptcha%20v2%20API%20Examples/Python%20Example/2captcha_python_api_example.py
This is my code:
from selenium import webdriver
from time import sleep
from selenium.webdriver.support.select import Select
import requests
driver = webdriver.Chrome('chromedriver.exe')
driver.get('the_url')
current_url = driver.current_url
captcha = driver.find_element_by_id("captcha-box")
captcha2 = captcha.find_element_by_xpath("//div/div/iframe").get_attribute("src")
captcha3 = captcha2.split('=')
#print(captcha3[2])
# Add these values
API_KEY = 'my_api_key' # Your 2captcha API KEY
site_key = captcha3[2] # site-key, read the 2captcha docs on how to get this
url = current_url # example url
proxy = 'Myproxy' # example proxy
proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}
s = requests.Session()
# here we post site key to 2captcha to get captcha ID (and we parse it here too)
captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url), proxies=proxy).text.split('|')[1]
# then we parse gresponse from 2captcha response
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
print("solving ref captcha...")
while 'CAPCHA_NOT_READY' in recaptcha_answer:
sleep(5)
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
recaptcha_answer = recaptcha_answer.split('|')[1]
# we make the payload for the post data here, use something like mitmproxy or fiddler to see what is needed
payload = {
'key': 'value',
'gresponse': recaptcha_answer # This is the response from 2captcha, which is needed for the post request to go through.
}
# then send the post request to the url
response = s.post(url, payload, proxies=proxy)
# And that's all there is to it other than scraping data from the website, which is dynamic for every website.
This is my error:
solving ref captcha...
Traceback (most recent call last):
File "main.py", line 38, in
recaptcha_answer = recaptcha_answer.split('|')[1]
IndexError: list index out of range
The captcha is getting solved because I can see it on 2captcha dashboard, so which is the error if it's de official documentation?
EDIT:
For some without modification I'm getting the captcha solved form 2captcha but then I get this error:
solving ref captcha...
OK|this_is_the_2captch_answer
Traceback (most recent call last):
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 594, in urlopen
self._prepare_proxy(conn)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 805, in _prepare_proxy
conn.connect()
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 308, in connect
self._tunnel()
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 906, in _tunnel
(version, code, message) = response._read_status()
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 278, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: <html>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\retry.py", line 368, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\packages\six.py", line 685, in reraise
raise value.with_traceback(tb)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 594, in urlopen
self._prepare_proxy(conn)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 805, in _prepare_proxy
conn.connect()
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 308, in connect
self._tunnel()
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 906, in _tunnel
(version, code, message) = response._read_status()
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 278, in _read_status
raise BadStatusLine(line)
urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine('<html>\r\n'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 49, in <module>
response = s.post(url, payload, proxies=proxy)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 581, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine('<html>\r\n'))
Why am I getting this error?
I'm setting as site_key = current_url_where_captcha_is_located
Is this correct?
Use your debugger or put a print(recaptcha_answer) before the error line to see what's the value of recaptcha_answer before you try to call .split('|') on it. There is no | in the string so when you're trying to get the second element of the resulting list with [1] it fails.
Looks like you don't provide any valid proxy connection parameters but passing this proxy to requests when connecting to the API.
Just comment these two lines:
#proxy = 'Myproxy' # example proxy
#proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}
And then remove proxies=proxy from four lines:
captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url)).text.split('|')[1]
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
response = s.post(url, payload, proxies=proxy)

Python requests.exceptions.ChunkedEncodingError

I am writing a python script that does these steps below.
Query a MongoDB database
Parse and aggregate results
Upload data to a ServiceNow table via a REST API
This works most of the time but occasionally I see this error:
requests.exceptions.ChunkedEncodingError: ("Connection broken: error(104, 'Connection reset by peer')", error(104, 'Connection reset by peer'))
This error stops the script and prevents the entire data set from being captured.
What can I do to alleviate this issue?
Python 2.7.5
Code
#!/usr/bin/env python
from config import *
import os, sys
mypath = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(mypath, "api-python-client"))
from apiclient.mongo import *
from pymongo import MongoClient
import json
import requests
from bson.json_util import dumps
client = MongoClient(mongo_uri)
#Create ServiceNow URL
svcnow_url = create_svcnow_url('u_imp_cmps')
#BITSDB Nmap Collection
db = client[mongo_db]
#Aggregate - RDBMS equivalent to Alias select x as y
#Rename fields to match ServiceNow field names
computers = db['computer'].aggregate([
{"$unwind": "$hostnames"},
{"$project" : {
"_id":0,
"u_hostname": "$hostnames.name",
"u_ipv4": "$addresses.ipv4",
"u_status": "$status.state",
"u_updated_timestamp": "$last_seen"
}}
])
j = dumps({"records":computers})
#print(j)
#Set proper headers
headers = {"Content-Type":"application/json","Accept":"application/json"}
#Build HTTP Request
response = requests.post(url=svcnow_url, auth=(svcnow_user, svcnow_pwd), headers=headers ,data=j)
#Check for HTTP codes other than 200
if response.status_code != 200:
print('Status:', response.status_code, 'Headers:', response.headers, 'Response Text', response.text, 'Error Response:',response.json())
exit()
#Decode the JSON response into a dictionary and use the data
print('Status:',response.status_code,'Headers:',response.headers,'Response:',response.json())
Error
Traceback (most recent call last):
File "/usr/src/computer_pingable_import.py", line 50, in <module>
response = requests.post(url=svcnow_url, auth=(svcnow_user, svcnow_pwd), headers=headers ,data=j)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 107, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 608, in send
r.content
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 737, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 663, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: error(104, 'Connection reset by peer')", error(104, 'Connection reset by peer'))
Another attempt and a slightly different error message:
Traceback (most recent call last):
File "/usr/src/computer_pingable_import.py", line 50, in <module>
response = requests.post(url=svcnow_url, auth=(svcnow_user, svcnow_pwd), headers=headers ,data=j)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 107, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 426, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))
ServiceNow prevents REST transactions from running for longer than 60 seconds.
I'm not too sure how large your dataset is, but you will want to chunk the data into smaller pieces to ensure the transaction always runs.

Timing out when I use a proxy

I am trying to implement proxies into my web crawler. Without the proxies, my code has no problem connecting to the website, however when I try to add in proxies, suddenly it won't connect! It doesn't look like anybody in python-requests has made a post about this problem, so I'm hoping you all can help me!
Background info: I'm using a Mac and using Anaconda's Python 3.4 inside of a virtual environment.
Here is my code that works without proxies
proxyDict = {'http': 'http://10.10.1.10:3128'}
def pmc_spider(max_pages, pmid):
start = 1
titles_list = []
url_list = []
url_keys = []
while start <= max_pages:
url = 'http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/'+str(pmid)+'/citedby/?page='+str(start)
req = requests.get(url) #this works
plain_text = req.text
soup = BeautifulSoup(plain_text, "lxml")
for items in soup.findAll('div', {'class': 'title'}):
title = items.get_text()
titles_list.append(title)
for link in items.findAll('a'):
urlkey = link.get('href')
url_keys.append(urlkey) #url = base + key
url = "http://www.ncbi.nlm.nih.gov"+str(urlkey)
url_list.append(url)
start += 1
return titles_list, url_list, authors_list
Based on other posts I'm looking at, I should just be able to replace this:
req = requests.get(url)
with this:
req = requests.get(url, proxies=proxyDict, timeout=2)
But this doesn't work! :( If I run it with this line of code the terminal gives me a TimeOut error
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 578, in urlopen
chunked=chunked)
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 362, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/http/client.py", line 1137, in request
self._send_request(method, url, body, headers)
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/http/client.py", line 1182, in _send_request
self.endheaders(body)
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/http/client.py", line 1133, in endheaders
self._send_output(message_body)
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/http/client.py", line 963, in _send_output
self.send(msg)
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/http/client.py", line 898, in send
self.connect()
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 167, in connect
conn = self._new_conn()
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 147, in _new_conn
(self.host, self.timeout))
requests.packages.urllib3.exceptions.ConnectTimeoutError: (<requests.packages.urllib3.connection.HTTPConnection object at 0x1052665f8>, 'Connection to 10.10.1.10 timed out. (connect timeout=2)')
And then I get a few of these printed in the terminal with different traces but the same error:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/site-packages/requests/adapters.py", line 403, in send
timeout=timeout
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 623, in urlopen
_stacktrace=sys.exc_info()[2])
File "/Users/hclent/anaconda3/envs/py34/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py", line 281, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='10.10.1.10', port=3128): Max retries exceeded with url: http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/18269575/citedby/?page=1 (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x1052665f8>, 'Connection to 10.10.1.10 timed out. (connect timeout=2)'))
Why would the addition of proxies to my code suddenly cause me to timeout? I tried it on several random urls and had the same thing happen. So it seems to be a problem with proxies rather than a problem with my code. However, I'm at the point where I MUST use proxies now so I need to get to the root of this and fix it. I've also tried several different IP addresses for the proxy from a VPN that I use, so I know the IP addresses are valid.
I appreciate your help so much! Thank you!
It looks like you'll need to use a http or https proxy that will respond to requests.
The 10.10.1.10:3128 in your code seems to be from examples in the requests documentation
Taking a proxy from the list at http://proxylist.hidemyass.com/search-1291967 (may not be the best source) your proxyDict should look like this: {'http' : 'http://209.242.141.60:8080'}
testing this on the command line seems to work fine:
>>> proxies = {'http' : 'http://209.242.141.60:8080'}
>>> requests.get('http://google.com', proxies=proxies)
<Response [200]>

Categories