If request timeout skip url python requests - python

I would like the following script to try every url in url_list, and if it exist print it exist(url) and if not print don't(url) and if request timeout skip to the next url using "requests" lib:
url_list = ['www.google.com','www.urlthatwilltimeout.com','www.urlthatdon\'t exist']
def exist:
if request.status_code == 200:
print"exist{0}".format(url)
else:
print"don\'t{0}".format(url)
a = 0
while (a < 2):
url = urllist[a]
try:
request = requests.get(url, timeout=10)
except request.timeout:#any option that is similar?
print"timed out"
continue
validate()
a+=1

Based on this SO answer
below is code which will limit the total time taken by a GET request as well
as discern other exceptions that may happen.
Note that in requests 2.4.0 and later you may specify a connection timeout and read timeout
by using the syntax:
requests.get(..., timeout=(...conn timeout..., ...read timeout...))
The read timeout, however, only specifies the timeout between individual
read calls, not a timeout for the entire request.
Code:
import requests
import eventlet
eventlet.monkey_patch()
url_list = ['http://localhost:3000/delay/0',
'http://localhost:3000/delay/20',
'http://localhost:3333/', # no server listening
'http://www.google.com'
]
for url in url_list:
try:
with eventlet.timeout.Timeout(1):
response = requests.get(url)
print "OK -", url
except requests.exceptions.ReadTimeout:
print "READ TIMED OUT -", url
except requests.exceptions.ConnectionError:
print "CONNECT ERROR -", url
except eventlet.timeout.Timeout, e:
print "TOTAL TIMEOUT -", url
except requests.exceptions.RequestException, e:
print "OTHER REQUESTS EXCEPTION -", url, e
And here is an express server you can use to test it:
var express = require('express');
var sleep = require('sleep')
var app = express();
app.get('/delay/:secs', function(req, res) {
var secs = parseInt( req.params.secs )
sleep.sleep(secs)
res.send('Done sleeping for ' + secs + ' seconds')
});
app.listen(3000, function () {
console.log('Example app listening on port 3000!');
});

Related

429 Too many request error despite using proxies

I am using StormProxies to access Etsy data but despite using proxies and implementing retries I am getting 429 Too Many Requests error most of the time(~80%+). Here is my code to access data:
import requests
def create_request(url, logging, headers={}, is_proxy=True):
r = None
try:
proxies = {
'http': 'http://{}'.format(PROXY_GATEWAY_IP),
'https': 'http://{}'.format(PROXY_GATEWAY_IP),
}
with requests.Session() as s:
retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504, 429])
s.mount('http://', HTTPAdapter(max_retries=retries))
if is_proxy:
r = s.get(url, proxies=proxies, timeout=30, headers=headers)
else:
r = s.get(url, headers=headers, timeout=30)
r.raise_for_status()
if r.status_code != 200:
print('Status Code = ', r.status_code)
if logging is not None:
logging.info('Status Code = ' + str(r.status_code))
except Exception as ex:
print('Exception occur in create_request for the url:- {url}'.format())
crash_date = time.strftime("%Y-%m-%d %H:%m:%S")
crash_string = "".join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__))
exception_string = '[' + crash_date + '] - ' + crash_string + '\n'
print('Could not connect. Proxy issue or something else')
print('==========================================================')
print(exception_string)
finally:
return r
StormProxies guys say that I implement retries, this is how I have done but it is not working for me.
I am using Python multiprocessing and spawning 30+ threads at a time.
My recommendation is, remove huge overhead based on thread management in one process (30+ is really lot).
It is more efficiency to use more processes with only a few threads (2-4 threads, based on delay with I/O) because threads in one process have to play with GIL (Global Interpreter Lock). In this case all will be only about configuration for your Python code.

Checking website response within x seconds

Good day the problem I am facing is that I want to check if my website is up or not this is the sample pseudo code
Check(website.com)
if checking_time > 10 seconds:
print "No response Recieve"
else:
print "Site is up"
I already try the code below but not working
try:
response = urllib.urlopen("http://insurance.contactnumbersph.com").getcode()
time.sleep(5)
if response == "" or response == "403":
print "No response"
else:
print "ok"
If the website is not up and running, you will get connection refused error and actually doesn't return any status code. So, you can catch the error in python with simple try: and except: blocks.
import requests
URL = 'http://some-url-where-there-is-no-server'
try:
resp = requests.get(URL)
except Exception as e:
# handle here
print(e) # for example
You can also check repeatedly 10 times, each per second to check if there is an exception, if there is you will check again
import requests
URL = 'http://some-url'
canCheck = False
counts = 0
gotConnected = False
while counts < 10 :
try:
resp = requests.get(URL)
gotConnected = True
break
except Exception as e:
counts +=1
time.sleep(1)
The result will be available in gotConnected flag, which you can use later to handle appropriate actions.
note that the timeout that gets passed around by urllib applies to the "wrong thing". that is each individual network operation (e.g. hostname resolution, socket connection, sending headers, reading a few bytes of the headers, reading a few more bytes of the response) each get this same timeout applied. hence passing a "timeout" of 10 seconds could allow a large response to continue for hours
if you want to stick to built in Python code then it would be nice to use a thread to do this, but it doesn't seem to be possible to cancel running threads nicely. an async library like trio would allow better timeout and cancellation handling, but we can make do by using the multiprocessing module instead:
from urllib.request import Request, urlopen
from multiprocessing import Process
from time import perf_counter
def _http_ping(url):
req = Request(url, method='HEAD')
print(f'trying {url!r}')
start = perf_counter()
res = urlopen(req)
secs = perf_counter() - start
print(f'response {url!r} of {res.status} after {secs*1000:.2f}ms')
res.close()
def http_ping(url, timeout):
proc = Process(target=_http_ping, args=(url,))
try:
proc.start()
proc.join(timeout)
success = not proc.is_alive()
finally:
proc.terminate()
proc.join()
proc.close()
return success
you can use https://httpbin.org/ to test this, e.g:
http_ping('https://httpbin.org/delay/2', 1)
should print out a "trying" message, but not a "response" message. you can adjust the delay time and timeout to explore how this behaves...
note that this spins up a new process for each request, but as long as you're doing this less than a thousand pings a second it should be OK

How to use the requests module to skip connection timeout urls

Hi how can i use the request module to go through a bunch of URLs and if a url in the list takes more time to load or a connection timeout how can i skip that particular url and skip to the next one
def req():
with open('demofile.txt','r') as http:
for url in http.readlines():
req = url.strip()
print(req)
page=requests.get("http://"+req,verify=False)
if page.status_code == 400:
break
else:
continue
time.sleep(1)
You can raise exception if there is a timeout and continue on finally block for next request,
import requests
import logging
timeout = 0.00001
try:
response = requests.get(url="https://google.com", timeout=timeout)
except requests.exceptions.ConnectTimeout as e:
logging.error("Time out!")
finally:
# continue request here
print("hello")
# output,
ERROR:root:Time out!
hello

Python - Requests module - Receving streaming updates - Connection reset by peer

I have been building my own python (version 3.2.1) trading application in a practice account of a Forex provider (OANDA) but I am having some issues in receiving the streaming prices with a Linux debian-based OS.
In particular, I have followed their "Python streaming rates" guide available here: http://developer.oanda.com/rest-live/sample-code/.
I have a thread calling the function 'connect_to_stream' which prints out all the ticks received from the server:
streaming_thread = threading.Thread(target=streaming.connect_to_stream, args=[])
streaming_thread.start()
The streaming.connect_to_stream function is defined as following:
def connect_to_stream():
[..]#provider-related info are passed here
try:
s = requests.Session()
url = "https://" + domain + "/v1/prices"
headers = {'Authorization' : 'Bearer ' + access_token,
'Connection' : 'keep-alive'
}
params = {'instruments' : instruments, 'accountId' : account_id}
req = requests.Request('GET', url, headers = headers, params = params)
pre = req.prepare()
resp = s.send(pre, stream = True, verify = False)
return resp
except Exception as e:
s.close()
print ("Caught exception when connecting to stream\n%s" % str(e))
if response.status_code != 200:
print (response.text)
return
for line in response.iter_lines(1):
if line:
try:
msg = json.loads(line)
print(msg)
except Exception as e:
print ("Caught exception when connecting to stream\n%s" % str(e))
return
The msg variable contains the tick received for the streaming.
The problem is that I receive ticks for three hours on average after which the connection gets dropped and the script either hangs without receiving any ticks or throws an exception with reason "Connection Reset by Peer".
Could you please share any thoughts on where I am going wrong here? Is it anything related to the requests library (iter_lines maybe)?
I would like to receive ticks indefinitely unless a Keyboard exception is raised.
Thanks
That doesn't seem too weird to me that a service would close connections living for more than 3 hours.
That's probably a safety on their side to make sure to free their server sockets from ghost clients.
So you should probably just reconnect when you are disconnected.
try:
s = requests.Session()
url = "https://" + domain + "/v1/prices"
headers = {'Authorization' : 'Bearer ' + access_token,
'Connection' : 'keep-alive'
}
params = {'instruments' : instruments, 'accountId' : account_id}
req = requests.Request('GET', url, headers = headers, params = params)
pre = req.prepare()
resp = s.send(pre, stream = True, verify = False)
return resp
except SocketError as e:
if e.errno == errno.ECONNRESET:
pass # connection has been reset, reconnect.
except Exception as e:
pass # other exceptions but you'll probably need to reconnect too.

Proxy Check in python

I have written a script in python that uses cookies and POST/GET. I also included proxy support in my script. However, when one enters a dead proxy, the script crashes. Is there any way to check if a proxy is dead/alive before running the rest of my script?
Furthermore, I noticed that some proxies don't handle cookies/POST headers properly. Is there any way to fix this?
The simplest was is to simply catch the IOError exception from urllib:
try:
urllib.urlopen(
"http://example.com",
proxies={'http':'http://example.com:8080'}
)
except IOError:
print "Connection error! (Check proxy)"
else:
print "All was fine"
Also, from this blog post - "check status proxy address" (with some slight improvements):
for python 2
import urllib2
import socket
def is_bad_proxy(pip):
try:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.example.com') # change the URL to test here
sock=urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'Error code: ', e.code
return e.code
except Exception, detail:
print "ERROR:", detail
return True
return False
def main():
socket.setdefaulttimeout(120)
# two sample proxy IPs
proxyList = ['125.76.226.9:80', '213.55.87.162:6588']
for currentProxy in proxyList:
if is_bad_proxy(currentProxy):
print "Bad Proxy %s" % (currentProxy)
else:
print "%s is working" % (currentProxy)
if __name__ == '__main__':
main()
for python 3
import urllib.request
import socket
import urllib.error
def is_bad_proxy(pip):
try:
proxy_handler = urllib.request.ProxyHandler({'http': pip})
opener = urllib.request.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
req=urllib.request.Request('http://www.example.com') # change the URL to test here
sock=urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
print('Error code: ', e.code)
return e.code
except Exception as detail:
print("ERROR:", detail)
return True
return False
def main():
socket.setdefaulttimeout(120)
# two sample proxy IPs
proxyList = ['125.76.226.9:80', '25.176.126.9:80']
for currentProxy in proxyList:
if is_bad_proxy(currentProxy):
print("Bad Proxy %s" % (currentProxy))
else:
print("%s is working" % (currentProxy))
if __name__ == '__main__':
main()
Remember this could double the time the script takes, if the proxy is down (as you will have to wait for two connection-timeouts).. Unless you specifically have to know the proxy is at fault, handling the IOError is far cleaner, simpler and quicker..
you can use the Proxy-checker library which is as simple as this
from proxy_checker import ProxyChecker
checker = ProxyChecker()
checker.check_proxy('<ip>:<port>')
output :
{
"country": "United States",
"country_code": "US",
"protocols": [
"socks4",
"socks5"
],
"anonymity": "Elite",
"timeout": 1649
}
with the possibility of generating your own proxies and check them with two lines of code
you can use ip-getter website to get the IP by which you are sending a request, then check if the IP is the same as your proxy IP or some thing else. Here is a script for that matter:
import requests
proxy_ip = "<IP>"
proxy_port = "<PORT>"
proxy_user = "<USERNAME>"
proxy_pass = "<PASSWORD>"
proxies = {
"http": f"http://{proxy_user}:{proxy_pass}#{proxy_ip}:{proxy_port}/",
"https": f"http://{proxy_user}:{proxy_pass}#{proxy_ip}:{proxy_port}/"
}
url = 'https://api.ipify.org'
try:
response = requests.get(url, proxies=proxies)
assert response.text==proxy_ip
except:
print("Proxy does not work")
I think that the better approach is like dbr said, handling the exception.
Another solution that could be better in some cases, is to use an external online proxy checker tool to check if a proxy server is alive and then continue using your script without any modification.
There is one nice package Grab
So, if it ok for you, you can write something like this(simple valid proxy checker-generator):
from grab import Grab, GrabError
def get_valid_proxy(proxy_list): #format of items e.g. '128.2.198.188:3124'
g = Grab()
for proxy in proxy_list:
g.setup(proxy=proxy, proxy_type='http', connect_timeout=5, timeout=5)
try:
g.go('google.com')
except GrabError:
#logging.info("Test error")
pass
else:
yield proxy

Categories