Repeat function until true in Python - python

googled a lot but I still have no solution
So I have a parser def:
def parse_page(url):
req = request.get(url, headers=headers(), proxies=dict(http='socks4://' + get_proxy()), timeout=5)
(code was just for example)
Sometimes proxy is dead or other error could happened (timeout, err 500) but I need to make this request anyway and try until it will return true
So how can I do that?
I tried retrying lib but no success
Thank you!

How about:
import time
req = 0
while not req:
try:
req = request.get(url, headers=headers(), proxies=dict(http='socks4://' + get_proxy()))
except:
time.sleep(5)
As soon as you get a req this will be True no matter what it is, so long as it's not None and that will exit the loop.

while parse_page(url,urls[url]) == False:
print('Something happened... Trying again...')
else:
print(url + 'Is saved.. Keep going...')
Just have to swtich while to False and thats it...
I will leave it if somebody will google it.

Related

Strange Intermittent posting issue using python requests

I'm using a raspberry pi 3 and python 2.7 with requests to post data to my lamp server. All works great except for intermittent posting errors which are trapped requests.exceptions.ConnectTimeout. BTW, timeout=0.5 sec which is 2.5x the posting time (0.2 sec). See code below.
When the request exception occurs, I check for internet access using CheckConnection(). BTW, this takes only 0.016 sec on pi;so fast compared to other techniques. When False, it doesn't retry posting and logs data locally.
However, I can connect remotely to the Pi using TeamViewer while this is happening! I am posting data to our server with other installations so it is not a cloud server down issue.
After several to many minutes, the issue resolves itself and posting resumes like nothing was wrong.
Any suggestions to how I can change my code is most welcomed either to determine the root cause or fix the issue. Thank you in advance.
******** CODE ************
def PostData(payload,retry_count=3):
url = 'http://xxx.xxx.xxx.xxx/api/data/push/'
try:
response = requests.post(url,params=payload,timeout=0.5)
if response.status_code == 200:
return response.text
response.raise_for_status()
except (requests.exceptions.RequestException, requests.exceptions.ConnectTimeout) as e:
print "Post Error..."
x = CheckConnection()
if x==False:
return "Internet for Posting: " + str(x)
if retry_count >0:
Reason = "Post Settings Retry: " + str(retry_count)
print Reason
#sleeptime = 0.05*2**(3-retry_count)
#time.sleep(sleeptime)
return PostData(payload, retry_count-1)
if retry_count==0:
Reason = "Error! Post settings retry failed. Retry=0. Internet: " + str(x)
return Reason
return None
except Exception as e:
x = CheckConnection()
Reason= "Error! Posting Exception: " + str(e) + "Internet: " + str(x)
print Reason
return None
def CheckConnection(host="8.8.8.8",port=53,timeout=0.5):
try:
socket.setdefaulttimeout(timeout)
socket.socket(socket.AF_INET,socket.SOCK_STREAM).connect((host,port))
return True
except Exception:
return False

Do something on connection error apart from retry with python requests

I am using Python requests to make a post request.
I am trying to do something like this as shown in below post:
Retry with requests
When there is connection error or response status code received is from status_forcelist, it should retry(which is working fine). What I want to do is after first try (before retrying), I want to do some other stuff. This can be possible if I can catch Exception and handle it to do other stuff. But it seems that requests is not raising any exception in case of connection error or response code is in status_forcelist unless retry count reaches to max configured. How can I achieve this?
Here is the code sample:
def requests_retry_session(
retries=3,
backoff_factor=0.3,
status_forcelist=(500, 502, 504),
session=None,
):
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
return session
def do_something_more():
## do something to tell user API failed and it will retry
print("I am doing something more...")
Usage...
t0 = time.time()
try:
response = requests_retry_session().get(
'http://localhost:9999',
)
except Exception as x:
# Catch exception when connection error or 500 on first attempt and do something more
do_somthing_more()
print('It failed :(', x.__class__.__name__)
else:
print('It eventually worked', response.status_code)
finally:
t1 = time.time()
print('Took', t1 - t0, 'seconds')
I know exception will be raised after max allowed attempts(defined in retries=3). All I want is some signal from requests or urllib3 to tell my main program that first attempt is failed and now it will start retrying. So that my program can do something more based on it. If not through exception, something else.
The most robust way (but probably not the best and certainly not the most efficient) would be just to set the retries to 0 - then the exception will be raised every time. Then I would just call the function three times, with manual counter, which will count how many times you tried to reconnect. Something like this (I didn't check if it works, just wanted to show you my way of thinking):
counter = 0
t0 = time.time()
for i in range(3):
try:
response = requests_retry_session().get(
'http://localhost:9999',
)
#This should already be set to retries=0
except MaxRetryError:
counter += 1
do_something_more()
print('It is the {} time it failed'.format(counter))
else:
break #If there isn't MaxRetryError, it connected successfully, so we don't have to execute for anymore
t1 = time.time()
print('Took', t1 - t0, 'seconds')

How to check if a list of URLs exists

I'm trying to test if a simple list of urls exists, the code works when I'm just testing one url, but when I try add a array of urls, it's breaks.
Any idea what i'm doing wrong?
Single URL Code
import httplib
c = httplib.HTTPConnection('www.example.com')
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')
Broken Array Code
import httplib
Urls = ['www.google.ie', 'www.msn.com', 'www.fakeniallweb.com', 'www.wikipedia.org', 'www.galwaydxc.com', 'www.foxnews.com', 'www.blizzard.com', 'www.youtube.com']
for x in Urls:
c = httplib.HTTPConnection(x)
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')
else:
print('web site' + x + 'un-reachable')
#To prevent code from closing
input ()
The problem is not that you do it as an array, it is that one of your urls (www.fakeniallweb.com) has a different problem than your other urls.
I think because the DNS cannot be resolved, you cannot request the HEAD as you do. So you need an additional check other than just checking for response code 200.
Maybe you could do something like this:
try:
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')
else:
print('website does not exist')
except gaierror as e:
print('Error resolving DNS')
Honestly I suspect you will find other cases where a website returns different status codes. For example a website might return something in the 3xx range for a redirect, or a 403 if you cannot access it. That does not mean the website does not exist.
Hope this helps you on your way!
#Dries De Rydt
Thanks for your help , it was a unresolved dns error causing it to crash out.
I ended up Lib/socket.py
solution
import socket
Urls = ['www.google.ie', 'www.msn.com', 'www.fakeniallweb.com', 'www.wikipedia.org', 'www.galwaydxc.com', 'www.foxnews.com', 'www.blizzard.com', 'www.youtube.com']
for x in Urls:
try:
url = socket.gethostbyname(x)
print x + ' was reachable '
except socket.gaierror, err:
print "cannot resolve hostname: ", x, err
#To prevent code from closing
input ()
Thanks for all the help.

Multiple simultaneous HTTP requests

I'm trying to take a list of items and check for their status change based on certain processing by the API. The list will be manually populated and can vary in number to several thousand.
I'm trying to write a script that makes multiple simultaneous connections to the API to keep checking for the status change. For each item, once the status changes, the attempts to check must stop. Based on reading other posts on Stackoverflow (Specifically, What is the fastest way to send 100,000 HTTP requests in Python? ), I've come up with the following code. But the script always stops after processing the list once. What am I doing wrong?
One additional issue that I'm facing is that the keyboard interrup method never fires (I'm trying with Ctrl+C but it does not kill the script.
from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue
requestURLBase = "https://example.com/api"
apiKey = "123456"
concurrent = 200
keepTrying = 1
def doWork():
while keepTrying == 1:
url = q.get()
status, body, url = checkStatus(url)
checkResult(status, body, url)
q.task_done()
def checkStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(requestURLBase)
conn.request("GET", url.path)
res = conn.getresponse()
respBody = res.read()
conn.close()
return res.status, respBody, ourl #Status can be 210 for error or 300 for successful API response
except:
print "ErrorBlock"
print res.read()
conn.close()
return "error", "error", ourl
def checkResult(status, body, url):
if "unavailable" not in body:
print status, body, url
keepTrying = 1
else:
keepTrying = 0
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for value in open('valuelist.txt'):
fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
print fullUrl
q.put(fullUrl)
q.join()
except KeyboardInterrupt:
sys.exit(1)
I'm new to Python so there could be syntax errors as well... I'm definitely not familiar with multi-threading so perhaps I'm doing something else wrong as well.
In the code, the list is only read once. Should be something like
try:
while True:
for value in open('valuelist.txt'):
fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
print fullUrl
q.put(fullUrl)
q.join()
For the interrupt thing, remove the bare except line in checkStatus or make it except Exception. Bare excepts will catch all exceptions, including SystemExit which is what sys.exit raises and stop the python process from terminating.
If I may make a couple comments in general though.
Threading is not a good implementation for such large concurrencies
Creating a new connection every time is not efficient
What I would suggest is
Use gevent for asynchronous network I/O
Pre-allocate a queue of connections same size as concurrency number and have checkStatus grab a connection object when it needs to make a call. That way the connections stay alive, get reused and there is no overhead in creating and destroying them and the increased memory use that goes with it.

Python urllib2 HTTPBasicAuthHandler

Here is the code:
import urllib2 as URL
def get_unread_msgs(user, passwd):
auth = URL.HTTPBasicAuthHandler()
auth.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='%s'%user,
passwd=passwd
)
opener = URL.build_opener(auth)
URL.install_opener(opener)
try:
feed= URL.urlopen('https://mail.google.com/mail/feed/atom')
return feed.read()
except:
return None
It works just fine. The only problem is that when a wrong username or password is used, it takes forever to open to url #
feed= URL.urlopen('https://mail.google.com/mail/feed/atom')
It doesn't throw up any errors, just keep executing the urlopen statement forever.
How can i know if username/password is incorrect.
I thought of a timeout for the function but then that would turn all error and even slow internet into a authentication error.
It should throw an error, more precisely an urllib2.HTTPError, with the code field set to 401, you can see some adapted code below. I left your general try/except structure, but really, do not use general except statements, catch only what you expect that could happen!
def get_unread_msgs(user, passwd):
auth = URL.HTTPBasicAuthHandler()
auth.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='%s'%user,
passwd=passwd
)
opener = URL.build_opener(auth)
URL.install_opener(opener)
try:
feed= URL.urlopen('https://mail.google.com/mail/feed/atom')
return feed.read()
except HTTPError, e:
if e.code == 401:
print "authorization failed"
else:
raise e # or do something else
except: #A general except clause is discouraged, I let it in because you had it already
return None
I just tested it here, works perfectly

Categories