Continue Processing requests after failure in python - python

I have a script that runs and submits urls in a text file via GET to an API and saves the response to a text file. However, the for loop quits if I get a failure in the first section and does not continue passing the others. How can I still grab the failure and continue on with the others without the script exiting before it finishes?
sys.stdout=open("mylog.txt","w")
for row in range(0, len(exampleData)):
url = exampleData[row][0]
print (url)
response = requests.get(url, auth=(user, pwd))
if response.status_code != 200:
print('Failure Message {}' .format(response.text))
work = 'failed'
continue
data = json.loads(response.text)
print(data)
work = 'succeeded'
sys.stdout.close()

Use continue instead of exit()

Use an exception to catch the failure and continue on.

Now that your loop control is corrected, it must be working correctly. It will print out a failure message every time it gets an error response (not 200). If you're only seeing one error message, you're only getting one non-200 response from the other side. If that's not what you expect, the problem is on the server side. (Or in the contents of exampleData.)
You need to debug your own server-client system. Simplify this loop so that the only thing it does is print diagnostic information about the response (e.g., print status_code), and find out what's really going on.

Related

How to try accessing webpage again after http errors in python for loop

Example of 522 error when I go to the webpage manually
Example of 525 error when I go to the webpage manually
Example of 504 error when I go to the webpage manually
I am running the following for loop which goes through a dictionary of subreddits(key) and urls (value). The urls produce a dictionary with all posts from 2022 of a given subreddit. Sometimes the for loop stops and produces a 'http error 525' or other errors.
I'm wondering how I can check for these errors when reading the url and then try again until the error is not given before moving to the next subreddit.
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
I haven't been able to figure it out.
You can put this code in try and except block like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
while True:
try:
page = urllib.request.urlopen(url).read()
dict_last_posts[subredd] = page
break # exit the while loop if the request succeeded
except urllib.error.HTTPError as e:
if e.code == 525 or e.code == 522 or e.code == 504:
print("Encountered HTTP error while reading URL. Retrying...")
else:
raise # re-raise the exception if it's a different error
This code will catch any HTTP Error that occurs while reading the URL and check if the error code is 525 or 504 or 525. If it is, it will print a message and try reading the URL again. If it's a different error, it will re-raise the exception so that you can handle it appropriately.
NOTE: This code will retry reading the URL indefinitely until it succeeds or a different error occurs. You may want to add a counter or a timeout to prevent the loop from going on forever in case the error persists.
It's unwise to indefinitely retry a request. Set a limit even if it's very high, but don't set it so high that it causes you to be rate limited (HTTP status 429). The backoff_factor will also have an impact on rate limiting.
Use the requests package for this. This makes it very easy to set a custom adapter for all of your requests via Session, and it includes Retry from urllib3 which takes care of retry behavior in an object you can pass to your adapter.
import requests
from requests.adapters import HTTPAdapter, Retry
s = requests.Session()
retries = Retry(
total=5,
backoff_factor=0.1,
status_forcelist=[504, 522, 525]
)
s.mount('https://', HTTPAdapter(max_retries=retries))
for subredd, url in dict_last_subreddit_posts.items():
response = s.get(url)
dict_last_posts[subredd] = response.content
You can play around with total (maximum number of retries) and backoff_factor (adjusts wait time between retries) to get the behavior you want.
Try something like this:
for subredd, url in dict_last_subreddit_posts.items():
print(subredd)
http_response = urllib.request.urlopen(url)
while http_response.status != 200:
if http_response.status == 503:
http_response = urllib.request.urlopen(url)
elif http_response.status == 523:
#enter code here
else:
#enter code here
dict_last_posts[subredd] = http_response.read()
But, Michael Ruth answer is better

I suddenly receive no stream data from Twitter stream (requests.get(...2/tweets/search/stream))

I'm working on a screenshot bot for Twitter using Python.
My app collects tweet from a filtered stream and replies with an image of the tweet.
Yesterday, my bot worked well: connected to stream, and made replies.
Today, it still connects to stream but returns nothing.
Here is the code:
def get_stream(set):
with requests.get(f"https://api.twitter.com/2/tweets/search/stream?tweet.fields=id,author_id&user.fields=id,username&expansions=author_id,referenced_tweets.id", auth=bearer_oauth, stream=True) as response:
print(response.status_code)
if response.status_code == 429:
print(f"returned code 429, waiting for 60 seconds to try again")
print(response.text)
time.sleep(60)
return
if response.status_code != 200:
raise Exception(
f"Cannot get stream (HTTP {response.status_code}): {response.text}"
)
for response_line in response.iter_lines():
if response_line:
print(here)
json_response = json.loads(response_line)
print(json.dumps(json_response, indent=4))
I've searched everywhere I know for help on this issue. I've reduced the queries in my request.get line, I've chosen to use a with statement, nothing works.
response.text returns nothing at all, even though response.status_code returns 200.
I have also tried 2 different developer accounts for streaming authentication
This is an issue currently under investigation and not related to the specific code here, you would be best following the Twitter Developer Community discussion.
I figured out what the problem was, it is a problem in the Twitter side that caused streaming to fail for some accounts.
It has been resolved now and everything works fine.

Capturing Response Body for a HTTP Error in python

Need to capture the response body for a HTTP error in python. Currently using the python request module's raise_for_status(). This method only returns the Status Code and description. Need a way to capture the response body for a detailed error log.
Please suggest alternatives to python requests module if similar required feature is present in some different module. If not then please suggest what changes can be done to existing code to capture the said response body.
Current implementation contains just the following:
resp.raise_for_status()
I guess I'll write this up quickly. This is working fine for me:
try:
r = requests.get('https://www.google.com/404')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err.request.url)
print(err)
print(err.response.text)
you can do something like below, which returns the content of the response, in unicode.
response.text
or
try:
r = requests.get('http://www.google.com/nothere')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err)
sys.exit(1)
# 404 Client Error: Not Found for url: http://www.google.com/nothere
here you'll get the full explanation on how to handle the exception. please check out Correct way to try/except using Python requests module?
You can log resp.text if resp.status_code >= 400.
There are some tools you may pick up such as Fiddler, Charles, wireshark.
However, those tools can just display the body of the response without including the reason or error stack why the error raises.

python requests module get method followed by json method getting stuck.

I am facing an issue with 'requests' module of python.
I have these three lines of code:
print '\n\nTrying to fetch Tweets from URL %s' % url
newTweets = requests.get(url).json()
print 'Fetched %d tweets from URL: %s' % (len(newTweets), url)
And somehow the program execution gets stuck (program halts) on the second line. The 'url' parameter is a valid url to our backend server which serves 'valid' json.
I have just started to experience this issue today. There are no loops in the code, so there's no scope for infinite looping. However, I still don't know what exactly happens inside 'get' and 'json' methods of requests module.
If anyone have any explanation for this, kindly reply.
Split your program into several steps
newTweets = requests.get(url)
then check status code for whatever you expect to return, e.g.:
if newTweets.status_code != 200:
# exception handling
return newTweets.json()

python http status code

I'm writing my own directory buster in python, and I'm testing it against a web server of mine in a safe and secure environment. This script basically tries to retrieve common directories from a given website and, looking at the HTTP status code of the response, it is able to determine if a page is accessible or not.
As a start, the script reads a file containing all the interesting directories to be looked up, and then requests are made, in the following way:
for dir in fileinput.input('utils/Directories_Common.wordlist'):
try:
conn = httplib.HTTPConnection(url)
conn.request("GET", "/"+str(dir))
toturl = 'http://'+url+'/'+str(dir)[:-1]
print ' Trying to get: '+toturl
r1 = conn.getresponse()
response = r1.read()
print ' ',r1.status, r1.reason
conn.close()
Then, the response is parsed and if a status code equal to "200" is returned, then the page is accessible. I've implemented all this in the following way:
if(r1.status == 200):
print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'
All seems fine to me except that the script marks as accessible pages that actually aren't. In fact, the algorithm collects the only pages that return a "200 OK", but when I manually surf to check those pages I found out they have been moved permanently or they have a restricted access. Something went wrong but I cannot spot where should I fix the code exactly, any help is appreciated..
I did not found any problems with your code, except it is almost unreadable. I have rewritten it into this working snippet:
import httplib
host = 'www.google.com'
directories = ['aosicdjqwe0cd9qwe0d9q2we', 'reader', 'news']
for directory in directories:
conn = httplib.HTTPConnection(host)
conn.request('HEAD', '/' + directory)
url = 'http://{0}/{1}'.format(host, directory)
print ' Trying: {0}'.format(url)
response = conn.getresponse()
print ' Got: ', response.status, response.reason
conn.close()
if response.status == 200:
print ("[!] The subdirectory '{0}' "
"could be interesting.").format(directory)
Outputs:
$ python snippet.py
Trying: http://www.google.com/aosicdjqwe0cd9qwe0d9q2we
Got: 404 Not Found
Trying: http://www.google.com/reader
Got: 302 Moved Temporarily
Trying: http://www.google.com/news
Got: 200 OK
[!] The subdirectory 'news' could be interesting.
Also, I did use HEAD HTTP request instead of GET, as it is more efficient if you do not need the contents and you are interested only in the status code.
I would be adviced you to use http://docs.python-requests.org/en/latest/# for http.

Categories