python requests module get method followed by json method getting stuck. - python

I am facing an issue with 'requests' module of python.
I have these three lines of code:
print '\n\nTrying to fetch Tweets from URL %s' % url
newTweets = requests.get(url).json()
print 'Fetched %d tweets from URL: %s' % (len(newTweets), url)
And somehow the program execution gets stuck (program halts) on the second line. The 'url' parameter is a valid url to our backend server which serves 'valid' json.
I have just started to experience this issue today. There are no loops in the code, so there's no scope for infinite looping. However, I still don't know what exactly happens inside 'get' and 'json' methods of requests module.
If anyone have any explanation for this, kindly reply.

Split your program into several steps
newTweets = requests.get(url)
then check status code for whatever you expect to return, e.g.:
if newTweets.status_code != 200:
# exception handling
return newTweets.json()

Related

Python fails to parse JSON in API calls

I want to write a python script that retrieves data from a website using it's API. Using the code below (which comes from this [tutorial][1] - click [here for more context][2]), I am able to successfully submit a query to the website but not retrieve the data I need.
import requests, sys, json
WEBSITE_API = "https://rest.uniprot.org/beta"
def get_url(url, **kwargs):
response = requests.get(url, **kwargs);
if not response.ok:
print(response.text)
response.raise_for_status()
sys.exit()
return(response)
r = get_url(f"{WEBSITE_API}/uniprotkb/search?query=parkin AND (taxonomy_id:9606)&fields=id,accession,length,ft_site", headers={"Accept": "text/tsv"})
print(r.text)
Instead of printing the data I've queried, I get an error message:
{"url":"http://rest.uniprot.org/uniprotkb/search","messages":["Invalid request received. Requested media type/format not accepted: 'text/tsv'."]}
I figured out that I can print the data as a JSON-formatted string if I change the second-to-last line of my code to this:
r = get_url(f"{WEBSITE_API}/uniprotkb/search?query=parkin AND (taxonomy_id:9606)&fields=id,accession,length,ft_site")
In other words, I simply remove the following code snippet:
headers={"Accept": "text/tsv"}
I highly doubt this is an issue with the website because the tutorial the non-working code comes from was from a November 2021 webinar put on by an affiliated institution. I think a more likely explanation is that my python environment lacks some library and would appreciate any advice about how to investigate this possibility. Another possible solution I would appreciate is code and/or library suggestions that would put me on the path to simply parsing the JSON-string into a tab-delimited file.
[1]: https://colab.research.google.com/drive/1SU3j4VmXHYrYxOlrdK_NCBhajXb0iGes#scrollTo=EI91gRfPYfbc
[2]: https://drive.google.com/file/d/1qZXLl19apibjszCXroC1Jg2BGjcychqX/view

How to make Python go through URLs in a text file, check their status codes, and exclude all ones with 404 error?

I tried the following script, but unfortunately the output file is identical to the input file. I'm not sure what's wrong with it.
import requests
url_lines = open('banana1.txt').read().splitlines()
remove_from_urls = []
for url in url_lines:
remove_url = requests.get(url)
print(remove_url.status_code)
if remove_url.status_code == 404:
remove_from_urls.append(url)
continue
url_lines = [url for url in url_lines if url not in remove_from_urls]
print(url_lines)
# Save urls example
with open('banana2.txt', 'w+') as file:
for item in url_lines:
file.write(item + '\n')
There seems to be no error in your code, but there are few things that would help to make it more readable and consistent. The first course of action should be to make sure there is at least one url that would return a 404 status code.
Edit: After providing the actual URL.
The 404 problem
In your case, the problem is the Twitter actually does not return 404 error for your "Not found" url. You can test it using curl:
$ curl -o /dev/null -w "%{http_code}" "https://twitter.com/davemeltzerWON/status/1321279214365016064"
200
Or using Python:
import requests
response = requests.get("https://twitter.com/davemeltzerWON/status/1321279214365016064")
print(response.status_code)
The output for both should be 200.
Since Twitter is a JavaScript application that loads its content after it has been processed in browser, you cannot find the information you are looking for in the HTML response. You would need to use something like Selenium to actually process the JavaScript for you and then you would be able to look for actual text like "not found" on the web page.
Code review
Please make sure to close the file properly. Also, file object is a lines iterator, you can convert it to list very easily. Another trick to make the code more readable is to make use of Python set. So you may read the file like this:
with open("banana1.txt") as fid:
url_lines = set(fid)
Then you simply remove all the links that do not work:
not_working = set()
for url in url_lines:
if requests.get(url).status_code == 404:
not_working.add(url)
working = url_lines - not_working
with open("banana2.txt", "w") as fid:
fid.write("\n".join(working))
Also, if some of the links point to the same server, you should make use of requests.Session class:
from requests import Session
session = Session()
Then replace requests.get with session.get, you should get some performance boost since the Session uses keep-alive connection and many other things.

Capturing Response Body for a HTTP Error in python

Need to capture the response body for a HTTP error in python. Currently using the python request module's raise_for_status(). This method only returns the Status Code and description. Need a way to capture the response body for a detailed error log.
Please suggest alternatives to python requests module if similar required feature is present in some different module. If not then please suggest what changes can be done to existing code to capture the said response body.
Current implementation contains just the following:
resp.raise_for_status()
I guess I'll write this up quickly. This is working fine for me:
try:
r = requests.get('https://www.google.com/404')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err.request.url)
print(err)
print(err.response.text)
you can do something like below, which returns the content of the response, in unicode.
response.text
or
try:
r = requests.get('http://www.google.com/nothere')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
print(err)
sys.exit(1)
# 404 Client Error: Not Found for url: http://www.google.com/nothere
here you'll get the full explanation on how to handle the exception. please check out Correct way to try/except using Python requests module?
You can log resp.text if resp.status_code >= 400.
There are some tools you may pick up such as Fiddler, Charles, wireshark.
However, those tools can just display the body of the response without including the reason or error stack why the error raises.

Continue Processing requests after failure in python

I have a script that runs and submits urls in a text file via GET to an API and saves the response to a text file. However, the for loop quits if I get a failure in the first section and does not continue passing the others. How can I still grab the failure and continue on with the others without the script exiting before it finishes?
sys.stdout=open("mylog.txt","w")
for row in range(0, len(exampleData)):
url = exampleData[row][0]
print (url)
response = requests.get(url, auth=(user, pwd))
if response.status_code != 200:
print('Failure Message {}' .format(response.text))
work = 'failed'
continue
data = json.loads(response.text)
print(data)
work = 'succeeded'
sys.stdout.close()
Use continue instead of exit()
Use an exception to catch the failure and continue on.
Now that your loop control is corrected, it must be working correctly. It will print out a failure message every time it gets an error response (not 200). If you're only seeing one error message, you're only getting one non-200 response from the other side. If that's not what you expect, the problem is on the server side. (Or in the contents of exampleData.)
You need to debug your own server-client system. Simplify this loop so that the only thing it does is print diagnostic information about the response (e.g., print status_code), and find out what's really going on.

How to form an anonymous request to Imgur's APIv3

A while ago, I made a python function which took a URL of an image and passed it to Imgur's API v2. Since I've been notified that the v2 API is going to be deprecated, I've attempted to make it using API v3.
As they say in the Imgur API documentation:
[Sending] an authorization header with your client_id along with your requests [...] also works if you'd like to upload images anonymously (without the image being tied to an account). This lets us know which application is accessing the API.**
Authorization: Client-ID YOURCLIENTID
It's unclear to me (especially with the italics they put) whether they mean that the header should be {'Authorization': 'Client-ID ' + clientID}, or {'Authorization: Client-ID ': clientID}, or {'Authorization:', 'Client-ID ' + clientID}, or some other variation...
Either way, I tried and this is what I got (using Python 2.7.3):
def sideLoad(imgURL):
img = urllib.quote_plus(imgURL)
req = urllib2.Request('https://api.imgur.com/3/image',
urllib.urlencode([('image', img),
('key', clientSecret)]))
req.add_header('Authorization', 'Client-ID ' + clientID)
response = urllib2.urlopen(req)
return response.geturl()
This seems to me like it does everything Imgur wants me to do: I've got the right endpoint, passing data to urllib2.Request makes it a POST request according to the Python docs, I'm passing the image parameter with the form-encoded URL, I also tried giving it my client secret as a POST parameter since I got an error saying I need an ID (even though there is no mention of the need for me to use my client secret anywhere in the relevant documentation). I add the Authorization header and it seems to be the right form, so... why am I getting an Error 400: Bad Request?
Side-question: I might be able to debug it myself if I could see the actual error Imgur returns, but because it returns an erroneous HTTP status, Python dies and gives me one of those nauseating stack traces. Is there any way I could have Python stop whining and give me the error message JSON that I know Imgur returns?
Well, I'll be damned. I tried taking out the encoding functions and just straight up forming the string, and I got it to work. I guess Imgur's API expects the non-form-encoded URL?
Oh... or was it because I used both quote_plus() and url_encode(), encoding the URL twice? That seems even more likely...
This is my working solution, at long last, for something that took me a day when I thought it'd take an hour at most:
def sideLoad(imgURL):
img = urllib.quote_plus(imgURL)
req = urllib2.Request('https://api.imgur.com/3/image', 'image=' + img)
req.add_header('Authorization', 'Client-ID ' + clientID)
response = urllib2.urlopen(req)
response = json.loads(response.read())
return str(response[u'data'][u'link'])
It's not a final version, mind you, it still lacks some testing (I'll see whether I can get rid of quote_plus(), or if it's perhaps preferable to use url_encode alone) as well as error handling (especially for big gifs, the most frequent case of failure).
I hope this helps! I searched all over Google, Imgur and Stack Overflow and the information about anonymous usage of APIv3 were confusing (and drowned in a sea of utterly horrifying OAuth2 stuff).
In python 3.4 using urllib I was able to do it like this:
import urllib.request
import json
opener = urllib.request.build_opener()
opener.addheaders = [("Authorization", "Client-ID"+ yourClientId)]
jsonStr = opener.open("https://api.imgur.com/3/image/"+pictureId).read().decode("utf-8")
jsonObj = json.loads(jsonStr)
#jsonObj is a python dictionary of the imgur json response.

Categories