How to fix <Response 500> error in python requests? - python

I am using an API, which receives a pdf file and does some analysis, but I am receiving Response 500 always
Have initially tested using Postman and the request goes through, receiving response 200 with the corresponding JSON information. The SSL security should be turned off.
However, when I try to do request via Python, I always get Response 500
Python code written by me:
import requests
url = "https://{{BASE_URL}}/api/v1/documents"
fin = open('/home/train/aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf', 'rb')
files = {'file': fin}
r = requests.post(url, files=files, verify=False)
print (r)
#r.text is empty
Python code, produced by the Postman:
import requests
url = "https://{{BASE_URL}}/api/v1/documents"
payload = "------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"file\"; filename=\"aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf\"\r\nContent-Type: application/pdf\r\n\r\n\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW--"
headers = {
'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
'Content-Type': "application/x-www-form-urlencoded",
'cache-control': "no-cache",
'Postman-Token': "65f888e2-c1e6-4108-ad76-f698aaf2b542"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
Have masked the API link as {{BASE_URL}} due to the confidentiality
Response by Postman:
{
"id": "5e69058e2690d5b0e519cf4006dfdbfeeb5261b935094a2173b2e79a58e80ab5",
"name": "aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf",
"fileIds": {
"original": "5e69058e2690d5b0e519cf4006dfdbfeeb5261b935094a2173b2e79a58e80ab5.pdf"
},
"creationDate": "2019-06-20T09:41:59.5930472+00:00"
}
Response by Python:
Response<500>
UPDATE:
Tried the GET request - works fine, as I receive the JSON response from it. I guess the problem is in posting pdf file. Is there any other options on how to post a file to an API?
Postman Response RAW:
POST /api/v1/documents
Content-Type: multipart/form-data; boundary=--------------------------375732980407830821611925
cache-control: no-cache
Postman-Token: 3e63d5a1-12cf-4f6b-8f16-3d41534549b9
User-Agent: PostmanRuntime/7.6.0
Accept: */*
Host: {{BASE_URL}}
cookie: c2b8faabe4d7f930c0f28c73aa7cafa9=736a1712f7a3dab03dd48a80403dd4ea
accept-encoding: gzip, deflate
content-length: 3123756
file=[object Object]
HTTP/1.1 200
status: 200
Date: Thu, 20 Jun 2019 10:59:55 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Location: /api/v1/files/95463e88527ecdc94393fde685ab1d05fa0ee0b924942f445b14b75e983c927e
api-supported-versions: 1.0
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Referrer-Policy: strict-origin
{"id":"95463e88527ecdc94393fde685ab1d05fa0ee0b924942f445b14b75e983c927e","name":"aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf","fileIds":{"original":"95463e88527ecdc94393fde685ab1d05fa0ee0b924942f445b14b75e983c927e.pdf"},"creationDate":"2019-06-20T10:59:55.7038573+00:00"}
CORRECT REQUEST
So, eventually - the correct code is the following:
import requests
files = {
'file': open('/home/train/aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf', 'rb'),
}
response = requests.post('{{BASE_URL}}/api/v1/documents', files=files, verify=False)
print (response.text)

A 500 error indicates an internal server error, not an error with your script.
If you're receiving a 500 error (as opposed to a 400 error, which indicates a bad request), then theoretically your script is fine and it's the server-side code that needs to be adjusted.
In practice, it could still be due a bad request though.
If you're the one running the API, then you can check the error logs and debug the code line-by-line to figure out why the server is throwing an error.
In this case though, it sounds like it's a third-party API, correct? If so, I recommend looking through their documentation to find a working example or contacting them if you think it's an issue on their end (which is unlikely but possible).

Related

Python POST multipart/form-data request different behavior from Postman

I'm attempting to use this API endpoint to upload a file:
https://h.app.wdesk.com/s/cerebral-docs/?python#uploadfileusingpost
With this python function:
def upload_file(token, filepath, table_id):
url = "https://h.app.wdesk.com/s/wdata/prep/api/v1/file"
headers = {
'Accept': 'application/json',
'Authorization': f'Bearer {token}'
}
files = {
"tableId": (None, table_id),
"file": open(filepath, "rb")
}
resp = requests.post(url, headers=headers, files=files)
print(resp.request.headers)
return resp.json()
The Content-Type and Content-Length headers are computed and added by the requests library internally as per their documentation. When assigning to the files kwarg in the post function, the library knows it's supposed to be a multipart/form-data request.
The print out of the request header is as follows, showing the Content-Type and Content-Length that the library added. I've omitted the auth token.
{'User-Agent': 'python-requests/2.24.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive',
'Authorization': 'Bearer <omitted>', 'Content-Length': '8201', 'Content-Type': 'multipart/form-data; boundary=bb582b9071574462d44c4b43ec4d7bf3'}
The json response from the API is:
{'body': ['contentType must not be null'], 'code': 400}
The odd thing is that the same request, when made through Postman, gives a different response - which is what I expected from Python as well.
{ "code": 409, "body": "duplicate file name" }
These are the Postman request headers:
POST /s/wdata/prep/api/v1/file HTTP/1.1
Authorization: Bearer <omitted>
Accept: */*
Cache-Control: no-cache
Postman-Token: 34ed08d4-4467-4168-a4e4-c83b16ce9afb
Host: h.app.wdesk.com
Content-Type: multipart/form-data; boundary=--------------------------179907322036790253179546
Content-Length: 8279
The Postman request also computes the Content-Type and Content-Length headers when the request is sent, and are not user specified.
I am quite confused as to why I'm getting two different behaviors from the API service for the same request.
There must be something I'm missing and can't figure out what it is.
Figured out what was wrong with my request, compared to NodeJS and Postman.
The contentType being referred to in the API's error message was the file parameter's content type, not the http request header Content-Type.
The upload started to work flawlessly when I updated my file parameter like so:
files = {
"tableId": (None, table_id),
"file": (Path(filepath).name, open(filepath, "rb"), "text/csv", None)
}
I learned that Python's requests library will not automatically add the file's mime type to the request body. We need to be explicit about it.
Hope this helps someone else too.

python requests and urllib https errors

I want to read the data from the nasa earth api, opening the url in the browser displays the data. When I try to make a GET request with python and urllib it throws an error.
request.urlopen("https://api.nasa.gov/planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY").read()
urllib.error.HTTPError: HTTP Error 400: Bad Request
When I try it with Requests. It returns an error.
r = requests.get("https://api.nasa.gov/planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY")
r.content is:
{"error": {"code": "HTTPS_REQUIRED", "message": "Requests must be made over HTTPS. Try accessing the API at: https://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY"}}
If i print out r.url it is http and not https:
http://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
I dunno why it happens i am using python 3.7. Any help is appreciated
I was able to reproduce your mistake. However, when I copied the link from the Nasa website, it worked:
r = requests.get("https://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY")
r.json()
Are you sure your URL is correct? When i run this in request with debug logging i see that the first request gets HTTP 301 redirected
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.nasa.gov:443
send: b'GET /planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY HTTP/1.1\r\nHost: api.nasa.gov\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 301 MOVED PERMANENTLY\r\n'
header: Server: openresty
header: Date: Tue, 25 Feb 2020 10:56:09 GMT
header: Content-Type: text/html; charset=utf-8
header: Content-Length: 399
header: Connection: keep-alive
header: X-RateLimit-Limit: 40
header: X-RateLimit-Remaining: 36
header: Location: http://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
The URL returned here is http which then results in a request to that which returns a HTTP 400 bad request
send: b'GET /planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY HTTP/1.1\r\nHost: api.nasa.gov\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:urllib3.connectionpool:http://api.nasa.gov:80 "GET /planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY HTTP/1.1" 400 None
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Server: openresty
header: Date: Tue, 25 Feb 2020 10:56:09 GMT
Looking at your URL vs the one that it tells you to use they are differnt.
Your URL : https://api.nasa.gov/planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
Their URL: https://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
You look like you are missing a / after the word imagery. When i use the URL they suggest i get data back like
b'{\n "date": "2014-02-04T03:30:01", \n "id": "LC8_L1T_TOA/LC81270592014035LGN00", \n "resource": {\n "dataset": "LC8_L1T_TOA", \n "planet": "earth"\n }, \n "service_version": "v1", \n "url": "https://earthengine.googleapis.com/api/thumb?thumbid=1e37797ab6e6638b5a0d02392acb479f&token=dc7d50c412dd5dcd7b014d52f0a1f91c"\n}'

Login to website using http.client

I am trying to login to a website using http.client in Python using the following code:
import urllib.parse
import http.client
payload = urllib.parse.urlencode({"username": "USERNAME-HERE",
"password": "PASSWORD-HERE",
"redirect": "index.php",
"sid": "",
"login": "Login"})
conn = http.client.HTTPConnection("osu.ppy.sh:80")
conn.request("POST", "/forum/ucp.php?mode=login", payload)
response = conn.getresponse()
data = response.read()
# print the HTML after the request
print(bytes(str(data), "utf-8").decode("unicode_escape"))
I know that a common suggestion is to just use the Requests library, and I have tried this, but I specifically want to know how to do this without using Requests.
The behavior I am looking for can be replicated with the following code that successfully logs in to the site using Requests:
import requests
payload = {"username": "USERNAME-HERE",
"password": "PASSWORD-HERE",
"redirect": "index.php",
"sid": "",
"login": "Login"}
p = requests.Session().post('https://osu.ppy.sh/forum/ucp.php?mode=login', payload)
# print the HTML after the request
print(p.text)
It seems to me that the http.client code is not "delivering" the payload, while the Requests code is.
Any insights? Am I overlooking something?
EDIT: Adding conn.set_debuglevel(1) gives the following output:
send: b'POST /forum/ucp.php?mode=login HTTP/1.1
Host: osu.ppy.sh
Accept-Encoding: identity
Content-Length: 70'
send: b'redirect=index.php&sid=&login=Login&username=USERNAME-HERE&password=PASSWORD-HERE'
reply: 'HTTP/1.1 200 OK'
header: Date
header: Content-Type
header: Transfer-Encoding
header: Connection
header: Set-Cookie
header: Cache-Control
header: Expires
header: Pragma
header: X-Frame-Options
header: X-Content-Type-Options
header: Server
header: CF-RAY
since you are urlencoding your payload, you must send the header: application/x-www-form-urlencoded

Recreate curl command that sends JSON as multipart/form-data using Python-Requests

I'm trying to create a Python-Requests version of the following curl POST command (which works perfectly and provides the expected response):
curl -F 'json={"method":"update_video","params":{"video":{"id":"129263001","itemState":"INACTIVE"},"token":"jCoXH5OAMYQtXm1sg62KAF3ysG90YLagDAdlhg.."}}' https://api.somewebservice.com/services/post
Using:
curl -v -F 'json={"method":"update_video","params":{"video":{"id":"582984001","itemState":"INACTIVE"},"token":"jCoXH5OAMYQtXm1sg62KAF3ysG90YLagEECDAdlhg.."}}' https://api.somewebservice.com/services/post
I get the following (only including output after all the TLS handshakes, server certificate data, etc):
....
> POST /services/post HTTP/1.1
> User-Agent: curl/7.41.0
> Host: api.somewebservice.com
> Accept: */*
> Content-Length: 294
> Expect: 100-continue
> Content-Type: multipart/form-data; boundary=------------------------871a9aa84d3c0de2
>
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Content-Type: application/json;charset=UTF-8
< Content-Length: 1228
< Date: Sun, 10 Apr 2016 07:04:00 GMT
< Server: somewebservice
Given that the above cURL command works perfectly and given this output here running in verbose mode, am I correct in assuming that what I need to do is take a multi-part/form approach that sends a JSON object in a form if I'm trying to recreate this using Python-Requests?
So far, I've tried:
import requests
import json
def deactivate_request():
url = "https://api.somewebservice.com/services/post"
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
payload = {"method":"update_video","params":{"video":{"id":"12926301","itemState":"INACTIVE"},"token":"jCoXH5OKAF3ysG90YLagEECTP16uOUSg_fEGDAdlhg.."}}
r = requests.post(url, json=payload, headers=headers)
print(r.text)
I've tried different variations too, like:
r = requests.post(url, data=json.dumps(payload), headers=headers)
or without headers, like this:
r = requests.post(url, data=json.dumps(payload))
or this:
r = requests.post(url, json=payload)
And nothing seems to work, I just keep getting the same error message:
{"error": {"name":"MissingJSONError","message":"Could not find JSON-RPC.","code":211}, "result": null, "id": null}
The documentation for this web service for that "211" error states that:
We got a null string for either the json parameter (for a non-multipart post) or the first part of a multipart post.
What am I doing wrong here in terms of recreating this cURL request using the Requests module? I thought that I could send the payload object as form-encoded data, and it looks like that is what the cURL command is doing with the -F argument there.
Apparently this curl command can be recreated with the following:
import requests
def deactivate_request():
url = "https://api.somewebservice.com/services/post"
print url
#headers = {"Authorization": "Bearer " + token, "Content-Type": "application/json"}
headers = {'Content-Type': 'application/json'}
print(headers)
payload = 'json={"method":"update_video", "params":{"video":{"id":"620001", "itemState":"INACTIVE"}, "token":"jCoXH5OAMYQtXm1sg62KAF3yECTP16uOUSg_fEGDAdlhg.."}}'
# using params instead of data because we are making this POST request by
# constructing query string URL with key/value pairs in it.
r = requests.post(url, params=payload, headers=headers)
Not quite obvious as the curl command uses 'multipart/form-data' in its header whereas with the above we're just using 'application/json'.

Retrieve ALL header data with urllib

I've scraped many websites and have often wondered why the response headers displayed in Firebug and the response headers returned by urllib.urlopen(url).info() are often different in that Firebug reports MORE headers.
I encountered an interesting one today. I'm scraping a website by following a "search url" that fully loads (returns a 200 status code) before redirecting to a final page. The easiest way to perform the scrape would be to return the Location response header and make another request. However, that particular header is absent when I run 'urllib.urlopen(url).info().
Here is the difference:
Firebug headers:
Cache-Control : no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Connection : keep-alive
Content-Encoding : gzip
Content-Length : 2433
Content-Type : text/html
Date : Fri, 05 Oct 2012 15:59:31 GMT
Expires : Thu, 19 Nov 1981 08:52:00 GMT
Location : /catalog/display/1292/index.html
Pragma : no-cache
Server : Apache/2.0.55
Set-Cookie : PHPSESSID=9b99dd9a4afb0ef0ca267b853265b540; path=/
Vary : Accept-Encoding,User-Agent
X-Powered-By : PHP/4.4.0
Headers returned by my code:
Date: Fri, 05 Oct 2012 17:16:23 GMT
Server: Apache/2.0.55
X-Powered-By: PHP/4.4.0
Set-Cookie: PHPSESSID=39ccc547fc407daab21d3c83451d9a04; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding,User-Agent
Content-Type: text/html
Connection: close
Here's my code:
from BeautifulSoup import BeautifulSoup
import urllib
import psycopg2
import psycopg2.extras
import scrape_tools
tools = scrape_tools.tool_box()
db = tools.db_connect()
cursor = db.cursor(cursor_factory = psycopg2.extras.RealDictCursor)
cursor.execute("SELECT data FROM table WHERE variable = 'Constant' ORDER BY data")
for row in cursor:
url = 'http://www.website.com/search/' + row['data']
headers = {
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' : 'gzip, deflate',
'Accept-Language' : 'en-us,en;q=0.5',
'Connection' : 'keep-alive',
'Host' : 'www.website.com',
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0.1'
}
post_params = {
'query' : row['data'],
'searchtype' : 'products'
}
post_args = urllib.urlencode(post_params)
soup = tools.request(url, post_args, headers)
print tools.get_headers(url, post_args, headers)
Please note: scrape_tools is a module I wrote myself. The code contained in the module to retrieve headers is (basically) as follows:
class tool_box:
def get_headers(self, url, post, headers):
file_pointer = urllib.urlopen(url, post, headers)
return file_pointer.info()
Is there a reason for the discrepancy? Am I making a silly mistake in my code? How can I retrieve the missing header data? I'm fairly new to Python, so please forgive any dumb errors.
Thanks in advance. Any advice is much appreciated!
Also...Sorry about the wall of code =\
You're not getting the same kind of response for the two requests. For example, the response to the Firefox request contains a Location: header, so it's probably a 302 Moved temporarily or a 301. Those don't contain any actual body data, but instead cause your Firefox to issue a second request to the URL in the Location: header (urllib doesn't do that).
The Firefox response also uses Connection : keep-alive while the urllib request got answered with Connection: close.
Also, the Firefox response is gzipped (Content-Encoding : gzip), while the urllib one is not. That's probably because your Firefox sends a Accept-Encoding: gzip, deflate header with its request.
Don't rely on Firebug to tell you HTTP headers (even though it does so truthfully most of the time), but use a sniffer like wireshark to inspect what's actually going over the wire.
You're obviously dealing with two different responses.
There could be several reasons for this. For one, web servers are supposed to respond differently depending on what Accept-Language, Accept-Encoding headers etc.. the client sends in its request. Then there's also the possibility that the server does some kind of User-Agent sniffing.
Either way, capture your requests with urllib as well as the ones with Firefox using wireshark and first compare the requests (not the headers, but the actual GET / HTTP/1.0 part. Are they really the same? If yes, move on to comparing request headers and start manually setting the same headers for the urllib request until you figure out which headers make a difference.

Categories