python requests and urllib https errors - python

I want to read the data from the nasa earth api, opening the url in the browser displays the data. When I try to make a GET request with python and urllib it throws an error.
request.urlopen("https://api.nasa.gov/planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY").read()
urllib.error.HTTPError: HTTP Error 400: Bad Request
When I try it with Requests. It returns an error.
r = requests.get("https://api.nasa.gov/planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY")
r.content is:
{"error": {"code": "HTTPS_REQUIRED", "message": "Requests must be made over HTTPS. Try accessing the API at: https://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY"}}
If i print out r.url it is http and not https:
http://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
I dunno why it happens i am using python 3.7. Any help is appreciated

I was able to reproduce your mistake. However, when I copied the link from the Nasa website, it worked:
r = requests.get("https://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY")
r.json()

Are you sure your URL is correct? When i run this in request with debug logging i see that the first request gets HTTP 301 redirected
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.nasa.gov:443
send: b'GET /planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY HTTP/1.1\r\nHost: api.nasa.gov\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 301 MOVED PERMANENTLY\r\n'
header: Server: openresty
header: Date: Tue, 25 Feb 2020 10:56:09 GMT
header: Content-Type: text/html; charset=utf-8
header: Content-Length: 399
header: Connection: keep-alive
header: X-RateLimit-Limit: 40
header: X-RateLimit-Remaining: 36
header: Location: http://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
The URL returned here is http which then results in a request to that which returns a HTTP 400 bad request
send: b'GET /planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY HTTP/1.1\r\nHost: api.nasa.gov\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:urllib3.connectionpool:http://api.nasa.gov:80 "GET /planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY HTTP/1.1" 400 None
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Server: openresty
header: Date: Tue, 25 Feb 2020 10:56:09 GMT
Looking at your URL vs the one that it tells you to use they are differnt.
Your URL : https://api.nasa.gov/planetary/earth/imagery?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
Their URL: https://api.nasa.gov/planetary/earth/imagery/?lon=100.75&lat=1.5&date=2014-02-01&api_key=DEMO_KEY
You look like you are missing a / after the word imagery. When i use the URL they suggest i get data back like
b'{\n "date": "2014-02-04T03:30:01", \n "id": "LC8_L1T_TOA/LC81270592014035LGN00", \n "resource": {\n "dataset": "LC8_L1T_TOA", \n "planet": "earth"\n }, \n "service_version": "v1", \n "url": "https://earthengine.googleapis.com/api/thumb?thumbid=1e37797ab6e6638b5a0d02392acb479f&token=dc7d50c412dd5dcd7b014d52f0a1f91c"\n}'

Related

How to fix <Response 500> error in python requests?

I am using an API, which receives a pdf file and does some analysis, but I am receiving Response 500 always
Have initially tested using Postman and the request goes through, receiving response 200 with the corresponding JSON information. The SSL security should be turned off.
However, when I try to do request via Python, I always get Response 500
Python code written by me:
import requests
url = "https://{{BASE_URL}}/api/v1/documents"
fin = open('/home/train/aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf', 'rb')
files = {'file': fin}
r = requests.post(url, files=files, verify=False)
print (r)
#r.text is empty
Python code, produced by the Postman:
import requests
url = "https://{{BASE_URL}}/api/v1/documents"
payload = "------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"file\"; filename=\"aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf\"\r\nContent-Type: application/pdf\r\n\r\n\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW--"
headers = {
'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
'Content-Type': "application/x-www-form-urlencoded",
'cache-control': "no-cache",
'Postman-Token': "65f888e2-c1e6-4108-ad76-f698aaf2b542"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
Have masked the API link as {{BASE_URL}} due to the confidentiality
Response by Postman:
{
"id": "5e69058e2690d5b0e519cf4006dfdbfeeb5261b935094a2173b2e79a58e80ab5",
"name": "aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf",
"fileIds": {
"original": "5e69058e2690d5b0e519cf4006dfdbfeeb5261b935094a2173b2e79a58e80ab5.pdf"
},
"creationDate": "2019-06-20T09:41:59.5930472+00:00"
}
Response by Python:
Response<500>
UPDATE:
Tried the GET request - works fine, as I receive the JSON response from it. I guess the problem is in posting pdf file. Is there any other options on how to post a file to an API?
Postman Response RAW:
POST /api/v1/documents
Content-Type: multipart/form-data; boundary=--------------------------375732980407830821611925
cache-control: no-cache
Postman-Token: 3e63d5a1-12cf-4f6b-8f16-3d41534549b9
User-Agent: PostmanRuntime/7.6.0
Accept: */*
Host: {{BASE_URL}}
cookie: c2b8faabe4d7f930c0f28c73aa7cafa9=736a1712f7a3dab03dd48a80403dd4ea
accept-encoding: gzip, deflate
content-length: 3123756
file=[object Object]
HTTP/1.1 200
status: 200
Date: Thu, 20 Jun 2019 10:59:55 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Location: /api/v1/files/95463e88527ecdc94393fde685ab1d05fa0ee0b924942f445b14b75e983c927e
api-supported-versions: 1.0
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Referrer-Policy: strict-origin
{"id":"95463e88527ecdc94393fde685ab1d05fa0ee0b924942f445b14b75e983c927e","name":"aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf","fileIds":{"original":"95463e88527ecdc94393fde685ab1d05fa0ee0b924942f445b14b75e983c927e.pdf"},"creationDate":"2019-06-20T10:59:55.7038573+00:00"}
CORRECT REQUEST
So, eventually - the correct code is the following:
import requests
files = {
'file': open('/home/train/aab2wieuqcnvn3g6syadumik4bsg5.0062.pdf', 'rb'),
}
response = requests.post('{{BASE_URL}}/api/v1/documents', files=files, verify=False)
print (response.text)
A 500 error indicates an internal server error, not an error with your script.
If you're receiving a 500 error (as opposed to a 400 error, which indicates a bad request), then theoretically your script is fine and it's the server-side code that needs to be adjusted.
In practice, it could still be due a bad request though.
If you're the one running the API, then you can check the error logs and debug the code line-by-line to figure out why the server is throwing an error.
In this case though, it sounds like it's a third-party API, correct? If so, I recommend looking through their documentation to find a working example or contacting them if you think it's an issue on their end (which is unlikely but possible).

Login to website using http.client

I am trying to login to a website using http.client in Python using the following code:
import urllib.parse
import http.client
payload = urllib.parse.urlencode({"username": "USERNAME-HERE",
"password": "PASSWORD-HERE",
"redirect": "index.php",
"sid": "",
"login": "Login"})
conn = http.client.HTTPConnection("osu.ppy.sh:80")
conn.request("POST", "/forum/ucp.php?mode=login", payload)
response = conn.getresponse()
data = response.read()
# print the HTML after the request
print(bytes(str(data), "utf-8").decode("unicode_escape"))
I know that a common suggestion is to just use the Requests library, and I have tried this, but I specifically want to know how to do this without using Requests.
The behavior I am looking for can be replicated with the following code that successfully logs in to the site using Requests:
import requests
payload = {"username": "USERNAME-HERE",
"password": "PASSWORD-HERE",
"redirect": "index.php",
"sid": "",
"login": "Login"}
p = requests.Session().post('https://osu.ppy.sh/forum/ucp.php?mode=login', payload)
# print the HTML after the request
print(p.text)
It seems to me that the http.client code is not "delivering" the payload, while the Requests code is.
Any insights? Am I overlooking something?
EDIT: Adding conn.set_debuglevel(1) gives the following output:
send: b'POST /forum/ucp.php?mode=login HTTP/1.1
Host: osu.ppy.sh
Accept-Encoding: identity
Content-Length: 70'
send: b'redirect=index.php&sid=&login=Login&username=USERNAME-HERE&password=PASSWORD-HERE'
reply: 'HTTP/1.1 200 OK'
header: Date
header: Content-Type
header: Transfer-Encoding
header: Connection
header: Set-Cookie
header: Cache-Control
header: Expires
header: Pragma
header: X-Frame-Options
header: X-Content-Type-Options
header: Server
header: CF-RAY
since you are urlencoding your payload, you must send the header: application/x-www-form-urlencoded

Python library requests open the wrong page

I try to open a html page with python requests library but my code open the site root folder and i don't understand how solve the problem.
import requests
scraping = requests.request("POST", url = "http://www.pollnet.it/WeeklyReport_it.aspx?ID=69")
print scraping.content
Thank you for all suggestion!
You can see easily that the server is redirecting to the main page.
➜ ~ http -v http://www.pollnet.it/WeeklyReport_it.aspx\?ID\=69
GET /WeeklyReport_it.aspx?ID=69 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: www.pollnet.it
User-Agent: HTTPie/0.9.3
HTTP/1.1 302 Found
Content-Length: 131
Content-Type: text/html; charset=utf-8
Date: Sun, 07 Feb 2016 11:24:52 GMT
Location: /default.asp
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
On further checking, it can be seen that the web server uses session cookies.
➜ ~ http -v http://www.pollnet.it/default_it.asp
HTTP/1.1 200 OK
Cache-Control: private
Content-Encoding: gzip
Content-Length: 9471
Content-Type: text/html; Charset=utf-8
Date: Sun, 07 Feb 2016 13:21:41 GMT
Server: Microsoft-IIS/7.5
Set-Cookie: ASPSESSIONIDSQTSTAST=PBHDLEIDFCNMPKIGANFDNMLK; path=/
Vary: Accept-Encoding
X-Powered-By: ASP.NET
It means that every time the main page is visited, the server sends a "Set-Cookie" header, which instructs the browser to set certain cookies. Then every time the browser asks for a Weekly Report, the server validates the session cookie.
Normally. requests package does not save cookies in between requests, but to do the scraping, we can use a Session object which will save the cookies in between page requests.
import requests
# create a Session object
s= requests.Session()
# first visit the main page
s.get("http://www.pollnet.it/default_it.asp")
# then we can visit the weekly report pages
r = s.get("http://www.pollnet.it/WeeklyReport_it.aspx?ID=69")
print(r.text)
# another page
r = s.get("http://www.pollnet.it/WeeklyReport_it.aspx?ID=89")
print(r.text)
But here is some advice - the web server may only allow opening of a fixed number of pages (maybe 10, maybe 15) with a certain Session object. Either immediately validate the results of r.text each time (maybe check the length of the request body to ensure it is not too small), or create a new Session object, for every 5 or 6 pages.
More info on Session objects here.

How is json data posted?

My flask code is as follows:
#app.route('/sheets/api',methods=["POST"])
def insert():
if request.get_json():
return "<h1>Works! </h1>"
else:
return "<h1>Does not work.</h1>"
When the request is:
POST /sheets/api HTTP/1.1
Host: localhost:10080
Cache-Control: no-cache
{'key':'value'}
The result is <h1>Does not work.</h1>.
When I added a Content-Type header:
POST /sheets/api HTTP/1.1
Host: localhost:10080
Content-Type: application/json
Cache-Control: no-cache
{'key':'value'}
I get a 400 error.
What am I doing wrong?
You are not posting valid JSON. JSON strings use double quotes:
{"key":"value"}
With single quotes the string is not valid JSON and a 400 Bad Request response is returned.
Demo against a local Flask server implementing only your route:
>>> import requests
>>> requests.post('http://localhost:5000/sheets/api', data="{'key':'value'}", headers={'content-type': 'application/json'}).text
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>400 Bad Request</title>\n<h1>Bad Request</h1>\n<p>The browser (or proxy) sent a request that this server could not understand.</p>\n'
>>> requests.post('http://localhost:5000/sheets/api', data='{"key":"value"}', headers={'content-type': 'application/json'}).text
'<h1>Works! </h1>'

Mechanize for Python and authorization without first getting a 401 error

I'm trying to use Mechanize to automate interactions with a very picky legacy system. In particular, after the first login page the authorization must be sent with every request it knocks you out of the system. Unfortunately, Mechanize seems content on only sending the authorization after first getting a 401 Unauthorized error. Is there any way to have it send authorization every time?
Here's some sample code:
br.add_password("http://example.com/securepage", "USERNAME", "PASSWORD", "/MYREALM")
br.follow_link(link_to_secure_page) # where the url is the previous URL
Here's the response I get from debugging Mechanize:
send: 'GET /securepage HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: example.com\r\nReferer: http://example.com/home\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 401 Unauthorized\r\n'
header: Server: Tandy1000Web
header: Date: Thu, 08 Dec 2011 03:08:04 GMT
header: Connection: close
header: Expires: Tue, 01 Jan 1980 06:00:00 GMT
header: Content-Type: text/html; charset=US-ASCII
header: Content-Length: 210
header: WWW-Authenticate: Basic realm="/MYREALM"
header: Cache-control: no-cache
send: 'GET /securepage HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: example.com\r\nReferer: http://example.com/home\r\nConnection: close\r\nAuthorization: Basic VVNFUk5BTUU6UEFTU1dPUkQ=\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Tandy1000Web
header: Date: Thu, 08 Dec 2011 03:08:07 GMT
header: Connection: close
header: Last-Modified: Thu, 08 Dec 2011 03:08:06 GMT
header: Expires: Tue, 01 Jan 1980 06:00:00 GMT
header: Content-Type: text/html; charset=UTF-8
header: Content-Length: 33333
header: Cache-control: no-cache
The problem is that contrary to what should happen in modern web application with a GET request, by hitting the 401 error first I get the wrong page. I've confirmed with CURL and urllib2 that if I hit the URL directly by passing in the auth header on the first request I get the correct page.
Any hints on how to tell mechanize to always send the auth headers and avoid the first 401 error? This needs to be fixed on the client side. I can't modify the server.
from base64 import b64encode
import mechanize
url = 'http://192.168.3.5/table.js'
username = 'admin'
password = 'password'
# I have had to add a carriage return ('%s:%s\n'), but
# you may not have to.
b64login = b64encode('%s:%s' % (username, password))
br = mechanize.Browser()
# # I needed to change to Mozilla for mine, but most do not
# br.addheaders= [('User-agent', 'Mozilla/5.0')]
br.addheaders.append(
('Authorization', 'Basic %s' % b64login )
)
br.open(url)
r = br.response()
data = r.read()
print data

Categories