Python 3.4 HTTP Error 505 retrieving json from url - python

I am trying to connect to a page that takes in some values and return some data in JSON format in Python 3.4 using urllib. I want to save the values returned from the json into a csv file.
This is what I tried...
import json
import urllib.request
url = 'my_link/select?wt=json&indent=true&f=value'
response = urllib.request.Request(url)
response = urllib.request.urlopen(response)
data = response.read()
I am getting an error below:
urllib.error.HTTPError: HTTP Error 505: HTTP Version Not Supported
EDIT: Found a solution to my problem. I answered it below.

You have found a server that apparently doesn't want to talk HTTP/1.1. You could try lying to it by claiming you are using a HTTP/1.0 client instead, by patching the http.client.HTTPConnection class:
import http.client
http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
and re-trying your request.

I used FancyURLopener and it worked. Found this useful: docs.python.org: urllib.request
url_request = urllib.request.FancyURLopener({})
with url_request.open(url) as url_opener:
json_data = url_opener.read().decode('utf-8')
with open(file_output, 'w', encoding ='utf-8') as output:
output.write(json_data)
Hope this helps those having the same problems as mine.

Related

Artifactory AQL Query Using Python 3 'Requests'

Would anybody be able to help me identify where I am going wrong with a very basic AQL I am trying to execute using Python 3 and the 'Requests' library?
No matter what I do I cant seem to get past a 400: Bad Request. It is clearly something to do with the formatting of the data I am trying to Post but just can't see it. I'm assuming I want to pass it as a string as when posting sql queries they have to be in plain text.
Just as an aside, I can execute this query absolutely fine using postman etc.
import requests
from requests.auth import HTTPBasicAuth
import json
HEADERS = {'Content-type': 'text'}
DATA = 'items.find({"type":"folder","repo":{"$eq":"BOS-Release-Builds"}}).sort({"$desc":["created"]}).limit(1)'
response = requests.post('https://local-Artifactory/artifactory/api/search/aql', headers=HEADERS, auth=HTTPBasicAuth('user', 'password'), data = DATA)
print (response.status_code)
##print (response.json())
def jprint(obj):
text = json.dumps(obj, sort_keys=True, indent=4)
print (text)
jprint(response.json())
print(response.url)
print(response.encoding)
Ok, after sitting staring at the code for another few minutes I spotted my mistake.
I shouldnt have defined the Header content type in the request and instead let the 'Requests' library deal with this for you.
So the code should look like:
import requests
from requests.auth import HTTPBasicAuth
import json
DATA = 'items.find({"type":"folder","repo":{"$eq":"BOS-Release-Builds"}}).sort({"$desc":["created"]}).limit(1)'
response = requests.post('https://uk-artifactory.flowbird.group/artifactory/api/search/aql', auth=HTTPBasicAuth('user', 'password!'), data = DATA)
print (response.status_code)
##print (response.json())
def jprint(obj):
text = json.dumps(obj, sort_keys=True, indent=4)
print (text)
jprint(response.json())
print(response.url)
print(response.encoding)

cannot access URL due to SSL module not available Python

First time trying to use Python 3.6 requests library's get() function with data from quandl.com and load and dump json.
import json
import requests
request = requests.get("https://www.quandl.com/api/v3/datasets/CHRIS/MX_CGZ2.json?api_key=api_keyxxxxx", verify=False)
request_text=request.text()
data = json.loads(request_text)
data_serialized = json.dumps(data)
print(data_serialized)
I have an account at quandl.com to access the data. The error when python program is run in cmd line says "cannot connect to HTTPS URL because SSL mode not available."
import requests
import urllib3
urllib3.disable_warnings()
r = requests.get(
"https://www.quandl.com/api/v3/datasets/CHRIS/MX_CGZ2.json?api_key=api_keyxxxxx").json()
print(r)
Output will be the following since i don't have API Key
{'quandl_error': {'code': 'QEAx01', 'message': 'We could not recognize your API key. Please check your API key and try again.'}}
You don't need to import json module as requests already have it.
Although I have verified 'quandl_api_keys received the following error when trying to retrieve data with 'print' data 'json.dumps' function: "quandl_error": {"code": "QEAx01" ... discovered that the incorrect fonts on the quotation marks around key code in the .env file resulted in this error. Check language settings and fonts before making requests after the error mssg.

Indexing to elasticsearch 6.1 with python script

I am trying to index to elasticsearch using a python 2.7 script as follows:
from __future__ import print_function
import urllib, urllib2
#FORMDATA is a json format string that has been taken from a file
ELASTIC_URL = 'http://1.2.3.9:9200/indexname/entry/
req = urllib2.Request(ELASTIC_URL)
req.add_header('contentType', 'application/x-www-form-urlencoded')
response = urllib2.urlopen(req, FORMDATA, timeout=4).read()
print(response)
I keep getting the error HTTP Error 406: Not Acceptable: HTTPError
i have also tried formatting the data with urllib.quote(FORMDATA) and get the same error. The data is not a dictionary it is a string that when converted to json is multi dimensional.
I think this is something to do with the fact the req header needs to specify the contentType to be the correct format but i'm struggling to workout what this is. I managed to do this import on elasticsearch 5.x but now on 3.x it doesn't seem to be working.
Any ideas??
Almost all elasticsearch API calls use Content-Type: application/json in the headers - this should be what you need here.
Also be aware that if you are submitting data, this will need to be in the form of a POST (or a PUT if generating your own id), not a GET request: https://www.elastic.co/guide/en/elasticsearch/guide/current/index-doc.html

making a simple GET/POST with url Encoding python

i have a custom url of the form
http://somekey:somemorekey#host.com/getthisfile.json
i tried all the way but getting errors :
method 1 :
from httplib2 import Http
ipdb> from urllib import urlencode
h=Http()
ipdb> resp, content = h.request("3b8138fedf8:1d697a75c7e50#abc.myshopify.com/admin/shop.json")
error :
No help on =Http()
Got this method from here
method 2 :
import urllib
urllib.urlopen(url).read()
Error :
*** IOError: [Errno url error] unknown url type: '3b8108519e5378'
I guess something wrong with the encoding ..
i tried ...
ipdb> url.encode('idna')
*** UnicodeError: label empty or too long
Is there any way to make this Complex url get call easy .
You are using a PDB-based debugger instead of a interactive Python prompt. h is a command in PDB. Use ! to prevent PDB from trying to interpret the line as a command:
!h = Http()
urllib requires that you pass it a fully qualified URL; your URL is lacking a scheme:
urllib.urlopen('http://' + url).read()
Your URL does not appear to use any international characters in the domain name, so you do not need to use IDNA encoding.
You may want to look into the 3rd-party requests library; it makes interacting with HTTP servers that much easier and straightforward:
import requests
r = requests.get('http://abc.myshopify.com/admin/shop.json', auth=("3b8138fedf8", "1d697a75c7e50"))
data = r.json() # interpret the response as JSON data.
The current de facto HTTP library for Python is Requests.
import requests
response = requests.get(
"http://abc.myshopify.com/admin/shop.json",
auth=("3b8138fedf8", "1d697a75c7e50")
)
response.raise_for_status() # Raise an exception if HTTP error occurs
print response.content # Do something with the content.

How do I get HTTP header info without authentication using python?

I'm trying to write a small program that will simply display the header information of a website. Here is the code:
import urllib2
url = 'http://some.ip.add.ress/'
request = urllib2.Request(url)
try:
html = urllib2.urlopen(request)
except urllib2.URLError, e:
print e.code
else:
print html.info()
If 'some.ip.add.ress' is google.com then the header information is returned without a problem. However if it's an ip address that requires basic authentication before access then it returns a 401. Is there a way to get header (or any other) information without authentication?
I've worked it out.
After try has failed due to unauthorized access the following modification will print the header information:
print e.info()
instead of:
print e.code()
Thanks for looking :)
If you want just the headers, instead of using urllib2, you should go lower level and use httplib
import httplib
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
print conn.getresponse().getheaders()
If all you want are HTTP headers then you should make HEAD not GET request. You can see how to do this by reading Python - HEAD request with urllib2.

Categories