I try to read JSON-formatted data from the following public URL: http://ws-old.parlament.ch/factions?format=json. Unfortunately, I was not able to convert the response to JSON as I always get the HTML-formatted content back from my request. Somehow the request seems to completely ignore the parameters for JSON formatting passed with the URL:
import urllib.request
response = urllib.request.urlopen('http://ws-old.parlament.ch/factions?format=json')
response_text = response.read()
print(response_text) #why is this HTML?
Does somebody know how I am able to get the JSON formatted content as displayed in the web browser?
You need to add "Accept": "text/json" to request header.
For example using requests package:
r = requests.get(r'http://ws-old.parlament.ch/factions?format=json',
headers={'Accept':'text/json'})
print(r.json())
Result:
[{'id': 3, 'updated': '2022-02-22T14:59:17Z', 'abbreviation': ...
Sorry for you but these web services have a misleading implementation. The format query parameter is useless. As pointed out by #maciek97x only the header Accept: <format> will be considered for the formatting.
So your can directly call the endpoint without the ?format=json but with the header Accept: text/json.
Related
I am trying to update an already existing page in Atlassian confluence page through the Python requests module. I am using the requests.put() method to send the http request to update my page. The page already has the title "Update Status". I am trying to enter one line as the content of the page. The page id and other information that is within the json payload has been copied by me directly from the rest/api/content... output of the webpage I am trying to access.
Note: I am already able to access information from the webpage through python requests.get but I am not able to post information to the webpage.
Method used to access information from the webpage which works:
response = requests.get('https://confluence.ai.com/rest/api/content/525424594?expand=body.storage',
auth=HTTPBasicAuth('svc-Automation#ai.com', 'AIengineering1#ai')).json()
Method used to update information to that page which does not work as the response is in the form of an error 415.
import requests
from requests.auth import HTTPBasicAuth
import json
url = "https://confluence.ai.com/rest/api/content/525424594"
payload = {"id":"525424594","type":"page", "title":"new page-Update Status","space":{"key":"TST"},"body":{"storage":{"value": "<p>This is the updated text for the new page</p>","representation":"storage"}}, "version":{"number":2}}
result = requests.put(url, data=payload, auth=HTTPBasicAuth('svc-Automation#ai.com', 'AIengineering1#ai'))
print (result)
I am guessing that the payload is not in the right format. Any suggestions?
Note: The link, username and password shown here are all fictional.
Try sending the data with the "json" named argument instead of "data", so requests module would set the application/json to content-type header.
result = requests.put(url, json=payload, auth=HTTPBasicAuth('svc-Automation#ai.com', 'AIengineering1#ai'))
I'm sending a POST request, with python-requests in Python 3.5, using:
r = requests.post(apiEndpoint, data=jsonPayload, headers=headersToMergeIn)
I can inspect the headers and body after sending the request like this:
print(r.request.headers)
print(r.request.body)
Is there any way to inspect the full request headers (not just the ones i'm merging in) and body before sending the request?
Note: I need this for an api which requires me to build a hash off of a subset of the headers, and another hash off the full body.
You probably want Prepared Requests
UPDATE:
It turned out to be an inconsistency in the responses of the Instagram graphql (unofficial) API, which requires authentication for some IDs but does not for others for the same endpoint.
I am issuing GET requests against Instagram graphql endpoint. For some queries, the JSON response I get via Python requests module is inconsistent with what I get via a browser for the same query.
For example this URL returns a JSON object containing 10 users as expected:
https://www.instagram.com/graphql/query/?variables=%7B%22shortcode%22%3A+%22BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0%22%2C+%22first%22%3A+10%7D&query_id=17864450716183058
But when I request the same URL via requests module like this:
import requests
url = 'https://www.instagram.com/graphql/query/?variables=%7B%22shortcode%22%3A+%22BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0%22%2C+%22first%22%3A+10%7D&query_id=17864450716183058'
response = requests.get(url)
The returned value, i.e. response.text is {"data": {"shortcode_media": null}, "status": "ok"}, kinda empty response, which I suppose means something like the media ID did not match.
As a double check, this test of comparing the original URL with the URL of the final response holds true, showing that the URL is not changed by requests module in any way:
>>> response.url == url
True
This only happens for long media IDs such as BYRWPzFHUfg8r_s9UMtd6BtoI01RPGmviXaskI0. For shorter IDs, e.g. BZx5Zx9nHwS the response returned by the requests module is the same that is return via the browser as expected.
Rather than the length of the ID, I thought it may be a special character in the ID which is being encoded differently, such as the underscore. I tried encoding it with %5F but that didn't work neither.
Any ideas? Can it be a bug in the requests module?
I have another question about posts.
This post should be almost identical to one referenced on stack overflow using this question 'Using request.post to post multipart form data via python not working', but for some reason I can't get it to work. The website is http://www.camp.bicnirrh.res.in/predict/. I want to post a file that is already in the FASTA format to this website and select the 'SVM' option using requests in python. This is based on what #NorthCat gave me previously, which worked like a charm:
import requests
import urllib
file={'file':(open('Bishop/newdenovo2.txt','r').read())}
url = 'http://www.camp.bicnirrh.res.in/predict/hii.php'
payload = {"algo[]":"svm"}
raw = urllib.urlencode(payload)
response = session.post(url, files=file, data=payload)
print(response.text)
Since it's not working, I assumed the payload was the problem. I've been playing with the payload, but I can't get any of these to work.
payload = {'S1':str(data), 'filename':'', 'algo[]':'svm'} # where I tried just reading the file in, called 'data'
payload = {'svm':'svm'} # not actually in the headers, but I tried this too)
payload = {'S1': '', 'algo[]':'svm', 'B1': 'Submit'}
None of these payloads resulted in data.
Any help is appreciated. Thanks so much!
You need to set the file post variable name to "userfile", i.e.
file={'userfile':(open('Bishop/newdenovo2.txt','r').read())}
Note that the read() is unnecessary, but it doesn't prevent the file upload succeeding. Here is some code that should work for you:
import requests
session = requests.session()
response = session.post('http://www.camp.bicnirrh.res.in/predict/hii.php',
files={'userfile': ('fasta.txt', open('fasta.txt'), 'text/plain')},
data={'algo[]':'svm'})
response.text contains the HTML results, save it to a file and view it in your browser, or parse it with something like Beautiful Soup and extract the results.
In the request I've specified a mime type of "text/plain" for the file. This is not necessary, but it serves as documentation and might help the receiving server.
The content of my fasta.txt file is:
>24.6jsd2.Tut
GGTGTTGATCATGGCTCAGGACAAACGCTGGCGGCGTGCTTAATACATGCAAGTCGAACGGGCTACCTTCGGGTAGCTAGTGGCGGACGGGTGAGTAACACGTAGGTTTTCTGCCCAATAGTGGGGAATAACAGCTCGAAAGAGTTGCTAATACCGCATAAGCTCTCTTGCGTGGGCAGGAGAGGAAACCCCAGGAGCAATTCTGGGGGCTATAGGAGGAGCCTGCGGCGGATTAGCTAGATGGTGGGGTAAAGGCCTACCATGGCGACGATCCGTAGCTGGTCTGAGAGGACGGCCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAAGGAATATTCCACAATGGCCGAAAGCGTGATGGAGCGAAACCGCGTGCGGGAGGAAGCCTTTCGGGGTGTAAACCGCTTTTAGGGGAGATGAAACGCCACCGTAAGGTGGCTAAGACAGTACCCCCTGAATAAGCATCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGATGCAAGCGTTGTCCGGATTTACTGGGCGTAAAGCGCGCGCAGGCGGCAGGTTAAGTAAGGTGTGAAATCTCCCTGCTCAACGGGGAGGGTGCACTCCAGACTGACCAGCTAGAGGACGGTAGAGGGTGGTGGAATTGCTGGTGTAGCGGTGAAATGCGTAGAGATCAGCAGGAACACCCGTGGCGAAGGCGGCCACCTGGGCCGTACCTGACGCTGAGGCGCGAAGGCTAGGGGAGCGAACGGGATTAGATACCCCGGTAGTCCTAGCAGTAAACGATGTCCACTAGGTGTGGGGGGTTGTTGACCCCTTCCGTGCCGAAGCCAACGCATTAAGTGGACCGCCTGGGGAGTACGGTCGCAAGACTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCAGCGGAGCGTGTGGTTTAATTCGATGCGACGCGAAGAACCTTACCTGGGCTTGACATGCTATCGCAACACCCTGAAAGGGGTGCCTCCTTCGGGACGGTAGCACAGATGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTGTCCTTAGTTGTATATCTAAGGAGACTGCCGGAGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCAGCATGGCTCTTACGTCCAGGGCTACACATACGCTACAATGGCCGTTACAGTGAGATGCCACACCGCGAGGTGGAGCAGATCTCCAAAGGCGGCCTCAGTTCAGATTGCACTCTGCAACCCGAGTGCATGAAGTCGGAGTTGCTAGTAACCGCGTGTCAGCATAGCGCGGTGAATATGTTCCCGGGTCTTGTACACACCGCCCGTCACGTCATGGGAGCCGGCAACACTTCGAGTCCGTGAGCTAACCCCCCCTTTCGAGGGTGTGGGAGGCAGCGGCCGAGGGTGGGGCTGGTGACTGGGACGAAGTCGTAACAAGGT
I have an existing application that uses PyCurl to download gzipped JSON data via a REST type interface. This works well but is too slow for the desired use.
I'm trying to get an equivalent solution going that can use connection pooling. I have a simple example working with requests, but I don't see how to retrieve the attached gzipped JSON file that the returned header says is there.
My current sample code:
#!/usr/bin/python
import requests
headers = {"Authorization" : "XXX thisworksIgeta200Response",
"Content-type" : "application/json",
"Accept" : "application/json"}
r = requests.get("https://longickyGUIDyURL.noname.com",headers=headers,verify=False,stream=True)
data = r.raw.read(decode_content=True)
print data
This produces an HTML page, not the JSON output I want. The relevant returned headers look like this:
'content-disposition': 'attachment; filename="9d5c3c68-0e88-4b2d-88b9-94534b6cb80d"
'content-encoding': 'gzip',
So: requests or urllib4 (tried this a bit but don't see many examples or much documentation) or something else?
Any guidance or recommendations would be most welcome!
The Content-Disposition response-header field has been proposed as a means for the origin server to suggest a default filename if the user requests that the content is saved to a file (rfc2616)
The filename in the header is no more than a suggestion for what the browser should save it as. There is no other file there. The content you got back is all there is. The content-encoding: gzip header means that the content of the page was gzip-encoded for transit, but the requests module will have decoded that for you.
So, if it's HTML and you were expecting JSON, you probably have the wrong URL.