I am querying an API and receiving back a response of type text
url = "https://tripadvisor1.p.rapidapi.com/reviews/list"
querystring = {"limit":"20","currency":"USD","lang":"en_US","location_id":"2269364"}
headers = {
'x-rapidapi-host': "xxx",
'x-rapidapi-key': "xxx"
}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)
I then want to retrieve a dataframe object. Various online sources suggest this approach:
d = json.loads(response.text)
e = json_normalize(d)
However, what I get back is this:
I would like to obtain a dataframe object from the 'data' column. Please advise if I can make this question clearer.
First off requests already has a json() method on the response object to decode JSON responses. Second, the top level object contains three things: data, paging_results, and paging_total_results which is why you see what you do.
If you want to extract the inner data contents, just pull that out before passing it to normalize:
response = ...
df = json_normalize(response.json()['data'])
Related
I have found this following API endpoint from binance which unfortunately doesn't have a documentation on their page. They are using it directly on their website UI.
I have found the POST Request to the https://www.binance.com/gateway-api/v1/public/future/leverage/token/basket/list actually works. The problem is I would like to convert this info into a dataframe.
I have tried the following, unfortunately, I cannot find a way to convert the following into dataframe as the type is not yet json.
Any help would be appreciated. Thanks in advance
My attempt:
import requests
from requests.structures import CaseInsensitiveDict
url = "https://www.binance.com/gateway-api/v1/public/future/leverage/token/basket/list"
headers = CaseInsensitiveDict()
headers["Content-Type"] = "application/json"
data = """
{"asset": "BTCUP", "current": 1, "size": 20}
"""
resp = requests.post(url, headers=headers, data=data)
print(resp.content) #shows the data as class 'byte'
So I'm able to get 200 fields for a stock symbol using Yahoo Finance API (code is below). Is there a way to specify in the querystring which feilds I want (i.e. earningsDate, epsForward, regularMarketPrice, longName, symbol) instead of returning all of them? It's very slow.
def get_quote1(self, symbol):
url = "https://apidojo-yahoo-finance-v1.p.rapidapi.com/market/v2/get-quotes"
querystring = {f"region":"US","symbols":{ symbol }}
headers = {
'x-rapidapi-key': "7c0c7fd098mshba615283146b103p11faaejsn835b77f547e3",
'x-rapidapi-host': "apidojo-yahoo-finance-v1.p.rapidapi.com"
}
response = requests.request("GET", url, headers=headers, params=querystring)
# json_data = json.loads(response.text)
json_data= response.json()
return(json_data)```
A specific endpoint returns specific data according to the param we pass. Generally, we can't change the response; it's predefined. But you can always render the specific things you want.
Also, Yahoo Finance API has a lot of endpoints you can play around with. You will definitely find a best-suited endpoint for you that returns minimum but desired data according to your needs.
I'm building a personal webscraper and I'm still in dev phase, but I want to start saving data. The problem is, I cannot PUT or POST from the notebook level, which also means I cannot iterate through a big list of dictionaries/json objects.
I can however to it manually via Postman by just pasting the body and sending it.
Below is my code, which currently returns:
The pastebin URL is:{"message": "Could not parse request body into json: Unrecognized token 'ID': was expecting 'null', 'true', 'false' or NaN\n at [Source: (byte[])"ID=0&District=London"; line: 1, column: 4]"}
import requests
# defining the api-endpoint
API_ENDPOINT = ##############
# data to be sent to api
body = {
"ID": 0,
"District": "London"
}
# sending post request and saving response as response object
r = requests.post(url = API_ENDPOINT, data = body)
# extracting response text
pastebin_url = r.text
print("The pastebin URL is:%s"%pastebin_url)
Related question - can I use urllib instead of requests here?
You can try the following:
r = requests.post(url = API_ENDPOINT, json = body)
or
import json
r = requests.post(url = API_ENDPOINT, headers={"Content-Type":"application/json"}, data = json.dumps(body))
I make an API call using requests, and get a JSON response back. The API limit is 20 results per page. I can access the first page without a problem, however I cannot figure out how to include pagination in the query. In the JSON response, at the bottom of the page, it gives me the following information.
},
"_links":{
"first":{
"href":"https://search-lastSale?date=20190723-20190823&page=0&size=20
},
"last":{
"href":"https://search-lastSale?date=20190723-20190823&page=4&size=20
},
"next":{
"href":"https://search-lastSale?date=20190723-20190823&page=1&size=20
},
"self":{
"href":"https://search-lastSale?date=20190723-20190823&page=0&size=20
}
},
"page":{
"number":0,
"size":20,
"totalElements":77,
"totalPages":4
}
I've read the docs at https://2.python-requests.org//en/latest/user/advanced/#link-headers and various other articles and posts, but everything seems very specific to people's own APIs.
I've taken my code back to just a single URL request, and old auth token just so I can get a grasp of it, then rescale up to my existing project. The code below:
url = "https://search-api.corelogic.asia/search/au/property/postcode/401249/lastSale"
querystring = {"page":"0","size":"20","date":"20190723-20190823"}
headers = {
'Content-Type': "application/JSON",
'Authorization': "My Token"}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)
As far as I can tell from the docs and reading, what I should be doing is either:
Get a JSON response back that finds the page count, and then send a new request with a custom list of URLs that reference that count somehow, i.e.
if 'totalPages = [4]
https:www.search/page0
https:www.search/page1
https:www.search/page2
https:www.search/page3
https:www.search/page4
loop through each URL and append the JSON file; or
Utilise the 'next' page in the JSON response to grab the next url, until there is no 'next' page in the JSON file, i.e.
While json.response = ['next']
keep getting data
append to open json file
Both methods make sense, however I cannot see where this pagination would exist in my code.
As you can see, in the response you receive there is the _links dict, you can use the href inside next to get next page.
Or you can try to manually generate those urls:
>>> def paged_url(page: int=0, size=20) -> str:
... return ("https://search-lastSale?date=20190723-20190823&"
... f"page={page}&size={size}")
...
>>> paged_url(1)
'https://search-lastSale?date=20190723-20190823&page=1&size=20'
>>> paged_url(2)
'https://search-lastSale?date=20190723-20190823&page=2&size=20'
>>> paged_url(3, 10)
'https://search-lastSale?date=20190723-20190823&page=3&size=10'
Those URLs contain the next page you should fetch.
I would like to get data from api, but api is presented in pages. So I have to iterate through all of them and save wanted data in variable.
I was trying to attach new page in loop and add data to my response, but only I got was error: "TypeError: must be str, not Response". I wanted to do it in this way:
response = "https://api.dane.gov.pl/resources/17201/data?page=1"
for i in range(2,32):
url = "https://api.dane.gov.pl/resources/17201/data?page="+str(i)
response += requests.get(url)
data = response.text
When I get the data I want to extract and operate on them.
requests.get(url) returns a Response object. At the moment, you are trying to add the Response object to a string.
Try something like this:
response = []
for i in range(2,32):
url = "https://api.dane.gov.pl/resources/17201/data?page="+str(i)
response.append(requests.get(url).text)
When that finishes running, response will be a list full of the response text instead of response objects.