Transforming JSON string to Pandas DataFrame in Flask

Transforming JSON string to Pandas DataFrame in Flask - python

I'm parsing JSON data in Flask from POST request. Everything seems to be fine and works ok:
from flask import Flask
from flask import request
import io
import json
import pandas as pd
app = Flask(__name__)
#app.route('/postjson', methods = ['POST'])
def postJsonHandler():
print (request.is_json)
content = request.get_json()
df = pd.io.json.json_normalize(content)
print (df)
return 'JSON posted'
app.run(host='0.0.0.0', port= 8090)
The output looks like this:
True
columns data
0 [Days, Orders] [[10/1/16, 284], [10/2/16, 633], [10/3/16, 532...
Then I try to transform json to pandas dataframe using json_normalize() function. So I receiving the result close to pandas dataframe but it is not yet it.
What changes in the code should I do to receive classical Pandas Dataframe format with columns and data inside.
Thanks in advance.

Solved the problem. The idea was to use parameters of the json_normalize() function something like that:
df = pd.io.json.json_normalize(content, 'data')

Related

Python API call to BigQuery using cloud functions

I'm trying to build my first cloud function. Its a function that should get data from API, transform to DF and push to bigquery. I've set the cloud function up with a http trigger using validate_http as entry point. The problem is that it states the function is working but it doesnt actually write anything. Its a similiar problem as the problem discussed here: Passing data from http api to bigquery using google cloud function python
import pandas as pd
import json
import requests
from pandas.io import gbq
import pandas_gbq
import gcsfs
#function 1: Responding and validating any HTTP request
def validate_http(request):
request.json = request.get_json()
if request.args:
get_api_data()
return f'Data pull complete'
elif request_json:
get_api_data()
return f'Data pull complete'
else:
get_api_data()
return f'Data pull complete'
#function 2: Get data and transform
def get_api_data():
import pandas as pd
import requests
import json
#Setting up variables with tokens
base_url = "https://"
token= "&token="
token2= "&token="
fields = "&fields=date,id,shippingAddress,items"
date_filter = "&filter=date in '2022-01-22'"
data_limit = "&limit=99999999"
#Performing API call on request with variables
def main_requests(base_url,token,fields,date_filter,data_limit):
req = requests.get(base_url + token + fields +date_filter + data_limit)
return req.json()
#Making API Call and storing in data
data = main_requests(base_url,token,fields,date_filter,data_limit)
#transforming the data
df = pd.json_normalize(data['orders']).explode('items').reset_index(drop=True)
items = df['items'].agg(pd.Series)[['id','itemNumber','colorNumber', 'amount', 'size','quantity', 'quantityReturned']]
df = df.drop(columns=[ 'items', 'shippingAddress.id', 'shippingAddress.housenumber', 'shippingAddress.housenumberExtension', 'shippingAddress.address2','shippingAddress.name','shippingAddress.companyName','shippingAddress.street', 'shippingAddress.postalcode', 'shippingAddress.city', 'shippingAddress.county', 'shippingAddress.countryId', 'shippingAddress.email', 'shippingAddress.phone'])
df = df.rename(columns=
{'date' : 'Date',
'shippingAddress.countryIso' : 'Country',
'id' : 'order_id'})
df = pd.concat([df, items], axis=1, join='inner')
#Push data function
bq_load('Return_data_api', df)
#function 3: Convert to bigquery table
def bq_load(key, value):
project_name = '375215'
dataset_name = 'Returns'
table_name = key
value.to_gbq(destination_table='{}.{}'.format(dataset_name, table_name), project_id=project_name, if_exists='replace')
The problem is that the script doesnt write to bigquery and doesnt return any error. I know that the get_api_data() function is working since I tested it locally and does seem to be able to write to BigQuery. Using cloud functions I cant seem to trigger this function and make it write data to bigquery.

There are a couple of things wrong with the code that would set you right.
you have list data, so store as a csv file (in preference to json).
this would mean updating (and probably renaming) the JsonArrayStore class and its methods to work with CSV.
Once you have completed the above and written well formed csv, you can proceed to this:
reading the csv in the del_btn method would then look like this:
import python
class ToDoGUI(tk.Tk):
...
# methods
...
def del_btn(self):
a = JsonArrayStore('test1.csv')
# read to list
with open('test1.csv') as csvfile:
reader = csv.reader(csvfile)
data = list(reader)
print(data)
Good work, you have a lot to do, if you get stuck further please post again.

What is the most effective method for adding nested data to a pandas dataframe?

I have been trying to add the JSON data from this API to a pandas data frame. Here is the code I have tried:
url = 'https://api.covid19api.com/summary'
df = pd.read_json(url)
print(df.head())
When running this code, I receive the following error:
ValueError: Mixing dicts with non-Series may lead to ambiguous
ordering.
Any advice on this would be helpful. Thanks in advance.

Hi Matt and welcome on SO. Whenever you work with json it's better to first get the data and have a look at it. In your particular case the key Global is different from the ones in Countries that's why you get that error
import urllib.request
import json
import pandas as pd
url = 'https://api.covid19api.com/summary'
response = urllib.request.urlopen(url)
# the following is the data you should explore
data = json.loads(response.read())
df = pd.DataFrame(data["Countries"])

The JSON has a couple of elements ('Global', 'Countries' and 'Date'), so it would make sense to split it up into separate dataframes, which is not easy to do using pandas.read_json().
import requests
url = 'https://api.covid19api.com/summary'
r = requests.get(url)
data = r.json()
global_data = pd.DataFrame(data['Global'])
countries = pd.DataFrame(data['Countries'])

Json problems to csv

I'm trying to get some stats from the NBA stats page. I'm following this tutorial-idea
https://towardsdatascience.com/using-python-pandas-and-plotly-to-generate-nba-shot-charts-e28f873a99cb
The basic idea is put the data into a csv file.
So I try this code, to get the data from the nba web, trying to get the json file and the convert it to a csv:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID="
player_id="202695"
shot_data_url_end="&ContextMeasure=FGA&Season=2017-18&section=player&sct=plot"
def shoy_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
json = requests.get(full_url, headers=headers).json()
return(json)
data = json['resultSets'][0]['rowSets']
columns = json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
And this is the error that notebook shows to me:
TypeError Traceback (most recent call last)
<ipython-input-42-a3452c3a4fc8> in <module>
18
19
---> 20 data = json['resultSets'][0]['rowSets']
21 columns = json['resultSets'][0]['headers']
22
TypeError: 'module' object is not subscriptable
Anyone can help me, or know another way to get the data into a .csv or excel file?

When imported with import json, the name json is referring to the JSON module of the Python standard library. You cannot use it as a regular variable name. If you rename your variable to something else such as response_json, this part of your code will work.
Regarding the rest of the code, the page https://stats.nba.com/events/ doesn't return any JSON text, it is a regular web page with images, menus, a video player, etc... If you want to access the API that returns the shots in JSON format, you will have to use the https://stats.nba.com/stats/shotchartdetail (with the right query string). This API endpoint is mentioned in the tutorial, in the "Chrome XHR tab and resulting json linked by url" image.

Ok I've changed the code like this:
import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
def shot_chart(player_id):
full_url = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=202695&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
response_json = requests.get(full_url, headers=headers)
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)

import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request
shot_data_url_start="https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2019-20&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id="202330"
shot_data_url_end="&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def shot_chart(player_id):
full_url = shot_data_url_start + str(player_id) + shot_data_url_end
response_json = requests.get(full_url).json()
return(response_json)
data = response_json['resultSets'][0]['rowSets']
columns = response_json['resultSets'][0]['headers']
df = pd.DataFrame.from_records(data, columns=columns)
shot_chart("202330")
What is going on now? the notebook is tucked right know

Try this out
import pandas as pd
from pandas import DataFrame as df
shot_data_url_start = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id = "204001"
shot_data_url_end = "&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
def get_shot_data(player_id):
full_url = shot_data_url_start + player_id + shot_data_url_end
data = requests.get(
full_url,
headers = {
"User-Agent": "PostmanRuntime/7.4.0"
}
)
return data.json()
shot_results = get_shot_data(player_id)
result_sets = shot_results['resultSets']
first_result_set = result_sets[0]
row_set = first_result_set['rowSet']
set_headers = first_result_set['headers']
df = pd.DataFrame.from_records(row_set, columns=set_headers)
I see how you got confused with that medium post. You were missing the headers and the url for the NBA api wasn't right. That's what #pierre was trying to say in his response. The url you're using isn't right. If you reread that post you were following, you'll see that the author said he had to dig in to dev tools in order to find that actual url to use in order to grab the JSON.
Edit: Forgot to mention that when I didn't pass a User-Agent in the headers, the request would timeout. If you don't pass that in, you won't get a successful response.

Creating a pandas dataframe from a JSON request

I am trying to create a machine Learning application with Flask. I have created a POST API route that will take the data and transform it into a pandas dataframe. Here is the Flask code that calls the python function to transform the data.
from flask import Flask, abort, request
import json
import mlFlask as ml
import pandas as pd
app = Flask(__name__)
#app.route('/test', methods=['POST'])
def test():
if not request.json:
abort(400)
print type(request.json)
result = ml.classification(request.json)
return json.dumps(result)
This is the file that contains the helper function.
def jsonToDataFrame(data):
print type(data)
df = pd.DataFrame.from_dict(data,orient='columns')
But I am getting an import error. Also, when I print the type of data is dict so I don't know why it would cause an issue. It works when I orient the dataframe based on index but it doesn't work based on column.
ValueError: If using all scalar values, you must pass an index
Here is the body of the request in JSON format.
{
"updatedDate":"2012-09-30T23:51:45.778Z",
"createdDate":"2012-09-30T23:51:45.778Z",
"date":"2012-06-30T00:00:00.000Z",
"name":"Mad Max",
"Type":"SBC",
"Org":"Private",
"month":"Feb"
}
What am I doing wrong here ?

Getting null from flask request python

I am writing a simple flask application where based on my query, I should get the required answer in the desired format.
The code is as below;
#-*- coding: utf-8 -*-
import StringIO
import os
import pandas as pd
import numpy as np
from flask import Flask, request, Response, abort, jsonify, send_from_directory,make_response
import io
from pandas import DataFrame
import urllib2, json
import requests
from flask import session
import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")
app = Flask(__name__)
#app.route("/api/conversation/", methods=['POST'])
def chatbot():
df = pd.DataFrame(json.load(urllib2.urlopen('http://192.168.21.245/sixthsensedata/server/Test_new.json')))
question = request.form.get('question')
store = []
if question == 'What is the number of total observation of the dataset':
store.append(df.shape)
if question == 'What are the column names of the dataset':
store.append(df.columns)
return jsonify(store)
if __name__ == '__main__':
app.debug = True
app.run(host = '192.168.21.11',port=5000)
It's running properly but getting null response. I would like to create ~30 more questions like this & store values in the store array. But values are not getting appended inside store, I think.
In jupyter notebook, though, I am getting proper response;
df = pd.DataFrame(json.load(urllib2.urlopen('http://192.168.21.245/sixthsensedata/server/Test_new.json')))
store = []
store.append(df.shape)
print store
[(521, 24)]
Why in flask, the values are not getting appended? I am testing my application in postman. Please guide where I am lacking.
Screenshot from postman

When not providing the data type for the Post method, request.form evaluates to
ImmutableMultiDict([('{"question": "What is the number of total observation of the dataset"}', u'')])
and question = request.form.get('question') ends up being none
You can explicitly use content type as json, or force load it.
#app.route('/api/conversation/', methods=['POST'])
def chatbot():
question = request.get_json(force=True).get('question')
store = []
if question == 'What is the number of total observation of the dataset':
store.append("shape")
elif question == 'What are the column names of the dataset':
store.append("columns")
return jsonify(store)
Curl requests
$curl -X POST -d '{"question": "What is the number of total observation of the dataset"}' http://127.0.0.1:5000/api/conversation/
["shape"]
$curl -H 'Content-Type: application/json' -X POST -d '{"question": "What is the number of total observation of the dataset"}' http://127.0.0.1:5000/api/conversation/
["shape"]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Transforming JSON string to Pandas DataFrame in Flask - python

Solved the problem. The idea was to use parameters of the json_normalize() function something like that: df = pd.io.json.json_normalize(content, 'data')

Related

Python API call to BigQuery using cloud functions

What is the most effective method for adding nested data to a pandas dataframe?

Json problems to csv

Creating a pandas dataframe from a JSON request

Getting null from flask request python

Categories

Resources