How to convert json to pandas dataframe?

How to convert json to pandas dataframe? - python

I am new at api programming. I am trying to download data from the moex api.
Here is the code I use:
import requests as re
from io import StringIO
import pandas as pd
import json
session = re.Session()
login = "aaaa"
password = "bbbb"
session.get('https://passport.moex.com/authenticate', auth=(login, password))
cookies = {'MicexPassportCert': session.cookies['MicexPassportCert']}
def api_query(engine, market, session, secur, from_start, till_end):
param = 'https://iss.moex.com/iss/history/engines/{}/markets/{}/sessions/{}/securities/{}/candles.json?from={}&till={}&interval=24&start=0'.format(engine, market, session, secur, from_start, till_end)
return param
url = api_query('stock', 'bonds', 'session', 'RU000A0JVWL2', '2020-11-01', '2021-05-01')
response = re.get(url, cookies=cookies)
As a result I have got the following data (part of data)
'history.cursor': {'metadata': {'INDEX': {'type': 'int64'}, 'TOTAL': {'type': 'int64'}, 'PAGESIZE': {'type': 'int64'}}, 'columns': ['INDEX', 'TOTAL', 'PAGESIZE'], 'data': [[0, 32, 100]]}}
I need to convert json format into pandas dataframe. How to do it? As a result I should get dataframe with 1 row and 3 columns.
Thanks in advance

Assuming your json is properly encoded you could try something like this:
import pandas as pd
import numpy as np
json = {
'history.cursor': {
'metadata': {'INDEX': {'type': 'int64'}, 'TOTAL': {'type': 'int64'}, 'PAGESIZE': {'type': 'int64'}},
'columns': ['INDEX', 'TOTAL', 'PAGESIZE'],
'data': [[0, 32, 100]]
}
}
columns = json['history.cursor']['columns']
data = np.array(json['history.cursor']['data'])
metadata = json['history.cursor']['metadata']
d = {}
for i, column in enumerate(columns):
d[column] = data[:,i].astype(metadata[column]['type'])
df = pd.DataFrame(d)
print(df)

you should use the method pd.io.json.read_json() method
your orientation would likely be 'split'
so
pd.read_json(json,orient='split') where split is your json in the form of dict like {index -> [index], columns -> [columns], data -> [values]}

Related

converting a deep nested loop from JSON into Pandas DF

I am getting info from an API, and getting this as the resulting JSON file:
{'business_discovery': {'media': {'data': [{'media_url': 'a link',
'timestamp': '2022-01-01T01:00:00+0000',
'caption': 'Caption',
'media_type': 'type',
'media_product_type': 'product_type',
'comments_count': 1,
'like_count': 1,
'id': 'ID'},
{'media_url': 'link',
# ... and so on
# NOTE: I scrubbed the numbers with dummy data
I know to get the data I can run this script to get all the data within the data
# "a" is the json without business discovery or media, which would be this:
a = {'data': [{'media_url': 'a link',
'timestamp': '2022-01-01T01:00:00+0000',
'caption': 'Caption',
'media_type': 'type',
'media_product_type': 'product_type',
'comments_count': 1,
'like_count': 1,
'id': 'ID'},
{'media_url': 'link',
# ... and so on
media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code = [],[],[],[],[],[],[],[]
for result in a['data']:
media_url.append(result[u'media_url']) #Appending all the info within their Json to a list
timestamp.append(result[u'timestamp'])
caption.append(result[u'caption'])
media_type.append(result[u'media_type'])
media_product_type.append(result[u'media_product_type'])
comment_count.append(result[u'comments_count'])
like_count.append(result[u'like_count'])
id_code.append(result[u'id']) # All info exists, even when a value is 0
df = pd.DataFrame([media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code]).T
when I run the above command on the info from the api, I get errors saying that the data is not found
This works fine for now, but trying to figure out a way to "hop" over both business discovery, and media, to get straight to data so I can run this more effectively, rather than copying and pasting where I skip over business discovery and media

Using json.normalize
df = pd.json_normalize(data=data["business_discovery"]["media"], record_path="data")

Group the data and convert to json data

I have a data frame with 150 rows and sample two rows mentioned below. Need to convert the data to json data like below.
Input:
artwork_id creator_id department_id art_work creator department
0 86508 29993 21 {'id': '86508', 'accession_number': '2015.584'... {'id': '29993', 'role': 'artist', 'description... {'id': '21', 'name': 'Prints'}
1 86508 68000 21 {'id': '86508', 'accession_number': '2015.584'... {'id': '68000', 'role': 'printer', 'descriptio... {'id': '21', 'name': 'Prints'}
desired output:
Attached as image
I have tried using below code
df.groupby(['artwork_id']).agg(lambda x: list(x))
df.to_json(orient = 'records')

Do you get the right format if you do the following:
result = df.to_json(orient="records")
parsed = json.loads(result)
json.dumps(parsed, indent=4)
or
grouped_art=df.groupby(['artwork_id']).agg(lambda x: list(x))
result = grouped_art.to_json(orient="records")
parsed = json.loads(result)
json.dumps(parsed, indent=4)

Not able to convert json data into csv in python while fetching data through api

I read a string containing a json document.
d2 = json.loads(s1)
I am getting data in this format, a list of dictionnaries.
[{'creati_id': 123,
'creativ_id': 234,
'status': 'adsc',
'name': 'seded',
…
'video_75_views': None,
'video_100_views': None,
'estimated': None,
'creative1': 1.0,
'creative': 'Excellent',
'value': 1.023424324}]}
How can I save this data in CSV format?

This can easily be achieved with the csv module:
import csv
data = [
{
"creati_id": 123,
"creativ_id": 234,
"status": "adsc",
"name": "seded",
}
]
with open("data_file.csv", "w") as data_file:
csv_writer = csv.writer(data_file)
header = data[0].keys()
csv_writer.writerow(header)
for line in data:
csv_writer.writerow(line.values())

You can use the standard csv library in Python to write CSV files. From your question, I'm assuming that you have multiple rows, each having the structure you shared. If that's the case, then something like this should do the trick:
import csv
json1 = [
{'creati_id': 123, 'creativ_id': 234, 'status': 'adsc', 'name': 'seded', 'email': None, 'brand': 'adc', 'market': 'dcassca', 'channel': 'dAD'},
{'creati_id': 123, 'creativ_id': 234, 'status': 'adsc', 'name': 'seded', 'email': None, 'brand': 'adc', 'market': 'dcassca', 'channel': 'dAD'}
]
header_names = json1[0].keys() # Extract the header names
data_rows = [row.values() for row in json1] # Extract the values for each
with open('output.csv', 'w', encoding='UTF8', newline='') as file:
writer = csv.writer(file)
writer.writerow(header_names) # Writes the header
writer.writerows(data_rows) # Writes the rows

Parsing multidimentional Json in python

I have issue with parsing Json file. here the format i have:
{'metadata': {'timezone': {'location': 'Etc/UTC'},
'serial_number': '123456',
'device_type': 'sensor'},
'timestamp': '2019-08-21T13:57:12.500Z',
'framenumber': '4866274',
'tracked_objects': [{'id': 2491,
'type': 'PERSON',
'position': {'x': -361,
'y': -2933,
'type': 'FOOT',
'coordinate_system': 'REAL_WORLD_IN_MILLIMETER'},
'person_data': {'height': 1295}},
{'id': 2492,
'type': 'PERSON',
'position': {'x': -733,
'y': -2860,
'type': 'FOOT',
'coordinate_system': 'REAL_WORLD_IN_MILLIMETER'},
'person_data': {'height': 1928}},
{'id': 2495,
'type': 'PERSON',
'position': {'x': -922,
'y': -3119,
'type': 'FOOT',
'coordinate_system': 'REAL_WORLD_IN_MILLIMETER'},
'person_data': {'height': 1716}}]}
And I am trying to get next columns into dataframe:
timezone, serial_number,id, x , y which are part of position, and height.
This is the code i used so far:
# Import Dependencies
import pandas as pd
import json
from pandas.io.json import json_normalize
# loading json file. In your case you will point the data stream into json variable
infile = open("C:/Users/slavi/Documents/GIT/test2.json")
json_raw = json.load(infile)
# Functions to flaten multidimensional json file
def flatten_json(nested_json):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
# Use Function to flaten json
json_flat = flatten_json(json_raw)
# Create panda dataframe from dictionary sine json itself is list of dictionaries or dictiornary of dictionaries
df = pd.DataFrame.from_dict(json_flat, orient='index')
# Reseting index
df.reset_index(level=0, inplace=True)
df.set_index('index', inplace=True)
df
I used the function to flaten the json however when i run the code I am getting results like this:
So there should be 3 lines of data for each tracked object and i should retrieve those columns with 3 lines of data under.
Any suggestion on how to adjust my code?

For any kind of JSON parsing to DtaFrame, get acquanited to json_normalize:
import json
from pandas.io.json import json_normalize
with open('...', r) as f:
json_raw = json.load(f)
df = json_normalize(json_raw, record_path='tracked_objects', meta=[
['metadata', 'serial_number'],
'timestamp'
])
Result:
id type position.x position.y position.type position.coordinate_system person_data.height metadata.serial_number timestamp
0 2491 PERSON -361 -2933 FOOT REAL_WORLD_IN_MILLIMETER 1295 123456 2019-08-21T13:57:12.500Z
1 2492 PERSON -733 -2860 FOOT REAL_WORLD_IN_MILLIMETER 1928 123456 2019-08-21T13:57:12.500Z
2 2495 PERSON -922 -3119 FOOT REAL_WORLD_IN_MILLIMETER 1716 123456 2019-08-21T13:57:12.500Z
Rename the columns as you wish.

Parse this JSON response From App Annie in Python

I am working with the request module within python to grab certain fields within the JSON response.
import json
fn = 'download.json'
data = json
response = requests.get('http://api.appannie.com/v1/accounts/1000/apps/mysuperapp/sales?break_down=application+iap&start_date=2013-10-01&end_date=2013-10-02', \
auth=('username', 'password'))
data = response.json()
print(data)
This works in python, as the response is the following:
{'prev_page': None, 'currency': 'USD', 'next_page': None, 'sales_list': [{'revenue': {'ad': '0.00', 'iap': {'refunds': '0.00', 'sales': '0.00', 'promotions': '0.00'}, 'app': {'refunds': '0.00', 'updates': '0.00', 'downloads': '0.00', 'promotions': '0.00'}},
'units': {'iap': {'refunds': 0, 'sales': 0, 'promotions': 0}, 'app': {'refunds': 0, 'updates': 0, 'downloads': 2000, 'promotions': 0}}, 'country': 'all', 'date': 'all'}], 'iap_sales': [], 'page_num': 1, 'code': 200, 'page_index': 0}
The question is how do I parse this to get my downloads number within the 'app' block - namely the "2000" value?

After the response.json() data is already a dictionary otherwise response.json() would raise an exception. Therefore you can access it just like any other dictionary.

You can use the loads() method of json -
import json
response = requests.get('http://api.appannie.com/v1/accounts/1000/apps/mysuperapp/sales?break_down=application+iap&start_date=2013-10-01&end_date=2013-10-02',
auth=('username', 'password'))
data = json.loads(response.json()) # data is a dictionary now
sales_list = data.get('sales_list')
for sales in sales_list:
print sales['revenue']['app']

You can use json.loads:
import json
import requests
response = requests.get(...)
json_data = json.loads(response.text)
This converts a given string into a dictionary which allows you to access your JSON data easily within your code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert json to pandas dataframe? - python

you should use the method pd.io.json.read_json() method your orientation would likely be 'split' so pd.read_json(json,orient='split') where split is your json in the form of dict like {index -> [index], columns -> [columns], data -> [values]}

Related

converting a deep nested loop from JSON into Pandas DF

Group the data and convert to json data

Not able to convert json data into csv in python while fetching data through api

Parsing multidimentional Json in python

Parse this JSON response From App Annie in Python

Categories

Resources