python: Pandas - Convert complex json to a dataframe - python

I have a complex JSON data structure and have to convert it to a data frame. The JSON structure is as follows:
{'fields': [{'id': 'a', 'label': 'Particulars', 'type': 'string'},
{'id': 'b', 'label': 'States', 'type': 'string'},
{'id': 'c', 'label': 'Gender', 'type': 'string'},
{'id': 'd', 'label': ' 11-2013', 'type': 'string'},
{'id': 'e', 'label': ' 12-2013', 'type': 'string'},
{'id': 'f', 'label': ' 1-2014', 'type': 'string'},
{'id': 'g', 'label': ' 2-2014', 'type': 'string'}],
'data': [['Animal Husbandry- incl Poultry, Dairy and Herdsman',
'Andhra Pradesh',
'Men',
'156.12',
'153.18',
'163.56',
'163.56'],
['Animal Husbandry- incl Poultry, Dairy and Herdsman',
'Bihar',
'Men',
'159.39',
'149.38',
'147.24',
'155.89'],
['Animal Husbandry- incl Poultry, Dairy and Herdsman',
'Gujarat',
'Men',
'157.08',
'145',
'145',
'145']]}
I want to make a dataframe from it in the following format:
I tried directly using the read_json function which gives me error. Then I tried using json.normalize which does not give me the desired output as I don't know its proper working. Can anyone let me know how should I use json.normalize() to get the output in my required format?

Use json_normalize and set columns names by list comprehension:
from pandas.io.json import json_normalize
df = json_normalize(d, 'data')
df.columns = [x.get('label') for x in d['fields']]
print (df)
Particulars States Gender \
0 Animal Husbandry- incl Poultry, Dairy and Herd... Andhra Pradesh Men
1 Animal Husbandry- incl Poultry, Dairy and Herd... Bihar Men
2 Animal Husbandry- incl Poultry, Dairy and Herd... Gujarat Men
11-2013 12-2013 1-2014 2-2014
0 156.12 153.18 163.56 163.56
1 159.39 149.38 147.24 155.89
2 157.08 145 145 145

Related

How can I spilt element to different columns in python?

I am working on a python program to output a dataset of ranking of some items.
This is my code:
import pandas as pd
list=[{'ranking': 1, 'sku': 'WD-0215', 'name': 'Sofa', 'price': '$1,299.00', 'detail': 'Red'},
{'ranking': 1, 'sku': 'WD-0215', 'name': 'Sofa', 'price': '$1,299.00', 'detail': 'Cottom'},
{'ranking': 1, 'sku': 'WD-0215', 'name': 'Sofa', 'price': '$1,299.00', 'detail': 'Wood Lab'},
{'ranking': 2, 'sku': 'sfr20', 'name': 'TV', 'price': '$1,861.00 – $3,699.00', 'detail': 'W1360×D750×H710'},
{'ranking': 2, 'sku': 'sfr20', 'name': 'TV', 'price': '$1,861.00 – $3,699.00', 'detail': 'LED'},
{'ranking': 2, 'sku': 'sfr20', 'name': 'TV', 'price': '$1,861.00 – $3,699.00', 'detail': 'Made in Japan'},
{'ranking': 2, 'sku': 'sfr20', 'name': 'TV', 'price': '$1,861.00 – $3,699.00', 'detail': 'Nordic'}
]
df = pd.DataFrame(list)
print(df)
df.to_csv('item.csv',encoding='utf_8_sig')
However my expected output should be like this:
ranking
sku
name
price
detail1
detail2
detail3
detail4
1
WD-0215
Sofa
$1299.00
Red
Cottom
Wood Lab
none
1
sfr20
TV
$1861.00-$3699.00
W1360×D750×H710
LED
Made in Japan
Nordic
How can change the code to ouput this result?
Use GroupBy.cumcount for counter and reshape by Series.unstack:
g = df.groupby(['ranking', 'sku', 'name', 'price']).cumcount().add(1)
df = df.set_index(['ranking', 'sku', 'name', 'price', g])['detail'].unstack().add_prefix('detail')
print (df)
detail1 detail2 \
ranking sku name price
1 WD-0215 Sofa $1,299.00 Red Cottom
2 sfr20 TV $1,861.00 – $3,699.00 W1360×D750×H710 LED
detail3 detail4
ranking sku name price
1 WD-0215 Sofa $1,299.00 Wood Lab NaN
2 sfr20 TV $1,861.00 – $3,699.00 Made in Japan Nordic

i want to datafram from the list which has dict inside

[{'id': 523535,
'type': 'array',
'name': 'Index',
'value': '- rea - das - faA -\n'},
{'id': 425322,
'type': 'array',
'name': 'status',
'value': '321 - 323 - - B332\n'},
{'id': 425322, 'type': 'array', 'name': 'Index', 'value': 'I'},
{'id': 527942, 'type': 'array', 'name': 'status', 'value': 'BF'}]
I want to data-frame which only name and value.
where column names are Freigabestatus and Index,
and their values are BF and I
as you can see below.
_____________________
|Freigabestatus |Index|
_______________________
| BF |I |
_______________________
import pandas as pd
lst = [{'id': 1050881,
'type': 'array',
'name': 'Index',
'value': '- AF - H04 - SCA -\n'},
{'id': 1050882,
'type': 'array',
'name': 'Freigabestatus',
'value': 'U1 - 000 - I - BF\n'},
{'id': 1050909, 'type': 'array', 'name': 'Index', 'value': 'I'},
{'id': 1050949, 'type': 'array', 'name': 'Freigabestatus', 'value': 'BF'}]
df = pd.DataFrame({'Freigabestatus': [d['value'] for d in lst if d['name'] =='Freigabestatus']})
df['Index'] = [d['value'] for d in lst if d['name'] == 'Index']
df

Extract values from dicts inside lists

I'm trying to extract the values from this JSON file, but I having some trouble to extract the data inside from lists in the dict values. For example, in the city and state, I would like to get only the name values and create a Pandas Dataframe and select only some keys like this.
I tried using some for with get methods techniques, but without success.
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': 0001, 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}
In [123]: data
Out[123]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}
In [124]: data2 = {k:v for k,v in data.items() if k in required}
In [125]: data2
Out[125]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'gender': ['Feminino'],
'id': 11111111,
'name': 'My User',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}]}
In [126]: pd.DataFrame(data2).assign(
...: city_name=lambda x: x['city'].str.get('name'),
...: state_name=lambda x: x['state'].str.get('name'),
...: state_fs=lambda x: x['state'].str.get('fs')
...: ).drop(['state', 'city'], axis=1)
Out[126]:
birthday gender id name city_name state_name state_fs
0 1987-07-13T00:00:00.000Z Feminino 11111111 My User Porto Alegre Rio Grande do Sul RS
reason why data2 is required is that you can't have columns that differ in length. So in this case, pd.DataFrame(data) won't work as rg has 0 items but birthday has 1 item.
Also something to look at if you are directly dealing with json files is pd.json_normalize

How to parse result from YAHOO.Finance.SymbolSuggest.ssCallback in Python

I'm using Python 3 and trying to get a list of possible stock symbols from company name using YAHOO.Finance.SymbolSuggest.ssCallback.
With the below code,
import urllib
yahoo_url = 'http://d.yimg.com/autoc.finance.yahoo.com/autoc?query=apple+inc&region=1&lang=en&callback=YAHOO.Finance.SymbolSuggest.ssCallback'
response = urllib.request.urlopen(yahoo_url)
str_response = response.read().decode('utf-8')
The result is below:
str_response
Out[136]: 'YAHOO.Finance.SymbolSuggest.ssCallback({"ResultSet":{"Query":"AAPL","Result":[{"symbol":"AAPL","name":"Apple Inc.","exch":"NMS","type":"S","exchDisp":"NASDAQ","typeDisp":"Equity"},{"symbol":"^NY2LAAPL","name":"ICE Leveraged 2x AAPL Index","exch":"NYS","type":"I","exchDisp":"NYSE","typeDisp":"Index"},{"symbol":"AAPL.BA","name":"Apple Inc.","exch":"BUE","type":"S","exchDisp":"Buenos Aires","typeDisp":"Equity"},{"symbol":"AAPL34.SA","name":"Apple Inc.","exch":"SAO","type":"S","exchDisp":"Sao Paolo","typeDisp":"Equity"},{"symbol":"AAPL.MX","name":"Apple Inc.","exch":"MEX","type":"S","exchDisp":"Mexico","typeDisp":"Equity"},{"symbol":"AAPL.MI","name":"APPLE","exch":"MIL","type":"S","exchDisp":"Milan","typeDisp":"Equity"},{"symbol":"AAPLD.BA","name":"APPLE INC","exch":"BUE","type":"S","exchDisp":"Buenos Aires","typeDisp":"Equity"},{"symbol":"AAPLC.BA","name":"APPLE INC","exch":"BUE","type":"S","exchDisp":"Buenos Aires","typeDisp":"Equity"},{"symbol":"AAPL.VI","name":"Apple Inc.","exch":"VIE","type":"S","exchDisp":"Vienna","typeDisp":"Equity"}]}});'
How do I extract only the below segment? And then put into dict?
"Result":[{"symbol":"AAPL","name":"Apple Inc.","exch":"NMS","type":"S","exchDisp":"NASDAQ","typeDisp":"Equity"},{"symbol":"^NY2LAAPL","name":"ICE Leveraged 2x AAPL Index","exch":"NYS","type":"I","exchDisp":"NYSE","typeDisp":"Index"},{"symbol":"AAPL.BA","name":"Apple Inc.","exch":"BUE","type":"S","exchDisp":"Buenos Aires","typeDisp":"Equity"},{"symbol":"AAPL34.SA","name":"Apple Inc.","exch":"SAO","type":"S","exchDisp":"Sao Paolo","typeDisp":"Equity"},{"symbol":"AAPL.MX","name":"Apple Inc.","exch":"MEX","type":"S","exchDisp":"Mexico","typeDisp":"Equity"},{"symbol":"AAPL.MI","name":"APPLE","exch":"MIL","type":"S","exchDisp":"Milan","typeDisp":"Equity"},{"symbol":"AAPLD.BA","name":"APPLE INC","exch":"BUE","type":"S","exchDisp":"Buenos Aires","typeDisp":"Equity"},{"symbol":"AAPLC.BA","name":"APPLE INC","exch":"BUE","type":"S","exchDisp":"Buenos Aires","typeDisp":"Equity"},{"symbol":"AAPL.VI","name":"Apple Inc.","exch":"VIE","type":"S","exchDisp":"Vienna","typeDisp":"Equity"}]
Thank you in advance.
convert the string to a dict
use ast.literal_eval
from ast import literal_eval
result = literal_eval(str_response[39:-2])
print(type(result))
>>> dict
# Result key of interest
Result = result['ResultSet']['Result']
print(Result)
[{'symbol': 'AAPL',
'name': 'Apple Inc.',
'exch': 'NMS',
'type': 'S',
'exchDisp': 'NASDAQ',
'typeDisp': 'Equity'},
{'symbol': '^NY2LAAPL',
'name': 'ICE Leveraged 2x AAPL Index',
'exch': 'NYS',
'type': 'I',
'exchDisp': 'NYSE',
'typeDisp': 'Index'},
{'symbol': 'AAPL.BA',
'name': 'Apple Inc.',
'exch': 'BUE',
'type': 'S',
'exchDisp': 'Buenos Aires',
'typeDisp': 'Equity'},
{'symbol': 'AAPL34.SA',
'name': 'Apple Inc.',
'exch': 'SAO',
'type': 'S',
'exchDisp': 'Sao Paolo',
'typeDisp': 'Equity'},
{'symbol': 'AAPL.MX',
'name': 'Apple Inc.',
'exch': 'MEX',
'type': 'S',
'exchDisp': 'Mexico',
'typeDisp': 'Equity'},
{'symbol': 'AAPL.MI',
'name': 'APPLE',
'exch': 'MIL',
'type': 'S',
'exchDisp': 'Milan',
'typeDisp': 'Equity'},
{'symbol': 'AAPLD.BA',
'name': 'APPLE INC',
'exch': 'BUE',
'type': 'S',
'exchDisp': 'Buenos Aires',
'typeDisp': 'Equity'},
{'symbol': 'AAPLC.BA',
'name': 'APPLE INC',
'exch': 'BUE',
'type': 'S',
'exchDisp': 'Buenos Aires',
'typeDisp': 'Equity'},
{'symbol': 'AAPL.VI',
'name': 'Apple Inc.',
'exch': 'VIE',
'type': 'S',
'exchDisp': 'Vienna',
'typeDisp': 'Equity'}]

Convert json to Pandas data frame

How can I convert this json into dataframe in python, by removing fields. I just need employess data in my dataframe.
{'fields': [{'id': 'displayName', 'type': 'text', 'name': 'Display name'},
{'id': 'firstName', 'type': 'text', 'name': 'First name'},
{'id': 'gender', 'type': 'gender', 'name': 'Gender'}],
'employees': [{'id': '123', 'displayName': 'abc', 'firstName': 'abc','gender': 'Female'},
{'id': '234', 'displayName': 'xyz.', 'firstName': 'xyz','gender': 'Female'},
{'id': '345', 'displayName': 'pqr', 'firstName': 'pqr', 'gender': 'Female'}]}
If you wan the employee information you can
JSON = {var:[...],'employees':[{}]}
employee_info = JSON['employees']
employee_info with be a list of dictionaries which you will be able to create a dataframe from by this answer: Convert list of dictionaries to a pandas DataFrame

Categories