So I have some data in json format, here's a snippet:
"sell": [
{
"Rate": 0.001425,
"Quantity": 537.27713514
},
{
"Rate": 0.00142853,
"Quantity": 6.59174681
}
]
What's the easiest way to access Rate and Quantity so that I can plot it in Matplotlib? Do I have to flatten/normalize it, or create a for loop to generate an array, or can I use pandas or some other library to convert it into matplotlib friendly data automatically?
I know matplotlib can handle inputs in a few ways
plt.plot([1,2,3,4], [1,4,9,16])
plt.plot([1,1],[2,4],[3,9],[4,16])
The simpliest is DataFrame constructor with DataFrame.plot:
import pandas as pd
d = {"sell": [
{
"Rate": 0.001425,
"Quantity": 537.27713514
},
{
"Rate": 0.00142853,
"Quantity": 6.59174681
}
]}
df = pd.DataFrame(d['sell'])
print (df)
Quantity Rate
0 537.277135 0.001425
1 6.591747 0.001429
df.plot(x='Quantity', y='Rate')
EDIT:
Also is possible use read_json for DataFrame.
Related
As I'm fairly new to python I've tried various ways based on answers found here but can't manage to normalize my json file.
As I checked in Postman it has 4 levels of nested arrays. For suppliers I want to expand all the levels of data.The problem I have is with the score_card subtree for which I want to pull out all risk_groups , then name,risk_score and all riskstogether with name,risk_score and indicators with the deepest level containing again name and risk_score.
and I'm not sure whether it is possible in one go based on ambiguous names between levels.
Below I'm sharing a reproducible example:
data = {"suppliers": [
{
"id": 1260391,
"name": "2712270 MANITOBA LTD",
"duns": "rm-071-7291",
"erp_number": "15189067;15189067",
"material_group_ids": [
176069
],
"business_unit_ids": [
13728
],
"responsible_user_ids": [
37761,
37760,
37759,
37758,
37757,
37756,
36520,
37587,
36494,
22060,
37742,
36446,
36289
],
"address": {
"address1": "BOX 550 NE 16-20-26",
"address2": None,
"zip_code": "R0J 1W0",
"city": "RUSSELL",
"state": None,
"country_code": "ca",
"latitude": 50.7731176,
"longitude": -101.2862461
},
"score_card": {
"risk_score": 26.13,
"risk_groups": [
{
"name": "Viability",
"risk_score": 43.33,
"risks": [
{
"name": "Financial stability supplier",
"risk_score": None,
"indicators": [
{
"name": "Acquisitions",
"risk_score": None
}
]
}
]
},
]
}
}
]
}
And here is how it should look:
expected = [[1260391,'2712270 MANITOBA LTD','rm-071-7291',176069,13728,[37761,37760,
37759,37758,37757,37756,36520,37587,36494,22060,37742,36446,36289],
'BOX 550 NE 16-20-26','None','R0J 1W0','RUSSELL','None','ca',50.7731176,
-101.2862461,26.13,'Viability',43.33,'Financial stability supplier','None',
'Acquisitions','None']]
df = pd.DataFrame(expected,columns=['id','name','duns','material_groups_ids',
'business_unit_ids','responsible_user_ids',
'address.address1','address.address2','address.zip_code',
'address.city','address.state','address.country_code','address.latitude',
'address.longitude','risk_score','risk_groups.name',
'risk_groups.risk_score','risk_groups.risks.name',
'risk_groups.risks.risk_score','risk_groups.risks.indicators.name',
'risk_groups.risks.indicators.risk_score'])
What I tried since last 3 days is to use json_normalize:
example = pd.json_normalize(data['suppliers'],
record_path=['score_card','risk_groups','risks',
'indicators'],meta = ['id','name','duns','erp_number','material_group_ids',
'business_unit_ids','responsible_user_ids','address',['score_card','risk_score']],record_prefix='_')
but when I specify the fields for record_path which seems to be a vital param here it always return this prompt:
TypeError: {'name': 'Acquisitions', 'risk_score': None} has non list value Acquisitions for path name. Must be list or null.
What I want to get a a table with columns as presented below:
So it seems that python doesn't know how to treat most granular level which contains null values.
I've tried approach from here: Use json_normalize to normalize json with nested arrays
but unsuccessfully.
I would like to get the same view which I was able to 'unpack' in Power BI but this way of having the data is not satisfying me.
All risk indicators are the components of risks which are part of risk groups as You can see on attached pictures.
Going to the most granular level it goes this way: Risk groups -> Risks -> Indicators so I would like to unpack it all.
Any help more than desperately needed. Kindly thank You for any support here.
I'm trying to generate different altair charts programmatically.
I will base those different charts setups on dictionaries with alt.Chart.from_dict().
I've reverse engineered the overall configuration of the charts with an existing chart doing chart.to_dict(), but this method serializes the data into json, whereas my data is hosted in pandas dataframes and I'm struggling to find the right syntax in the dictionary to pass the dataframe.
I've tried a few variations of the below :
d_chart_config = {
"data": df, #or df.to_dict()
"config": {
"view": {"continuousWidth": 400, "continuousHeight": 300},
"title": {"anchor": "start", "color": "#4b5c65", "fontSize": 20},
},
"mark": {"type": "bar", "size": 40},
....}
but haven't managed to figure out how or where to insert the dataframe in the dictionary, either as a dataframe directly or as a df.to_dict()
please help if you've managed something similar.
The pure pandas way to generate a Vega-Lite data field is {"values": df.to_dict(orient="records")}, but this has problems in some cases (namely handling of datetimes, categoricals, and non-standard numeric & string types).
Altair has utilities to work around these issues that you can use directly, namely the altair.utils.data.to_values function.
For example:
import pandas as pd
from altair.utils.data import to_values
df = pd.DataFrame({'a': [1, 2, 3], 'b': pd.date_range('2012', freq='Y', periods=3)})
print(to_values(df))
# {'values': [{'a': 1, 'b': '2012-12-31T00:00:00'},
# {'a': 2, 'b': '2013-12-31T00:00:00'},
# {'a': 3, 'b': '2014-12-31T00:00:00'}]}
You can use this directly within a dictionary containing a vega-lite specification and generate a valid chart:
alt.Chart.from_dict({
"data": to_values(df),
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "quantitative"},
"y": {"field": "b", "type": "ordinal", "timeUnit": "year"},
}
})
I am new to Python and I am trying to convert the following JSON into a panda frame.
The format of json is as follows. I have reduced the columns and rows. There are around 8 columns and each json has around 20000 rows
{
"DataFeed":[
{
"Columns":[
{
"Name":"customerID",
"Category":"Dimension",
"Type":"String"
},
{
"Name":"InvoiceID",
"Category":"Dimension",
"Type":"String"
},
{
"Name":"storeloc",
"Category":"Dimension",
"Type":"String"
}
],
"Rows":[
{
"customerID":"id128404805",
"InvoiceID":"IN3956",
"storeloc":"TX359"
},
{
"customerID":"id128404806",
"InvoiceID":"IN0054",
"storeloc":"CA235"
},
{
"customerID":"id128404807",
"InvoiceID":"IN7439",
"storeloc":"AZ2309"
}
]
}
]
}
i am trying to load it into a pandas dataframe. The number of columns are the same in json file. The number of rows are around 10000.
I am trying to get into the rows and insert into a table after certain calculations.
I am trying to use json_normalize but I am struggling with navigating to the Rows level and normalizing after that. I know it must be an issue solution but I am new to working with Json. Thanks
try pd.json_normalize() with the record_path argument.
Note, you'll need pandas 0.25 or higher.
assuming your json object is j
df = pd.json_normalize(j,record_path=['DataFeed','Rows'])
print(df)
customerID InvoiceID storeloc
0 id128404805 IN3956 TX359
1 id128404806 IN0054 CA235
2 id128404807 IN7439 AZ2309
I have a data structure like this:
data = [{
"name": "leopard",
"character": "mean",
"skills": ["sprinting", "hiding"],
"pattern": "striped",
},
{
"name": "antilope",
"character": "good",
"skills": ["running"],
},
.
.
.
]
Each key in the dictionaries has values of type integer, string or
list of strings (not all keys are in all dicts present), each
dictionary represents a row in a table; all rows are given as the list
of dictionaries.
How can I easily import this into Pandas? I tried
df = pd.DataFrame.from_records(data)
but here I get an "ValueError: arrays must all be same length" error.
The DataFrame constructor takes row-based arrays (amoungst other structures) as data input. Therefore the following works:
data = [{
"name": "leopard",
"character": "mean",
"skills": ["sprinting", "hiding"],
"pattern": "striped",
},
{
"name": "antilope",
"character": "good",
"skills": ["running"],
}]
df = pd.DataFrame(data)
print(df)
Output:
character name pattern skills
0 mean leopard striped [sprinting, hiding]
1 good antilope NaN [running]
Here is example JSON im working with.
{
":#computed_region_amqz_jbr4": "587",
":#computed_region_d3gw_znnf": "18",
":#computed_region_nmsq_hqvv": "55",
":#computed_region_r6rf_p9et": "36",
":#computed_region_rayf_jjgk": "295",
"arrests": "1",
"county_code": "44",
"county_code_text": "44",
"county_name": "Mifflin",
"fips_county_code": "087",
"fips_state_code": "42",
"incident_count": "1",
"lat_long": {
"type": "Point",
"coordinates": [
-77.620031,
40.612749
]
}
I have been able to pull out select columns I want except I'm having troubles with "lat_long". So far my code looks like:
# PRINTS OUT SPECIFIED COLUMNS
col_titles = ['county_name', 'incident_count', 'lat_long']
df = df.reindex(columns=col_titles)
However 'lat_long' is added to the data frame as such: {'type': 'Point', 'coordinates': [-75.71107, 4...
I thought once I figured out how properly add the coordinates to the data frame I would then create two seperate columns, one for latitude and one for longitude.
Any help with this matter would be appreciated. Thank you.
If I don't misunderstood your requirements then you can try this way with json_normalize. I just added the demo for single json, you can use apply or lambda for multiple datasets.
import pandas as pd
from pandas.io.json import json_normalize
df = {":#computed_region_amqz_jbr4":"587",":#computed_region_d3gw_znnf":"18",":#computed_region_nmsq_hqvv":"55",":#computed_region_r6rf_p9et":"36",":#computed_region_rayf_jjgk":"295","arrests":"1","county_code":"44","county_code_text":"44","county_name":"Mifflin","fips_county_code":"087","fips_state_code":"42","incident_count":"1","lat_long":{"type":"Point","coordinates":[-77.620031,40.612749]}}
df = pd.io.json.json_normalize(df)
df_modified = df[['county_name', 'incident_count', 'lat_long.type']]
df_modified['lat'] = df['lat_long.coordinates'][0][0]
df_modified['lng'] = df['lat_long.coordinates'][0][1]
print(df_modified)
Here is how you can do it as well:
df1 = pd.io.json.json_normalize(df)
pd.concat([df1, df1['lat_long.coordinates'].apply(pd.Series) \
.rename(columns={0: 'lat', 1: 'long'})], axis=1) \
.drop(columns=['lat_long.coordinates', 'lat_long.type'])