Read multi-level json in python with pandas from url - python
I try to read multi-level JSON with pandas and store data in the data-frame for next work with it or for print. The main goal for me is to understand how to read data from each level of JSON.
Here you are my first steps, which works:
import pandas as pd
import requests
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
d = req.json()
#what is next step?
#something like this? df = pd.DataFrame.from_dict(d.Data)
Could you tell me, how to read:
1st level (columns PageIndex, PageSize, TotalCount, Data)
2 level (from Data columns Code, Timestamp, Category, snapshots)
3 level (from Data and snapshots columns Code, DateFrom, DateTo, Type ...)
some good tip for next work with data?
maybe you tell me, that using pandas is not the best way how to read JSON
Here is json:
my json file to download from OneDrive
{"PageIndex":0,"PageSize":2,"TotalCount":100248,"Data":[{"Code":"859182400102974","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400102974","DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-502,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"},{"Code":"859182400102974","DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-382,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation":-382}],"ConsumptionEstimation2":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation2":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation2":-382}]}},{"Code":"859182400104897","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400104897","DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-280,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"},{"Code":"859182400104897","DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-282,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation":-282}],"ConsumptionEstimation2":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation2":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation2":-282}]}}]}
Thank you
I think using pandas to process JSON is not a good choice, because pandas is trying to deal with structural data, but in your example you are dealing with multi-level unstructured data.
But if you insist to do that, you can extract structural data from your JSON structure. For example, you can extract the array in JSON_ROOT."Data"."snapshots" into an ArrayList and save it into pd.DataFrame. Otherwise, you can only save the JSON structure as a string in one column in pd.DataFrame.
From answers above I am not more clever as before.
So I try to reduce my question to one question.
How Can I get table with 4 columns:
Data.Code; Data.snapshots.DateFrom; Data.snapshots.Address.Street; Data.snapshots.Address.City
This is my code, but it is necessary to correct it, but I do not how. The Code works but it returns 30 columns and not exactly what I want.
import pandas as pd
import requests
import pandas.io.json as pd_json
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()
df = pd_json.json_normalize(fin,
record_path=['Data','snapshots'],
record_prefix = 'Data.',
errors = 'ignore'
)
print(df)
Thank you for help.
Related
How do you run an excel function from within Python?
We are using an excel plugin to pull some data from an API. Our excel file contains a column with an entity identifier, and we use an excel formula to pull data for this entity from the internet. Is it possible to run this from within Python? I could export my pd.DataFrame to csv, open it with excel, append the data I want, and read it back into pandas... but is there a quicker way?
You can import the request and extract data using the Json() method import pandas as pd import requests url = 'https://api.covid19api.com/summary' r = requests.get(url) json = r.json() json Then you have the data and just need to include it in your dataframe
Populating an Excel File Using an API to track Card Prices in Python
I'm a novice when it comes to Python and in order to learn it, I was working on a side project. My goal is to track card prices of my YGO cards using the yu-gi-oh prices API https://yugiohprices.docs.apiary.io/# I am attempting to manually enter the print tag for each card and then have the API pull the data and populate the spreadsheet, such as the name of the card and its trait, in addition to the price data. So anytime I run the code, it is updated. My idea was to use a for loop to get the API to search up each print tag and store the information in an empty dictionary and then post the results onto the excel file. I added an example of the spreadsheet. Please let me know if I can clarify further. Any suggestions to the code that would help me achieve the goal for this project would be appreciated. Thanks in advance import requests import response as rsp import urllib3 import urlopen import json import pandas as pd df = pd.read_excel("api_ygo.xlsx") print(df[:5]) # See the first 5 columns response = requests.get('http://yugiohprices.com/api/price_for_print_tag/print_tag') print(response.json()) data = [] for i in df: print_tag = i[2] request = requests.get('http://yugiohprices.com/api/price_for_print_tag/print_tag' + print_tag) data.append(print_tag) print(data) def jprint(obj): text = json.dumps(obj, sort_keys=True, indent=4) print(text) jprint(response.json()) Example Spreadsheet
Iterating over a pandas dataframe can be done using df.apply(). This has the added advantage that you can store the results directly in your dataframe. First define a function that returns the desired result. Then apply the relevant column to that function while assigning the output to a new column: import requests import pandas as pd import time df = pd.DataFrame(['EP1-EN002', 'LED6-EN007', 'DRL2-EN041'], columns=['print_tag']) #just dummy data, in your case this is pd.read_excel def get_tag(print_tag): request = requests.get('http://yugiohprices.com/api/price_for_print_tag/' + print_tag) #this url works, the one in your code wasn't correct time.sleep(1) #sleep for a second to prevent sending too many API calls per minute return request.json() df['result'] = df['print_tag'].apply(get_tag) You can now export this column to a list of dictionaries with df['result'].tolist(). Or even better, you can flatten the results into a new dataframe with pd.json_normalize: df2 = pd.json_normalize(df['result']) df2.to_excel('output.xlsx') # save dataframe as new excel file
How do I export JSON data to CSV using python?
I'm building a site that, based on a user's input, sorts through JSON data and prints a schedule for them into an html table. I want to give it the functionality that once the their table is created they can export the data to a CSV/Excel file so we don't have to store their credentials (logins & schedules in a database). Is this possible? If so, how can I do it using python preferably?
This is not the exact answer but rather steps for you to follow in order to get a solution: 1 Read data from json. some_dict = json.loads(json_string) 2 Appropriate code to get the data from dictionary (sort/ conditions etc) and get that data in a 2D array (list) 3 Save that list as csv: https://realpython.com/python-csv/
I'm pretty lazy and like to utilize pandas for things like this. It would be something along the lines of import pandas as pd file = 'data.json' with open(file) as j: json_data = json.load(j) df = pd.DataFrame.from_dict(j, orient='index') df.to_csv("data.csv")
Python - Setting JSON data from itunes in Data Frame with Pandas
I am new to Python and I am facing, what a I believe to be, a fairly simple problem with Json and Pandas. Unfortunately it seems that my brain stopped working, so I would appreciate your help. I want to analyse reviews on itunes from the game Super Mario, and I want to do it with JSON. I want to retrieve the comments and all the information that comes along, and I want to have it as a Data Frame so I can start my analysis. Link: https://itunes.apple.com/gb/rss/customerreviews/id=1145275343/page=1/json My code: import json import requests import pandas as pd requestpost = requests.get('https://itunes.apple.com/gb/rss/customerreviews/id=1145275343/page=1/json') r = json.loads(requestpost.text) r dict_keys= r['feed'].keys() df = pd.DataFrame(r['feed'] , columns = [list(dict_keys)]) df Output: author entry updated rights title icon link id I just get the columns and no data inside each column. I am following the book Python for Data Analysis, I have read the documentation and went through countless examples. I do not understand what is the problem. Any help would be very much appreciated. Best regards Solution import json import requests import pandas as pd response = requests.get('https://itunes.apple.com/gb/rss/customerreviews/id=1145275343/page=1/json') json_data = json.loads(response.text) data = json_data['feed']['entry'] pd.json_normalize(data=data)
you get your desired output the following way: import json import requests from pandas.io.json import json_normalize response = requests.get('https://itunes.apple.com/gb/rss/customerreviews/id=1388411277/page=1/json') json_data = json.loads(response.text) data = json_data['feed'] json_normalize(data=data) json_normalize returns a dataframe.
From your data it looks like you have multiple keys within the first key. For example: r['feed']["author"] output: {'name': {'label': 'iTunes Store'}, uri': {'label': 'http://www.apple.com/uk/itunes/'}} So you will need to filter down a bit more to get your desired output
What is the correct way to convert json data (which is undefined/messy) into a DataFrame?
I am trying to understand how JSON data which is not parsed/extracted correctly can be converted into a (Pandas) DataFrame. I am using python (3.7.1) and have tried the usual way of reading the JSON data. Actually, the code works if I use transpose or axis=1 syntax. But using that completely ignores a large number of values or variables in the data and I am 100% sure that the maybe the code is working but is not giving the desired results. import pandas as pd import numpy as np import csv import json sourcefile = open(r"C:\Users\jadil\Downloads\chicago-red-light-and-speed-camera-data\socrata_metadata_red-light-camera-violations.json") json_data = json.load(sourcefile) #print(json_data) type(json_data) dict ## this code works but is not loading/reading complete data df = pd.DataFrame.from_dict(json_data, orient="index") df.head(15) #This is what I am getting for the first 15 rows df.head(15) 0 createdAt 1407456580 description This dataset reflects the daily volume of viol... rights [read] flags [default, restorable, restorePossibleForType] id spqx-js37 oid 24980316 owner {'type': 'interactive', 'profileImageUrlLarge'... newBackend False totalTimesRated 0 attributionLink http://www.cityofchicago.org hideFromCatalog False columns [{'description': 'Intersection of the location... displayType table indexUpdatedAt 1553164745 rowsUpdatedBy n9j5-zh
As you have seen, Pandas will attempt to create a data frame out of JSON data even if it is not parsed or extracted correctly. If your goal is to understand exactly what Pandas does when presented with a messy JSON file, you can look inside the code for pd.DataFrame.from_dict() to learn more. If your goal is to get the JSON data to convert correctly to a Pandas data frame, you will need to provide more information abut the JSON data, ideally by providing a sample of the data as text in your question. If your data is sufficiently complicated, you might try the json_normalize() function as described here.