why i cant load json file to pandas data frame - python

My code:
import json
import requests
responseGBP=requests.get("https://public.opendatasoft.com/api/records/1.0/search/?dataset=euro-exchange-rates&sort=date&facet=currency&rows=30&facet=date&q=date:[2020-12-01+TO+2020-12-31]&refine.currency=GBP")
response_jGBP=responseGBP.content.decode("utf-8")
df = pd.read_json(response_jGBP)
df
I'm getting this error:
All arrays must be of the same length
I want to get the currency data , but i cant covert the json file to pandas dataframe.
I'm getting "All arrays must be of the same length" Error

Because you didn't analyze dictionary structure properly.
You have to understand that for all columns in pandas rows will be of same count, and you can also load specific part of response if you want.
import pandas as pd
res = requests.get("https://public.opendatasoft.com/api/records/1.0/search/?dataset=euro-exchange-rates&sort=date&facet=currency&rows=30&facet=date&q=date:[2020-12-01+TO+2020-12-31]&refine.currency=GBP")
df = pd.DataFrame(res.json()["records"])
Above code will load the data but not sure if that is you wanted, but you got the idea you have to look at first JSON structure. If needed, you have to modify it as well to load as data frame.

Related

How do I get my response to show ndjson data instead of text?

My API call returns an ndjson format that wont allow me to use "data = response.json".
My only work around is doing this. This give me a string format that is difficult to parse. After put into a pandas df, it is about 20 columns of dictionaries of dictionaries. Is there a better way of getting the ndjson data and/or parsing these columns?
import pandas as pd
from io import StringIO
import ndjson as nd
data = response.text()
df = pd.read_json(StringIO(data), lines = True)

How to convert nested JSON data from a URL into a Pandas dataframe

import json
import pandas
import requests
Convert to Pandas
I know what you're going to say, this has been asked before. But ive gone through a number of posts already and they all require importing the json file into the code already.
So with this code I've been trying to import the json data through a URL, so there is no need to save any files before hand.
Is it even possible?
Please help.
Pandas json_normalize can do just that. Here is an example for which you will have to modify to meet your specific needs:
df = pd.json_normalize(packages_json, record_path='results')
(I omitted the output from the DF because is it unwieldy)

Read multi-level json in python with pandas from url

I try to read multi-level JSON with pandas and store data in the data-frame for next work with it or for print. The main goal for me is to understand how to read data from each level of JSON.
Here you are my first steps, which works:
import pandas as pd
import requests
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
d = req.json()
#what is next step?
#something like this? df = pd.DataFrame.from_dict(d.Data)
Could you tell me, how to read:
1st level (columns PageIndex, PageSize, TotalCount, Data)
2 level (from Data columns Code, Timestamp, Category, snapshots)
3 level (from Data and snapshots columns Code, DateFrom, DateTo, Type ...)
some good tip for next work with data?
maybe you tell me, that using pandas is not the best way how to read JSON
Here is json:
my json file to download from OneDrive
{"PageIndex":0,"PageSize":2,"TotalCount":100248,"Data":[{"Code":"859182400102974","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400102974","DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-502,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"},{"Code":"859182400102974","DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-382,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation":-382}],"ConsumptionEstimation2":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation2":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation2":-382}]}},{"Code":"859182400104897","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400104897","DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-280,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"},{"Code":"859182400104897","DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-282,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation":-282}],"ConsumptionEstimation2":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation2":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation2":-282}]}}]}
Thank you
I think using pandas to process JSON is not a good choice, because pandas is trying to deal with structural data, but in your example you are dealing with multi-level unstructured data.
But if you insist to do that, you can extract structural data from your JSON structure. For example, you can extract the array in JSON_ROOT."Data"."snapshots" into an ArrayList and save it into pd.DataFrame. Otherwise, you can only save the JSON structure as a string in one column in pd.DataFrame.
From answers above I am not more clever as before.
So I try to reduce my question to one question.
How Can I get table with 4 columns:
Data.Code; Data.snapshots.DateFrom; Data.snapshots.Address.Street; Data.snapshots.Address.City
This is my code, but it is necessary to correct it, but I do not how. The Code works but it returns 30 columns and not exactly what I want.
import pandas as pd
import requests
import pandas.io.json as pd_json
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"
req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()
df = pd_json.json_normalize(fin,
record_path=['Data','snapshots'],
record_prefix = 'Data.',
errors = 'ignore'
)
print(df)
Thank you for help.

What is the correct way to convert json data (which is undefined/messy) into a DataFrame?

I am trying to understand how JSON data which is not parsed/extracted correctly can be converted into a (Pandas) DataFrame.
I am using python (3.7.1) and have tried the usual way of reading the JSON data. Actually, the code works if I use transpose or axis=1 syntax. But using that completely ignores a large number of values or variables in the data and I am 100% sure that the maybe the code is working but is not giving the desired results.
import pandas as pd
import numpy as np
import csv
import json
sourcefile = open(r"C:\Users\jadil\Downloads\chicago-red-light-and-speed-camera-data\socrata_metadata_red-light-camera-violations.json")
json_data = json.load(sourcefile)
#print(json_data)
type(json_data)
dict
## this code works but is not loading/reading complete data
df = pd.DataFrame.from_dict(json_data, orient="index")
df.head(15)
#This is what I am getting for the first 15 rows
df.head(15)
0
createdAt 1407456580
description This dataset reflects the daily volume of viol...
rights [read]
flags [default, restorable, restorePossibleForType]
id spqx-js37
oid 24980316
owner {'type': 'interactive', 'profileImageUrlLarge'...
newBackend False
totalTimesRated 0
attributionLink http://www.cityofchicago.org
hideFromCatalog False
columns [{'description': 'Intersection of the location...
displayType table
indexUpdatedAt 1553164745
rowsUpdatedBy n9j5-zh
As you have seen, Pandas will attempt to create a data frame out of JSON data even if it is not parsed or extracted correctly. If your goal is to understand exactly what Pandas does when presented with a messy JSON file, you can look inside the code for pd.DataFrame.from_dict() to learn more. If your goal is to get the JSON data to convert correctly to a Pandas data frame, you will need to provide more information abut the JSON data, ideally by providing a sample of the data as text in your question. If your data is sufficiently complicated, you might try the json_normalize() function as described here.

read Json file using Pandas

I am trying to read a json file using pandas's read_json function and i am getting result but not what i want
My result have first row as a header (Titles) and i want to ignore first row in my result.
Below is my python code.
import json
import pandas as pd
result=pd.read_json('dummy_DB_clean.json')
print result
I tried pandas's json_normalize() function but did not get desired output.
If anyone of you , come across with this problem, please suggest me the solution.
Thanks,
Try this:
import json
import pandas as pd
df=pd.read_json('dummy_DB_clean.json')
df.drop(df.head(1).index, inplace=True)
print df

Categories