Import JSON key value into pandas dataframe error - python

I'm trying to capture a json and put it on a json dataframe. The obtained json has the following format:
{"#odata.context":"https://was-p.bcnet.bcb.gov.br/olinda/servico/PTAX/versao/v1/odata$metadata#_CotacaoDolarPeriodo(cotacaoCompra,cotacaoVenda,dataHoraCotacao)","value":[{"cotacaoCompra":1.80030,"cotacaoVenda":1.80110,"dataHoraCotacao":"2000-01-03 19:43:00.0"},{"cotacaoCompra":1.83290,"cotacaoVenda":1.83370,"dataHoraCotacao":"2000-01-04 19:13:00.0"},{"cotacaoCompra":1.85360,"cotacaoVenda":1.85440,"dataHoraCotacao":"2000-01-05 19:11:00.0"},...{"cotacaoCompra":1.83840,"cotacaoVenda":1.83920,"dataHoraCotacao":"2000-05-25 19:14:00.0"}]}
However, I want to get only his 'value' key in the following format in dataframe pandas:
cotacaoCompra
cotacaoVenda
dataHoraCotacao
1.80030
1.80110
2000-01-03 19:43:00.0
1.83290
1.83370
2000-01-04 19:13:00.0
:----
:------:
-----:
1.83840
1.83920
2000-05-25 19:14:00.0
But when I run the code, the following error appears: ValueError: Invalid file path or buffer object type: <class 'dict'>
Here is the code I am using:
url = 'https://olinda.bcb.gov.br/olinda/servico/PTAX/versao/v1/odata/CotacaoDolarPeriodo(dataInicial=#dataInicial,dataFinalCotacao=#dataFinalCotacao)?#dataInicial=%2701-01-2000%27&#dataFinalCotacao=%2711-01-2020%27&$top=100&$format=json&$select=cotacaoCompra,cotacaoVenda,dataHoraCotacao'
resp = requests.get(url=url)
data = resp.json()
json.dumps(data['value'])[1:-1]
json.loads(json.dumps(data['value']))
df = pd.read_json(data)
df.head()
Can you help me move json to the dataframe in the format above?

Related

how to fix error (ValueError: Expected object or value) when using read_json

import pandas as pd
df = pd.read_json('publicextract.charity.json')
csvData = df.to_csv('new.csv')
I am trying to open a json file and save it to a CSV. i am getting this error from the read_json
Error - loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
This is what the json data looks like for example from charity file name - https://register-of-charities.charitycommission.gov.uk/register/full-register-download
[{"date_of_extract":"2022-10-15T00:00:00","organisation_number":1,"registered_charity_number":200027,"linked_charity_number":1,"charity_name":"POTTERNE MISSION ROOM AND TRUST","charity_type":null,"charity_registration_status":"Removed","date_of_registration":"1962-05-17T00:00:00","date_of_removal":"2014-04-16T00:00:00","charity_reporting_status":null,"latest_acc_fin_period_start_date":null,"latest_acc_fin_period_end_date":null,"latest_income":null,"latest_expenditure":null,"charity_contact_address1":null,"charity_contact_address2":null,"charity_contact_address3":null,"charity_contact_address4":null,"charity_contact_address5":null,"charity_contact_postcode":null,"charity_contact_phone":null,"charity_contact_email":null,"charity_contact_web":null,"charity_company_registration_number":null,"charity_insolvent":false,"charity_in_administration":false,"charity_previously_excepted":null,"charity_is_cdf_or_cif":null,"charity_is_cio":null,"cio_is_dissolved":null,"date_cio_dissolution_notice":null,"charity_activities":null,"charity_gift_aid":null,"charity_has_land":null}
,{"date_of_extract":"2022-10-15T00:00:00","organisation_number":2,"registered_charity_number":200027,"linked_charity_number":2,"charity_name":"HITCHAM FREE CHURCH","charity_type":null,"charity_registration_status":"Registered","date_of_registration":"1962-05-17T00:00:00","date_of_removal":null,"charity_reporting_status":null,"latest_acc_fin_period_start_date":null,"latest_acc_fin_period_end_date":null,"latest_income":null,"latest_expenditure":null,"charity_contact_address1":null,"charity_contact_address2":null,"charity_contact_address3":null....}]
You should create dataframe directly like this
import json
df = pd.DataFrame(json.loads(open("file.json").read()))
df
#output looks like this
date_of_extract organisation_number registered_charity_number
0 2022-10-15T00:00:00 1 200027
1 2022-10-15T00:00:00 2 200027

converting timestamp with letters into datetime

I have a txt file with data and values like this one:
PP C timestamp HR RMSSD SCL
PP1 1 20120918T131600000 NaN NaN 80.239727
PP1 1 20120918T131700000 61 0.061420 77.365127
and I am importing it like that:
df = pd.read_csv('data.txt','\t', header=0)
which gives me a nice looking dataframe:
Running
df.columns
shows this result Index(['PP', 'C', 'timestamp', 'HR', 'RMSSD', 'SCL'], dtype='object').
Now when I am trying to convert the timestamp column into a datetime column:
df["datetime"] = pd.to_datetime(df["timestamp"], format='%Y%m%dT%H%M%S%f')
I get this:
ValueError: time data 'timestamp' does not match format '%Y%m%dT%H%M%S%f' (match)
Any ideas would be appreciated.
First, the error message you're quoting is from the header row. It's trying to parse the literal string 'timestamp' as a timestamp, which is failing. If you're getting an error on an actual data row, show us that message.
All three of your posted data rows parse fine with your format in my testing:
>>> [pandas.to_datetime(s, format='%Y%m%dT%H%M%S%f')
for s in ['20120918T131600000', '20120918T131700000',
'20120918T131800000']]
[Timestamp('2012-09-18 13:16:00'), Timestamp('2012-09-18 13:17:00'), Timestamp('2012-09-18 13:18:00')]
I have no idea where you got format='%Y%m%dT%H%M%S%f'[:-3], which just removes the S%f from the format string, leaving it invalid. If you want to remove the last three digits of the data so that you ca just use %H%M%S instead of %H%M%S%f, you need to put the [:-3] on the timestamp data value, not the format.

Converting and inserting timestamp in pandas

I'm having an issue converting time. Column[0] is a timestamp, I want to insert a new column at[1] for now its called timestamp2. I'm trying to then use the for statement to convert column[0] to a readable time and add it to column[1]. Currently I get the new column inserted but I get this error:
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'int'>
I added .astype(int) to the timestamp variable but that didn't help.
Code:
import requests
import json
import pandas as pd
from datetime import datetime
url = 'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/cbse/spot/btc-usd/aggregations/count_ohlcv_vwap?interval=1h&page_size=1000'
KEY = 'xxx'
headers = {
"X-Api-Key": KEY,
"Accept": "application/json",
"Accept-Encoding": "gzip"
}
res = requests.get(url, headers=headers)
j_data = res.json()
parse_data = j_data['data']
# create dataframe
df = pd.DataFrame.from_dict(pd.json_normalize(parse_data), orient='columns')
df.insert(1, 'timestamp2', ' ')
for index, row in df.iterrows():
timestamp = df['timestamp'].astype(int)
dt = datetime.fromtimestamp(timestamp)
df.at[index, "timestamp2"] = dt
print(df)
df.to_csv('test.csv', index=False, encoding='utf-8')
Parsed data:
timestamp,timestamp2,open,high,low,close,volume,price,count
1611169200000,5,35260,35260.6,35202.43,35237.93,7.1160681299999995,35231.58133242965,132
1611165600000,5,34861.78,35260,34780.26,35260,1011.0965832999998,34968.5318431902,11313
1611162000000,5,34730.11,35039.98,34544.33,34855.43,1091.5246025199979,34794.45207484006,12877
In this example I set 'df.at[index, "timestamp2"] = dt' to 5 just to make sure it inserted in each row, it does so I just need to convert column[0] to a readable time for column[1].
If you convert the timestamp to integer, it seems to be milliseconds since the epoc based on the magnitudes of the values.
Here is some more details on unix-time if you are interested. https://en.wikipedia.org/wiki/Unix_time
You can convert this to datetime using pd.to_datetime.
It is a vectorised operation so you don't need to use the loop through the dataframe. Both pd.to_numeric and pd.to_datetime can be applied to an entire series.
It's hard to debug without all your data but the below should work. .astype(int) is an alternative to pd.to_numeric, the only difference is pd.to_numeric gives you more flexibility in the treatment of errors, allowing you to coerce to nan (not sure if this is wanted or not).
import pandas as pd
df = pd.DataFrame({'timestamp':['1611169200000']})
# convert to integer. If there are invalid entries this will set to nan. Depends on your case how you want to treat these.
timestamp_num = pd.to_numeric(df['timestamp'],errors='ignore')
df['timestamp2'] pd.to_datetime(timestamp_num,unit='ms')
print(df.to_dict())
#{'timestamp': {0: '1611169200000'}, 'timestamp2': {0: Timestamp('2021-01-20 19:00:00')}}

how to parse non key value json file in python

I have a json file which has data like :
"data": [[1467398683, "GB", "204.0.20", "tracks", "content-based", "b47911d0e80d1a8a959a2b726654bbfa", "Dance & Electronic", 1466640000, 413933, 413933,
I am trying to parse this non key value json file into a dataframe in python, can someone suggest how this can be achieved ?
You have two way:
Pandas read_json method have parameter (orient = 'values')
df = pd.read_json(path, orient='values')
Or if you need you data like a matrix you can do this
df = pd.DataFrame(json.load('{"data": [[1467398683,..your data...}')['data'])
Please see also this thread (Parsing json values in pandas read_json)

pandas read_json: "If using all scalar values, you must pass an index"

I have some difficulty in importing a JSON file with pandas.
import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')
This is the error that I get:
ValueError: If using all scalar values, you must pass an index
The file structure is simplified like this:
{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}
It is from the machine learning course of University of Washington on Coursera. You can find the file here.
Try
ser = pd.read_json('people_wiki_map_index_to_word.json', typ='series')
That file only contains key value pairs where values are scalars. You can convert it to a dataframe with ser.to_frame('count').
You can also do something like this:
import json
with open('people_wiki_map_index_to_word.json', 'r') as f:
data = json.load(f)
Now data is a dictionary. You can pass it to a dataframe constructor like this:
df = pd.DataFrame({'count': data})
You can do as #ayhan mention which will give you a column base format
Or you can enclose the object in [ ] (source) as shown below to give you a row format that will be convenient if you are loading multiple values and planing on using matrix for your machine learning models.
df = pd.DataFrame([data])
I think what is happening is that the data in
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')
is being read as a string instead of a json
{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}
is actually
'{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}'
Since a string is a scalar, it wants you to load it as a json, you have to convert it to a dict which is exactly what the other response is doing
The best way is to do a json loads on the string to convert it to a dict and load it into pandas
myfile=f.read()
jsonData=json.loads(myfile)
df=pd.DataFrame(data)
{
"biennials": 522004,
"lb915": 116290
}
df = pd.read_json('values.json')
As pd.read_json expects a list
{
"biennials": [522004],
"lb915": [116290]
}
for a particular key, it returns an error saying
If using all scalar values, you must pass an index.
So you can resolve this by specifying 'typ' arg in pd.read_json
map_index_to_word = pd.read_json('Datasets/people_wiki_map_index_to_word.json', typ='dictionary')
For newer pandas, 0.19.0 and later, use the lines parameter, set it to True.
The file is read as a json object per line.
import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json', lines=True)
If fixed the following errors I encountered especially when some of the json files have only one value:
ValueError: If using all scalar values, you must pass an index
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
ValueError: Trailing data
For example
cat values.json
{
name: "Snow",
age: "31"
}
df = pd.read_json('values.json')
Chances are you might end up with this
Error: if using all scalar values, you must pass an index
Pandas looks up for a list or dictionary in the value. Something like
cat values.json
{
name: ["Snow"],
age: ["31"]
}
So try doing this. Later on to convert to html tohtml()
df = pd.DataFrame([pd.read_json(report_file, typ='series')])
result = df.to_html()
I solved this by converting it into an array like so
[{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}]

Categories