This question already has answers here:
NumPy array is not JSON serializable
(15 answers)
Closed 4 years ago.
I've a data frame genre_rail in which one column contains numpy.ndarray. The dataframe looks like as given below
The array in it looks like this :
['SINGTEL_movie_22906' 'SINGTEL_movie_22943' 'SINGTEL_movie_24404'
'SINGTEL_movie_22924' 'SINGTEL_movie_22937' 'SINGTEL_movie_22900'
'SINGTEL_movie_24416' 'SINGTEL_movie_24422']
I tried with the following code
import json
json_content = json.dumps({'mydata': [genre_rail.iloc[i]['content_id'] for i in range(len(genre_rail))] })
But got an error
TypeError: array is not JSON serializable
I need output as
{"Rail2_contend_id":
["SINGTEL_movie_22894","SINGTEL_movie_22898",
"SINGTEL_movie_22896","SINGTEL_movie_24609","SINGTEL_movie_2455",
"SINGTEL_movie_24550","SINGTEL_movie_24548","SINGTEL_movie_24546"]}
How about you convert the array to json using the .tolist method.
Then you can write it to json like :
np_array_to_list = np_array.tolist()
json_file = "file.json"
json.dump(b, codecs.open(json_file, 'w', encoding='utf-8'), sort_keys=True, indent=4)
Load all the data in dictionary, then dump it to json. Below code might help you
import json
#Data
d = ['SINGTEL_movie_22906', 'SINGTEL_movie_22943', 'SINGTEL_movie_24404'
'SINGTEL_movie_22924', 'SINGTEL_movie_22937', 'SINGTEL_movie_22900'
'SINGTEL_movie_24416', 'SINGTEL_movie_24422']
#Create dict
dic = {}
dic['Rail2_contend_id'] = d
print dic
#Dump data dict to jason
j = json.dumps(dic)
Output
{'Rail2_contend_id': ['SINGTEL_movie_22906', 'SINGTEL_movie_22943', 'SINGTEL_movie_24404SINGTEL_movie_22924', 'SINGTEL_movie_22937', 'SINGTEL_movie_22900SINGTEL_movie_24416', 'SINGTEL_movie_24422']}
Related
I'm writing a very small Pandas dataframe to a JSON file. In fact, the Dataframe has only one row with two columns.
To build the dataframe:
import pandas as pd
df = pd.DataFrame.from_dict(dict({'date': '2020-10-05', 'ppm': 411.1}), orient='index').T
print(df)
prints
date ppm
0 2020-10-05 411.1
The desired json output is as follows:
{
"date": "2020-10-05",
"ppm": 411.1
}
but when writing the json with pandas, I can only print it as an array with one element, like so:
[
{
"date":"2020-10-05",
"ppm":411.1
}
]
I've currently hacked my code to convert the Dataframe to a dict, and then use the json module to write the file.
import json
data = df.to_dict(orient='records')
data = data[0] # keep the only element
with open('data.json', 'w') as fp:
json.dump(data, fp, indent=2)
Is there a native way with pandas' .to_json() to keep the only dictionary item if there is only one?
I am currently using .to_json() like this, which incorrectly prints the array with one dictionary item.
df.to_json('data.json', orient='index', indent = 2)
Python 3.8.6
Pandas 1.1.3
If you want to export only one row, use iloc:
print (df.iloc[0].to_dict())
#{'date': '2020-10-05', 'ppm': 411.1}
I am trying to import a (very) large json file (3.3m rows, 1k columns), that has nested multiple nested jsons within it. Some of these nested jsons are double nested. I have found two ways to import the json file into a dataframe, however, I can't get the imported json to be flattened, and converted to strings at he same time.
The codes I am using are:
# 1: Import directly and convert to string
def Data_IMP(path):
with open(path) as Data:
Data_IMP = pd.read_json(Data, dtype=str)
Data_IMP = Data_IMP.replace("nan", "", regex=True)
return Data_IMP
The issue with the above is that it doesn't flatten the json file fully.
# 2: Import json and normalise
def Data_IMP(path):
with open(path) as Data:
d = json.load(Data)
Data_IMP = json_normalize(d)
return Data_IMP
The above script flattens out the json, but lets Python decide on the dtype for each column.
Is there a way to combine these approaches, so that the json file is flattened, and all columns read a strings?
I found a solution that worked, and was able to both import and flatten the jsons, as well as convert all text to strings.
# Function to import data from ARIC json file to dataframe
def Data_IMP(path):
with open(path) as Data:
d = json.load(Data)
Data_IMP = json_normalize(d)
return Data_IMP
# --------------------------------------------------------------------------------------------------------- #
# Function to cleanse Data file
def Data_Cleanse(Data_IMP):
Data_Cleanse = Data_IMP.replace(np.nan, '', regex=True)
Data_Cleanse = Data_Cleanse.astype(str)
return Data_Cleanse
How to convert multiple json objects to one json object and make a dataframe like keys as columns and values as rows in python.help would be greately appreciated
my json doesnt have comma's after each json object
data = {"name":"john",
"class":"fifth"}
{"name":"emma",
"class":"sixth"}
#my full method
from flask import Flask, jsonify, request
import cx_Oracle
app = Flask(__name__)
app.debug = True
conn = cx_Oracle.connect(user='',password='',dsn=dsn_tns)
c.execute('''SELECT APPLE, BANANA, CARROT FROM VEGETABLES''')
for row in c:
data = (json.dumps(row, indent=4, sort_keys=True, default=str)
print (data)
data = {"name":"john",
"class":"fifth"}
{"name":"emma",
"class":"sixth"}
I think that's invalid JSON, but I'm confident that it's invalid Python. I'm thinking that it looks like NDJSON?
If your data is in a file, and if every object is guaranteed to have exactly two keys name and class:
import json
import pandas a pd
data = {'name':[], 'class':[]}
with open("file.ndjson", "r") as f:
for line in f:
d = json.loads(line)
data['name'].append(d['name'])
data['class'].append(d['class'])
edit: You said your data are a response. This usually works for me:
r = # Response
pd.DataFrame(r.json())
This question already has answers here:
Pandas read nested json
(3 answers)
Closed 4 years ago.
I have a text file that contains a series of data in the form of dictionary.
I would like to read and store as a data frame in pandas.
How would I read.
I read pd.csv yet it does not give me the dataframe.
Can anyone help me with that?
You can download the text file Here
Thanks,
Zep,
The problem is you have a nested json. Try using json_normalize instead:
import requests #<-- requests library helps us handle http-requests
import pandas as pd
id_ = '1DbfQxBJKHvWO2YlKZCmeIN4al3xG8Wq5'
url = 'https://drive.google.com/uc?authuser=0&id={}&export=download'.format(id_)
r = requests.get(url)
df = pd.io.json.json_normalize(r.json())
print(df.columns)
or from hard drive, and json_normalize as wants to read a dictionary object and not a path:
import pandas as pd
import json
with open('myfile.json') as f:
jsonstr = json.load(f)
df = pd.io.json.json_normalize(jsonstr)
Returns:
Index(['average.accelerations', 'average.aerialDuels', 'average.assists',
'average.attackingActions', 'average.backPasses', 'average.ballLosses',
'average.ballRecoveries', 'average.corners', 'average.crosses',
'average.dangerousOpponentHalfRecoveries',
...
'total.successfulLongPasses', 'total.successfulPasses',
'total.successfulPassesToFinalThird', 'total.successfulPenalties',
'total.successfulSmartPasses', 'total.successfulThroughPasses',
'total.successfulVerticalPasses', 'total.throughPasses',
'total.verticalPasses', 'total.yellowCards'],
dtype='object', length=171)
Another idea would be to store the nested objects in a Series (and you can let a dictionary hold that those series).
dfs = {k: pd.Series(v) for k,v in r.json().items()}
print(dfs.keys())
# ['average', 'seasonId', 'competitionId', 'positions', 'total', 'playerId', 'percent'])
print(dfs['percent'])
Returns:
aerialDuelsWon 23.080
defensiveDuelsWon 18.420
directFreeKicksOnTarget 0.000
duelsWon 33.470
fieldAerialDuelsWon 23.080
goalConversion 22.581
headShotsOnTarget 0.000
offensiveDuelsWon 37.250
penaltiesConversion 0.000
shotsOnTarget 41.940
...
yellowCardsPerFoul 12.500
dtype: float64
The data only has one entry though.
You can convert you data to json after reading it as string, then use pandas.read_json() to convert your json to a dataframe.
Example:
import json
from pandas.io.json import json_normalize
f = open("file.txt", "w+")
contents = f.read()
contents = contents.replace("\n", "")
json_data = json.loads(contents)
df = json_normalize(json.loads(data))
You should have your data as a dataframe after that.
Hope this helps!
I'm using df.to_json() to convert dataframe to json. But it gives me a json string and not an object.
How can I get JSON object?
Also, when I'm appending this data to an array, it adds single quote before and after the json and it ruins the json structure.
How can I export to json object and append properly?
Code Used:
a=[]
array.append(df1.to_json(orient='records', lines=True))
array.append(df2.to_json(orient='records', lines=True))
Result:
['{"test:"w","param":1}','{"test:"w2","param":2}]']
Required Result:
[{"test":"w","param":1},{"test":"w2","param":2}]
Thank you!
I believe need create dict and then convert to json:
import json
d = df1.to_dict(orient='records')
j = json.dumps(d)
Or if possible:
j = df1.to_json(orient='records')
Here's what worked for me:
import pandas as pd
import json
df = pd.DataFrame([{"test":"w","param":1},{"test":"w2","param":2}])
print(df)
test param
0 w 1
1 w2 2
So now we convert to a json string:
d = df.to_json(orient='records')
print(d)
'[{"test":"w","param":1},{"test":"w2","param":2}]'
And now we parse this string to a list of dicts:
data = json.loads(d)
print(data)
[{'test': 'w', 'param': 1}, {'test': 'w2', 'param': 2}]