I am trying to convert dictionary to json, and one of dictionary values is from dataframe.to_json, and I got some strange output as following:
Here is the code
import json
import pandas as pd
my_dict = {}
my_dict["ClassName"] = "First class"
# get student list
df = pd.read_csv("./test.csv")
my_dict["StudentList"] = df.to_json(orient='records')
# output
with open("./output.json", 'w') as fp:
json.dump(my_dict, fp, indent=4)
Here is the input file ./test.csv
Name,Age
Joe,20
Emily,22
John,21
Peter,23
Here is the output file ./output.json
{
"ClassName": "First class",
"StudentList": "[{\"Name\":\"Joe\",\"Age\":20},{\"Name\":\"Emily\",\"Age\":22},{\"Name\":\"John\",\"Age\":21},{\"Name\":\"Peter\",\"Age\":23}]"
}
Here is what I need:
{
"ClassName": "First class",
"StudentList": [{"Name":"Joe","Age":20},{"Name":"Emily","Age":22},{"Name":"John","Age":21},{"Name":"Peter","Age":23}]
}
Thanks for any help.
Use df.to_dict instead of df.to_json:
my_dict["StudentList"] = df.to_dict(orient='records')
to_json just returns a string representation of the JSON, while to_dict returns an actual JSON object.
Related
https://github.com/Asabeneh/30-Days-Of-Python/blob/ff24ab221faaec455b664ad5bbdc6e0de76c3caf/data/countries_data.json
how can i loop through this countries_data.json file (see link above) to get 'languages'
i have tried:
import json
f = open("countries_data.json")
file = f.read()
# print(file)
for item in file:
print(item)
You have everything correct and set up but you didn't load the json file. Also there is a double space on "f = open". You also didn't open the file with the read parameter, not too sure if its needed though.
Correct code:
import json
f = open("countries_data.json", "r")
file = json.loads(f.read())
for item in file:
print(item)
Hope this helped, always double check your code.
You can see that you import the json module at the beginning, so you might as well use it
If you go to the documentation you will see a function allowing you to read this file directly.
In the end you end up with just a dictionary list, the code can be summarized as follows.
import json
with open("test/countries_data.json") as file:
data = json.load(file)
for item in data:
print(item["languages"])
You are missing one essential step, which is parsing the JSON data to Python datastructures.
import json
# read file
f = open("countries.json")
# parse JSON to Python datastructures
countries = json.load(f)
# now you have a list of countries
print(type(countries))
# loop through list of countries
for country in countries:
# you can access languages with country["languages"]; JSON objects are Python dictionaries now
print(type(country))
for language in country["languages"]:
print(language)
f.close()
Expected output:
<class 'list'>
<class 'dict'>
Pashto
Uzbek
Turkmen
...
You can use the json built-in package to deserialize the content of that file.
A sample of usage
data = """[
{
"name": "Afghanistan",
"capital": "Kabul",
"languages": [
"Pashto",
"Uzbek",
"Turkmen"
],
"population": 27657145,
"flag": "https://restcountries.eu/data/afg.svg",
"currency": "Afghan afghani"
},
{
"name": "Ă…land Islands",
"capital": "Mariehamn",
"languages": [
"Swedish"
],
"population": 28875,
"flag": "https://restcountries.eu/data/ala.svg",
"currency": "Euro"
}]"""
# deserializing
print(json.loads(data))
For more complex content have a look to the JSONDecoder.
doc
EDIT:
import json
path = # my file
with open(path, 'r') as fd:
# iterate over the dictionaries
for d in json.loads(fd.read()):
print(d['languages'])
EDIT: extra - top 10 languages
import json
import itertools as it
path = # path to file
with open(path, 'r') as fd:
text = fd.read()
languages_from_file = list(it.chain(*(d['languages'] for d in json.loads(text))))
# get unique "list" of languages
languages_all = set(languages_from_file)
# count the repeated languages
languages_count = {l: languages_from_file.count(l) for l in languages_all}
# order them per descending value
top_ten_languages = sorted(languages_count.items(), key=lambda k: k[1], reverse=True)[:10]
print(top_ten_languages)
Example: Desired output
{
"id": "",
"data": {
"package": ""
}
}
Here is the little script I have put together
import pandas as pd
df=pd.read_csv('example.csv')
df1=df[['request','text']]
dfnew=df1.rename(columns={'request':'id','text':'package'})
with open('something.json','w') as f:
f.write(dfnew.to_json(orient='records',lines=True))
Output I receive after running the script
{"id":"","package":}
I'll start with a mock dfnew since the code above it does not affect your problem.
If Pandas does not have a built-in method to export exactly what you want, you can manually manipulate the JSON before dumping it to file:
import json
dfnew = pd.DataFrame({
'id': [''],
'package': ['']
})
with open('something.json', 'w') as f:
jsonString = dfnew.to_json(orient='records', lines=True)
jsonObject = json.loads(jsonString)
package = jsonObject.pop('package')
jsonObject['data'] = {
'package': package
}
json.dump(jsonObject, f, indent=4)
I have a function that I apply to a json file. It works if it looks like this:
import json
def myfunction(dictionary):
#does things
return new_dictionary
data = """{
"_id": {
"$oid": "5e7511c45cb29ef48b8cfcff"
},
"description": "some text",
"startDate": {
"$date": "5e7511c45cb29ef48b8cfcff"
},
"completionDate": {
"$date": "2021-01-05T14:59:58.046Z"
},
"videos":[{"$oid":"5ecf6cc19ad2a4dfea993fed"}]
}"""
info = json.loads(data)
refined = key_replacer(info)
new_data = json.dumps(refined)
print(new_data)
However, I need to apply it to a whole while and the input looks like this (there are multiple elements and they are not separated by commas, they are one after another):
{"_id":{"$oid":"5f06cb272cfede51800b6b53"},"company":{"$oid":"5cdac819b6d0092cd6fb69d3"},"name":"SomeName","videos":[{"$oid":"5ecf6cc19ad2a4dfea993fed"}]}
{"_id":{"$oid":"5ddb781fb4a9862c5fbd298c"},"company":{"$oid":"5d22cf72262f0301ecacd706"},"name":"SomeName2","videos":[{"$oid":"5dd3f09727658a1b9b4fb5fd"},{"$oid":"5d78b5a536e59001a4357f4c"},{"$oid":"5de0b85e129ef7026f27ad47"}]}
How could I do this? I tried opening and reading the file, using load and dump instead of loads and dumps, and it still doesn't work. Do I need to read, or iterate over every line?
You are dealing with ndjson(Newline delimited JSON) data format.
You have to read the whole data string, split it by lines and parse each line as a JSON object resulting in a list of JSONs:
def parse_ndjson(data):
return [json.loads(l) for l in data.splitlines()]
with open('C:\\Users\\test.json', 'r', encoding="utf8") as handle:
data = handle.read()
dicts = parse_ndjson(data)
for d in dicts:
new_d = my_function(d)
print("New dict", new_d)
I'm getting facedetection data from an API in this form:
{"id":1,"ageMin":0,"ageMax":100,"faceConfidence":66.72220611572266,"emotion":"ANGRY","emotionConfidence":50.0'
b'2540969848633,"eyeglasses":false,"eyeglassesConfidence":50.38102722167969,"eyesOpen":true,"eyesOpenConfidence":50.20328140258789'
b',"gender":"Male","genderConfidence":50.462989807128906,"smile":false,"smileConfidence":50.15522384643555,"sunglasses":false,"sun'
b'glassesConfidence":50.446510314941406}]'
I'd like to save this to a csv-file like this:
id ageMin ageMax faceConfidence
1 0 100 66
... and so on.
I tried to do it this way:
response = requests.get(url, headers=headers)
with open('detections.csv', 'w') as f:
writer = csv.writer(f)
for item in response:
writer.writerow(str(item))
That puts every char in its own cell. I've also tried to use item.id, but that gives an error: AttributeError: 'bytes' object has no attribute 'id'.
Could someone point me to the right direction?
Maybe an overkill for a small task, but you can do the following:
convert JSON response (do not forget to check exceptions, etc.) to python dictionary
dic = response.json()
Create a dataframe, for example using pandas:
df = pandas.DataFrame(dic)
Save to csv omitting index:
df.to_csv('detections.csv', index=False, sep="\t")
You can do this relatively easily with the pandas and json libraries.
import pandas as pd
import json
response = """{
"id": 1,
"ageMin": 0,
"ageMax": 100,
"faceConfidence": 66.72220611572266,
"emotion": "ANGRY",
"emotionConfidence": 50.0,
"eyeglasses": false,
"eyeglassesConfidence": 50.38102722167969,
"eyesOpen": true,
"eyesOpenConfidence": 50.20328140258789,
"gender": "Male",
"genderConfidence": 50.462989807128906,
"smile": false,
"smileConfidence": 50.15522384643555,
"sunglasses": false,
"glassesConfidence":50.446510314941406
}"""
file = json.loads(doc)
json = pd.DataFrame({"data": file})
json.to_csv("response.csv")
This is the response formatted to csv.
,data
ageMax,100
ageMin,0
emotion,ANGRY
emotionConfidence,50.0
eyeglasses,False
eyeglassesConfidence,50.38102722167969
eyesOpen,True
eyesOpenConfidence,50.20328140258789
faceConfidence,66.72220611572266
gender,Male
genderConfidence,50.462989807128906
glassesConfidence,50.446510314941406
id,1
smile,False
smileConfidence,50.15522384643555
sunglasses,False
I had a code, which gave me an empty DataFrame with no saved tweets.
I tried to debug it by putting print(line) under the for line in json file: and json_data = json.loads(line).
That resulted a KeyError.
How do I fix it?
Thank you.
list_df = list()
# read the .txt file, line by line, and append the json data in each line to the list
with open('tweet_json.txt', 'r') as json_file:
for line in json_file:
print(line)
json_data = json.loads(line)
print(line)
tweet_id = json_data['tweet_id']
fvrt_count = json_data['favorite_count']
rtwt_count = json_data['retweet_count']
list_df.append({'tweet_id': tweet_id,
'favorite_count': fvrt_count,
'retweet_count': rtwt_count})
# create a pandas DataFrame using the list
df = pd.DataFrame(list_df, columns = ['tweet_id', 'favorite_count', 'retweet_count'])
df.head()
Your comment says you're trying to save to a file, but your code kind of says that you're trying to read from a file. Here are examples of how to do both:
Writing to JSON
import json
import pandas as pd
content = { # This just dummy data, in the form of a dictionary
"tweet1": {
"id": 1,
"msg": "Yay, first!"
},
"tweet2": {
"id": 2,
"msg": "I'm always second :("
}
}
# Write it to a file called "tweet_json.txt" in JSON
with open("tweet_json.txt", "w") as json_file:
json.dump(content, json_file, indent=4) # indent=4 is optional, it makes it easier to read
Note the w (as in write) in open("tweet_json.txt", "w"). You're using r (as in read), which doesn't give you permission to write anything. Also note the use of json.dump() rather than json.load(). We then get a file that looks like this:
$ cat tweet_json.txt
{
"tweet1": {
"id": 1,
"msg": "Yay, first!"
},
"tweet2": {
"id": 2,
"msg": "I'm always second :("
}
}
Reading from JSON
Let's read the file that we just wrote, using pandas read_json():
import pandas as pd
df = pd.read_json("tweet_json.txt")
print(df)
Output looks like this:
>>> df
tweet1 tweet2
id 1 2
msg Yay, first! I'm always second :(