I have an array of dictionaries like so:
myDict[0] = {'date':'today', 'status': 'ok'}
myDict[1] = {'date':'yesterday', 'status': 'bad'}
and I'm trying to export this array to a json file where each dictionary is its own entry. The problem is when I try to run:
dump(myDict, open("test.json", "w"))
It outputs a json file with a number prefix before each entry
{"0": {"date": "today", "status": "ok"}, "1": {"date": "yesterday", "status": "bad"} }
which apparently isn't legal json since my json parser (protovis) is giving me error messages
Any ideas?
Thanks
Use a list instead of a dictionary; you probably used:
myDict = {}
myDict[0] = {...}
You should use:
myList = []
myList.append({...}
P.S.: It seems valid json to me anyways, but it is an object and not a list; maybe this is the reason why your parser is complaining
You should use a JSON serializer...
Also, an array of dictionaries would better serialize to something like this:
[
{
"date": "today",
"status": "ok"
},
{
"date": "yesterday",
"status": "bad"
}
]
That is, you should just use a JavaScript array.
Related
I am trying to find duplicated JSON objects in a 30GB jsonlines file.
Given a JSON object A that look like this:
{
"data": {
"cert_index": 691749790,
"cert_link": "http://ct.googleapis.com/icarus/ct/v1/get-entries?start=691749790&end=691749790",
"chain": [{...},{...}],
"leaf_cert": {
"all_domains": [...],
"as_der": "MIIFcjCCBFqgAwIBAgISBDD2+d1gP36/+9uUveS...",
"extensions": {...},
"fingerprint": "0C:E4:AF:24:F1:AE:B1:09:B0:42:67:CB:F8:FC:B6:AF:1C:07:D6:5B",
"not_after": 1573738488,
"not_before": 1565962488,
"serial_number": "430F6F9DD603F7EBFFBDB94BDE4BBA4EC9A",
"subject": {...}
},
"seen": 1565966152.750253,
"source": {
"name": "Google 'Icarus' log",
"url": "ct.googleapis.com/icarus/"
},
"update_type": "PrecertLogEntry"
},
"message_type": "certificate_update"
}
How can I generate an output file where each row looks like this:
{"fingerprint":"0C:E4:AF:24:F1:AE:B1:09:B0:42:67:CB:F8:FC:B6:AF:1C:07:D6:5B", "certificates":[A, B, C]}
Here A, B, and C are the full JSON object for each of the duplicates.
You need to use an array with your information. And before adding a new JSON, check if the fingerprint is already in the array. For example:
currentFingerprint = myJson['data']['leaf_cert']['fingerprint']
for elem in arrayOfFingerprints:
if elem['fingerprint'] == currentFingerprint:
elem['certificates'].append(myJson)
break
else:
arrayOfFingerprints.append({'fingerprint': currentFingerprint, 'certificates': [myJson]}
I'm going to assume that you have already read the file and created a list of dicts.
from collections import defaultdict
import json
d = defaultdict(list)
for jobj in file:
d[jobj['data']['leaf_cert']['fingerprint']].append(jobj)
with open('file.txt', 'w') as out:
for k,v in d:
json.dump({"fingerprint":k, "certificates":v})
I am using a python and getting the data from an API the data formatted as listed in the example I have a problem getting out Cust_id and name put of the API
Below is one of the things I tried and one of the things answered by SimonR. I am sure I am doing something really dumb right now but I get the error
typeError: the JSON object must be str, bytes or bytearray, not dict. Thank you everyone in advance for your answers
import json
a = {
"count": 5,
"Customers": {
"32759": {
"cust_id": "1234",
"name": "Mickey Mouse"
},
"11053": {
"cust_id": "1235",
"name": "Mini Mouse"
},
"21483": {
"cust_id": "1236",
"name": "Goofy"
},
"12441": {
"cust_id": "1237",
"name": "Pluto"
},
"16640": {
"cust_id": "1238",
"name": "Donald Duck"
}
}
}
d = json.loads(a)
customers = {v["cust_id"]: v["name"] for v in d["Customers"].values()}
Is this what you're trying to do ?
import json
d = json.loads(a)
customers = {v["cust_id"]: v["name"] for v in d["Customers"].values()}
outputs :
{'1234': 'Mickey Mouse',
'1235': 'Mini Mouse',
'1236': 'Goofy',
'1237': 'Pluto',
'1238': 'Donald Duck'}
Well if I understood correctly you can do this:
# d is the API response in your post
# This will give you the list of customers
customers = d['Customers']
Then you can iterate over the customers dictionary and save them to any data structure you want:
# This will print out the name and cust_id
for k, v in customers.items():
print(v['cust_id'], v['name'])
Hope it helps!
import json
# convert json to python dict
response = json.loads(json_string)
# loop through all customers
for key, customer in response['Customers'].items():
# get customer id
customer['cust_id']
# get customer name
custoemr['name']
Okay, so I've been banging my head on this for the last 2 days, with no real progress. I am a beginner with python and coding in general, but this is the first issue I haven't been able to solve myself.
So I have this long file with JSON formatting with about 7000 entries from the youtubeapi.
right now I want to have a short script to print certain info ('videoId') for a certain dictionary key (refered to as 'key'):
My script:
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key']['Items']['id']['videoId'])
# print(trailers['key']['videoId'] gives same response
Error:
print(trailers['key']['Items']['id']['videoId'])
TypeError: string indices must be integers
It does work when I want to print all the information for the dictionary key:
This script works
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key'])
Also print(type(trailers)) results in class 'dict', as it's supposed to.
My JSON File is formatted like this and is from the youtube API, youtube#searchListResponse.
{
"kind": "youtube#searchListResponse",
"etag": "",
"nextPageToken": "",
"regionCode": "",
"pageInfo": {
"totalResults": 1000000,
"resultsPerPage": 1
},
"items": [
{
"kind": "youtube#searchResult",
"etag": "",
"id": {
"kind": "youtube#video",
"videoId": ""
},
"snippet": {
"publishedAt": "",
"channelId": "",
"title": "",
"description": "",
"thumbnails": {
"default": {
"url": "",
"width": 120,
"height": 90
},
"medium": {
"url": "",
"width": 320,
"height": 180
},
"high": {
"url": "",
"width": 480,
"height": 360
}
},
"channelTitle": "",
"liveBroadcastContent": "none"
}
}
]
}
What other information is needed to be given for you to understand the problem?
The following code gives me all the videoId's from the provided sample data (which is no id's at all in fact):
import json
with open('sampledata', 'r') as datafile:
data = json.loads(datafile.read())
print([item['id']['videoId'] for item in data['items']])
Perhaps you can try this with more data.
Hope this helps.
I didn't really look into the youtube api but looking at the code and the sample you gave it seems you missed out a [0]. Looking at the structure of json there's a list in key items.
import json
f = open ('json1.json', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['items'][0]['id']['videoId'])
I've not used json before at all. But it's basically imported in the form of dicts with more dicts, lists etc. Where applicable. At least from my understanding.
So when you do type(trailers) you get type dict. Then you do dict with trailers['key']. If you do type of that, it should also be a dict, if things work correctly. Working through the items in each dict should in the end find your error.
Pythons error says you are trying find the index/indices of a string, which only accepts integers, while you are trying to use a dict. So you need to find out why you are getting a string and not dict when using each argument.
Edit to add an example. If your dict contains a string on key 'item', then you get a string in return, not a new dict which you further can get a dict from. item in the json for example, seem to be a list, with dicts in it. Not a dict itself.
I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None
I have a json object which I would like to filter for misspelled key name. So for the example below, I would like to have a json object without the misspelled test_name key. What is the easiest way to do this?
json_data = """{
"my_test": [{
"group_name": "group-A",
"results": [{
"test_name": "test1",
"time": "8.556",
"status": "pass"
}, {
"test_name": "test2",
"time": "45.909",
"status": "pass"
}, {
"test_nameZASSD": "test3",
"time": "9.383",
"status": "fail"
}]
}]
}"""
This is an online test, and looks like i'm not allowed to use jsonSchema.
So far my code looks like this:
if 'test_suites' in data:
for suites in data["test_suites"]:
if 'results' in suites and 'suite_name' in suites:
for result in suites["results"]:
if 'test_name' not in result or 'time' not in result or 'status' not in result:
result.clear()
else:
....
else:
print("Check 'suite_name' and/or 'results'")
else:
print("Check 'test_suites'")
It kind of works, but result.clear() leaves a empty {}, which get annoying later. What can I do here?
It looks like your data have a consistent schema. So I would try using json schema to solve your problem. With that you can set up a schema and only allow objects with certain key names.
If you just want to check if a certain key is in the dictionary and make sure that you only get the ones that are according to spec you could do something like this:
passed = []
for item in result:
if 'test_name' in item.keys():
passed.append(item)
But if you have a lot of different keys you need to check for it will become unwieldy. So for bigger projects I would say that json schema is the way to go.