Parse json using python multiple dictionary - python

I need to get the value of the keywords from the json file below. Like:
output = ['abc,'cde']
Json file structure looks like :
d = [{
"response": {"docs": [
{"keywords": [{"value": "abc"}]},
{"keywords": [{"value": "cde"}]}
]}
}]
I have tried the below. I believe it's redundant though since I get only one level of ["response"]["docs"].
keywords = []
data = json.load(data_file)
for i in data:
keywords.append(i["response"]["docs"][0]["keywords"])
keyword_Value = [g['value'] for d in keywords for g in d]

There's a JSON encoder/decoder built in to Python. See: https://docs.python.org/2/library/json.html
Something like
import json
with open ('path/to/your_data.json') as json_data:
data = json.load(json_data)

If you do not mind using an external library, this task is quite easy using jmespath like:
import jmespath
keywords = jmespath.search('[].response.docs[].keywords[].value', data)
Code:
data = [{
"response": {"docs": [
{"keywords": [{"value": "abc"}]},
{"keywords": [{"value": "cde"}]}
]}
}]
import jmespath
keywords = jmespath.search('[].response.docs[].keywords[].value', data)
print(keywords)
Results:
['abc', 'cde']

Related

Python: Identify the duplicate JSON values and generate an output file with sets of duplicate input rows, one row per set of duplicates

I am trying to find duplicated JSON objects in a 30GB jsonlines file.
Given a JSON object A that look like this:
{
"data": {
"cert_index": 691749790,
"cert_link": "http://ct.googleapis.com/icarus/ct/v1/get-entries?start=691749790&end=691749790",
"chain": [{...},{...}],
"leaf_cert": {
"all_domains": [...],
"as_der": "MIIFcjCCBFqgAwIBAgISBDD2+d1gP36/+9uUveS...",
"extensions": {...},
"fingerprint": "0C:E4:AF:24:F1:AE:B1:09:B0:42:67:CB:F8:FC:B6:AF:1C:07:D6:5B",
"not_after": 1573738488,
"not_before": 1565962488,
"serial_number": "430F6F9DD603F7EBFFBDB94BDE4BBA4EC9A",
"subject": {...}
},
"seen": 1565966152.750253,
"source": {
"name": "Google 'Icarus' log",
"url": "ct.googleapis.com/icarus/"
},
"update_type": "PrecertLogEntry"
},
"message_type": "certificate_update"
}
How can I generate an output file where each row looks like this:
{"fingerprint":"0C:E4:AF:24:F1:AE:B1:09:B0:42:67:CB:F8:FC:B6:AF:1C:07:D6:5B", "certificates":[A, B, C]}
Here A, B, and C are the full JSON object for each of the duplicates.
You need to use an array with your information. And before adding a new JSON, check if the fingerprint is already in the array. For example:
currentFingerprint = myJson['data']['leaf_cert']['fingerprint']
for elem in arrayOfFingerprints:
if elem['fingerprint'] == currentFingerprint:
elem['certificates'].append(myJson)
break
else:
arrayOfFingerprints.append({'fingerprint': currentFingerprint, 'certificates': [myJson]}
I'm going to assume that you have already read the file and created a list of dicts.
from collections import defaultdict
import json
d = defaultdict(list)
for jobj in file:
d[jobj['data']['leaf_cert']['fingerprint']].append(jobj)
with open('file.txt', 'w') as out:
for k,v in d:
json.dump({"fingerprint":k, "certificates":v})

Python fetch data from json file

I'm new to python.
I'm trying to extract data from data.json file.
How can i get "Files_Names" and "project_name"?
Also, how to manipulate data, "XX\XX\X" is extra string.
desire output:
File_Names = ih/1/2/3.java
ihh/11/22/33.java.java
Project_name = android/hello
File_Names = hi/1/2/3.java
hih/11/22/33.java.java
Project_name = android/helloworld
data.json
{
"changed": [
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"Lines_Deleted": 28,
"File_Names": [
"1\t3\tih/1/2/3.java",
"1\t1\tihh/11/22/33.java.java"
],
"Files_Modified": 8,
"Lines_Inserted": 90
}
],
"project_name": "android/hello"
},
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"Lines_Deleted": 28,
"File_Names": [
"14\t3\thi/1/2/3.java",
"1\t1\thih/11/22/33.java.java"
],
"Files_Modified": 8,
"Lines_Inserted": 90
}
],
"project_name": "android/helloworld"
}
]
}
import json then use json.load(open('data.json')) to read the file. It will be loaded as a nested hierarchy of python objects (dictionaries, lists, ints, strings, floats) which you can parse accordingly.
Here's something to spark your imagination and communicate the concept.
import json
x = json.load(open('data.json'))
for sub_dict in x['changed']:
print('project_name', sub_dict['project_name'])
for entry in sub_dict['added_commits']:
print (entry['File_Names'])
You can use this approach
import json
with open('data.json') as json_file:
data = json.loads(json_file)
for item in data['changed']:
print(item['project_name'], item['added_commits']['File_Names'])
You can use something like this with json module
import json
f = open("file_name.json", "r")
data = f.read()
jsondata = json.loads(data)
print jsondata # all json file
print jsondata["changed"] # list after dictionary
print jsondata["changed"][0] # This will get you all you have in the first occurence within changed
f.close()
From here you can take it further with whatever elements you want from the json.

Loading JSON file for reading and selecting data

I have a json file that I load into python. I want to take a keyword from the file (which is very big), like country rank or review from info taken from the internet. I tried
json.load('filename.json')
but I am getting an error:
AttributeError: 'str' object has no attribute 'read.'
What am I doing wrong?
Additionally, how do I select part of a json file if it is very big?
I think you need to open the file then pass that to json load like this
import json
from pprint import pprint
with open('filename.json') as data:
output = json.load(data)
pprint(output)
Try the following:
import json
json_data_file = open("json_file_path", 'r').read() # r for reading the file
json_data = json.loads(json_data_file)
Access the data using the keys as follows :
json_data['key']
json.load() expects the file handle after it has been opened:
with open('filename.json') as datafile:
data = json.load(datafile)
For example if your json data looked like this:
{
"maps": [
{
"id": "blabla",
"iscategorical": "0"
},
{
"id": "blabla",
"iscategorical": "0"
}
],
"masks": {
"id": "valore"
},
"om_points": "value",
"parameters": {
"id": "valore"
}
}
To access parts of the data, use:
data["maps"][0]["id"]
data["masks"]["id"]
data["om_points"]
That code can be found in this SO answer:
Parsing values from a JSON file using Python?

csv to json in python

Hey so I have some hash ids in a csv file like
XbRPhe65YbC+xtgGQ8ukeZEr9xFOC4MEs9Z0wUidGSec=
XbRPhe65YbC+xtgGQ8uksrqSUJ/HhTPj1d2pL0/vuGrHM=
and I want to parse them into python wrap them in some additional code like
{"id" :"XbRPshe65YbC+xtGQ8ukqR2u2btfNeNe2gtcs72QbxPA=", "timestamp":"20150831"},
and then wrap all of that in some JSON syntax. This is then sent as a post request. Problem is I cannot seem to make it JSON readable. Everything seems to be ordered wrong and I am getting extra \.
import os
import pandas as pd
from pprint import pprint
df=pd.read_csv('test.csv',sep=',',header=None)
df[0] = '{"id" :"' + df[0].astype(str) + '", "timestamp":"20150831"}, '
df = df[:-1] # removes last comma
test = 'hello'
data =[ { "ids":[ df[0]],
"attributes":[
{
"name":"girl"
},
{
"name":"size"
}
]
}
]
json1 = data.to_json()
print(json1)
I agree that pandas doesn't seem to be the simplest tool for the job here. The built-in libraries will work great:
import csv
import json
with open('test.csv', newline='') as csvfile:
csvreader = csv.reader(csvfile)
data = {
"ids": [{"id": row[0], "timestamp": "20150831"} for row in csvreader],
"attributes": [
{"name": "girl"},
{"name": "size"}
]
}
json1 = json.dumps(data)
print(json1)

python - exporting dictionary(array) to json

I have an array of dictionaries like so:
myDict[0] = {'date':'today', 'status': 'ok'}
myDict[1] = {'date':'yesterday', 'status': 'bad'}
and I'm trying to export this array to a json file where each dictionary is its own entry. The problem is when I try to run:
dump(myDict, open("test.json", "w"))
It outputs a json file with a number prefix before each entry
{"0": {"date": "today", "status": "ok"}, "1": {"date": "yesterday", "status": "bad"} }
which apparently isn't legal json since my json parser (protovis) is giving me error messages
Any ideas?
Thanks
Use a list instead of a dictionary; you probably used:
myDict = {}
myDict[0] = {...}
You should use:
myList = []
myList.append({...}
P.S.: It seems valid json to me anyways, but it is an object and not a list; maybe this is the reason why your parser is complaining
You should use a JSON serializer...
Also, an array of dictionaries would better serialize to something like this:
[
{
"date": "today",
"status": "ok"
},
{
"date": "yesterday",
"status": "bad"
}
]
That is, you should just use a JavaScript array.

Categories