I need to get the value of the keywords from the json file below. Like:
output = ['abc,'cde']
Json file structure looks like :
d = [{
"response": {"docs": [
{"keywords": [{"value": "abc"}]},
{"keywords": [{"value": "cde"}]}
]}
}]
I have tried the below. I believe it's redundant though since I get only one level of ["response"]["docs"].
keywords = []
data = json.load(data_file)
for i in data:
keywords.append(i["response"]["docs"][0]["keywords"])
keyword_Value = [g['value'] for d in keywords for g in d]
There's a JSON encoder/decoder built in to Python. See: https://docs.python.org/2/library/json.html
Something like
import json
with open ('path/to/your_data.json') as json_data:
data = json.load(json_data)
If you do not mind using an external library, this task is quite easy using jmespath like:
import jmespath
keywords = jmespath.search('[].response.docs[].keywords[].value', data)
Code:
data = [{
"response": {"docs": [
{"keywords": [{"value": "abc"}]},
{"keywords": [{"value": "cde"}]}
]}
}]
import jmespath
keywords = jmespath.search('[].response.docs[].keywords[].value', data)
print(keywords)
Results:
['abc', 'cde']
Related
I am trying to find duplicated JSON objects in a 30GB jsonlines file.
Given a JSON object A that look like this:
{
"data": {
"cert_index": 691749790,
"cert_link": "http://ct.googleapis.com/icarus/ct/v1/get-entries?start=691749790&end=691749790",
"chain": [{...},{...}],
"leaf_cert": {
"all_domains": [...],
"as_der": "MIIFcjCCBFqgAwIBAgISBDD2+d1gP36/+9uUveS...",
"extensions": {...},
"fingerprint": "0C:E4:AF:24:F1:AE:B1:09:B0:42:67:CB:F8:FC:B6:AF:1C:07:D6:5B",
"not_after": 1573738488,
"not_before": 1565962488,
"serial_number": "430F6F9DD603F7EBFFBDB94BDE4BBA4EC9A",
"subject": {...}
},
"seen": 1565966152.750253,
"source": {
"name": "Google 'Icarus' log",
"url": "ct.googleapis.com/icarus/"
},
"update_type": "PrecertLogEntry"
},
"message_type": "certificate_update"
}
How can I generate an output file where each row looks like this:
{"fingerprint":"0C:E4:AF:24:F1:AE:B1:09:B0:42:67:CB:F8:FC:B6:AF:1C:07:D6:5B", "certificates":[A, B, C]}
Here A, B, and C are the full JSON object for each of the duplicates.
You need to use an array with your information. And before adding a new JSON, check if the fingerprint is already in the array. For example:
currentFingerprint = myJson['data']['leaf_cert']['fingerprint']
for elem in arrayOfFingerprints:
if elem['fingerprint'] == currentFingerprint:
elem['certificates'].append(myJson)
break
else:
arrayOfFingerprints.append({'fingerprint': currentFingerprint, 'certificates': [myJson]}
I'm going to assume that you have already read the file and created a list of dicts.
from collections import defaultdict
import json
d = defaultdict(list)
for jobj in file:
d[jobj['data']['leaf_cert']['fingerprint']].append(jobj)
with open('file.txt', 'w') as out:
for k,v in d:
json.dump({"fingerprint":k, "certificates":v})
I'm new to python.
I'm trying to extract data from data.json file.
How can i get "Files_Names" and "project_name"?
Also, how to manipulate data, "XX\XX\X" is extra string.
desire output:
File_Names = ih/1/2/3.java
ihh/11/22/33.java.java
Project_name = android/hello
File_Names = hi/1/2/3.java
hih/11/22/33.java.java
Project_name = android/helloworld
data.json
{
"changed": [
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"Lines_Deleted": 28,
"File_Names": [
"1\t3\tih/1/2/3.java",
"1\t1\tihh/11/22/33.java.java"
],
"Files_Modified": 8,
"Lines_Inserted": 90
}
],
"project_name": "android/hello"
},
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"Lines_Deleted": 28,
"File_Names": [
"14\t3\thi/1/2/3.java",
"1\t1\thih/11/22/33.java.java"
],
"Files_Modified": 8,
"Lines_Inserted": 90
}
],
"project_name": "android/helloworld"
}
]
}
import json then use json.load(open('data.json')) to read the file. It will be loaded as a nested hierarchy of python objects (dictionaries, lists, ints, strings, floats) which you can parse accordingly.
Here's something to spark your imagination and communicate the concept.
import json
x = json.load(open('data.json'))
for sub_dict in x['changed']:
print('project_name', sub_dict['project_name'])
for entry in sub_dict['added_commits']:
print (entry['File_Names'])
You can use this approach
import json
with open('data.json') as json_file:
data = json.loads(json_file)
for item in data['changed']:
print(item['project_name'], item['added_commits']['File_Names'])
You can use something like this with json module
import json
f = open("file_name.json", "r")
data = f.read()
jsondata = json.loads(data)
print jsondata # all json file
print jsondata["changed"] # list after dictionary
print jsondata["changed"][0] # This will get you all you have in the first occurence within changed
f.close()
From here you can take it further with whatever elements you want from the json.
I have a json file that I load into python. I want to take a keyword from the file (which is very big), like country rank or review from info taken from the internet. I tried
json.load('filename.json')
but I am getting an error:
AttributeError: 'str' object has no attribute 'read.'
What am I doing wrong?
Additionally, how do I select part of a json file if it is very big?
I think you need to open the file then pass that to json load like this
import json
from pprint import pprint
with open('filename.json') as data:
output = json.load(data)
pprint(output)
Try the following:
import json
json_data_file = open("json_file_path", 'r').read() # r for reading the file
json_data = json.loads(json_data_file)
Access the data using the keys as follows :
json_data['key']
json.load() expects the file handle after it has been opened:
with open('filename.json') as datafile:
data = json.load(datafile)
For example if your json data looked like this:
{
"maps": [
{
"id": "blabla",
"iscategorical": "0"
},
{
"id": "blabla",
"iscategorical": "0"
}
],
"masks": {
"id": "valore"
},
"om_points": "value",
"parameters": {
"id": "valore"
}
}
To access parts of the data, use:
data["maps"][0]["id"]
data["masks"]["id"]
data["om_points"]
That code can be found in this SO answer:
Parsing values from a JSON file using Python?
Hey so I have some hash ids in a csv file like
XbRPhe65YbC+xtgGQ8ukeZEr9xFOC4MEs9Z0wUidGSec=
XbRPhe65YbC+xtgGQ8uksrqSUJ/HhTPj1d2pL0/vuGrHM=
and I want to parse them into python wrap them in some additional code like
{"id" :"XbRPshe65YbC+xtGQ8ukqR2u2btfNeNe2gtcs72QbxPA=", "timestamp":"20150831"},
and then wrap all of that in some JSON syntax. This is then sent as a post request. Problem is I cannot seem to make it JSON readable. Everything seems to be ordered wrong and I am getting extra \.
import os
import pandas as pd
from pprint import pprint
df=pd.read_csv('test.csv',sep=',',header=None)
df[0] = '{"id" :"' + df[0].astype(str) + '", "timestamp":"20150831"}, '
df = df[:-1] # removes last comma
test = 'hello'
data =[ { "ids":[ df[0]],
"attributes":[
{
"name":"girl"
},
{
"name":"size"
}
]
}
]
json1 = data.to_json()
print(json1)
I agree that pandas doesn't seem to be the simplest tool for the job here. The built-in libraries will work great:
import csv
import json
with open('test.csv', newline='') as csvfile:
csvreader = csv.reader(csvfile)
data = {
"ids": [{"id": row[0], "timestamp": "20150831"} for row in csvreader],
"attributes": [
{"name": "girl"},
{"name": "size"}
]
}
json1 = json.dumps(data)
print(json1)
I have an array of dictionaries like so:
myDict[0] = {'date':'today', 'status': 'ok'}
myDict[1] = {'date':'yesterday', 'status': 'bad'}
and I'm trying to export this array to a json file where each dictionary is its own entry. The problem is when I try to run:
dump(myDict, open("test.json", "w"))
It outputs a json file with a number prefix before each entry
{"0": {"date": "today", "status": "ok"}, "1": {"date": "yesterday", "status": "bad"} }
which apparently isn't legal json since my json parser (protovis) is giving me error messages
Any ideas?
Thanks
Use a list instead of a dictionary; you probably used:
myDict = {}
myDict[0] = {...}
You should use:
myList = []
myList.append({...}
P.S.: It seems valid json to me anyways, but it is an object and not a list; maybe this is the reason why your parser is complaining
You should use a JSON serializer...
Also, an array of dictionaries would better serialize to something like this:
[
{
"date": "today",
"status": "ok"
},
{
"date": "yesterday",
"status": "bad"
}
]
That is, you should just use a JavaScript array.