Writing JSON data in python. Format - python

I have this method that writes json data to a file. The title is based on books and data is the book publisher,date,author, etc. The method works fine if I wanted to add one book.
Code
import json
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','a') as outfile:
json.dump(data,outfile , default = set_default)
def set_default(obj):
if isinstance(obj,set):
return list(obj)
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
JSON File with one book/one method call
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
}
However if I call the method multiple times , thus adding more book data to the json file. The format is all wrong. For instance if I simply call the method twice with a main method of
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
createJson("william-golding-lord of the flies","william","golding","1944","134","Penguin Books")
My JSON file looks like
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
} {
"william-golding-lord of the flies": [
["pageCount:134", "publisher:Penguin Books", "firstName:william","lastName:golding", "date:1944"]
]
}
Which is obviously wrong. Is there a simple fix to edit my method to produce a correct JSON format? I look at many simple examples online on putting json data in python. But all of them gave me format errors when I checked on JSONLint.com . I have been racking my brain to fix this problem and editing the file to make it correct. However all my efforts were to no avail. Any help is appreciated. Thank you very much.

Simply appending new objects to your file doesn't create valid JSON. You need to add your new data inside the top-level object, then rewrite the entire file.
This should work:
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
# Load any existing json data,
# or create an empty object if the file is not found,
# or is empty
try:
with open('data.json') as infile:
data = json.load(infile)
except FileNotFoundError:
data = {}
if not data:
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','w') as outfile:
json.dump(data,outfile , default = set_default)

A JSON can either be an array or a dictionary. In your case the JSON has two objects, one with the key stephen-king-it and another with william-golding-lord of the flies. Either of these on their own would be okay, but the way you combine them is invalid.
Using an array you could do this:
[
{ "stephen-king-it": [] },
{ "william-golding-lord of the flies": [] }
]
Or a dictionary style format (I would recommend this):
{
"stephen-king-it": [],
"william-golding-lord of the flies": []
}
Also the data you are appending looks like it should be formatted as key value pairs in a dictionary (which would be ideal). You need to change it to this:
data[title].append({
'firstName': firstName,
'lastName': lastName,
'date': date,
'pageCount': pageCount,
'publisher': publisher
})

Related

How to print specific key:value pairs from a pickled dictionary

I made a pickled dictionary, now I only want to retrieve the values associated with a user-inputted country.
import pickle
choice = input("Choose a country: ")
choice.capitalize()
file_name = "nationDict.dat"
fileObject = open(file_name, 'rb')
countries = pickle.load(fileObject)
for choice in countries:
print(choice)
and I use this classmethod to create the dictionary
#classmethod
def dictMaker(cls):
dictCont = {}
dictPop = {}
dictArea = {}
dictDensity = {}
for i in range(193):
dictCont[Nation.country[i]] = Nation.continent[i]
dictPop[Nation.country[i]] = Nation.population[i]
dictArea[Nation.country[i]] = Nation.area[i]
dictDensity[Nation.country[i]] = Nation.density[i]
with open("nationDict.dat", 'wb') as pickUN:
pickle.dump((dictCont, dictPop, dictArea, dictDensity), pickUN, protocol=pickle.HIGHEST_PROTOCOL)
I want to get data only for the country of choice, but I don't understand how. I end up getting the data for every country, I do get the 4 different sets of info I want though, but I want it for only 1 country. Everything I look up is about printing entire dictionaries, but I can't find anything talking about individual values only. I've tried just about every keyword to find things on this site.
I would consider storing your country data in a different form, such as nested dictionary:
import pickle
countries = {
Nation.country[i]: {
"continent": Nation.continent[i],
"population": Nation.population[i],
"area": Nation.area[i],
"density": Nation.density[i],
}
for i in range(193)
}
# Now you can pickle only one object:
with open("nation_dict.dat", "wb") as fh:
pickle.dump(countries, fh, protocol=pickle.HIGHEST_PROTOCOL)
And your script becomes:
import pickle
choice = input("Choose a country: ")
choice.capitalize()
file_name = "nationDict.dat"
with (file_name, 'rb') as fh:
countries = pickle.load(fileObject)
print(countries.get(choice))
# {"continent": "Europe", "population": 123456789, "area": 12345, "density": 12345}
Once your script is working I recommend posting on Code Review.
for countryDict in countries:
print(countryDict[choice])
Should do the trick. The variable that you have defined as countries is actually a tuple of dictionaries (dictCont, dictPop, dictArea, dictDensity). So the for loop iterates over each of those dicts and then gets the country of choice from them. In this case, countries is a poor name choice. I had read it and assumed it was a single dictionary with an array of values, as I was too lazy to read your second code block. As a rule of thumb, always assume other coders are lazy. Trust me.

Syntax to load nested in nested keys of JSON files

I have a big tree in a JSON file and I'm searching the python syntax for loading nested in nested keys from this JSON.
Assume I have this :
{
"FireWall": {
"eth0": {
"INPUT": {
"PING": 1,
}
}
}
}
According to the man page and some questions in Stackoverflow i tried this (and some variations) :
import json
config = open('config.json', 'r')
data = json.load('config')
config.close()
if data['{"FireWall", {"eth0", {"INPUT", {"Ping"}}}}'] == 1:
print('This is working')
With no result. What is the right way to do this (as simple as possible) ? Thank you !
You are trying data = json.load('config') to load string not file object and data['{"FireWall", {"eth0", {"INPUT", {"Ping"}}}}'] it's not right way to access nested dictionary key value.
import json
with open('config.json', 'r') as f:
data = json.load(f)
if data["FireWall"]["eth0"]["INPUT"]["Ping"] == 1:
print('This is working')
data is a nested dictionary, so:
data["FireWall"]["eth0"]["INPUT"]["Ping"]
will be equal to 1; or at least it will when you fix your call to json.load.
Try this:
data["FireWall"]["eth0"]["INPUT"]["PING"]
This will give you the value in PING

Parsing json file to collect data and store in a list/array

I am trying to build an IOT setup. I am thinking of using a json file to store states of the sensors and lights of the setup.
I have created a function to test out my concept. Here is what I wrote so far for the data side of things.
{
"sensor_data": [
{
"sensor_id": "302CEM/lion/light1",
"sensor_state": "on"
},
{
"sensor_id": "302CEM/lion/light2",
"sensor_state": "off"
}
]
}
def read_from_db():
with open('datajson.json') as f:
data = json.load(f)
for sensors in data['sensor_data']:
name = sensors['sensor_id']
read_from_db()
What I want to do is to parse the sensor_id into an array so that I can access them by saying for example sensor_name[0]. I am not sure how to go about it. I tried array.array but it doesn't save any values, have also tried .append but not the result I expected. Any suggestions?
If I understood correctly, all you have to do is assign all those sensors to names using a for loop and then return the result:
import json
def read_from_db():
with open('sensor_data.json') as f:
data = json.load(f)
names = [sensors['sensor_id'] for sensors in data['sensor_data']]
return names
sensor_names = read_from_db()
for i in range(len(sensor_names)):
print(sensor_names[i])
This will print:
302CEM/lion/light1
302CEM/lion/light2

(Python) merge new and existing JSON with deduplication

I'm querying an API with Python, This API sends JSON of the last X events and I want to keep a history of what it sent me.
So this is what the API sends, and I have the same type of elements in my flat history file (but with many more of the same objects).
The API and my final file doesn't have a key on which to setup a dictionary.
[{
"Item1": "01234",
"Item2": "Company",
"Item3": "XXXXXXXXX",
"Item4": "",
"Item5": "2015-12-17T12:00:01.553",
"Item6": "2015-12-18T12:00:00"
},
{
"Item1": "01234",
"Item2": "Company2",
"Item3": "XXXXXXX",
"Item4": null,
"Item5": "2015-12-17T16:49:23.76",
"Item6": "2015-12-18T11:00:00",
}]
How do I add up elements of the API only if they are not in the original file?
I have a skeleton of opening/closing file but have not many ideas about the processing.
main_file=open("History.json","r")
new_items=[]
api_data=requests.get(#here lies the api address and the header)
#here should be the deplucation/processing process
for item in api_data
if item not in main_file
new_items.append(item)
main_file.close()
try:
file_updated = open("History.json",'w')
file_updated.write(new_items + main_file)
file_updated.close()
print("File updated")
except :
print("Error writing file")
EDIT : I used the json to object method to do this :
from collections import namedtuple
Event = namedtuple('Event', 'Item1, Item2, Item3, Item4, Item5, Item6')
def parse_json_events(text):
events = [ Event(**k) for k in json.loads(text) ]
return events
if path.exists('Mainfile.json'):
with open('Mainfile.json') as data_file:
local_data = json.load(data_file)
print(local_data.text) #debug purposes
events_local=parse_json_events(local_data.text)
else:
events_local=[]
events_api=parse_json_events(api_request.text)
inserted_events=0
for e in events_api[::-1]:
if e not in events_local:
events_local.insert(0, e)
inserted_events=inserted_events+1
print("inserted elements %d" % inserted_events)
print(events_local) # this is OK, gives me a list of events
print(json.dump(events_local)) # this ... well... I want the list of object to be serialized but I get this error :
TypeError: dump() missing 1 required positional argument: 'fp'
Normally you solve this kind of problems by defining a schema with/without a third party tool (like Avro, Thrift, etc.). Basically, every record you get from the API needs to be translated to an entity in the programming language you are using.
Let's take as an example this JSON object:
{
"Item1": "01234",
"Item2": "Company",
"Item3": "XXXXXXXXX",
"Item4": "",
"Item5": "2015-12-17T12:00:01.553",
"Item6": "2015-12-18T12:00:00"
},
If you have a schema like
Company(object):
company_number = ...
name = ...
# other fields
Then, all you need to do is to serialize and deserialize the raw data.
Ideally, you'd read the JSON response from the API and then you could simply split each json object as a schema object (with or without a tool). In pseudocode:
api_client = client(http://..., )
response = api_client.get("/resources")
json = response.json
companies = parse_json_companies(json) # list of Company objects
At this point, it's really easy to handle the data you got from the api. You should do the same for the files you have stored on the filesystem. Load your files and deserialize the records (to Company objects). Then, it will be easy to compare the objects, as they will be like "normal" Python objects, so that you can perform comparisons, etc etc.
For example:
from collections import namedtuple
import json
Company = namedtuple('Company', 'Item1, Item2, Item3, Item4, Item5, Item6')
def parse_json_companies(text):
companies = [Company(**k) for k in json.loads(text)]
return companies
>>> companies = parse_json_companies(response.json)
>>> companies
[Company(Item1='01234', Item2='Company', Item3='XXXXXXXXX', Item4=u'', Item5='2015-12-17T12:00:01.553', Item6='2015-12-18T12:00:00'), Company(Item1='01234', Item2='Company2', Item3='XXXXXXX', Item4=None, Item5='2015-12-17T16:49:23.76', Item6='2015-12-18T11:00:00')]
Update after error on .dump(obj, fp) .
If you get the error with json.dump, refer to the documentation please. It clearly states that obj and fp are required arguments.
Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.
So, you need to pass an object that supports .write (e.g., a file opened in write mode).
I think the best way of solving this would be to think about your data structure. It seems like you're using the same data structure as the api at this moment.
Is there an Id among these item fields? If so use that field for deduplication. But for this example I'll use company name.
with open('history.json') as f:
historic_data = json.load(f)
api_data = requests.get()
for item in api_data:
historic_data[item['Item2']] = item
f.write(json.dumps(historic_data))
Every time the name in this case already exists in the dictionary it will be overwritten. If the name isn't existing it will be added.

export list to csv and present to user via browser

Want to prompt browser to save csv
^^working off above question, file is exporting correctly but the data is not displaying correctly.
#view_config(route_name='csvfile', renderer='csv')
def csv(self):
name = DBSession.query(table).join(othertable).filter(othertable.id == 9701).all()
header = ['name']
rows = []
for item in name:
rows = [item.id]
return {
'header': header,
'rows': rows
}
Getting _csv.Error
Error: sequence expected but if I change in my renderer writer.writerows(value['rows']) to writer.writerow(value['rows']) the file will download via the browser just fine. Problem is, it's not displaying data in each row. The entire result/dataset is in one row, so each entry is in it's own column rather than it's own row.
First, I wonder if having a return statement inside your for loop isn't also causing problems; from the linked example it looks like their loop was in the prior statement.
I think what it looks like it's doing is it's building a collection of rows based on "table" having columns with the same name as the headers. What are the fields in your table table?
name = DBSession.query(table).join(othertable).filter(othertable.id == 9701).all()
This is going to give you back essentially a collection of rows from table, as if you did a SELECT query on it.
Something like
name = DBSession.query(table).join(othertable).filter(othertable.id == 9701).all()
header = ['name']
rows = []
for item in name:
rows.append(item.name)
return {
'header': header,
'rows': r
}
Figured it out. kept getting Error: sequence expected so I was looking at the output. Decided to try putting the result inside another list.
#view_config(route_name='csv', renderer='csv')
def csv(self):
d = datetime.now()
query = DBSession.query(table, othertable).join(othertable).join(thirdtable).filter(
thirdtable.sid == 9701)
header = ['First Name', 'Last Name']
rows = []
filename = "csvreport" + d.strftime(" %m/%d").replace(' 0', '')
for i in query:
items = [i.table.first_name, i.table.last_name, i.othertable.login_time.strftime("%m/%d/%Y"),
]
rows.append(items)
return {
'header': header,
'rows': rows,
'filename': filename
}
This accomplishes 3 things. Fills out the header, fills the rows, and passes through a filename.
Renderer should look like this:
class CSVRenderer(object):
def __init__(self, info):
pass
def __call__(self, value, system):
fout = StringIO.StringIO()
writer = csv.writer(fout, delimiter=',',quotechar =',',quoting=csv.QUOTE_MINIMAL)
writer.writerow(value['header'])
writer.writerows(value['rows'])
resp = system['request'].response
resp.content_type = 'text/csv'
resp.content_disposition = 'attachment;filename='+value['filename']+'.csv'
return fout.getvalue()
This way, you can use the same csv renderer anywhere else and be able to pass through your own filename. It's also the only way I could figure out how to get the data from one column in the database to iterate through one column in the renderer. It feels a bit hacky but it works and works well.

Categories