(Python) merge new and existing JSON with deduplication

(Python) merge new and existing JSON with deduplication - python

I'm querying an API with Python, This API sends JSON of the last X events and I want to keep a history of what it sent me.
So this is what the API sends, and I have the same type of elements in my flat history file (but with many more of the same objects).
The API and my final file doesn't have a key on which to setup a dictionary.
[{
"Item1": "01234",
"Item2": "Company",
"Item3": "XXXXXXXXX",
"Item4": "",
"Item5": "2015-12-17T12:00:01.553",
"Item6": "2015-12-18T12:00:00"
},
{
"Item1": "01234",
"Item2": "Company2",
"Item3": "XXXXXXX",
"Item4": null,
"Item5": "2015-12-17T16:49:23.76",
"Item6": "2015-12-18T11:00:00",
}]
How do I add up elements of the API only if they are not in the original file?
I have a skeleton of opening/closing file but have not many ideas about the processing.
main_file=open("History.json","r")
new_items=[]
api_data=requests.get(#here lies the api address and the header)
#here should be the deplucation/processing process
for item in api_data
if item not in main_file
new_items.append(item)
main_file.close()
try:
file_updated = open("History.json",'w')
file_updated.write(new_items + main_file)
file_updated.close()
print("File updated")
except :
print("Error writing file")
EDIT : I used the json to object method to do this :
from collections import namedtuple
Event = namedtuple('Event', 'Item1, Item2, Item3, Item4, Item5, Item6')
def parse_json_events(text):
events = [ Event(**k) for k in json.loads(text) ]
return events
if path.exists('Mainfile.json'):
with open('Mainfile.json') as data_file:
local_data = json.load(data_file)
print(local_data.text) #debug purposes
events_local=parse_json_events(local_data.text)
else:
events_local=[]
events_api=parse_json_events(api_request.text)
inserted_events=0
for e in events_api[::-1]:
if e not in events_local:
events_local.insert(0, e)
inserted_events=inserted_events+1
print("inserted elements %d" % inserted_events)
print(events_local) # this is OK, gives me a list of events
print(json.dump(events_local)) # this ... well... I want the list of object to be serialized but I get this error :
TypeError: dump() missing 1 required positional argument: 'fp'

Normally you solve this kind of problems by defining a schema with/without a third party tool (like Avro, Thrift, etc.). Basically, every record you get from the API needs to be translated to an entity in the programming language you are using.
Let's take as an example this JSON object:
{
"Item1": "01234",
"Item2": "Company",
"Item3": "XXXXXXXXX",
"Item4": "",
"Item5": "2015-12-17T12:00:01.553",
"Item6": "2015-12-18T12:00:00"
},
If you have a schema like
Company(object):
company_number = ...
name = ...
# other fields
Then, all you need to do is to serialize and deserialize the raw data.
Ideally, you'd read the JSON response from the API and then you could simply split each json object as a schema object (with or without a tool). In pseudocode:
api_client = client(http://..., )
response = api_client.get("/resources")
json = response.json
companies = parse_json_companies(json) # list of Company objects
At this point, it's really easy to handle the data you got from the api. You should do the same for the files you have stored on the filesystem. Load your files and deserialize the records (to Company objects). Then, it will be easy to compare the objects, as they will be like "normal" Python objects, so that you can perform comparisons, etc etc.
For example:
from collections import namedtuple
import json
Company = namedtuple('Company', 'Item1, Item2, Item3, Item4, Item5, Item6')
def parse_json_companies(text):
companies = [Company(**k) for k in json.loads(text)]
return companies
>>> companies = parse_json_companies(response.json)
>>> companies
[Company(Item1='01234', Item2='Company', Item3='XXXXXXXXX', Item4=u'', Item5='2015-12-17T12:00:01.553', Item6='2015-12-18T12:00:00'), Company(Item1='01234', Item2='Company2', Item3='XXXXXXX', Item4=None, Item5='2015-12-17T16:49:23.76', Item6='2015-12-18T11:00:00')]
Update after error on .dump(obj, fp) .
If you get the error with json.dump, refer to the documentation please. It clearly states that obj and fp are required arguments.
Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.
So, you need to pass an object that supports .write (e.g., a file opened in write mode).

I think the best way of solving this would be to think about your data structure. It seems like you're using the same data structure as the api at this moment.
Is there an Id among these item fields? If so use that field for deduplication. But for this example I'll use company name.
with open('history.json') as f:
historic_data = json.load(f)
api_data = requests.get()
for item in api_data:
historic_data[item['Item2']] = item
f.write(json.dumps(historic_data))
Every time the name in this case already exists in the dictionary it will be overwritten. If the name isn't existing it will be added.

Related

Access one item from a dict and store it into a variable

I am trying to get all the "uuid"'s from an API, and the issue is that it is stored into a dict (I think). Her is how it looks on the API:
{"guild": {
"_id": "5eba1c5f8ea8c960a61f38ed",
"name": "Creators Club",
"name_lower": "creators club",
"coins": 0,
"coinsEver": 0,
"created": 1589255263630,
"members":
[{ "uuid": "db03ceff87ad4909bababc0e2622aaf8",
"rank": "Guild Master",
"joined": 1589255263630,
"expHistory": {
"2020-06-01": 280,
"2020-05-31": 4701,
"2020-05-30": 0,
"2020-05-29": 518,
"2020-05-28": 1055,
"2020-05-27": 136665,
"2020-05-26": 34806}}]
}
}
Now I am interested in the "uuid" part there, and take note: There is multiple players, it can be 1 to 100 players, and I am going to need every UUID.
Now I have done this in my python to get the UUID's displayed on the website:
try:
f = requests.get(
"https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
members = client.getPlayer(uuid=guildMembers) #this converts UUID to player names
#I need to store all uuid's in variables and put them at "guildMembers"
And that gives me all the "UUID codes", and I will be using client.getPlayer(uuid=---) to convert the UUID into the Player Names. I have to loop through each "UUID" into that code client.getPlayer(uuid=---) . But first of I need to save the UUID'S in variables, I have been doing members.uuid to access the UUID on my HTML file, but I don't know how you do the .uuid part in python
If you need anything else, just comment :)

List comprehension is a powerful concept:
members = [client.getPlayer(member['uuid']) for member in guildMembers]
Edit:
If you want to insert the names back into your data (in guildMembers),
use a dictionary comprehension with {uuid: member_name,} format:
members = {member['uuid']: client.getPlayer(uuid=member['uuid']) for member in guildMembers}
Than you can update guildMembers with your results:
for member in guildMembers:
guildMembers[member]['name'] = members[member['uuid']]

Assuming that guild is the main dictionary in which a key called members exists with a list of "sub dictionaries", you can try
uuid = list()
for x in guild['members']:
uuid.append(x['uuid'])
uuid now has all the uuids

If i understood situation right, You just need to loop through all received uuids and get players' data. Something like this:
f = requests.get("https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
guildMembersData = dict() # Here we will save member's data from getPlayer method
for guildMember in guildMembers:
uuid = guildMember["uuid"]
memberData = client.getPlayer(uuid=uuid)
guildMembersData[uuid] = client.getPlayer(uuid=guildMember["uuid"])
print(guildMembersData) # Here will be players' Data.

Parsing json file to collect data and store in a list/array

I am trying to build an IOT setup. I am thinking of using a json file to store states of the sensors and lights of the setup.
I have created a function to test out my concept. Here is what I wrote so far for the data side of things.
{
"sensor_data": [
{
"sensor_id": "302CEM/lion/light1",
"sensor_state": "on"
},
{
"sensor_id": "302CEM/lion/light2",
"sensor_state": "off"
}
]
}
def read_from_db():
with open('datajson.json') as f:
data = json.load(f)
for sensors in data['sensor_data']:
name = sensors['sensor_id']
read_from_db()
What I want to do is to parse the sensor_id into an array so that I can access them by saying for example sensor_name[0]. I am not sure how to go about it. I tried array.array but it doesn't save any values, have also tried .append but not the result I expected. Any suggestions?

If I understood correctly, all you have to do is assign all those sensors to names using a for loop and then return the result:
import json
def read_from_db():
with open('sensor_data.json') as f:
data = json.load(f)
names = [sensors['sensor_id'] for sensors in data['sensor_data']]
return names
sensor_names = read_from_db()
for i in range(len(sensor_names)):
print(sensor_names[i])
This will print:
302CEM/lion/light1
302CEM/lion/light2

Writing JSON data in python. Format

I have this method that writes json data to a file. The title is based on books and data is the book publisher,date,author, etc. The method works fine if I wanted to add one book.
Code
import json
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','a') as outfile:
json.dump(data,outfile , default = set_default)
def set_default(obj):
if isinstance(obj,set):
return list(obj)
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
JSON File with one book/one method call
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
}
However if I call the method multiple times , thus adding more book data to the json file. The format is all wrong. For instance if I simply call the method twice with a main method of
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
createJson("william-golding-lord of the flies","william","golding","1944","134","Penguin Books")
My JSON file looks like
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
} {
"william-golding-lord of the flies": [
["pageCount:134", "publisher:Penguin Books", "firstName:william","lastName:golding", "date:1944"]
]
}
Which is obviously wrong. Is there a simple fix to edit my method to produce a correct JSON format? I look at many simple examples online on putting json data in python. But all of them gave me format errors when I checked on JSONLint.com . I have been racking my brain to fix this problem and editing the file to make it correct. However all my efforts were to no avail. Any help is appreciated. Thank you very much.

Simply appending new objects to your file doesn't create valid JSON. You need to add your new data inside the top-level object, then rewrite the entire file.
This should work:
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
# Load any existing json data,
# or create an empty object if the file is not found,
# or is empty
try:
with open('data.json') as infile:
data = json.load(infile)
except FileNotFoundError:
data = {}
if not data:
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','w') as outfile:
json.dump(data,outfile , default = set_default)

A JSON can either be an array or a dictionary. In your case the JSON has two objects, one with the key stephen-king-it and another with william-golding-lord of the flies. Either of these on their own would be okay, but the way you combine them is invalid.
Using an array you could do this:
[
{ "stephen-king-it": [] },
{ "william-golding-lord of the flies": [] }
]
Or a dictionary style format (I would recommend this):
{
"stephen-king-it": [],
"william-golding-lord of the flies": []
}
Also the data you are appending looks like it should be formatted as key value pairs in a dictionary (which would be ideal). You need to change it to this:
data[title].append({
'firstName': firstName,
'lastName': lastName,
'date': date,
'pageCount': pageCount,
'publisher': publisher
})

How to turn a list into a paginated JSON response for REST?

I'm new to Django REST Framework and I faced a problem.
I'm building a backend for a social app. The task is to return a paginated JSON response to client. In the docs I only found how to do that for model instances, but what I have is a list:
[368625, 507694, 687854, 765213, 778491, 1004752, 1024781, 1303354, 1311339, 1407238, 1506842, 1530012, 1797981, 2113318, 2179297, 2312363, 2361973, 2610241, 3005224, 3252169, 3291575, 3333882, 3486264, 3860625, 3964299, 3968863, 4299124, 4907284, 4941503, 5120504, 5210060, 5292840, 5460981, 5622576, 5746708, 5757967, 5968243, 6025451, 6040799, 6267952, 6282564, 6603517, 7271663, 7288106, 7486229, 7600623, 7981711, 8106982, 8460028, 10471602]
Is there some nice way to do it? Do I have to serialize it in some special way?
What client is waiting for is:
{"users": [{"id": "368625"}, {"id": "507694"}, ...]}
The question is: How to paginate such response?
Any input is highly appreciated!
Best regards,
Alexey.

TLDR:
import json
data=[368625, 507694, 687854, 765213, 778491, 1004752, 1024781, 1303354, 1311339, 1407238, 1506842, 1530012, 1797981, 2113318, 2179297, 2312363, 2361973, 2610241, 3005224, 3252169, 3291575, 3333882, 3486264, 3860625, 3964299, 3968863, 4299124, 4907284, 4941503, 5120504, 5210060, 5292840, 5460981, 5622576, 5746708, 5757967, 5968243, 6025451, 6040799, 6267952, 6282564, 6603517, 7271663, 7288106, 7486229, 7600623, 7981711, 8106982, 8460028, 10471602]
print(json.dumps({"users":[{"id":value} for value in data]}))
import json imports the json package, which is a JSON serialization/deserialization library
json.dumps(obj) takes obj, a python object, and serializes it to a JSON string
[{"id":value} for value in data] is just a list comprehension which creates a list of python dictionaries with "id" mapped to each value in the data array
EDIT: Pagination
I'm not sure if there's some standard on pagination, but a simple model would be:
"data": {
"prevPage": "id",
"nextPage": "id",
"data": [
...
]
}
Honestly, implementing that in python wouldn't be that hard:
data=[ ... ]
currentPage={"pageID":0,"data":[]}
prevPage={"pageID":-1}
pageSize=5
for value in data:
currentPage["data"].append({"id":value})
if len(currentPage)==pageSize:
currentPage["prevPage"]=prevPage["pageID"]
prevPage["nextPage"]=currentPage["pageID"]
# add currentPage to some database of pages
prevPage=currentPage
currentPage={"pageID":"generate new page id","data":[]}
Obviously, this isn't very polished, but shows the basic concept.
EDIT: Pagination without storing pages
You could of course recreate the page every time it is requested:
def getPage(pageNum)
#calculate pageStart and pageEnd based on your own requiremnets
pageStart = (pageNum // 5) * 5
pageEnd = (pageNum // 5)*5+5
return [{"id":data[idx] for idx in range(pageStart, pageEnd)}]

python iterate json file where the json structure and key values are unknown

consider the sample JSON below.
{
"widget": {
"test": "on",
"window": {
"title": "myWidget1",
"name": "main_window"
},
"image": {
"src": "Images/wid1.png",
"name": "wid1"
}
},
"os":{
"name": "ios"
}
}
Consider the case where we dont know the structure of the JSON and any of the keys. What I need to implement is a python function which iterates through all the keys and sub-keys and prints the key. That is by only knowing the JSON file name, I should be able to iterate the entire keys and sub-keys. The JSON can be of any structure.What I have tried is given below.
JSON_PATH = "D:\workspace\python\sampleJSON.json"
os.path.expanduser(JSON_PATH)
def iterateAllKeys(e):
for key in e.iterkeys():
print key
for child in key.get(key):
iterateAllKeys(child)
with open(JSON_PATH) as data_file:
data = json.load(data_file)
iterateAllKeys(data)
Here, the iterateAllKeys() function is supposed to print all the keys present in the JSON file. But if only the outer loop is present, ie
def iterateAllKeys(e):
for key in e.iterkeys():
print key
It will print the keys "widget" and "os". But,
def iterateAllKeys(e):
for key in e.iterkeys():
print key
for child in key.get(key):
iterateAllKeys(child)
returns an error - AttributeError: 'unicode' object has no attribute 'get'. My understanding is - since the value of 'child' is not a dict object, we cannot apply the 'key.get()'. But is there any alternate way by which I can iterate the JSON file without specifying any of the key names. Thank you.

You can use recursion to iterate through multi level dictionaries like this:
def iter_dict(dic):
for key in dic:
print(key)
if isinstance(dic[key], dict):
iter_dict(dic[key])
The keys of the first dictionary are iterated and every key is printed, if the item is an instance of dict class, we can use recursion to also iterate through the dictionaries we encounter as items.

You can do this thru auxiliary package like flatten_json.
pip install flatten_json
from flatten_json import flatten
for key in flatten(your_dict).keys():
print(key)
Output:
widget_test
widget_window_title
widget_window_name
widget_image_src
widget_image_name
os_name
If you want to show only key without whole path then you can do like that:
print(key.split('_')[-1])

First of all your last function:
def iterateAllKeys(e):
for key in e.iterkeys():
print key
for child in key.get(key):
iterateAllKeys(child)
key is just the key_value of the dictionary. So if anything you should be using e.get(key) or e[key].
for child in e.get(key):
Now this would not solve your problem, one work-around is using try except, as follows:
def iterateAllKeys(e):
for key in e.iterkeys():
print key
try:
iterateAllKeys(e[key])
except:
print "---SKIP---"
This is maybe not the best work-around, but it certainly works.
With your Data it prints the following:
widget
test
---SKIP---
window
name
---SKIP---
title
---SKIP---
image
src
---SKIP---
name
---SKIP---
os
name
---SKIP---

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

(Python) merge new and existing JSON with deduplication - python

Related

Access one item from a dict and store it into a variable

Parsing json file to collect data and store in a list/array

Writing JSON data in python. Format

How to turn a list into a paginated JSON response for REST?

python iterate json file where the json structure and key values are unknown

Categories

Resources