The fastest way to structure data in JSON for python - python

I have a complex document that I am trying to structure most conveniently and efficiently with JSON in Python. I would like to be able to retrieve one of the items in my document with one line (i.e. not via a for loop)
A demo of the structure looks like this:
{
"movies": {
"0": {
"name": "charles",
"id": 0,
"loopable": true
},
"1": {
"name": "ray",
"id": 1,
"loopable": true
}
}
}
I am trying to be able to easily fetch a movie based on its id field. To do this, right now, I have made the index the same as the key to the movies object. So when I json.load the object to find movie 1's name I can just do movie[(id)]['name']
It seems like I should have a list of movies in the json file but it also seems like that would be more complicated. It could look like this:
{
"movies": [
{
"name": "charles",
"id": 0,
"loopable": true
},
{
"name": "ray",
"id": 1,
"loopable": true
}
]
}
but if that were the case I would have to loop through the entire array like this:
for movie in movies:
if movie['id'] == (id)
# Now I can get movie['id']['name']
Is there a more effiecient way of doing this?

Let 'movies' be a dict and not a list:
{
"movies": {
"12": {
"name": "charles",
"id": 12,
"loopable": true
},
"39": {
"name": "ray",
"id": 39,
"loopable": true
}
}
}
and you can access movie by id with yourjson['movies'][str(id)]

Related

Remove duplicate JSON keys in python

I have a 2M entries json that looks like this with some keys as integers that occasionally repeat and I'm trying to figure out how to remove all duplicates in a dictionary:
{
"1": {
"id": 1,
"some_data": "some_data_1",
},
"2": {
"id": 2,
"some_data": "some_data_2",
},
"2": {
"id": 2,
"some_data": "some_data_2",
},
"3": {
"id": 3,
"some_data": "some_data_3",
},
}
So, basically, I'm looking for a generic function that will iterate over dict_keys, look for duplicates and return a clean json. Tried flicking with { each[''] : each for each in te }.values(), but to no avail.

Converting from json to dataframe to sql

I'm trying to save all the json data to the sql database and I'm using python so I decided to use pandas.
Part of the JSON:
{
"stores": [
{
"ID": "123456",
"name": "Store 1",
"status": "Active",
"date": "2019-03-28T15:20:00Z",
"tagIDs": null,
"location": {
"cityID": 2,
"countryID": 4,
"geoLocation": {
"latitude": 1.13121,
"longitude": 103.4324231
},
"postcode": "123456",
"address": ""
},
"new": false
},
{
"ID": "223456",
"name": "Store 2",
"status": "Active",
"date": "2020-03-28T15:20:00Z",
"tagIDs": [
12,
35
],
"location": {
"cityID": 21,
"countryID": 5,
"geoLocation": {
"latitude": 1.12512,
"longitude": 103.23342
},
"postcode": "223456",
"address": ""
},
"new": true
}
]
}
My Code:
response = requests.get(.....)
result = response.text
data = json.loads(result)
df = pd.json_normalize(data["store"])
.....
db_connection = sqlalchemy.create_engine(.....)
df.to_sql(con=db_connection, name="store", if_exists="append" )
Error: _mysql_connector.MySQLInterfaceError: Python type list cannot be converted
How I want the dataframe to actually look like:
ID tagIDs date
0 123456 [] 2020-04-23T09:32:26Z
1 223456 [12,35] 2019-05-24T03:21:39Z
2 323456 [709,1493] 2019-03-28T15:38:39Z
I tried using different dataframes & json objects so far and they all work.
So I discovered the issue is with the json object.
Without the "tagIDs", everything else works fine.
I was thinking maybe if I converted the object to a string it can be parsed to sql but it didn't work either. How do I change the tagIDs such that I can parse everything to sql? Or is there another more efficient way to do this?
I think the tagIDs field is a list and your database does not seem to be happy with it.
Not sure this is the best way but you can try to convert it from list to string
df['tagIDs'] = df['tagIDs'].apply(lambda x: str(x))

I need help figuring out how to turn online data into a usable list that I can print data from

In a program I am working on, I use ArcCloud's music fingerprinting service. after uploading the data I need identified, I am given back this piece of data:
re = ACRCloudRecognizer(config)
data = (re.recognize_by_file('audio_name.mp3', 0))
>>>data
'{"metadata":{"timestamp_utc":"2020-05-18 23:00:59","music":[{"label":"NoCopyrightSounds","play_offset_ms":125620,"duration_ms":326609,"external_ids":{},"artists":[{"name":"Culture Code & Regoton"}],"result_from":1,"acrid":"a53ea40c6a8b4a6795ac3d799f6a4aec","title":"Waking Up","genres":[{"name":"Electro"}],"album":{"name":"Waking Up"},"score":100,"external_metadata":{},"release_date":"2014-05-25"}]},"cost_time":5.5099999904633,"status":{"msg":"Success","version":"1.0","code":0},"result_type":0}\n'
I think it's a list, but I am unable to figure out how to navigate nor grab specific information from it. I'm unsure how they set up the information, and what patterns to look for. Ideally, I would like to create a print function that would print the title, artists, and album.
Any help is much appreciated!
Formatting the JSON makes it more legible
{
"metadata": {
"timestamp_utc": "2020-05-18 23:00:59",
"music": [
{
"label": "NoCopyrightSounds",
"play_offset_ms": 125620,
"duration_ms": 326609,
"external_ids": {},
"artists": [
{
"name": "Culture Code & Regoton"
}
],
"result_from": 1,
"acrid": "a53ea40c6a8b4a6795ac3d799f6a4aec",
"title": "Waking Up",
"genres": [
{
"name": "Electro"
}
],
"album": {
"name": "Waking Up"
},
"score": 100,
"external_metadata": {},
"release_date": "2014-05-25"
}
]
},
"cost_time": 5.5099999904633,
"status": {
"msg": "Success",
"version": "1.0",
"code": 0
},
"result_type": 0
}
Looks like you're looking for .metadata.music.title (presumably), but only if .status.code is 0

Find a value in JSON using Python

I’ve previously succeeded in parsing data from a JSON file, but now I’m facing a problem with the function I want to achieve. I have a list of names, identification numbers and birthdate in a JSON. What I want to get in Python is to be able to let a user input a name and retrieve his identification number and the birthdate (if present).
This is my JSON example file:
[
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": null
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
To be clear, I want to input "V410Z8" and get his name and his birthdate.
I tried to write some code in Python but I only succeed in searching for “id_number” and not for what is inside “id_number” for example "V410Z8".
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
database = "example.json"
data = json.loads(open(database).read())
id_number = data[0]["id_number"]
print id_number
Thank you for your support, guys :)
You have to iterate over the list of dictionaries and search for the one with the given id_number. Once you find it you can print the rest of its data and break, assuming id_number is unique.
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
for i in data:
if i['id_number'] == 'V410Z8':
print(i['birthdate'])
print(i['name'])
break
If you have control over the data structure, a more efficient way would be to use the id_number as a key (again, assuming id_number is unique):
data = { "SA4784" : {"name": "Mark", "birthdate": None},
"V410Z8" : { "name": "Vincent", "birthdate": "15/02/1989"},
"CZ1094" : {"name": "Paul", "birthdate": "27/09/1994"}
}
Then all you need to do is try to access it directly:
try:
print(data["V410Z8"]["name"])
except KeyError:
print("ID doesn't exist")
>> "Vincent"
Using lamda in Python
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
Using Lambda and filter
print(list(filter(lambda x:x["id_number"]=="CZ1094",data)))
Output
[{'id_number': 'CZ1094', 'name': 'Paul', 'birthdate': '27/09/1994'}]
You can use list comprehension:
Given
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "15/02/1989"
},
{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "27/09/1994"
}
]
to get the list item(s) with id_number equal to "V410Z8" you may use:
result = [x for x in data if x["id_number"]=="V410Z8"]
result will contain:
[{'id_number': 'V410Z8', 'name': 'Vincent', 'birthdate': '15/02/1989'}]
In case the if condition is not satisfied, result will contain an empty list: []
data = [
{
"id_number": "SA4784",
"name": "Mark",
"birthdate": None
},
{
"id_number": "V410Z8",
"name": "Vincent",
"birthdate": "14/02/1989"
},
{
"id_number": "CZ1093",
"name": "Paul",
"birthdate": "26/09/1994"
}
]
list(map(lambda x:x if x["id_number"]=="cz1093" ,data)
Output should be
[{
"id_number": "CZ1094",
"name": "Paul",
"birthdate": "26/09/1994"
}]
If you are only interested in one or a subset of total results, then I'd suggest a generator function as the fastest solution, since it will not unnecessarily iterate over every item regardless, and is more memory efficient:
def gen_func(data, search_term):
for i in data:
if i['id_number'] == search_term:
yield i
You can then run the following to retrieve results for CZ1094:
foo = gen_func(data, 'CZ1094')
next(foo)
{'id_number': 'CZ1094', 'name': 'Paul', 'birthdate': '27/09/1994'}
NB: You'll need to handle StopIteration at end of iterable.

python querying a json objectpath

I've a nested json structure, I'm using objectpath (python API version), but I don't understand how to select and filter some information (more precisely the nested information in the structure).
EG.
I want to select the "description" of the action "reading" for the user "John".
JSON:
{
"user":
{
"actions":
[
{
"name": "reading",
"description": "blablabla"
}
]
"name": "John"
}
}
CODE:
$.user[#.name is 'John' and #.actions.name is 'reading'].actions.description
but it doesn't work (empty set but in my JSON it isn't so).
Any suggestion?
Is this what you are trying to do?
import objectpath
data = {
"user": {
"actions": {
"name": "reading",
"description": "blablabla"
},
"name": "John"
}
}
tree = objectpath.Tree(data)
result = tree.execute("$.user[#.name is 'John'].actions[#.name is 'reading'].description")
for entry in result:
print entry
Output
blablabla
I had to fix your JSON. Also, tree.execute returns a generator. You could replace the for loop with print result.next(), but the for loop seemed more clear.
import objectpath import *
your_json = {"name": "felix", "last_name": "diaz"}
# This json path will bring all the key-values of your json
your_json_path='$.*'
my_key_values = Tree(your_json).execute(your_json_path)
# If you want to retrieve the name node...then specify it.
my_name= Tree(your_json).execute('$.name')
# If you want to retrieve a the last_name node...then specify it.
last_name= Tree(your_json).execute('$.last_name')
I believe you're just missing a comma in JSON:
{
"user":
{
"actions": [
{
"name": "reading",
"description": "blablabla"
}
],
"name": "John"
}
}
Assuming there is only one "John", with only one "reading" activity, the following query works:
$.user[#.name is 'John'].actions[0][#.name is 'reading'][0].description
If there could be multiple "John"s, with multiple "reading" activities, the following query will almost work:
$.user.*[#.name is 'John'].actions..*[#.name is 'reading'].description
I say almost because the use of .. will be problematic if there are other nested dictionaries with "name" and "description" entries, such as
{
"user": {
"actions": [
{
"name": "reading",
"description": "blablabla",
"nested": {
"name": "reading",
"description": "broken"
}
}
],
"name": "John"
}
}
To get a correct query, there is an open issue to correctly implement queries into arrays: https://github.com/adriank/ObjectPath/issues/60

Categories