How do I use simplejson to decode JSON responses to python objects? - python

JSON serialization Python using simpleJSON
How do I create an object so that we can optimize the serialization of the object
I'm using simpleJSON
1,2 are fixed variables
3 is a fixed dict of category and score
4 is an array of dicts that are fixed in length (4), the array is a length specificed at run-time.
The process needs to be as fast as possible, so I'm not sure about the best solution.
{
"always-include": true,
"geo": null,
"category-score" : [
{
"Arts-Entertainment": 0.72,
"Business": 0.03,
"Computers-Internet": 0.08,
"Gaming": 0.02,
"Health": 0.02,
}
],
"discovered-entities" : [
{
'relevance': '0.410652',
'count': '2',
'type': 'TelevisionStation',
'text': 'Fox News'
},
{
'relevance': '0.396494',
'count': '2',
'type': 'Organization',
'text': 'NBA'
}
]
],
}

Um...
import simplejson as json
result_object = json.loads(input_json_string)
?

Related

Propertly form JSON with df.to_json with dataframe containing nested json

I have the following situation:
id items
3b68b7b2-f42c-418b-aa88-02450d66b616 [{quantity=3.0, item_id=210defdb-de69-4d03-bddd-7db626cd501b, description=Abc}, {quantity=1.0, item_id=ff457660-5f30-4432-a5af-564a9dee0029, description=xyz . 23}, {quantity=10.0, item_id=8dbd22f2-cc13-4776-b58c-4d6fe0f3463e, description=abc def}]
where one of my columns has a nested JSON list inside of it.
I wish to output the data of this dataframe as proper JSON, including the nested list.
So, for example, calling df.to_dict(orient='records', indent=4) on the above dataframe yields:
[
{
"id": "3b68b7b2-f42c-418b-aa88-02450d66b616",
"items": "[{quantity=3.0, item_id=210defdb-de69-4d03-bddd-7db626cd501b, description=Abc}, {quantity=1.0, item_id=ff457660-5f30-4432-a5af-564a9dee0029, description=xyz . 23}, {quantity=10.0, item_id=8dbd22f2-cc13-4776-b58c-4d6fe0f3463e, description=abc def}]"
}
]
whereas I want:
[
{
"id": "3b68b7b2-f42c-418b-aa88-02450d66b616",
"items": [
{
"quantity": 3.0,
"item_id": "210defdb-de69-4d03-bddd-7db626cd501b",
"description": "Abc"
},
{
"quantity": 1.0,
"item_id": "ff457660-5f30-4432-a5af-564a9dee0029",
"description": "xyz . 23"
},
{
"quantity": 10.0,
"item_id": "8dbd22f2-cc13-4776-b58c-4d6fe0f3463e",
"description": "abc def"
}
]
}
]
Is this possible using df.to_json()? I have tried to use regex to parse the resulting string, but due to the data contained therein, it is unfortunately extremely difficult so "jsonify" the fields I want.
You don't have a list but a string, and this string is not valid json, so you need a bit of pre-processing.
Assuming a non-nested structure, you can use:
import json
out = (df.assign(items=df['items'].str.replace(r'(\w+)=([^,}]+)', r'"\1": "\2"', regex=True).apply(json.loads))
.to_dict(orient='records')
)
Output:
[{'id': '3b68b7b2-f42c-418b-aa88-02450d66b616',
'items': [{'description': 'Abc',
'item_id': '210defdb-de69-4d03-bddd-7db626cd501b',
'quantity': '3.0'},
{'description': 'xyz . 23',
'item_id': 'ff457660-5f30-4432-a5af-564a9dee0029',
'quantity': '1.0'},
{'description': 'abc def',
'item_id': '8dbd22f2-cc13-4776-b58c-4d6fe0f3463e',
'quantity': '10.0'}]}]

csv to complex nested json

So, I have a huge CSV file that looks like:
PN,PCA Code,MPN Code,DATE_CODE,Supplier Code,CM Code,Fiscal YEAR,Fiscal MONTH,Usage,Defects
13-1668-01,73-2590,MPN148,1639,S125,CM1,2017,5,65388,0
20-0127-02,73-2171,MPN170,1707,S125,CM1,2017,9,11895,0
19-2472-01,73-2302,MPN24,1711,S119,CM1,2017,10,4479,0
20-0127-02,73-2169,MPN170,1706,S125,CM1,2017,9,7322,0
20-0127-02,73-2296,MPN170,1822,S125,CM1,2018,12,180193,0
15-14399-01,73-2590,MPN195,1739,S133,CM6,2018,11,1290,0
What I want to do is group up all the data by PCA Code. So, a PCA Code will have certain number for parts, those parts would be manufactured by certain MPN Code and the final nested JSON structure that I want looks like:
[
{
PCA: {
"code": "73-2590",
"CM": ["CM1", "CM6"],
"parts": [
{
"number": "13-1668-01",
"manufacturer": [
{
"id": "MPN148"
"info": [
{
"date_code": 1639,
"supplier": {
"id": "S125",
"FYFM": "2020-9",
"usage": 65388,
"defects": 0,
}
}
]
},
]
}
]
}
}
]
So, I want this structure for multiple part numbers (PNs) having different MPNs with different Date Codes and so on.
I am currently using Pandas to do this but I'm stuck on how to proceed with the nesting.
My code so far:
import json
import pandas as pd
dataframe = pd.read_csv('files/dppm_wc.csv')
data = {'PCAs': []}
for key, group in dataframe.groupby('PCA Code'):
for index, row in group.itterrows():
temp_dict = {'PCA Code': key, 'CM Code': row['CM Code'], 'parts': []}
with open('output.txt', 'w') as file:
file.write(json.dumps(data, indent=4))
How do I proceed to achieve the nested JSON format that I want? Is there a better way to do this than what I am doing?
I don't really understand what you wish to do with that structure, but I guess it could be achieved with something like this
data = {'PCAs': []}
for key, group in df.groupby('PCA Code'):
temp_dict = {'PCA Code': key, 'CM Code': [], 'parts': []}
for index, row in group.iterrows():
temp_dict['CM Code'].append(row['CM Code'])
temp_dict['parts'].append(
{'number': row['PN'],
'manufacturer': [
{
'id': row['MPN Code'],
'info': [
{
'date_code': row['DATE_CODE'],
'supplier': {'id': row['Supplier Code'],
'FYFM': '%s-%s' % (row['Fiscal YEAR'], row['Fiscal MONTH']),
'usage': row['Usage'],
'defects': row['Defects']}
}
]
}]
}
)
data['PCAs'].append(temp_dict)

How to access MongoDB array in which is are stored key-value pairs by key name

I am working with pymongo and after writing aggregate query
db.collection.aggregate([{'$project': {'Id': '$ResultData.Id','data' : '$Results.Data'}}])
I received the object:
{'data': [{'key': 'valid', 'value': 'true'},
{'key': 'number', 'value': '543543'},
{'key': 'name', 'value': 'Saturdays cx'},
{'key': 'message', 'value': 'it is valid.'},
{'key': 'city', 'value': 'London'},
{'key': 'street', 'value': 'Bigeye'},
{'key': 'pc', 'value': '3566'}],
Is there a way that I can access the values by the key name? Like that '$Results.Data.city' and I will receive London. I would like to do that on the level of MongoDB aggregate query so it means I want to write a query in the way:
db.collection.aggregate([{'$project':
{'Id': '$ResultData.Id',
'data' : '$Results.Data',
'city' : $Results.Data.city',
'name' : $Results.Data.name',
'street' : $Results.Data.street',
'pc' : $Results.Data.pc',
}}])
And receive all the values of provided keys.
Using the $elemMatch projection operator in the following query from mongo shell:
db.collection.find(
{ _id: <some_value> },
{ _id: 0, data: { $elemMatch: { key: "city" } } }
)
The output:
{ "data" : [ { "key" : "city", "value" : "London" } ] }
Using PyMongo (gets the same output):
collection.find_one(
{ '_id': <some_value> },
{ '_id': 0, 'data': { '$elemMatch': { 'key': 'city' } } }
)
Using PyMongo aggregate method (gets the same result):
pipeline = [
{
'$project': {
'_id': 0,
'data': {
'$filter': {
'input': '$data', 'as': 'dat',
'cond': { '$eq': [ '$$dat.key', INPUT_KEY ] }
}
}
}
}
]
INPUT_KEY = 'city'
pprint.pprint(list(collection.aggregate(pipeline)))
Naming the received object "result", if result['data'] always is a list of dictionaries with 2 keys (key and value), you can convert the whole list to a dictionary using keys as keys and values as values. Given that this statement is somewhat confusing, here's the code:
data = {pair['key']: pair['value'] for pair in result['data']}
From here, data['city'] will give you 'London', data['street'] will be 'Bigeye' and so on. Obviously, this assumes that there are no conflicts amoung key values in result['data']. Note that this dictionary will (just as the original result['data']) only contain strings so don't expect data['number'] to be an integer.
Another approach would be to dynamically create an object holding each key-value pair as an attribute, allowing you to use the following syntax: data.city, data.street, ... But this would required more complicated code and is a less common and stable approach.

How to best diplay a random string, with some strings weighted heavier than others

I am trying to display a random string but would like some strings to occur more often than others. My current strategy is with nested dictionaries for ease of updating and the 'choices' function.
msg_list = {
'msg_1': {
'msg': 'Hi',
'weight': 40,
},
'msg_2': {
'msg': 'hello',
'weight': 50,
},
'msg_3': {
'msg': "What's up",
'weight': 10,
},
}
message = choices(msg_list['msg'], msg_list['weight'])
string = message['msg']
This obviously doesn't work, and I imagine I could build the lists with a loop, but I am curious if there is a faster way of doing this. Thanks!
You're almost there.
You just need to create lists for the 2 parameters of random.choices.
msg_list = {
'msg_1': {
'msg': 'Hi',
'weight': 40,
},
'msg_2': {
'msg': 'hello',
'weight': 50,
},
'msg_3': {
'msg': "What's up",
'weight': 10,
},
}
weights = [msg_list[key]['weight'] for key in msg_list.keys()]
messages = [msg_list[key]['msg'] for key in msg_list.keys()]
message = choices(messages, weights)
string = message[0]

Python dictionary comprehension filtering

I have a list of dictionaries, for instance :
movies = [
{
"name": "The Help",
"imdb": 8.0,
"category": "Drama"
},
{
"name": "The Choice",
"imdb": 6.2,
"category": "Romance"
},
{
"name": "Colonia",
"imdb": 7.4,
"category": "Romance"
},
{
"name": "Love",
"imdb": 6.0,
"category": "Romance"
},
{
"name": "Bride Wars",
"imdb": 5.4,
"category": "Romance"
},
{
"name": "AlphaJet",
"imdb": 3.2,
"category": "War"
},
{
"name": "Ringing Crime",
"imdb": 4.0,
"category": "Crime"
}
]
I want to filter them by IMDB > 5.5 :
I try this code:
[ { k:v for (k,v) in i.items() if i.get("imdb") > 5.5 } for i in movies]
and the output:
[{'name': 'The Help', 'imdb': 8.0, 'category': 'Drama'},
{'name': 'The Choice', 'imdb': 6.2, 'category': 'Romance'},
{'name': 'Colonia', 'imdb': 7.4, 'category': 'Romance'},
{'name': 'Love', 'imdb': 6.0, 'category': 'Romance'},
{},
{},
{}]
When the IMDB is lower than 5.5, It returns an empty dictionary. any ideas? thank you!
A dictionary comprehension is not necessary to filter a list of dictionaries.
You can just use a list comprehension with a condition based on a dictionary value:
res = [d for d in movies if d['imdb'] > 5.5]
The way your code is written, the dictionary comprehension produces an empty dictionary in cases where i['imdb'] <= 5.5.
An alternative to using list comprehension is using the filter function from the Python builtins. This takes in a function and an iterable, and returns a "filter object" that only keeps the items which, when passed through the function return True.
In this case, it would be:
list(filter(lambda x:x["imdb"]>5.5, movies))
I included the list() around everything to convert the filter object to a list you can use. If you want to learn more about the filter builtin, you can read about it here.
Other answers have already provided better alternative ways of doing this but let's look at the way you were going about it and look at what was going on.
If I delete some things from your code, I get:
[{} for i in movies}]
Looking at just that, should make it clear why a dictionary is created for each movie. You do have an if statement inside that dictionary, but because it is inside, it doesn't change whether it is being created.
To do this the way you were going about it, you'd essentially need to check twice making the first check irrelevant:
[
{ k:v for (k,v) in i.items() if i.get("imdb") > 5.5 } for i in movies if i.get("imdb") > 5.5
]
which can be simplified to just
[
{ k:v for (k,v) in i.items()} for i in movies if i.get("imdb") > 5.5
]
and now, since we aren't changing the item, just:
[
i for i in movies if i.get("imdb") > 5.5
]
If you are happy to use a 3rd party library, Pandas accepts a list of dictionaries via the pd.DataFrame constructor:
import pandas as pd
df = pd.DataFrame(movies)
res = df[df['imdb'] > 5.5].to_dict('records')
Result:
[{'category': 'Drama', 'imdb': 8.0, 'name': 'The Help'},
{'category': 'Romance', 'imdb': 6.2, 'name': 'The Choice'},
{'category': 'Romance', 'imdb': 7.4, 'name': 'Colonia'},
{'category': 'Romance', 'imdb': 6.0, 'name': 'Love'}]

Categories