logical error in python dictionary traversal - python

one of my queries in mongoDB through pymongo returns:
{ "_id" : { "origin" : "ABE", "destination" : "DTW", "carrier" : "EV" }, "Ddelay" : -5.333333333333333,
"Adelay" : -12.666666666666666 }
{ "_id" : { "origin" : "ABE", "destination" : "ORD", "carrier" : "EV" }, "Ddelay" : -4, "Adelay" : 14 }
{ "_id" : { "origin" : "ABE", "destination" : "ATL", "carrier" : "EV" }, "Ddelay" : 6, "Adelay" : 14 }
I am traversing the result as below in my python module but I am not getting all the 3 results but only two. I believe I should not use len(results) as I am doing currently. Can you please help me correctly traverse the result as I need to display all three results in the resultant json document on web ui.
Thank you.
code:
pipe = [{ '$match': { 'origin': {"$in" : [origin_ID]}}},
{"$group" :{'_id': { 'origin':"$origin", 'destination': "$dest",'carrier':"$carrier"},
"Ddelay" : {'$avg' :"$dep_delay"},"Adelay" : {'$avg' :"$arr_delay"}}}, {"$limit" : 4}]
results = connect.aggregate(pipeline=pipe)
#pdb.set_trace()
DATETIME_FORMAT = '%Y-%m-%d'
for x in range(len(results)):
origin = (results['result'][x])['_id']['origin']
destination = (results['result'][x])['_id']['destination']
carrier = (results['result'][x])['_id']['carrier']
Adelay = (results['result'][x])['Adelay']
Ddelay = (results['result'][x])['Ddelay']
obj = {'Origin':origin,
'Destination':destination,
'Carrier': carrier,
'Avg Arrival Delay': Adelay,
'Avg Dep Delay': Ddelay}
json_result.append(obj)
return json.dumps(json_result,indent= 2, sort_keys=False,separators=(',',':'))

Pymongo returns result in format:
{u'ok': 1.0, u'result': [...]}
So you should iterate over result:
for x in results['result']:
...
In your code you try to calculate length of dict, not length of result container.

Related

How to rearrange my output to be a single object and be separated by a comma ? Mongodb/python

Note: i use flask / pymongo
How can i rearrange my data to output all of them in a single object separated by a comma. (see end of post for example).
I have a collection with data similar to this and i need to ouput all the number of times for example here, that sandwich is in the collection like this Sandwiches: 13 :
{
"id":"J6qWt6XIUmIGFHX5rQJA-w",
"categories":[
{
"alias":"sandwiches",
"title":"Sandwiches"
}
]
}
So with this first request :
restos.aggregate([{$unwind:"$categories"},
{$group:{_id:'$categories.title', count:{$sum:1}}},
{$project:{ type:"$_id", count: 1,_id:0}}])
I achieved to get an out put like this :
{ "count" : 3, "type" : "Sandwiches" }
But what i want is the type as a key and the count as a value, like this : { "Sandwiches" : 3 }
I was able to "partially make it works with that command but that's not really the format i want :
.aggregate([{'$unwind': '$categories'},{'$group': {'_id': '$categories.title','count':{'$sum':
1}}},{'$project': {'type': '$_id', 'count': 1, '_id': 0}}, {'$replaceRoot': {'newRoot':
{'$arrayToObject': [[{'k': '$type', 'v': '$count'}]]}}}]))
The output was :
{
"restaurants": [
{
"Polish": 1
},
{
"Salad": 3
},
{
"Convenience Stores": 1
},
{
"British": 2
}]}
But my desired output is something like this that doesn't have the array and the data is contained into only 1 object:
{
"restaurants":{
Sandwiches: 13,
pizza: 15,
...
}
for the list thing i've come to realize that i use flask and when i return my jsonify object i put 'restaurants': list(db.restaurants.aggregate([
but when i remove it i get this error : TypeError: Object of type CommandCursor is not JSON serializable
Any idea on how to do that ? thanks a lot :)
If you can get a data like the following.
{ "count" : 13, "type" : "Sandwiches" }
You can do like this:
data = [{ "count" : 13, "type" : "Sandwiches" }, { "count" : 15, "type" : "Pizza" }]
output = {}
p = {}
for d in data: # read each item in the list
p.update({d['type']: d['count']}) # build a p dict with type key
output.update({'restaurants': p}) # build an output dict with restaurants key
print(output)
# {'restaurants': {'Sandwiches': 13, 'Pizza': 15}}

python how to search a string, count values and group by in json

I have a python program calling an API that receives the result as below:
{
"result": [
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "3"
},
{
"company" : "BMW",
"model" : "7"
},
{
"company" : "AUDI",
"model" : "A3"
},
{
"company" : "AUDI",
"model" : "A7"
},
]
}
Now my task is to identify the number of occurrences of elements from the list in JSON output and group them. The expected output should look like this:
{
"BMW" :
{
"5series" : 3,
"3series" : 1,
"7series" : 1,
},
"AUDI" :
{
"A3" : 1,
"A7" : 1,
},
"MERCEDES":
{
"EClass" : 0,
"SClass" : 0
}
}
I need to find the "company" from list of elements. This will include names that may not be in JSON response sometimes, then the expected output should include that as 0. The "model" names (3,5,7,A3 etc..,) are fixed, so we know that's those are only ones that may or may not be in json api response.
For ex: The List has 3 company names in below code. - companyname = ["BMW,"AUDI","MERCEDES"] . However, sometimes, the JSON API response may not have one or more elements. In this case, "MERCEDES" is missing, but the final output should include "MERCEDES" as well with value as 0.
Here is what i have tried so far:
def modelcount():
companyname= ["BMW","AUDI","MERCEDES"]
url = apiurl
#Send Request
apiresponse = requests.get(url, auth=(user, password), headers=headers, proxies=proxies)
# Decode the JSON response into a dictionary and use the data
data = apiresponse.json()
print(len(data['result']))
3series= 0
5series= 0
7series= 0
A3=0
A7=0
EClass = 0
SClass = 0
modelcountjson = {}
for name in companyname:
for item in data['result']:
models= {}
if item['company'] == name:
if item['model'] == 3:
3series = 3series + 1
elif item['model'] == 5:
5series = 5series + 1
elif item['model'] == 7:
7series = 7series + 1
models['3series'] = 3series
models['5series'] = 5series
models['7series'] = 7series
#I still haven't written AUDI, MERCEDES above. This is where i feel i am writing inefficiently.
modelcountjson[name] = models
return jsonify(modelcountjson)
```
As the number of models grow, I am worried of code getting redundant with many for loops and may cause performance overhead. I am looking for help on achieving the end result in most efficient way.
Thank you so much for your help.
A useful package for working directly with JSON-style dictionaries and lists is toolz (see documentation for more details). This way you can concisely group the data and count occurrences of each model while handling potentially missing data separately:
from toolz import itertoolz
result = {
"result": [
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "3"
},
{
"company" : "BMW",
"model" : "7"
},
{
"company" : "AUDI",
"model" : "A3"
},
{
"company" : "AUDI",
"model" : "A7"
},
]
}
final_output = {}
grouped_result = itertoolz.groupby('company', result['result'])
if 'MERCEDES' not in grouped_result:
final_output['MERCEDES'] = {
'EClass': 0,
'SClass': 0
}
for key, value in grouped_result.items():
models = itertoolz.pluck('model', value)
final_output[key] = itertoolz.frequencies(models)
The output results in:
{'AUDI': {'A3': 1, 'A7': 1}, 'BMW': {'3': 1, '5': 3, '7': 1}, 'MERCEDES': {'EClass': 0, 'SClass': 0}}
You could go for a bit of a separation of code and config:
conf = {
'BMW': {'format': '{}series', 'keys': ['3', '5', '7']},
'AUDI': {'format': '{}', 'keys': ['A3', 'A7']},
'MERCEDES': {'format': '{}Class', 'keys': ['E', 'S']},
}
def modelcount():
# retrieve `data`
# ...
result = {
k: {
v['format'].format(key): 0 for key in v['keys']
} for k, v in conf.items()
}
for car in data['result']:
com = car['company']
mod = car['model']
key = conf[com]['format'].format(mod)
result[com][key] += 1
for com in result:
result[com]['Total'] = sum(result[com].values())
return result
>>> modelcount()
{'BMW': {'3series': 1, '5series': 3, '7series': 1},
'AUDI': {'A3': 1, 'A7': 1},
'MERCEDES': {'EClass': 0, 'SClass': 0}}
This way, for more companies and models, you will only have to touch the conf, not the code. The time complexity of this is O(m+n) with m the total number of distinct models and n the number of cars in the API response.

i want to convert sample JSON data into nested JSON using specific key-value in python

I have below sample data in JSON format :
project_cost_details is my database result set after querying.
{
"1": {
"amount": 0,
"breakdown": [
{
"amount": 169857,
"id": 4,
"name": "SampleData",
"parent_id": "1"
}
],
"id": 1,
"name": "ABC PR"
}
}
Here is full json : https://jsoneditoronline.org/?id=2ce7ab19af6f420397b07b939674f49c
Expected output :https://jsoneditoronline.org/?id=56a47e6f8e424fe8ac58c5e0732168d7
I have this sample JSON which i created using loops in code. But i am stuck at how to convert this to expected JSON format. I am getting sequential changes, need to convert to tree like or nested JSON format.
Trying in Python :
project_cost = {}
for cost in project_cost_details:
if cost.get('Parent_Cost_Type_ID'):
project_id = str(cost.get('Project_ID'))
parent_cost_type_id = str(cost.get('Parent_Cost_Type_ID'))
if project_id not in project_cost:
project_cost[project_id] = {}
if "breakdown" not in project_cost[project_id]:
project_cost[project_id]["breakdown"] = []
if 'amount' not in project_cost[project_id]:
project_cost[project_id]['amount'] = 0
project_cost[project_id]['name'] = cost.get('Title')
project_cost[project_id]['id'] = cost.get('Project_ID')
if parent_cost_type_id == cost.get('Cost_Type_ID'):
project_cost[project_id]['amount'] += int(cost.get('Amount'))
#if parent_cost_type_id is None:
project_cost[project_id]["breakdown"].append(
{
'amount': int(cost.get('Amount')),
'name': cost.get('Name'),
'parent_id': parent_cost_type_id,
'id' : cost.get('Cost_Type_ID')
}
)
from this i am getting sample JSON. It will be good if get in this code only desired format.
Also tried this solution mention here : https://adiyatmubarak.wordpress.com/2015/10/05/group-list-of-dictionary-data-by-particular-key-in-python/
I got approach to convert sample JSON to expected JSON :
data = [
{ "name" : "ABC", "parent":"DEF", },
{ "name" : "DEF", "parent":"null" },
{ "name" : "new_name", "parent":"ABC" },
{ "name" : "new_name2", "parent":"ABC" },
{ "name" : "Foo", "parent":"DEF"},
{ "name" : "Bar", "parent":"null"},
{ "name" : "Chandani", "parent":"new_name", "relation": "rel", "depth": 3 },
{ "name" : "Chandani333", "parent":"new_name", "relation": "rel", "depth": 3 }
]
result = {x.get("name"):x for x in data}
#print(result)
tree = [];
for a in data:
#print(a)
if a.get("parent") in result:
parent = result[a.get("parent")]
else:
parent = ""
if parent:
if "children" not in parent:
parent["children"] = []
parent["children"].append(a)
else:
tree.append(a)
Reference help : http://jsfiddle.net/9FqKS/ this is a JavaScript solution i converted to Python
It seems that you want to get a list of values from a dictionary.
result = [value for key, value in project_cost_details.items()]

MongoDB find in array of objects

I want to query Mongodb: find all users, that have 'artist'=Iowa in any array item of objects.
Here is Robomongo of my collection:
In Python I'm doing:
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music': {
'items': {
'$elemMatch': {
'artist': 'Iowa'
}
}
}
})
but this returns nothing. Also tried this:
{'member_of_group': 20548570, 'my_music': {'$elemMatch': {'$.artist': 'Iowa'}}} and that didn't work.
Here is part of document with array:
"can_see_audio" : 1,
"my_music" : {
"items" : [
{
"name" : "Anastasia Plotnikova",
"photo" : "http://cs620227.vk.me/v620227451/9c47/w_okXehPbYc.jpg",
"id" : "864451",
"name_gen" : "Anastasia"
},
{
"title" : "Ain't Talkin' 'Bout Dub",
"url" : "http://cs4964.vk.me/u14671028/audios/c5b8a0735224.mp3?extra=jgV4ZQrFrsfxZCJf4gsRgnKWvdAfIqjE0M6eMtxGFpj2yp4vjs5DYgAGImPMp4mCUSUGJzoyGeh2Es6L-H51TPa3Q_Q",
"lyrics_id" : 24313846,
"artist" : "Apollo 440",
"genre_id" : 18,
"id" : 344280392,
"owner_id" : 864451,
"duration" : 279
},
{
"title" : "Animals",
"url" : "http://cs1316.vk.me/u4198685/audios/4b9e4536e1be.mp3?extra=TScqXzQ_qaEFKHG8trrwbFyNvjvJKEOLnwOWHJZl_cW5EA6K3a9vimaMpx-Yk5_k41vRPywzuThN_IHT8mbKlPcSigw",
"lyrics_id" : 166037,
"artist" : "Nickelback",
"id" : 344280351,
"owner_id" : 864451,
"duration" : 186
},
The following query should work. You can use the dot notation to query into sub documents and arrays.
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music.items.artist':'Iowa'
})
The following query worked for me in the mongo shell
db.collection1.find({ "my_music.items.artist" : "Iowa" })

Parsing json in python

I am having trouble parsing the following json file. I am trying to parse this using logstash/python.
{
"took" : 153,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 946,
"max_score" : 1.0,
"hits" : [ {
"_index" : "incoming_bytes",
"_type" : "logs",
"_id" : "lZSq4mBRSVSxO0kyTwh3fQ",
"_score" : 1.0, "_source" : {"user_id":"86c8c25d81a448c49e3d3924ea5ceddf","name":"network.incoming.bytes","resource_id":"instance-00000001-c8be5ca1-116b-45b3-accb-1b40050abc90-tapaf1e421f-c8","timestamp":"2013-11-02T07:32:36Z","resource_metadata":{"name":"tapaf1e421f-c8","parameters":{},"fref":null,"instance_id":"c8be5ca1-116b-45b3-accb-1b40050abc90","instance_type":"5c014e54-ee16-43a8-a763-54e243bd8969","mac":"fa:16:3e:67:39:29"},"volume":557462,"source":"openstack","project_id":"9ac587404bdd4fcdafe41c0b10f9f8ae","type":"cumulative","id":"f1eb19aa-4390-11e3-8bac-000c2973cfb1","unit":"B","#timestamp":"2013-11-02T07:32:38.276Z","#version":"1","host":"127.0.0.1","tags":["_grokparsefailure"],"priority":13,"severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","#type":"%{appdeliveryserver}"}
}, {
"_index" : "incoming_bytes",
"_type" : "logs",
"_id" : "073URWt5Sc-krLACxQnI3g",
"_score" : 1.0, "_source" : {"user_id":"86c8c25d81a448c49e3d3924ea5ceddf","name":"network.incoming.bytes","resource_id":"instance-00000001-c8be5ca1-116b-45b3-accb-1b40050abc90-tapaf1e421f-c8","timestamp":"2013-11-02T07:32:38Z","resource_metadata":{"name":"tapaf1e421f-c8","parameters":{},"fref":null,"instance_id":"c8be5ca1-116b-45b3-accb-1b40050abc90","instance_type":"5c014e54-ee16-43a8-a763-54e243bd8969","mac":"fa:16:3e:67:39:29"},"volume":562559,"source":"openstack","project_id":"9ac587404bdd4fcdafe41c0b10f9f8ae","type":"cumulative","id":"f31e38d4-4390-11e3-8bac-000c2973cfb1","unit":"B","#timestamp":"2013-11-02T07:32:39.001Z","#version":"1","host":"127.0.0.1","tags":["_grokparsefailure"],"priority":13,"severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","#type":"%{appdeliveryserver}"}
}]
}
}
I have used the following configuration for logstash, however the configuration does not work as expected which is: to parse individual fields in the JSON document and output to STDOUT.
input {
stdin{}
file {
path => ["/home/****/Downloads/throughput"]
codec => "json"
}
}
filter{
json{
source => "message"
target => "throughput"
}
}
output {
stdout {codec => rubydebug }
}
For python I am trying to access the Individual Volume and source (IP address) fields.
I tried the following code, with goal to map individual fields for each record, and I would like to know how to proceed in order to traverse and extract individual elements in the list.
import json
from pprint import pprint
json_data=open('throughput')
data = json.load(json_data)
pprint(data["hits"])
json_data.close()
Thanks
Parsed json is a dictionary, you can use itemgetter to drill down.
For example for volume
>>> for hits in data['hits']['hits']:
... print hits['_source']['volume']
...
557462
562559
or can use map to get a list:
>>> from operator import itemgetter
>>> map(itemgetter('volume'), map(itemgetter('_source'), data['hits']['hits']))
[557462, 562559]

Categories