I am having trouble parsing the following json file. I am trying to parse this using logstash/python.
{
"took" : 153,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 946,
"max_score" : 1.0,
"hits" : [ {
"_index" : "incoming_bytes",
"_type" : "logs",
"_id" : "lZSq4mBRSVSxO0kyTwh3fQ",
"_score" : 1.0, "_source" : {"user_id":"86c8c25d81a448c49e3d3924ea5ceddf","name":"network.incoming.bytes","resource_id":"instance-00000001-c8be5ca1-116b-45b3-accb-1b40050abc90-tapaf1e421f-c8","timestamp":"2013-11-02T07:32:36Z","resource_metadata":{"name":"tapaf1e421f-c8","parameters":{},"fref":null,"instance_id":"c8be5ca1-116b-45b3-accb-1b40050abc90","instance_type":"5c014e54-ee16-43a8-a763-54e243bd8969","mac":"fa:16:3e:67:39:29"},"volume":557462,"source":"openstack","project_id":"9ac587404bdd4fcdafe41c0b10f9f8ae","type":"cumulative","id":"f1eb19aa-4390-11e3-8bac-000c2973cfb1","unit":"B","#timestamp":"2013-11-02T07:32:38.276Z","#version":"1","host":"127.0.0.1","tags":["_grokparsefailure"],"priority":13,"severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","#type":"%{appdeliveryserver}"}
}, {
"_index" : "incoming_bytes",
"_type" : "logs",
"_id" : "073URWt5Sc-krLACxQnI3g",
"_score" : 1.0, "_source" : {"user_id":"86c8c25d81a448c49e3d3924ea5ceddf","name":"network.incoming.bytes","resource_id":"instance-00000001-c8be5ca1-116b-45b3-accb-1b40050abc90-tapaf1e421f-c8","timestamp":"2013-11-02T07:32:38Z","resource_metadata":{"name":"tapaf1e421f-c8","parameters":{},"fref":null,"instance_id":"c8be5ca1-116b-45b3-accb-1b40050abc90","instance_type":"5c014e54-ee16-43a8-a763-54e243bd8969","mac":"fa:16:3e:67:39:29"},"volume":562559,"source":"openstack","project_id":"9ac587404bdd4fcdafe41c0b10f9f8ae","type":"cumulative","id":"f31e38d4-4390-11e3-8bac-000c2973cfb1","unit":"B","#timestamp":"2013-11-02T07:32:39.001Z","#version":"1","host":"127.0.0.1","tags":["_grokparsefailure"],"priority":13,"severity":5,"facility":1,"facility_label":"user-level","severity_label":"Notice","#type":"%{appdeliveryserver}"}
}]
}
}
I have used the following configuration for logstash, however the configuration does not work as expected which is: to parse individual fields in the JSON document and output to STDOUT.
input {
stdin{}
file {
path => ["/home/****/Downloads/throughput"]
codec => "json"
}
}
filter{
json{
source => "message"
target => "throughput"
}
}
output {
stdout {codec => rubydebug }
}
For python I am trying to access the Individual Volume and source (IP address) fields.
I tried the following code, with goal to map individual fields for each record, and I would like to know how to proceed in order to traverse and extract individual elements in the list.
import json
from pprint import pprint
json_data=open('throughput')
data = json.load(json_data)
pprint(data["hits"])
json_data.close()
Thanks
Parsed json is a dictionary, you can use itemgetter to drill down.
For example for volume
>>> for hits in data['hits']['hits']:
... print hits['_source']['volume']
...
557462
562559
or can use map to get a list:
>>> from operator import itemgetter
>>> map(itemgetter('volume'), map(itemgetter('_source'), data['hits']['hits']))
[557462, 562559]
Related
Note: i use flask / pymongo
How can i rearrange my data to output all of them in a single object separated by a comma. (see end of post for example).
I have a collection with data similar to this and i need to ouput all the number of times for example here, that sandwich is in the collection like this Sandwiches: 13 :
{
"id":"J6qWt6XIUmIGFHX5rQJA-w",
"categories":[
{
"alias":"sandwiches",
"title":"Sandwiches"
}
]
}
So with this first request :
restos.aggregate([{$unwind:"$categories"},
{$group:{_id:'$categories.title', count:{$sum:1}}},
{$project:{ type:"$_id", count: 1,_id:0}}])
I achieved to get an out put like this :
{ "count" : 3, "type" : "Sandwiches" }
But what i want is the type as a key and the count as a value, like this : { "Sandwiches" : 3 }
I was able to "partially make it works with that command but that's not really the format i want :
.aggregate([{'$unwind': '$categories'},{'$group': {'_id': '$categories.title','count':{'$sum':
1}}},{'$project': {'type': '$_id', 'count': 1, '_id': 0}}, {'$replaceRoot': {'newRoot':
{'$arrayToObject': [[{'k': '$type', 'v': '$count'}]]}}}]))
The output was :
{
"restaurants": [
{
"Polish": 1
},
{
"Salad": 3
},
{
"Convenience Stores": 1
},
{
"British": 2
}]}
But my desired output is something like this that doesn't have the array and the data is contained into only 1 object:
{
"restaurants":{
Sandwiches: 13,
pizza: 15,
...
}
for the list thing i've come to realize that i use flask and when i return my jsonify object i put 'restaurants': list(db.restaurants.aggregate([
but when i remove it i get this error : TypeError: Object of type CommandCursor is not JSON serializable
Any idea on how to do that ? thanks a lot :)
If you can get a data like the following.
{ "count" : 13, "type" : "Sandwiches" }
You can do like this:
data = [{ "count" : 13, "type" : "Sandwiches" }, { "count" : 15, "type" : "Pizza" }]
output = {}
p = {}
for d in data: # read each item in the list
p.update({d['type']: d['count']}) # build a p dict with type key
output.update({'restaurants': p}) # build an output dict with restaurants key
print(output)
# {'restaurants': {'Sandwiches': 13, 'Pizza': 15}}
I am using Python and MongoEngine to try and query the below Document in MongoDB.
I need a query to efficiently get the Documents only when they contain Embedded Documents 'Keywords' that match the following criteria:
Keywords Filtered where the Property 'SFR' is LTE '100000'
SUM the filtered keywords
Return the parent documents where SUM of the keywords matching the criteria is Greater than '9'
Example structure:
{
"_id" : ObjectId("5eae60e4055ef0e717f06a50"),
"registered_data" : ISODate("2020-05-03T16:12:51.999+0000"),
"UniqueName" : "SomeUniqueNameHere",
"keywords" : [
{
"keyword" : "carport",
"search_volume" : NumberInt(10532),
"sfr" : NumberInt(20127),
"percent_contribution" : 6.47,
"competing_product_count" : NumberInt(997),
"avg_review_count" : NumberInt(143),
"avg_review_score" : 4.05,
"avg_price" : 331.77,
"exact_ppc_bid" : 3.44,
"broad_ppc_bid" : 2.98,
"exact_hsa_bid" : 8.33,
"broad_hsa_bid" : 9.29
},
{
"keyword" : "party tent",
"search_volume" : NumberInt(6944),
"sfr" : NumberInt(35970),
"percent_contribution" : 4.27,
"competing_product_count" : NumberInt(2000),
"avg_review_count" : NumberInt(216),
"avg_review_score" : 3.72,
"avg_price" : 210.16,
"exact_ppc_bid" : 1.13,
"broad_ppc_bid" : 0.55,
"exact_hsa_bid" : 9.66,
"broad_hsa_bid" : 8.29
}
]
}
From the research I have been doing, I believe an Aggregate type query might do what I am attempting.
Unfortunately, being new to MongoDB / MongoEngine I am struggling to figure out how to structure the query and have failed in finding an example similar to what I am attempting to do (RED FLAG RIGHT????).
I did find an example of a aggregate but unsure how to structure my criteria in it, maybe something like this is getting close but does not work.
pipeline = [
{
"$lte": {
"$sum" : {
"keywords" : {
"$lte": {
"keyword": 100000
}
}
}: 9
}
}
]
data = product.objects().aggregate(pipeline)
Any guidance would be greatly appreciated.
Thanks,
Ben
you can try something like this
db.collection.aggregate([
{
$project: { // the first project to filter the keywords array
registered_data: 1,
UniqueName: 1,
keywords: {
$filter: {
input: "$keywords",
as: "item",
cond: {
$lte: [
"$$item.sfr",
100000
]
}
}
}
}
},
{
$project: { // the second project to get the length of the keywords array
registered_data: 1,
UniqueName: 1,
keywords: 1,
keywordsLength: {
$size: "$keywords"
}
}
},
{
$match: { // then do the match
keywordsLength: {
$gte: 9
}
}
}
])
you can test it here Mongo Playground
hope it helps
Note, I used sfr property only from the keywords array for simplicity
I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.
Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']
somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.
This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])
I want to query Mongodb: find all users, that have 'artist'=Iowa in any array item of objects.
Here is Robomongo of my collection:
In Python I'm doing:
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music': {
'items': {
'$elemMatch': {
'artist': 'Iowa'
}
}
}
})
but this returns nothing. Also tried this:
{'member_of_group': 20548570, 'my_music': {'$elemMatch': {'$.artist': 'Iowa'}}} and that didn't work.
Here is part of document with array:
"can_see_audio" : 1,
"my_music" : {
"items" : [
{
"name" : "Anastasia Plotnikova",
"photo" : "http://cs620227.vk.me/v620227451/9c47/w_okXehPbYc.jpg",
"id" : "864451",
"name_gen" : "Anastasia"
},
{
"title" : "Ain't Talkin' 'Bout Dub",
"url" : "http://cs4964.vk.me/u14671028/audios/c5b8a0735224.mp3?extra=jgV4ZQrFrsfxZCJf4gsRgnKWvdAfIqjE0M6eMtxGFpj2yp4vjs5DYgAGImPMp4mCUSUGJzoyGeh2Es6L-H51TPa3Q_Q",
"lyrics_id" : 24313846,
"artist" : "Apollo 440",
"genre_id" : 18,
"id" : 344280392,
"owner_id" : 864451,
"duration" : 279
},
{
"title" : "Animals",
"url" : "http://cs1316.vk.me/u4198685/audios/4b9e4536e1be.mp3?extra=TScqXzQ_qaEFKHG8trrwbFyNvjvJKEOLnwOWHJZl_cW5EA6K3a9vimaMpx-Yk5_k41vRPywzuThN_IHT8mbKlPcSigw",
"lyrics_id" : 166037,
"artist" : "Nickelback",
"id" : 344280351,
"owner_id" : 864451,
"duration" : 186
},
The following query should work. You can use the dot notation to query into sub documents and arrays.
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music.items.artist':'Iowa'
})
The following query worked for me in the mongo shell
db.collection1.find({ "my_music.items.artist" : "Iowa" })
one of my queries in mongoDB through pymongo returns:
{ "_id" : { "origin" : "ABE", "destination" : "DTW", "carrier" : "EV" }, "Ddelay" : -5.333333333333333,
"Adelay" : -12.666666666666666 }
{ "_id" : { "origin" : "ABE", "destination" : "ORD", "carrier" : "EV" }, "Ddelay" : -4, "Adelay" : 14 }
{ "_id" : { "origin" : "ABE", "destination" : "ATL", "carrier" : "EV" }, "Ddelay" : 6, "Adelay" : 14 }
I am traversing the result as below in my python module but I am not getting all the 3 results but only two. I believe I should not use len(results) as I am doing currently. Can you please help me correctly traverse the result as I need to display all three results in the resultant json document on web ui.
Thank you.
code:
pipe = [{ '$match': { 'origin': {"$in" : [origin_ID]}}},
{"$group" :{'_id': { 'origin':"$origin", 'destination': "$dest",'carrier':"$carrier"},
"Ddelay" : {'$avg' :"$dep_delay"},"Adelay" : {'$avg' :"$arr_delay"}}}, {"$limit" : 4}]
results = connect.aggregate(pipeline=pipe)
#pdb.set_trace()
DATETIME_FORMAT = '%Y-%m-%d'
for x in range(len(results)):
origin = (results['result'][x])['_id']['origin']
destination = (results['result'][x])['_id']['destination']
carrier = (results['result'][x])['_id']['carrier']
Adelay = (results['result'][x])['Adelay']
Ddelay = (results['result'][x])['Ddelay']
obj = {'Origin':origin,
'Destination':destination,
'Carrier': carrier,
'Avg Arrival Delay': Adelay,
'Avg Dep Delay': Ddelay}
json_result.append(obj)
return json.dumps(json_result,indent= 2, sort_keys=False,separators=(',',':'))
Pymongo returns result in format:
{u'ok': 1.0, u'result': [...]}
So you should iterate over result:
for x in results['result']:
...
In your code you try to calculate length of dict, not length of result container.