EDIT : more explicit exemple
I would like to count the number of values of one specific field in a collection.
chosenSensors = ["CO2_BUR_NE_I_001", "CO2_CEL_SE_I_001"]
match = {'$match':{'$or':list(map(lambda x:{x:{'$exists': True}}, chosenSensors))}}
group = {'$group':{'_id':{'year':{'$year':'$timestamp'}}}}
project = {'$project':{}}
for chosenSensor in chosenSensors:
group['$group'][chosenSensor+'-Count'] = {'$sum':{'$cond':[{'$ifNull':['$'+chosenSensor, False]}, 1, 0]}}
project['$project'][chosenSensor+'-Count'] = True
sort = {'$sort': {"_id":1}}
pipeline = [match, group, project, sort]
for doc in client["cleanData"]["test"].aggregate(pipeline):
print(doc)
Just below is one sample of my collection. I would like to count the number of values in CO2_BUR_NE_I_001.
I expect to have a count of 4.
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226fd"),
"timestamp" : ISODate("2016-11-17T12:36:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226fe"),
"timestamp" : ISODate("2016-11-17T12:37:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226ff"),
"timestamp" : ISODate("2016-11-17T12:38:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 0
}
{
"_id" : ObjectId("593ab63a1ccb9b0c0fb3d3e5"),
"timestamp" : ISODate("2016-02-01T19:26:00.000Z"),
"CO2_CEL_SE_I_001" : 1080
}
{
"_id" : ObjectId("593ab6021ccb9b0c0fb22700"),
"timestamp" : ISODate("2016-11-17T12:39:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
{
"_id" : ObjectId("593ab6025ccb9b0c0fb226fd"),
"timestamp" : ISODate("2016-11-17T12:36:00.000Z"),
"TEM_ETG_001" : 1210
}
But I have 3. The value 0 of CO2_CEL_SE_I_001 is not counted as an existing value.
{'_id': {'year': 2016}, 'CO2_BUR_NE_I_001-Count': 3, 'CO2_CEL_SE_I_001-Count': 5}
If I replace 0 by 880 in the involved document...
{
"_id" : ObjectId("593ab6021ccb9b0c0fb226ff"),
"timestamp" : ISODate("2016-11-17T12:38:00.000Z"),
"CO2_CEL_SE_I_001" : 1210,
"CO2_BUR_NE_I_001" : 880
}
... I find the expected result
{'_id': {'year': 2016}, 'CO2_BUR_NE_I_001-Count': 4, 'CO2_CEL_SE_I_001-Count': 5}
EDIT : Beggining of an answer...
When I use $ifNull on a value which exists, it returns the value. However, when this value is 0, it returns 0. But this return is given to $cond, and when it's 0, the $cond is considered as False and it returns 0 instead of 1 to my $sum. How can i handle that?
Counting the number of values of one specific field in a collection.
You can use db.collection.distinct() to get distinct values from mongodb and then find length of list no need of aggregation.
values = db.collection.distinct('field',{Conditions})
print(len(values))
The method uses the fact than Null is lower than numbers (int, doubles, long) in the comparison order of BSON types values :
Documentation : comparison/Sort Order
So I just have to compare my value with None.
{'$sum':{'$cond':[{ '$gt': ['$'+chosenSensor, None]}, 1, 0]}}
Related
I am trying to auto increment a field in my mongo collection. The field is an 'id' field and it contains the 'id' of each document. For example. 1, 2, 3 etc.
What I want to happen is insert a new document and take the 'id' from the last document and add 1 to it so that the new document is lastID + 1.
The way I have written the code makes it so that it gets the last document and adds 1 to the last document and then updates it. So if the last id is 5, then the new document will have 5 and the document that I was incrementing on now has the new 'id' of 6.
I am not sure how to get round this so any help would be appreciated.
Code
last_id = pokemons.find_one({}, sort=[( 'id', -1)])
last_pokemon = pokemons.find_one_and_update({'id' : last_id['id']}, {'$inc': {'id': 1}}, sort=[( 'id', -1)])
new_pokemon = {
"name" : name, "avg_spawns" : avg_spawns, "candy" : candy, "img" : img_link, "weaknesses" : [], "type" : [], "candy_count" : candy_count,
"egg" : egg, "height" : height, "multipliers" : [], "next_evolution" : [], "prev_evolution" : [],
"spawn_chance" : spawn_chance, "spawn_time" : spawn_time, "weight" : weight, "id" : last_pokemon['id'], "num" : last_pokemon['id'],
}
pokemons.insert_one(new_pokemon)
The variables in new_pokemon don't matter as I am just having issues with the last_pokemon part
The find_one command in MongoDB command doesn't support sort functionality. You have to make use of normal find command with limit parameter set to 1.
last_id = pokemons.find({}, {"id": 1}, sort=[('id', -1)]).limit(1).next() # Will error if there are no documents in collection due to the usage of `next()`
last_id["id"] += 1
new_pokemon = {
"name" : name, "avg_spawns" : avg_spawns, "candy" : candy, "img" : img_link, "weaknesses" : [], "type" : [], "candy_count" : candy_count,
"egg" : egg, "height" : height, "multipliers" : [], "next_evolution" : [], "prev_evolution" : [],
"spawn_chance" : spawn_chance, "spawn_time" : spawn_time, "weight" : weight, "id" : last_id['id'], "num" : last_id['id'],
}
pokemons.insert_one(new_pokemon)
I want to remove duplicate data from my collection in MongoDB. How can I accomplish this?
Please refer to this example to understand my problem:
My collection name & questions are in this col/row as follows -
{
"questionText" : "what is android ?",
"__v" : 0,
"_id" : ObjectId("540f346c3e7fc1234ffa7085"),
"userId" : "102"
},
{
"questionText" : "what is android ?",
"__v" : 0,
"_id" : ObjectId("540f346c3e7fc1054ffa7086"),
"userId" : "102"
}
How do I remove the duplicate question by the same userId? Any help?
I'm using Python and MongoDB.
IMPORTANT: The dropDups option was removed starting with MongoDB 3.x, so this solution is only valid for MongoDB versions 2.x and before. There is no direct replacement for the dropDups option. The answers to the question posed at http://stackoverflow.com/questions/30187688/mongo-3-duplicates-on-unique-index-dropdups offer some possible alternative ways to remove duplicates in Mongo 3.x.
Duplicate records can be removed from a MongoDB collection by creating a unique index on the collection and specifying the dropDups option.
Assuming the collection includes a field named record_id that uniquely identifies a record in the collection, the command to use to create a unique index and drop duplicates is:
db.collection.ensureIndex( { record_id:1 }, { unique:true, dropDups:true } )
Here is the trace of a session that shows the contents of a collection before and after creating a unique index with dropDups. Notice that duplicate records are no longer present after the index is created.
> db.pages.find()
{ “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8d”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8f”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
>
> db.pages.ensureIndex( { scan_id:1, leaf_num:1 }, { unique:true, dropDups:true } )
>
> db.pages.find()
{ “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
>
Since now the dropOps is deprecated. You can use pandas.
select the fields you need from mongodb
use pandas.DataFrame.duplicated to mark all duplicates as True except the first one
remove them ( the ones marked as duplicated ) in the collection using their _ids
I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.
Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']
somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.
This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])
I want to query Mongodb: find all users, that have 'artist'=Iowa in any array item of objects.
Here is Robomongo of my collection:
In Python I'm doing:
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music': {
'items': {
'$elemMatch': {
'artist': 'Iowa'
}
}
}
})
but this returns nothing. Also tried this:
{'member_of_group': 20548570, 'my_music': {'$elemMatch': {'$.artist': 'Iowa'}}} and that didn't work.
Here is part of document with array:
"can_see_audio" : 1,
"my_music" : {
"items" : [
{
"name" : "Anastasia Plotnikova",
"photo" : "http://cs620227.vk.me/v620227451/9c47/w_okXehPbYc.jpg",
"id" : "864451",
"name_gen" : "Anastasia"
},
{
"title" : "Ain't Talkin' 'Bout Dub",
"url" : "http://cs4964.vk.me/u14671028/audios/c5b8a0735224.mp3?extra=jgV4ZQrFrsfxZCJf4gsRgnKWvdAfIqjE0M6eMtxGFpj2yp4vjs5DYgAGImPMp4mCUSUGJzoyGeh2Es6L-H51TPa3Q_Q",
"lyrics_id" : 24313846,
"artist" : "Apollo 440",
"genre_id" : 18,
"id" : 344280392,
"owner_id" : 864451,
"duration" : 279
},
{
"title" : "Animals",
"url" : "http://cs1316.vk.me/u4198685/audios/4b9e4536e1be.mp3?extra=TScqXzQ_qaEFKHG8trrwbFyNvjvJKEOLnwOWHJZl_cW5EA6K3a9vimaMpx-Yk5_k41vRPywzuThN_IHT8mbKlPcSigw",
"lyrics_id" : 166037,
"artist" : "Nickelback",
"id" : 344280351,
"owner_id" : 864451,
"duration" : 186
},
The following query should work. You can use the dot notation to query into sub documents and arrays.
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music.items.artist':'Iowa'
})
The following query worked for me in the mongo shell
db.collection1.find({ "my_music.items.artist" : "Iowa" })
one of my queries in mongoDB through pymongo returns:
{ "_id" : { "origin" : "ABE", "destination" : "DTW", "carrier" : "EV" }, "Ddelay" : -5.333333333333333,
"Adelay" : -12.666666666666666 }
{ "_id" : { "origin" : "ABE", "destination" : "ORD", "carrier" : "EV" }, "Ddelay" : -4, "Adelay" : 14 }
{ "_id" : { "origin" : "ABE", "destination" : "ATL", "carrier" : "EV" }, "Ddelay" : 6, "Adelay" : 14 }
I am traversing the result as below in my python module but I am not getting all the 3 results but only two. I believe I should not use len(results) as I am doing currently. Can you please help me correctly traverse the result as I need to display all three results in the resultant json document on web ui.
Thank you.
code:
pipe = [{ '$match': { 'origin': {"$in" : [origin_ID]}}},
{"$group" :{'_id': { 'origin':"$origin", 'destination': "$dest",'carrier':"$carrier"},
"Ddelay" : {'$avg' :"$dep_delay"},"Adelay" : {'$avg' :"$arr_delay"}}}, {"$limit" : 4}]
results = connect.aggregate(pipeline=pipe)
#pdb.set_trace()
DATETIME_FORMAT = '%Y-%m-%d'
for x in range(len(results)):
origin = (results['result'][x])['_id']['origin']
destination = (results['result'][x])['_id']['destination']
carrier = (results['result'][x])['_id']['carrier']
Adelay = (results['result'][x])['Adelay']
Ddelay = (results['result'][x])['Ddelay']
obj = {'Origin':origin,
'Destination':destination,
'Carrier': carrier,
'Avg Arrival Delay': Adelay,
'Avg Dep Delay': Ddelay}
json_result.append(obj)
return json.dumps(json_result,indent= 2, sort_keys=False,separators=(',',':'))
Pymongo returns result in format:
{u'ok': 1.0, u'result': [...]}
So you should iterate over result:
for x in results['result']:
...
In your code you try to calculate length of dict, not length of result container.