how to select single field from _id embedded field mongodb - python

Is there a way to select only userId field from _id embedded document?
I tried with below query
And also i wants to delete all the documents that comes as output to the below query may be in a batch of 10000 per batch by keeping database load and this operation should not hamper the database. please suggest.
Sample Data:
"_id" : {
"Path" : 0,
"TriggerName" : "T1",
"userId" : NumberLong(231),
"Date" : "02/09/2017",
"OfferType" : "NOOFFER"
},
"OfferCount" : NumberLong(0),
"OfferName" : "NoOffer",
"trgtm" : NumberLong("1486623660308"),
"trgtype" : "PREDEFINED",
"desktopTop-normal" : NumberLong(1)
query:
mongo --eval 'db.l.find({"_id.Date": {"$lt" : "03/09/2017"}},{"_id.userId":1}).limit(1).forEach(printjson)
output:
{
"_id" : {
"Path" : 0,
"TriggerName" : "T1",
"userId" : NumberLong(231),
"Date" : "02/09/2017",
"OfferType" : "NOOFFER"
}

Related

How to print or log real executed query statements, such as django insert, query and other SQL statements will be logged

How to print or log real executed query statements, such as django insert, query and other SQL statements will be logged。I searched the documentation of both libraries and found nothing about this.
I want something similar to django in that I can execute any query such as User.object.filter(name='123').values("id", "phone") and log the actual execution statement of this query: select id, phone from auth_user where name='123'
I believe you have 2 options:
1) use pymongo.monitoring, this link from MongoEngine's doc shows how to configure it.
2) Turn on Profiling with db.setProfilingLevel(2), all queries will then be added to the db.system.profile collection.
E.g: executing this db.users.findOne({'name': 'abc'}), will add the following to the db.system.profile collection:
{
"op" : "query",
"ns" : "myDb.users",
"command" : {
"find" : "users",
"filter" : {
"name" : "abc"
},
"limit" : 1,
....
"$db" : "myDb"
},
"keysExamined" : 0,
"docsExamined" : 0,
"cursorExhausted" : true,
"numYield" : 0,
"locks" : {...},
"nreturned" : 0,
"responseLength" : 83,
"protocol" : "op_msg",
"millis" : 0,
"planSummary" : "EOF",
...
"ts" : ISODate("2019-11-20T19:27:13.297Z"),
"appName" : "MongoDB Shell",
"allUsers" : [ ],
"user" : ""
}
As you can see and unfortunately, its not as practical as a SQL query but the information is there

Check and remove duplicates in python MongoDB

I want to remove duplicate data from my collection in MongoDB. How can I accomplish this?
Please refer to this example to understand my problem:
My collection name & questions are in this col/row as follows -
{
"questionText" : "what is android ?",
"__v" : 0,
"_id" : ObjectId("540f346c3e7fc1234ffa7085"),
"userId" : "102"
},
{
"questionText" : "what is android ?",
"__v" : 0,
"_id" : ObjectId("540f346c3e7fc1054ffa7086"),
"userId" : "102"
}
How do I remove the duplicate question by the same userId? Any help?
I'm using Python and MongoDB.
IMPORTANT: The dropDups option was removed starting with MongoDB 3.x, so this solution is only valid for MongoDB versions 2.x and before. There is no direct replacement for the dropDups option. The answers to the question posed at http://stackoverflow.com/questions/30187688/mongo-3-duplicates-on-unique-index-dropdups offer some possible alternative ways to remove duplicates in Mongo 3.x.
Duplicate records can be removed from a MongoDB collection by creating a unique index on the collection and specifying the dropDups option.
Assuming the collection includes a field named record_id that uniquely identifies a record in the collection, the command to use to create a unique index and drop duplicates is:
db.collection.ensureIndex( { record_id:1 }, { unique:true, dropDups:true } )
Here is the trace of a session that shows the contents of a collection before and after creating a unique index with dropDups. Notice that duplicate records are no longer present after the index is created.
> db.pages.find()
{ “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8d”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8f”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
>
> db.pages.ensureIndex( { scan_id:1, leaf_num:1 }, { unique:true, dropDups:true } )
>
> db.pages.find()
{ “_id” : ObjectId(“52829c886602e2c8428d1d8c”), “leaf_num” : “1”, “scan_id” : “smithsoniancont251985smit”, “height” : 3464, “width” : 2548 }
{ “_id” : ObjectId(“52829c886602e2c8428d1d8e”), “leaf_num” : “2”, “scan_id” : “smithsoniancont251985smit”, “height” : 3587, “width” : 2503 }
>
Since now the dropOps is deprecated. You can use pandas.
select the fields you need from mongodb
use pandas.DataFrame.duplicated to mark all duplicates as True except the first one
remove them ( the ones marked as duplicated ) in the collection using their _ids

Extract values from oddly-nested Python

I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.
Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']
somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.
This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])

How to copy values from one mongo document into a nested field in another in python?

I have two collections in the mongo database. At the moment I have an ID document from collection1 in document in collection2. I want to copy some values from Collection1 to nested field (dataFromCollection1) in related documents in Collection2. I'm looking for help because I can not find a solution to pass values from the mongo base fields to variables in python.
Collection1:
{
"_id" : ObjectId("583d498214f89c3f08b10e2d"),
"name" : "Name",
"gender" : "men",
"secondName" : "",
"testData" : [ ],
"numberOf" : NumberInt(0),
"place" : "",
"surname" : "Surname",
"field1" : "eggs",
"field2" : "hamm",
"field3" : "foo",
"field4" : "bar"
}
Collection2:
{
"_id" : ObjectId("58b028e26900ed21d5153a36"),
"collection1" : ObjectId("583d498214f89c3f08b10e2d")
"fieldCol2_1" : "123",
"fieldCol2_2" : "332",
"fieldCol2_3" : "133",
"dataFromCollection1" : {
"name" : " ",
"surname" : " ",
"field1" : " ",
"field2" : " ",
"field3" : " ",
"field4" : " "
}
}
I think you should use aggregate function in pymongo package, in aggregate you can use $lookup to match the key value pairs and project for projecting the required fields this question has already been asked so this pymongo - how to match on lookup? might helps you.
Then you can use the $out function in aggregate to create the new updated collection of your interest or you can also update the existing collection by using the update in pymongo.

MongoDB find in array of objects

I want to query Mongodb: find all users, that have 'artist'=Iowa in any array item of objects.
Here is Robomongo of my collection:
In Python I'm doing:
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music': {
'items': {
'$elemMatch': {
'artist': 'Iowa'
}
}
}
})
but this returns nothing. Also tried this:
{'member_of_group': 20548570, 'my_music': {'$elemMatch': {'$.artist': 'Iowa'}}} and that didn't work.
Here is part of document with array:
"can_see_audio" : 1,
"my_music" : {
"items" : [
{
"name" : "Anastasia Plotnikova",
"photo" : "http://cs620227.vk.me/v620227451/9c47/w_okXehPbYc.jpg",
"id" : "864451",
"name_gen" : "Anastasia"
},
{
"title" : "Ain't Talkin' 'Bout Dub",
"url" : "http://cs4964.vk.me/u14671028/audios/c5b8a0735224.mp3?extra=jgV4ZQrFrsfxZCJf4gsRgnKWvdAfIqjE0M6eMtxGFpj2yp4vjs5DYgAGImPMp4mCUSUGJzoyGeh2Es6L-H51TPa3Q_Q",
"lyrics_id" : 24313846,
"artist" : "Apollo 440",
"genre_id" : 18,
"id" : 344280392,
"owner_id" : 864451,
"duration" : 279
},
{
"title" : "Animals",
"url" : "http://cs1316.vk.me/u4198685/audios/4b9e4536e1be.mp3?extra=TScqXzQ_qaEFKHG8trrwbFyNvjvJKEOLnwOWHJZl_cW5EA6K3a9vimaMpx-Yk5_k41vRPywzuThN_IHT8mbKlPcSigw",
"lyrics_id" : 166037,
"artist" : "Nickelback",
"id" : 344280351,
"owner_id" : 864451,
"duration" : 186
},
The following query should work. You can use the dot notation to query into sub documents and arrays.
Vkuser._get_collection().find({
'member_of_group': 20548570,
'my_music.items.artist':'Iowa'
})
The following query worked for me in the mongo shell
db.collection1.find({ "my_music.items.artist" : "Iowa" })

Categories