PyMongo iterate through documents / arrays within documents, replace found values - python

I have the following data in Mongodb:
{ "_id" : 1, "items" : [ "apple", "orange", "plum" ] }
{ "_id" : 2, "items" : [ "orange", "apple", "pineapple" ] }
{ "_id" : 3, "items" : [ "cherry", "carrot", "apple" ] }
{ "_id" : 4, "items" : [ "sprouts", "pear", "lettuce" ] }
I am trying to make a function using Python / PyMongo that takes 2 arguments, an old and a new string. I would like to find all the "apple" in all of the arrays across all of the documents and replace them with the string "banana".
Below is the code I have so far:
def update(old, new):
for result in db.collection.find({"items" : old}):
for i in result["items"]:
if i == old:
db.collection.update({"_id": result["._id"]}, {"$set": {"items": new}})

Finally figured this out, it is actually really simple:
from pymongo import MongoClient
mongo_client = MongoClient(127.0.0.1, 27017)
db = mongo_client.my_databse
def update(old, new)
for result in db.collection.find({'items' : old}):
db.collection.update_one({ '_id': result['_id'], 'items': old}, { '$set' : {'items.$' : new}})

use update_many() to update multiple documents in one query instead of looping through the documents.
from pymongo import MongoClient
mongo_client = MongoClient(127.0.0.1, 27017)
db = mongo_client.my_databse
def update(old, new)
db.collection.update_many({'items': old}, { '$set' : {'items.$' : new}})
For pymongo less than 3.2 use update with multi flag
def update(old, new)
db.collection.update({'items': old}, { '$set' : {'items.$' : new}}, multi=True)

Related

MongoDB Python MongoEngine - Returning Document by filter of Embedded Documents Sum of Filtered property

I am using Python and MongoEngine to try and query the below Document in MongoDB.
I need a query to efficiently get the Documents only when they contain Embedded Documents 'Keywords' that match the following criteria:
Keywords Filtered where the Property 'SFR' is LTE '100000'
SUM the filtered keywords
Return the parent documents where SUM of the keywords matching the criteria is Greater than '9'
Example structure:
{
"_id" : ObjectId("5eae60e4055ef0e717f06a50"),
"registered_data" : ISODate("2020-05-03T16:12:51.999+0000"),
"UniqueName" : "SomeUniqueNameHere",
"keywords" : [
{
"keyword" : "carport",
"search_volume" : NumberInt(10532),
"sfr" : NumberInt(20127),
"percent_contribution" : 6.47,
"competing_product_count" : NumberInt(997),
"avg_review_count" : NumberInt(143),
"avg_review_score" : 4.05,
"avg_price" : 331.77,
"exact_ppc_bid" : 3.44,
"broad_ppc_bid" : 2.98,
"exact_hsa_bid" : 8.33,
"broad_hsa_bid" : 9.29
},
{
"keyword" : "party tent",
"search_volume" : NumberInt(6944),
"sfr" : NumberInt(35970),
"percent_contribution" : 4.27,
"competing_product_count" : NumberInt(2000),
"avg_review_count" : NumberInt(216),
"avg_review_score" : 3.72,
"avg_price" : 210.16,
"exact_ppc_bid" : 1.13,
"broad_ppc_bid" : 0.55,
"exact_hsa_bid" : 9.66,
"broad_hsa_bid" : 8.29
}
]
}
From the research I have been doing, I believe an Aggregate type query might do what I am attempting.
Unfortunately, being new to MongoDB / MongoEngine I am struggling to figure out how to structure the query and have failed in finding an example similar to what I am attempting to do (RED FLAG RIGHT????).
I did find an example of a aggregate but unsure how to structure my criteria in it, maybe something like this is getting close but does not work.
pipeline = [
{
"$lte": {
"$sum" : {
"keywords" : {
"$lte": {
"keyword": 100000
}
}
}: 9
}
}
]
data = product.objects().aggregate(pipeline)
Any guidance would be greatly appreciated.
Thanks,
Ben
you can try something like this
db.collection.aggregate([
{
$project: { // the first project to filter the keywords array
registered_data: 1,
UniqueName: 1,
keywords: {
$filter: {
input: "$keywords",
as: "item",
cond: {
$lte: [
"$$item.sfr",
100000
]
}
}
}
}
},
{
$project: { // the second project to get the length of the keywords array
registered_data: 1,
UniqueName: 1,
keywords: 1,
keywordsLength: {
$size: "$keywords"
}
}
},
{
$match: { // then do the match
keywordsLength: {
$gte: 9
}
}
}
])
you can test it here Mongo Playground
hope it helps
Note, I used sfr property only from the keywords array for simplicity

MongoDB values to Dict in python

Basically I need to connect to MongoDB documents records and put into values into dict.
**MongoDB Values**
{ "_id" : "LAC1397", "code" : "MIS", "label" : "Marshall Islands", "mappingName" : "RESIDENTIAL_COUNTRY" }
{ "_id" : "LAC1852", "code" : "COP", "label" : "Colombian peso", "mappingName" : "FOREIGN_CURRENCY_CODE"}
How do i map it to dict in the below fashion in python
**syntax :**
dict = {"mappingName|Code" : "Value" }
**Example :**
dict = { "RESIDENTIAL_COUNTRY|MIS" : "Marshall Islands" , "FOREIGN_CURRENCY_CODE|COP" : "Colombian peso" , "COMM_LANG|ENG" : "English" }
**Python Code**
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.mongo
collection = db.masters
for post in collection.find():
Got stuck after this , not sure how to put into dict in the mentioned method
post will be a dict with the values from mongo, so you can loop the records and append to a new dictionary. As the comments mention, any duplicates would be overridden by the last found value. If this might be an issue, consider a sort() on the find() function.
Sample code:
from pymongo import MongoClient
db = MongoClient()['mydatabase']
db.mycollection.insert_one({ "_id" : "LAC1397", "code" : "MIS", "label" : "Marshall Islands", "mappingName" : "RESIDENTIAL_COUNTRY" })
db.mycollection.insert_one({ "_id" : "LAC1852", "code" : "COP", "label" : "Colombian peso", "mappingName" : "FOREIGN_CURRENCY_CODE"})
mydict = {}
for post in db.mycollection.find():
k = f"{post.get('mappingName')}|{post.get('code')}"
mydict[k] = post.get('label')
print(mydict)
Gives:
{'RESIDENTIAL_COUNTRY|MIS': 'Marshall Islands', 'FOREIGN_CURRENCY_CODE|COP': 'Colombian peso'}

Trouble understanding mongo aggregation

I am trying to list all the virtual machines (vms) in my mongo database that use a certain data store, EMC_123.
I have this script, but it list vms that do not use the data store EMC_123.
#!/usr/bin/env python
import pprint
import pymongo
def run_query():
server = '127.0.0.1'
client = pymongo.MongoClient("mongodb://%s:27017/" % server)
db = client["data_center_test"]
collection = db["data_centers"]
pipeline = [
{ "$match": { "clusters.hosts.vms.data_stores.name" : "EMC_123"}},
{ "$group": { "_id" : "$clusters.hosts.vms.name" }}
]
for doc in list(db.data_centers.aggregate(pipeline)):
pp = pprint.PrettyPrinter()
pp.pprint(doc)
pp.pprint (db.command('aggregate', 'data_centers', pipeline=pipeline, explain=True))
def main():
run_query()
return 0
# Start program
if __name__ == "__main__":
main()
I assume I there is something wrong with my pipeline.
Here is the plan that gets printed out:
{u'ok': 1.0,
u'stages': [{u'$cursor': {u'fields': {u'_id': 0,
u'clusters.hosts.vms.name': 1},
u'query': {u'clusters.hosts.vms.data_stores.name': u'EMC_123'},
u'queryPlanner': {u'indexFilterSet': False,
u'namespace': u'data_center_test.data_centers',
u'parsedQuery': {u'clusters.hosts.vms.data_stores.name': {u'$eq': u'EMC_123'}},
u'plannerVersion': 1,
u'rejectedPlans': [],
u'winningPlan': {u'direction': u'forward',
u'filter': {u'clusters.hosts.vms.data_stores.name': {u'$eq': u'EMC_123'}},
u'stage': u'COLLSCAN'}}}},
{u'$group': {u'_id': u'$clusters.hosts.vms.name'}}]}
UPDATE:
Here is a skeleton of what the document looks like:
{
"name" : "data_center_name",
"clusters" : [
{
"hosts" : [
{
"name" : "esxi-hostname",
"vms" : [
{
"data_stores" : [ { "name" : "EMC_123" } ],
"name" : "vm-name1",
"networks" : [ { "name" : "vlan334" } ]
},
{
"data_stores" : [ { "name" : "some_other_data_store" } ],
"name" : "vm-name2",
"networks" : [ { "name" : "vlan334" } ]
}
]
}
],
"name" : "cluster_name"
}
]
}
The problem I am seeing is that vm-name2 shows up in the results when it doesn't have EMC_123 as a data store.
Upate 2:
ok I am able to write a mongo shell query that does what I want. It is a little ugly:
db.data_centers.aggregate({$unwind: '$clusters'}, {$unwind: '$clusters.hosts'}, {$unwind: '$clusters.hosts.vms'}, {$unwind: '$clusters.hosts.vms.data_stores'}, {$match: {"clusters.hosts.vms.data_stores.name": "EMC_123"}})
I came about this in the second answer of this SO question: MongoDB Projection of Nested Arrays
Based on the answers in MongoDB Projection of Nested Arrays I had to change my pipeline to this:
pipeline = [
{'$unwind': '$clusters'},
{'$unwind': '$clusters.hosts'},
{'$unwind': '$clusters.hosts.vms'},
{'$unwind': '$clusters.hosts.vms.data_stores'},
{'$match': {"clusters.hosts.vms.data_stores.name": "EMC_123"}}
]

Mongo projection result as an array of selected items

For the given documents in the mongo
{"_id" : "joe":
grocerylist: [ "cheddar", "apple", "oranges" ]
}
{"_id" : "joanna":
grocerylist: [ "cheddar", "foobar" ]
}
{"_id" : "john":
grocerylist: [ "apple", "oranges" ]
}
If I search for user with cheddar in their list
find({"grocerylist" : cheddar}, fields={'_id' : 1})
I get
[{u'_id': u'joe'}, {u'_id': u'joanna'}]
Using Mongo, how can I get just a list of matched users, like this..
[u'joe', u'joanna']
Thanks.
_ids are unique across the collection, thus you can use distinct here.
collection.distinct('_id', {'grocerylist' : cheddar})
One option would be to use a list comprehension:
cursor = db.col.find({"grocerylist" : cheddar}, fields={'_id' : 1})
print([document['user'] for document in cursor])

Pymongo count elements collected out of all documents with key

I want to count all elements which occur in somekey in an MongoDB collection.
The current code looks at all elements in somekey as a whole.
from pymongo import Connection
con = Connection()
db = con.database
collection = db.collection
from bson.code import Code
reducer = Code("""
function(obj, prev){
prev.count++;
}
""")
from bson.son import SON
results = collection.group(key={"somekey":1}, condition={}, initial={"count": 0}, reduce=reducer)
for doc in results:
print doc
However, I want that it counts all elements which occur in any document with somekey.
Here is an anticipated example. The MongoDB has the following documents.
{ "_id" : 1, “somekey" : [“AB", “CD"], "someotherkey" : "X" }
{ "_id" : 2, “somekey" : [“AB", “XY”], "someotherkey" : "Y" }
The result should provide an by count ordered list with:
count: 2 "AB"
count: 1 "CD"
count: 1 "XY"
The .group() method will not work on elements that are arrays, and the closest similar thing would be mapReduce where you have more control over the emitted keys.
But really the better fit here is the aggregation framework. It is implemented in native code as does not use JavaScript interpreter processing as the other methods there do.
You wont be getting an "ordered list" from MongoDB responses, but you get a similar document result:
results = collection.aggregate([
# Unwind the array
{ "$unwind": "somekey" },
# Group the results and count
{ "$group": {
"_id": "$somekey",
"count": { "$sum": 1 }
}}
])
Gives you something like:
{ "_id": "AB", "count": 2 }
{ "_id": "CD", "count": 1 }
{ "_id": "XY", "count": 1 }

Categories