MongoDB better way of searching through collections for key - python

I'm trying to see if my string is in a specific key in my whole collection.
Example of collection:
_id: "XXX"
hwid: "XX1"
_id: "XXX"
hwid: "XX2"
_id; "XXX"
hwid: "XX3"
I want to search through all of the hwid keys in my collection and see if my string is in one of them and then return true/false. I thought it'd be a good way to do this with a for loop but it returns false everytime
client = pymongo.MongoClient("connection works")
db = client.monza
collection = db.whitelist
def search_hwid(hwid):
for x in collection.find({}, {"hwid"}):
if hwid == x['hwid']:
whitelisted = True
return whitelisted
break
else:
whitelisted = False
return whitelisted
it returns False everytime even when I am 100% sure that hwid is in my collection

//please see outcomes of find option with various options, for your query you need to find directly
//without using {} as first parameter
> db.test3.find();
{ "_id" : 1, "hwid" : "XX1" }
{ "_id" : 2, "hwid" : "XX2" }
{ "_id" : 3, "hwid" : "XX3" }
> db.test3.find({},{hwid:"XX2"});
{ "_id" : 1, "hwid" : "XX1" }
{ "_id" : 2, "hwid" : "XX2" }
{ "_id" : 3, "hwid" : "XX3" }
> db.test3.find({hwid:"XX2"});
{ "_id" : 2, "hwid" : "XX2" }
> db.test3.find({hwid:"XX3"});
{ "_id" : 3, "hwid" : "XX3" }

Related

MongoDB Python MongoEngine - Returning Document by filter of Embedded Documents Sum of Filtered property

I am using Python and MongoEngine to try and query the below Document in MongoDB.
I need a query to efficiently get the Documents only when they contain Embedded Documents 'Keywords' that match the following criteria:
Keywords Filtered where the Property 'SFR' is LTE '100000'
SUM the filtered keywords
Return the parent documents where SUM of the keywords matching the criteria is Greater than '9'
Example structure:
{
"_id" : ObjectId("5eae60e4055ef0e717f06a50"),
"registered_data" : ISODate("2020-05-03T16:12:51.999+0000"),
"UniqueName" : "SomeUniqueNameHere",
"keywords" : [
{
"keyword" : "carport",
"search_volume" : NumberInt(10532),
"sfr" : NumberInt(20127),
"percent_contribution" : 6.47,
"competing_product_count" : NumberInt(997),
"avg_review_count" : NumberInt(143),
"avg_review_score" : 4.05,
"avg_price" : 331.77,
"exact_ppc_bid" : 3.44,
"broad_ppc_bid" : 2.98,
"exact_hsa_bid" : 8.33,
"broad_hsa_bid" : 9.29
},
{
"keyword" : "party tent",
"search_volume" : NumberInt(6944),
"sfr" : NumberInt(35970),
"percent_contribution" : 4.27,
"competing_product_count" : NumberInt(2000),
"avg_review_count" : NumberInt(216),
"avg_review_score" : 3.72,
"avg_price" : 210.16,
"exact_ppc_bid" : 1.13,
"broad_ppc_bid" : 0.55,
"exact_hsa_bid" : 9.66,
"broad_hsa_bid" : 8.29
}
]
}
From the research I have been doing, I believe an Aggregate type query might do what I am attempting.
Unfortunately, being new to MongoDB / MongoEngine I am struggling to figure out how to structure the query and have failed in finding an example similar to what I am attempting to do (RED FLAG RIGHT????).
I did find an example of a aggregate but unsure how to structure my criteria in it, maybe something like this is getting close but does not work.
pipeline = [
{
"$lte": {
"$sum" : {
"keywords" : {
"$lte": {
"keyword": 100000
}
}
}: 9
}
}
]
data = product.objects().aggregate(pipeline)
Any guidance would be greatly appreciated.
Thanks,
Ben
you can try something like this
db.collection.aggregate([
{
$project: { // the first project to filter the keywords array
registered_data: 1,
UniqueName: 1,
keywords: {
$filter: {
input: "$keywords",
as: "item",
cond: {
$lte: [
"$$item.sfr",
100000
]
}
}
}
}
},
{
$project: { // the second project to get the length of the keywords array
registered_data: 1,
UniqueName: 1,
keywords: 1,
keywordsLength: {
$size: "$keywords"
}
}
},
{
$match: { // then do the match
keywordsLength: {
$gte: 9
}
}
}
])
you can test it here Mongo Playground
hope it helps
Note, I used sfr property only from the keywords array for simplicity

Extract values from oddly-nested Python

I must be really slow because I spent a whole day googling and trying to write Python code to simply list the "code" values only so my output will be Service1, Service2, Service2. I have extracted json values before from complex json or dict structure. But now I must have hit a mental block.
This is my json structure.
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
print(somejson["offers"]) # I tried so many variations to no avail.
Or, if you want the "code" stuffs :
>>> [s['code'] for s in somejson['offers'].values()]
['Service1', 'Service2', 'Service4']
somejson["offers"] is a dictionary. It seems you want to print its keys.
In Python 2:
print(somejson["offers"].keys())
In Python 3:
print([x for x in somejson["offers"].keys()])
In Python 3 you must use the list comprehension because in Python 3 keys() is a 'view', not a list.
This should probably do the trick , if you are not certain about the number of Services in the json.
import json
myjson='''
{
"formatVersion" : "ABC",
"publicationDate" : "2017-10-06",
"offers" : {
"Service1" : {
"code" : "Service1",
"version" : "1a1a1a1a",
"index" : "1c1c1c1c1c1c1"
},
"Service2" : {
"code" : "Service2",
"version" : "2a2a2a2a2",
"index" : "2c2c2c2c2c2"
},
"Service3" : {
"code" : "Service4",
"version" : "3a3a3a3a3a",
"index" : "3c3c3c3c3c3"
}
}
}
'''
#convert above string to json
somejson = json.loads(myjson)
#Without knowing the Services:
offers = somejson["offers"]
keys = offers.keys()
for service in keys:
print(somejson["offers"][service]["code"])

Mongodb document traversing

I have a query in mongo db, tried lots of solution but still not found it working. Any help will be appreciated.
How to find all keys named "channel" in document?
db.clients.find({"_id": 69})
{
"_id" : 69,
"configs" : {
"GOOGLE" : {
"drid" : "1246ABCD",
"adproviders" : {
"adult" : [
{
"type" : "landing",
"adprovider" : "abc123",
"channel" : "abc456"
},
{
"type" : "search",
"adprovider" : "xyz123",
"channel" : "xyz456"
}
],
"nonadult" : [
{
"type" : "landing",
"adprovider" : "pqr123",
"channel" : "pqr456"
},
{
"type" : "search",
"adprovider" : "lmn123",
"channel" : "lmn456"
}
]
}
},
"channel" : "ABC786",
"_cls" : "ClientGoogleDoc"
}
}
Trying to find keys with name channel
db.clients.find({"_id": 69, "channel": true})
Expecting:
{"channels": ["abc456", "xyz456", "ABC786", "xyz456", "pqr456", "lmn456", ...]}
As far as I know, you'd have to use python to recursively traverse the dictionary yourself in order to build the list that you want above:
channels = []
def traverse(my_dict):
for key, value in my_dict.items():
if isinstance(value, dict):
traverse(value)
else:
if key == "channel":
channels.append(value)
traverse({"a":{"channel":"abc123"}, "channel":"xyzzz"})
print(channels)
output:
['abc123', 'xyzzz']
However, using a thing called projections you can get sort of close to what you want (but not really, since you have to specify all of the channels manually):
db.clients.find({"_id": 69}, {"configs.channel":1})
returns:
{ "_id" : ObjectId("69"), "configs" : { "channel" : "ABC786" } }
If you want to get really fancy, you could write a generator function to generate all the keys in a given dictionary, no matter how deep:
my_dict = { "a": {
"channel":"abc123",
"key2": "jjj",
"subdict": {"deep_key": 5, "channel": "nested"}
},
"channel":"xyzzz"}
def getAllKeys(my_dict):
for key, value in my_dict.items():
yield key, value
if isinstance(value, dict):
for key, value in getAllKeys(value):
yield key, value
for key, value in getAllKeys(my_dict):
if key == "channel":
print value
output:
nested
abc123
xyzzz
You can use the $project mongodb operator to get only the value for the specific key. Check the documentation at http://docs.mongodb.org/manual/reference/operator/aggregation/project/

logical error in python dictionary traversal

one of my queries in mongoDB through pymongo returns:
{ "_id" : { "origin" : "ABE", "destination" : "DTW", "carrier" : "EV" }, "Ddelay" : -5.333333333333333,
"Adelay" : -12.666666666666666 }
{ "_id" : { "origin" : "ABE", "destination" : "ORD", "carrier" : "EV" }, "Ddelay" : -4, "Adelay" : 14 }
{ "_id" : { "origin" : "ABE", "destination" : "ATL", "carrier" : "EV" }, "Ddelay" : 6, "Adelay" : 14 }
I am traversing the result as below in my python module but I am not getting all the 3 results but only two. I believe I should not use len(results) as I am doing currently. Can you please help me correctly traverse the result as I need to display all three results in the resultant json document on web ui.
Thank you.
code:
pipe = [{ '$match': { 'origin': {"$in" : [origin_ID]}}},
{"$group" :{'_id': { 'origin':"$origin", 'destination': "$dest",'carrier':"$carrier"},
"Ddelay" : {'$avg' :"$dep_delay"},"Adelay" : {'$avg' :"$arr_delay"}}}, {"$limit" : 4}]
results = connect.aggregate(pipeline=pipe)
#pdb.set_trace()
DATETIME_FORMAT = '%Y-%m-%d'
for x in range(len(results)):
origin = (results['result'][x])['_id']['origin']
destination = (results['result'][x])['_id']['destination']
carrier = (results['result'][x])['_id']['carrier']
Adelay = (results['result'][x])['Adelay']
Ddelay = (results['result'][x])['Ddelay']
obj = {'Origin':origin,
'Destination':destination,
'Carrier': carrier,
'Avg Arrival Delay': Adelay,
'Avg Dep Delay': Ddelay}
json_result.append(obj)
return json.dumps(json_result,indent= 2, sort_keys=False,separators=(',',':'))
Pymongo returns result in format:
{u'ok': 1.0, u'result': [...]}
So you should iterate over result:
for x in results['result']:
...
In your code you try to calculate length of dict, not length of result container.

pyMongo iterate over cursor object with subitems

The function below searches a collection with a subitem projects. If there is a subitem with isManager set to 1 it should return True otherwise it will always return False.
def isMasterProject(self, pid, uid):
masterProjects = False
proj = self.collection.find({ "_id": uid, "projects": { '$elemMatch': { "projectId": _byid(pid), "isManager": 1 } } })
for value in proj:
if str(value['projects']['projectId']) == pid:
if value['projects']['isManager'] == 1:
masterProjects = True
return masterProjects
_byid is equivalent to ObjectId
It always seem to return False. Here's an example of a collection.
{
"_id" : ObjectId("52cf683306bcfc7be96a4d89"),
"firstName" : "Test",
"lastName" : "User",
"projects" : [
{
"projectId" : ObjectId("514f593c06bcfc1e96f619be"),
"isManager" : 0
},
{
"projectId" : ObjectId("511e3ed0909706a6a188953d"),
"isManager" : 1
},
{
"projectId" : ObjectId("51803baf06bcfc149116bf62"),
"isManager" : 1
},
{
"projectId" : ObjectId("514362bf121f92fb6867e58f"),
"isManager" : 1
}
],
"user" : "test.user#example.com",
"userType" : "Basic"
}
Would it be simpler to check for an empty cursor and if so how would I do that?
How about:
obj = next(proj, None)
if obj:
$elemMatch should only return results if the criteria given match a document so you should only return a cursor from find where your criteria are true.
Since you are using _id in the query and only ever expect to get one result, why not use findOne and shortcut one step.
Another gotcha for the new initiates, be aware you are returning the whole document here and not some representation with only the matching element of the array. Things that did not match will still be there, and then expecting different results by iterating over these will lead you to grief.

Categories