Pymongo count elements collected out of all documents with key - python

I want to count all elements which occur in somekey in an MongoDB collection.
The current code looks at all elements in somekey as a whole.
from pymongo import Connection
con = Connection()
db = con.database
collection = db.collection
from bson.code import Code
reducer = Code("""
function(obj, prev){
prev.count++;
}
""")
from bson.son import SON
results = collection.group(key={"somekey":1}, condition={}, initial={"count": 0}, reduce=reducer)
for doc in results:
print doc
However, I want that it counts all elements which occur in any document with somekey.
Here is an anticipated example. The MongoDB has the following documents.
{ "_id" : 1, “somekey" : [“AB", “CD"], "someotherkey" : "X" }
{ "_id" : 2, “somekey" : [“AB", “XY”], "someotherkey" : "Y" }
The result should provide an by count ordered list with:
count: 2 "AB"
count: 1 "CD"
count: 1 "XY"

The .group() method will not work on elements that are arrays, and the closest similar thing would be mapReduce where you have more control over the emitted keys.
But really the better fit here is the aggregation framework. It is implemented in native code as does not use JavaScript interpreter processing as the other methods there do.
You wont be getting an "ordered list" from MongoDB responses, but you get a similar document result:
results = collection.aggregate([
# Unwind the array
{ "$unwind": "somekey" },
# Group the results and count
{ "$group": {
"_id": "$somekey",
"count": { "$sum": 1 }
}}
])
Gives you something like:
{ "_id": "AB", "count": 2 }
{ "_id": "CD", "count": 1 }
{ "_id": "XY", "count": 1 }

Related

How can I limit text score with $gt operator in MongoDB?

I want to limit text scores using the $gt operator.
Using the find function, I can sort the text scores according to the text similarity status from largest to smallest. I can get the cursor with the highest score by putting a limit of 1 on the rank.
deneme = user.find(
{ '$text': { '$search': "dogan can" } },
{ 'score': { '$meta': "textScore" }})
deneme_sort = deneme.sort([('score', {'$meta': 'textScore'})]).limit(1)
But I don't want the ones whose text score is below the value I gave, to be listed.
For example, I don't want text scores below 1.5 to appear in the list. I'm trying to use the '$gt' operator for this but I'm getting an error.
deneme = user.find(
{ '$text': { '$search': "dogan can" } },
{ 'score': { '$meta': "textScore"}}, {'score': { '$gt': 1.5 } })
TypeError: skip must be an instance of int
it gives this error because the find function can only take two values.
I'm trying to query using the '$and' operator. This time it does not recognize the '$meta' operator. Or the '$gt' operator must take two values.
deneme = user.find({ '$text': { '$search': "dogan can" }} ,
{'$and':[{ 'score': { '$meta': "textScore" }},{'score': { '$gt': 1.5 }}]})
doc = []
for doc in deneme:
print(doc)
Expression $gt takes exactly 2 arguments. 1 were passed in., full error: {'ok': 0.0, 'errmsg': 'Expression $gt takes exactly 2 arguments. 1 were passed in.', 'code': 16020, 'codeName': 'Location16020'}
I just started learning mongodb. Can you help me?
I think what you're requesting is documented here. In short - you will need to use the aggregation framework so that you can have 3 stages to accomplish each of the following:
Perform the initial $text searching in a $match stage
Persist the text score as part of the document via an $addFields stage
Use an additional $match stage to perform the $gt filtering against that new score field.
Given a collection with the following documents:
test> db.foo.find()
[
{ _id: 1, key: 'dogan can' },
{ _id: 2, key: 'dogan' },
{ _id: 3, key: 'can' },
{ _id: 4, key: 'abc' }
]
A text search against dogan can will return the first three documents:
test> db.foo.aggregate([ { $match: { $text: { $search: "dogan can" } } },{$addFields:{score:{$meta:'textScore'}}}])
[
{ _id: 3, key: 'can', score: 1.1 },
{ _id: 1, key: 'dogan can', score: 1.5 },
{ _id: 2, key: 'dogan', score: 1.1 }
]
Appending the final $match (using a filter of 1.2), only one of the documents is returned:
test> db.foo.aggregate([ { $match: { $text: { $search: "dogan can" } } },{$addFields:{score:{$meta:'textScore'}}},{$match:{score:{$gt:1.2}}}])
[
{ _id: 1, key: 'dogan can', score: 1.5 }
]
If desired, you can of course include a $sort stage on the score as well.

pymongo - Update a data and access the found value

I am trying to update a value of an array stored in a mongodb collection
any_collection: {
{
"_id": "asdw231231"
"values": [
{
"item" : "a"
},
{
"item" : "b"
}
],
"role": "role_one"
},
...many similar
}
the idea is that I want to access values ​​and edit a value with the following code that I found in the mongodb documentation
conn.any_collection.find_one_and_update(
{
"_id": any_id,
"values.item": "b"
},
{
"$set": {
"values.$.item": "new_value" # here the error, ".$."
}
}
)
This should work, but I can't understand what the error is or what is the correct syntax for pymongo. The error is generated when adding "$";
It works fine with my fastAPI.
#app.get("/find/{id}")
async def root(id: int):
db = get_database()
q = {'_id': 'asdw231231','values.item': 'b'}
u = {'$set': {'values.$.item': 'new_value' }}
c = db['any'].find_one_and_update(q, u)
return {"message": c}
mongoplayground

MongoDB Python MongoEngine - Returning Document by filter of Embedded Documents Sum of Filtered property

I am using Python and MongoEngine to try and query the below Document in MongoDB.
I need a query to efficiently get the Documents only when they contain Embedded Documents 'Keywords' that match the following criteria:
Keywords Filtered where the Property 'SFR' is LTE '100000'
SUM the filtered keywords
Return the parent documents where SUM of the keywords matching the criteria is Greater than '9'
Example structure:
{
"_id" : ObjectId("5eae60e4055ef0e717f06a50"),
"registered_data" : ISODate("2020-05-03T16:12:51.999+0000"),
"UniqueName" : "SomeUniqueNameHere",
"keywords" : [
{
"keyword" : "carport",
"search_volume" : NumberInt(10532),
"sfr" : NumberInt(20127),
"percent_contribution" : 6.47,
"competing_product_count" : NumberInt(997),
"avg_review_count" : NumberInt(143),
"avg_review_score" : 4.05,
"avg_price" : 331.77,
"exact_ppc_bid" : 3.44,
"broad_ppc_bid" : 2.98,
"exact_hsa_bid" : 8.33,
"broad_hsa_bid" : 9.29
},
{
"keyword" : "party tent",
"search_volume" : NumberInt(6944),
"sfr" : NumberInt(35970),
"percent_contribution" : 4.27,
"competing_product_count" : NumberInt(2000),
"avg_review_count" : NumberInt(216),
"avg_review_score" : 3.72,
"avg_price" : 210.16,
"exact_ppc_bid" : 1.13,
"broad_ppc_bid" : 0.55,
"exact_hsa_bid" : 9.66,
"broad_hsa_bid" : 8.29
}
]
}
From the research I have been doing, I believe an Aggregate type query might do what I am attempting.
Unfortunately, being new to MongoDB / MongoEngine I am struggling to figure out how to structure the query and have failed in finding an example similar to what I am attempting to do (RED FLAG RIGHT????).
I did find an example of a aggregate but unsure how to structure my criteria in it, maybe something like this is getting close but does not work.
pipeline = [
{
"$lte": {
"$sum" : {
"keywords" : {
"$lte": {
"keyword": 100000
}
}
}: 9
}
}
]
data = product.objects().aggregate(pipeline)
Any guidance would be greatly appreciated.
Thanks,
Ben
you can try something like this
db.collection.aggregate([
{
$project: { // the first project to filter the keywords array
registered_data: 1,
UniqueName: 1,
keywords: {
$filter: {
input: "$keywords",
as: "item",
cond: {
$lte: [
"$$item.sfr",
100000
]
}
}
}
}
},
{
$project: { // the second project to get the length of the keywords array
registered_data: 1,
UniqueName: 1,
keywords: 1,
keywordsLength: {
$size: "$keywords"
}
}
},
{
$match: { // then do the match
keywordsLength: {
$gte: 9
}
}
}
])
you can test it here Mongo Playground
hope it helps
Note, I used sfr property only from the keywords array for simplicity

How to find the count of the number of documents in mongodb using pymongo aggregation?

I'm trying to find the max value of a field from a number of documents and want the output to not only reflect the max value of the field but also the total count of documents that the aggregate query will retrieve.
I'm able to retrieve the "wait" field with the max value that I want with the below query, but am stuck with how to get the count of all the documents that are satisfy the below query(Match field).
db = mongo_client[_MONGO_COLLECTION]
cursor = db.aggregate(
[
{"$match": { "owner": { "$exists": False}}},
{
"$project": {
"wait" : {
"$divide": [{"$subtract": [datetime.now(), "$creationDate"]}, 1000],
}
}
},
{
"$sort" : {
"wait": -1
}
}, {"$limit" : 1}
])
for x in cursor:
print(x)
You can use count method as below:
print(cursor.count())
print(list(cursor))
or
you can add $count pipeline as below:
{
"$count":"count" // the name of count filed
}

How to use mapreduce in mongodb?

I have the following code in python:
from pymongo import Connection
import bson
c = Connection()
db = c.twitter
ids = db.users_from_united_states.distinct("user.id")
for i in ids:
count = db.users_from_united_states.find({"user.id":i}).count()
for u in db.users_from_united_states.find({"user.id":i, "tweets_text": {"$size": count}}).limit(1):
db.my_usa_fitness_network.insert(u)
I need to get all the users and find the register of each user where the number of tweets_text is equal to the number of times that it appears in the collection (meaning that this document contains ALL the tweets that the same user posted).
Then, I need to save it in another collection, or just group it on the same collection.
When I run this code it gives me a number of documents that is less than the ids number
I saw something about mapReduce but I just can't figure out how to use it in my case.
I tried to run another code directly on mongodb but it hasn't worked at all:
var ids = db.users_from_united_states.distinct("user.id")
for (i=0; i< ids.length; i++){
var count = db.users_from_united_states.find({"user.id":ids[i]}).count()
db.users_from_united_states.find({"user.id":ids[i], "tweets_text": {$size: count}).limit(1).forEach(function(doc){db.my_usa_fitness_network.insert(doc)})
}
Can you help me please? I have a huge project and I need help. Thank you.
[
{
"$group": {
"_id": "$user.id",
"my_fitness_data": {
"$push": "$text"
}
}
},
{
"$project": {
"UserId": "$_id",
"TweetsCount": {
"$size": "$my_fitness_data"
},
"Tweets": "$my_fitness_data"
}
}
]

Categories