How to use mapreduce in mongodb? - python

I have the following code in python:
from pymongo import Connection
import bson
c = Connection()
db = c.twitter
ids = db.users_from_united_states.distinct("user.id")
for i in ids:
count = db.users_from_united_states.find({"user.id":i}).count()
for u in db.users_from_united_states.find({"user.id":i, "tweets_text": {"$size": count}}).limit(1):
db.my_usa_fitness_network.insert(u)
I need to get all the users and find the register of each user where the number of tweets_text is equal to the number of times that it appears in the collection (meaning that this document contains ALL the tweets that the same user posted).
Then, I need to save it in another collection, or just group it on the same collection.
When I run this code it gives me a number of documents that is less than the ids number
I saw something about mapReduce but I just can't figure out how to use it in my case.
I tried to run another code directly on mongodb but it hasn't worked at all:
var ids = db.users_from_united_states.distinct("user.id")
for (i=0; i< ids.length; i++){
var count = db.users_from_united_states.find({"user.id":ids[i]}).count()
db.users_from_united_states.find({"user.id":ids[i], "tweets_text": {$size: count}).limit(1).forEach(function(doc){db.my_usa_fitness_network.insert(doc)})
}
Can you help me please? I have a huge project and I need help. Thank you.

[
{
"$group": {
"_id": "$user.id",
"my_fitness_data": {
"$push": "$text"
}
}
},
{
"$project": {
"UserId": "$_id",
"TweetsCount": {
"$size": "$my_fitness_data"
},
"Tweets": "$my_fitness_data"
}
}
]

Related

How to use the sum of two fields when searching for a document in MongoDB?

I have a collection of accounts and I am trying to find an account in which the targetAmount >= totalAmount + N
{
"_id": {
"$oid": "60d097b761484f6ad65b5305"
},
"targetAmount": 100,
"totalAmount": 0,
"highPriority": false,
"lastTimeUsed": 1624283088
}
Now I just select all accounts, iterate over them and check if the condition is met. But I'm trying to do this all in a query:
amount = 10
tasks = ProviderAccountTaskModel.objects(
__raw__={
'targetAmount': {
'$gte': {'$add': ['totalAmount', amount]}
}
}
).order_by('-highPriority', 'lastTimeUsed')
I have also tried using the $sum, but both options do not work.
Can't it be used when searching, or am I just going the wrong way?
You can use a $where. Just be aware it will be fairly slow (has to execute Javascript code on every record) so combine with indexed queries if you can.
db.getCollection('YourCollectionName').find( { $where: function() { return this.targetAmount > (this.totalAmount + 10) } })
or more compact way of doing it will be
db.getCollection('YourCollectionName').find( { $where: "this.targetAmount > this.totalAmount + 10" })
You have to use aggregation instead of the find command since self-referencing of documents in addition to arithmetic operations won't work on it.
Below is the aggregation command you are looking for. Convert it into motoengine equivalent command.
db.collection.aggregate([
{
"$match": {
"$expr": {
"$gte": [
"$targetAmount",
{
"$sum": [
"$totalAmount",
10
]
},
],
},
},
},
{
"$sort": {
"highPriority": -1,
"lastTimeUsed": 1,
},
},
])
Mongo Playground Sample Execution

MongoDB (PyMongo) Pagination with distinct not giving consistent result

I am trying to achieve pagination with distinct using pymongo.
I have records
{
name: string,
roll: integer,
address: string,
.
.
}
I only want name for each record, where name can be duplicate, so i want distinct name with pagination.
result = collection.aggregate([
{'$sort':{"name":1}},
{'$group':{"_id":"$name"}},
{'$skip':skip},
{'$limit':limit}
])
Problem is, with this query, each time I query I get different result for same page number
Looked into this answer
Distinct() command used with skip() and limit()
but didn't help in my case.
How do I resolve this.
Thanks in advance!
I've tried to sort after the group and it seems to solve the problem
db.collection.aggregate([
{
"$group": {
"_id": "$name"
}
},
{
"$sort": {
"_id": 1
}
},
{
"$skip": 0
},
{
"$limit": 1
}
])
try it here

Update nested Python list key using pop

I have a json file called pool.json which contains this:
{
"pools": {
"$poolId": {
"nodes": {
"$nodeId": {
"bcm": {
"address": {
"ip": "10.10.10.10"
},
"password": "ADMIN",
"username": "ADMIN"
}
}
}
}
}
}
This is my Python code:
pool_id = ['123456']
json_pool = json.loads(read_json_file('pool.json'))
for i in pool_id:
json_pool['pools'][i] = json_pool.pop(['pools']['$poolId'])
print('json_pool: %s' % json_pool)
I'm trying to update $poolId with the value in pool_id(I know I've only got one pool_id. I just want to get this piece working before I do anything else). Ive been trying to do this with pop but am having no success when it's nested as in this case. I can get it working when I want to change a top level key. What am I doing wrong?
I think you want to execute json_pool['pools'].pop('$poolId') instead of json_pool.pop(['pools']['$poolId']).

How to find the count of the number of documents in mongodb using pymongo aggregation?

I'm trying to find the max value of a field from a number of documents and want the output to not only reflect the max value of the field but also the total count of documents that the aggregate query will retrieve.
I'm able to retrieve the "wait" field with the max value that I want with the below query, but am stuck with how to get the count of all the documents that are satisfy the below query(Match field).
db = mongo_client[_MONGO_COLLECTION]
cursor = db.aggregate(
[
{"$match": { "owner": { "$exists": False}}},
{
"$project": {
"wait" : {
"$divide": [{"$subtract": [datetime.now(), "$creationDate"]}, 1000],
}
}
},
{
"$sort" : {
"wait": -1
}
}, {"$limit" : 1}
])
for x in cursor:
print(x)
You can use count method as below:
print(cursor.count())
print(list(cursor))
or
you can add $count pipeline as below:
{
"$count":"count" // the name of count filed
}

Pymongo count elements collected out of all documents with key

I want to count all elements which occur in somekey in an MongoDB collection.
The current code looks at all elements in somekey as a whole.
from pymongo import Connection
con = Connection()
db = con.database
collection = db.collection
from bson.code import Code
reducer = Code("""
function(obj, prev){
prev.count++;
}
""")
from bson.son import SON
results = collection.group(key={"somekey":1}, condition={}, initial={"count": 0}, reduce=reducer)
for doc in results:
print doc
However, I want that it counts all elements which occur in any document with somekey.
Here is an anticipated example. The MongoDB has the following documents.
{ "_id" : 1, “somekey" : [“AB", “CD"], "someotherkey" : "X" }
{ "_id" : 2, “somekey" : [“AB", “XY”], "someotherkey" : "Y" }
The result should provide an by count ordered list with:
count: 2 "AB"
count: 1 "CD"
count: 1 "XY"
The .group() method will not work on elements that are arrays, and the closest similar thing would be mapReduce where you have more control over the emitted keys.
But really the better fit here is the aggregation framework. It is implemented in native code as does not use JavaScript interpreter processing as the other methods there do.
You wont be getting an "ordered list" from MongoDB responses, but you get a similar document result:
results = collection.aggregate([
# Unwind the array
{ "$unwind": "somekey" },
# Group the results and count
{ "$group": {
"_id": "$somekey",
"count": { "$sum": 1 }
}}
])
Gives you something like:
{ "_id": "AB", "count": 2 }
{ "_id": "CD", "count": 1 }
{ "_id": "XY", "count": 1 }

Categories