python: how to find documents with specific fields - python

I am using python and mongodb. I have a collection which contains 40000 documents. I have a group of coordinates and I need to find which document these coordinates belong to. Now I am doing:
cell_start = citymap.find({"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
This method is a typical geoJSON method and it works well. Now I know some documents have such a field:
{'trips_dest':......}
The value of this field is not important so I just skip that. The thing is that, instead of looking for documents from all these 40000 documents, I can just look for documents from documents which have the field called 'trips_dest'.
Since I know only about 40% of documents have the field 'trips_dest' so I think this would increase the efficiency. However, I don't know how to modify my code to do that. Any idea?

You need the $exists query operator. Something like that:
cell_start = citymap.find({"trips_dest": {$exists: true},
"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
To quote the documentation:
Syntax: { field: { $exists: <boolean> } }
When <boolean> is true, $exists matches the documents that contain the field, including documents where the field value is null
If you need to reject null values, use:
"trips_dest": {$exists: true, $ne: null}
As a final note, a sparse index might eventually speed up such query.

Related

Retrieve data from couchdb (how to verify that a value is in a database)

how can I retrieve data from a couchdb database, not in a document of a database but in all the documents of the database. I already verified if a value was in a document but I want now to verify if a value is in a special database.
It's hard to be specific here, since you're not providing any example data. Do you mean if a particular value exists anywhere in any document, or if a specific field holds a specific value in any document?
If you mean the former, I'd advice you to come up with a better data model. So let's focus on the latter. Let's say your documents contain the field called special, and you want to know if any such documents has the value 99 in this field:
{
...
"special": 99
}
Create a view keyed on the value:
function(doc) {
if (doc && doc.special) {
emit(doc.special, null);
}
}
Now you can check if documents exist in this database for values of the special field:
# Look for the value 99
% acurl 'https://XYZ.cloudant.com/aaa/_design/test/_view/special?key=99'
{"total_rows":3,"offset":2,"rows":[
{"id":"a3c424e99f3cc9988a2553bb680ac7f8","key":99,"value":null}
]}
In my databse aaa there is one document, with the id a3c424e9... that has the value 99 in the special field. This is a so-called reverse index.

Pymongo: update document only if value of a field matches a provided value

Is it possible to update a document based on the condition that the value of a field in that document matches a value that I provide?
I can easily do this in two steps but was wondering if there was a way to do it in a single call to MongoAtlas.
It is a single step execution. You don't need two steps.
Reference
And Python implementation is same syntax as plain queries.
db.collection.update_one({
filter condition goes here
},{
'$set': {
update fields goes here
}
}, upsert=False) //Options

PyMongo get to find json object based on field value

I created a MongoDB with a collection like so:
{
"FR" : {...},
"EN": {...}
}
I'm pretty new with the world MongoDB/PyMongoDB so I was wondering if there is a way to get the data based on key (FR or EN) ?
I've tried this: db.collection.find_one({'EN'}) but it did not work.
Cheers,
find() uses a filter and projection as the first two parameters. The filter determines which documents are returned and the projections determines which fields are returned.
So to get the data you are interested in use:
for doc in db.collection.find({}, {'EN': 1}):
print(doc.get('EN'))

Pymongo aggregate: filter by count of fields number (dynamic)

Let's say I have an aggregation pipeline that for now leads to a collection with documents built like this:
{'name': 'Paul',
'football_position': 'Keeper',
'basketball_position': 4,...}
Obviously not everyone plays every sport so for some documents there would be fields that do not exist. The document regarding them would then be
{'name': 'Louis'}
What I want to do is to filter people that play at least one sport, inside my aggregation pipeline
I know that this is easy to check for one field with {'$match': {'football_position': {'$exists': True}}}, but I want to check if any of these fields exist.
I found an old question a bit similar (Check for existence of multiple fields in MongoDB document) but it checks for the existence of all fields -which, while bothersome, could be attained by the multiplication of multiples $match operations. Plus, maybe mongoDB has now a better way to handle this than writing a custom JavaScript function.
maybe mongoDB has now a better way to handle this
Yes, you can now utilise an aggregation operator $objectToArray (SERVER-23310) to turn keys into values. It should be able to count 'dynamic' number of fields. Combining this operator with $addFields could be quite useful.
Both operators are available in MongoDB v3.4.4+
Using your documents above as example:
db.sports.aggregate([
{ $addFields :
{ "numFields" :
{ $size:
{ $objectToArray:"$$ROOT"}
}
}
},
{ $match:
{ numFields:
{$gt:2}
}
}
])
The aggregation pipeline above, will first add a field called numFields. The value would be the size of an array. The array would contain the number of fields in the document. The second stage would filter only for 2 fields and greater (two fields because there's still _id field plus name).
In PyMongo, the above aggregation pipeline would look like:
cursor = collection.aggregate([
{"$addFields":{"numFields":
{"$size":{"$objectToArray":"$$ROOT"}}}},
{"$match":{"numFields":{"$gt":2}}}
])
Having said the above, if possible for your use case, I would suggest to reconsider your data models for easier access. i.e. Add a new field to keep track of number of sports when a new sport position is inserted/added.

pymongo saving embedded objectIds, InvalidDocumentError

Using the pymongo driver bare to connect python to mongodb, why is it that using an ObjectId instance as the key for an embedded document raises an InvalidDocument error?
I am trying to link documents using objectids and cant seem to understand why I would want to convert them to strings when the ones created automatically for the driver are ObjectId instances.
item = collection.find({'x':'foo'})
item['otherstuff'] = {pymongo.objectid.ObjectId() : 'data about this link'}
collection.update({'x':'foo'}, item)
bson.errors.InvalidDocument: documents must have only string keys, key was ObjectId('4f0b5d4e764df61c67000000')
In practice the linked ids represent documents that contain questions, and the values in the dictionary here keyed as 'otherstuff' for example would represent this individual document's responses to that particular question.
Is there a reason applying objectids like this won't encode into bson and then fails? Is it impossible to nest ObjectIds within documents like this to cross-reference? Have I misunderstood the purpose of them?
The BSON spec dictates that keys must be strings, so PyMongo is right to reject this as an invalid document (and would be regardless of at what level an ObjectId was used as a key, whether at the top level or in an embedded document). This is necessary, among other reasons, so that the query language can be unambiguous. Imagine you had this document (and that it were a valid BSON document):
{ _id: ...,
"4f0cbe6d7f40d36b24a5c4d7": true,
ObjectId("4f0cbe6d7f40d36b24a5c4d7"): false
}
And then you attempted to query with:
db.foo.find({"4f0cbe6d7f40d36b24a5c4d7": false})
Should this return this document? Should that string be auto-boxed into an ObjectId? How would Mongo know when that can be auto-boxed, and how to disambiguate in cases like this document?
A possible alternative solution to your problem is to have an array of embedded documents like:
{ answers: [
{ answer_id: ObjectId("..."), summary: "Good answer to this question" },
{ answer_id: ObjectId("..."), summary: "Bad answer to this question" }
]
}
This is valid BSON, and will also be indexable more efficiently. If you add an index on answers, you can search efficiently for exact matches on these subdocuments; if you add an index on answers.answer_id, then you can search efficiently by the ObjectId of the answer you're looking for (and so on).

Categories