PyMongo get to find json object based on field value - python

I created a MongoDB with a collection like so:
{
"FR" : {...},
"EN": {...}
}
I'm pretty new with the world MongoDB/PyMongoDB so I was wondering if there is a way to get the data based on key (FR or EN) ?
I've tried this: db.collection.find_one({'EN'}) but it did not work.
Cheers,

find() uses a filter and projection as the first two parameters. The filter determines which documents are returned and the projections determines which fields are returned.
So to get the data you are interested in use:
for doc in db.collection.find({}, {'EN': 1}):
print(doc.get('EN'))

Related

Elasticsearch query to match values from list of values in Excel [Python]

I'm new to Elasticsearch. I have a list of values for example:
id_list=[1111,2222,3333,4444,5555]
Now I want to match those ids in that id_list to match with some information stored in Elasticsearch having the same id no. I'm thinking to use for loop to loop all the ids to match using the ES query, but I not sure how exactly to do that.
I know that using For Loop can run through all values in the list
for id in id_list:
print(id)
I able to search the id one-by-one using below ES query:
query={"bool":
{must":
[{"match":{"id_list":"1111"}}]
}}
Any possible way to include loop function so that I dont have to key-in the id manually like above?
Thanks!
You can use terms query from elasticsearch to query list of ids:
{
"query": {
"terms": {
"id_list": [1111,2222,3333,4444,5555]
}
}
}
Updated Based on comments:
As mentioned in documentation maximum of 65,536 terms.
By default, Elasticsearch limits the terms query to a maximum of
65,536 terms. This includes terms fetched using terms lookup. You can
change this limit using the index.max_terms_count setting.

How to list the outer-most fields of a large document usign PyMongo?

I am just having difficulties to understand this, how can I list outer-most fields when I am working with a huge text datasets? I am trying to implement it in Mongodb and pymongo? any suggestions?
I am not sure what you need, but maybe the bellow can help.
Query
this query returns an array with all the names of the outer fields
objectToArray to convert ROOT document to array
get the first member that is the field name
PlayMongo
aggregate(
[{"$project":
{"_id": 0,
"outer-fields":
{"$map":
{"input":
{"$map":
{"input": {"$objectToArray": "$$ROOT"},
"in": ["$$m.k", "$$m.v"],
"as": "m"}},
"in": {"$arrayElemAt": ["$$this", 0]}}}}}])

Pymongo aggregate: filter by count of fields number (dynamic)

Let's say I have an aggregation pipeline that for now leads to a collection with documents built like this:
{'name': 'Paul',
'football_position': 'Keeper',
'basketball_position': 4,...}
Obviously not everyone plays every sport so for some documents there would be fields that do not exist. The document regarding them would then be
{'name': 'Louis'}
What I want to do is to filter people that play at least one sport, inside my aggregation pipeline
I know that this is easy to check for one field with {'$match': {'football_position': {'$exists': True}}}, but I want to check if any of these fields exist.
I found an old question a bit similar (Check for existence of multiple fields in MongoDB document) but it checks for the existence of all fields -which, while bothersome, could be attained by the multiplication of multiples $match operations. Plus, maybe mongoDB has now a better way to handle this than writing a custom JavaScript function.
maybe mongoDB has now a better way to handle this
Yes, you can now utilise an aggregation operator $objectToArray (SERVER-23310) to turn keys into values. It should be able to count 'dynamic' number of fields. Combining this operator with $addFields could be quite useful.
Both operators are available in MongoDB v3.4.4+
Using your documents above as example:
db.sports.aggregate([
{ $addFields :
{ "numFields" :
{ $size:
{ $objectToArray:"$$ROOT"}
}
}
},
{ $match:
{ numFields:
{$gt:2}
}
}
])
The aggregation pipeline above, will first add a field called numFields. The value would be the size of an array. The array would contain the number of fields in the document. The second stage would filter only for 2 fields and greater (two fields because there's still _id field plus name).
In PyMongo, the above aggregation pipeline would look like:
cursor = collection.aggregate([
{"$addFields":{"numFields":
{"$size":{"$objectToArray":"$$ROOT"}}}},
{"$match":{"numFields":{"$gt":2}}}
])
Having said the above, if possible for your use case, I would suggest to reconsider your data models for easier access. i.e. Add a new field to keep track of number of sports when a new sport position is inserted/added.

python: how to find documents with specific fields

I am using python and mongodb. I have a collection which contains 40000 documents. I have a group of coordinates and I need to find which document these coordinates belong to. Now I am doing:
cell_start = citymap.find({"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
This method is a typical geoJSON method and it works well. Now I know some documents have such a field:
{'trips_dest':......}
The value of this field is not important so I just skip that. The thing is that, instead of looking for documents from all these 40000 documents, I can just look for documents from documents which have the field called 'trips_dest'.
Since I know only about 40% of documents have the field 'trips_dest' so I think this would increase the efficiency. However, I don't know how to modify my code to do that. Any idea?
You need the $exists query operator. Something like that:
cell_start = citymap.find({"trips_dest": {$exists: true},
"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
To quote the documentation:
Syntax: { field: { $exists: <boolean> } }
When <boolean> is true, $exists matches the documents that contain the field, including documents where the field value is null
If you need to reject null values, use:
"trips_dest": {$exists: true, $ne: null}
As a final note, a sparse index might eventually speed up such query.

Updating a nested mongodb collection using a cursor

I am new to mongodb and am using it to store a nested document. E.g. Each document contains an array of students for each class. I am trying to update the information in each of the array nodes. Is there a better way to do it instead of updating each of the array element one at a time?
Here is my record in the collection -
{
“_id” : “23343” ,
“class” : “Physics”,
“students”: [
{ “id” : “2412” , “name” : "Alice", “mentor” : 0 },
{ “id” : “2413” , “name” : "Bob, “mentor” : 0 },
]
}
There are multiple records like this in the collection.
I have a list of these courses I need to update for each record. For example I get an array of students for the above record to update like this -
{
“_id” : “23343” ,
“class” : “Physics”,
“students”: [
{ “id” : “2412” , “name” : "Alice", “mentor” : "Mark" },
{ “id” : “2413” , “name” : "Bob, “mentor” : "Jackson" },
]
}
What is the best way to update the record?
I am using python. Intuitively,I can do a find() on the collection for the course. I get a cursor for that. I can do a for each in cursor. I believe mongodb updates the whole document on update().
for record in courseCollection.find():
recordId = record['_id']
updatedList = getUpdatedStudentList( record['students'])
updatedRecord = prepareUpdatedRecord(updatedList)
courseCollection.update( {'_id' : recordId}, updateList)
The pymongo documentation site does not talk about the set option in the update function. Unless I use that I believe mongodb updates the whole document.
Also calling update with a query option by passing in the _id seems unnecessary because I just did the query and have a handle to the record. Can I somehow use the cursor to do the update there by not do the query for the update again?
You can use the $ operator with $elemMatch in an update. Let's start by inserting your document:
collection.insert({
"_id": "23343",
"class": "Physics",
"students": [
{"id": "2412", "name": "Alice", "mentor": 0},
{"id": "2413", "name": "Bob", "mentor": 0}]})
Now I'll run two update statements, first adding the mentor "Mark", then "Jackson":
collection.update(
# Query portion.
{"_id": "23343", "students": {"$elemMatch": {"id": "2412"}}},
# Update portion; $ is the position of the matching student.
{"$set": {"students.$.mentor": "Mark"}})
collection.update(
# Query portion.
{"_id": "23343", "students": {"$elemMatch": {"id": "2413"}}},
{"$set": {"students.$.mentor": "Jackson"}})
Each update statement affects just the "mentor" field of one subdocument in the "students" array.
I am not so sure what the question is, exactly. In a nutshell: yes, you'll have to update the 'parent' object and yes, you can use $set or replace the entire document, which would be the default behavior. The difference is mostly a matter of locking, concurrency and ownership which is a bit complex. Here's a little more detail on some of your concerns:
Updating a nested mongodb collection using a cursor
Please note that there are no "nested collections", there are only embedded documents. That's important, because the first class citizen in mongodb is always the actual document itself. For instance, a find() will return documents, not subsets of embedded documents alone. You can do projections, but that's only an output transformation, so to speak.
I can do a find() on the collection for the course. I get a cursor for that.
You get a cursor, but since you're querying on the primary key there can only be a single match (primary keys are unique), i.e. you could use findOne() and you don't need to iterate the single result.
E.g. Each document contains an array of students for each class.
These should usually be references to the students, i.e. there should be a separate students collection because you don't want to lose the student because it was temporarily not assigned to any course.
The pymongo documentation site does not talk about the set option in the update function. Unless I use that I believe mongodb updates the whole document.
That's true. You can do a $set on the students array of a document. That avoids overwriting any other fields, such as class. On the other hand, if somebody else has changed the class while your using was editing the students, do the updates still make sense? Unclear ownership is my primary concern with embedded documents.
Also calling update with a query option by passing in the _id seems unnecessary because I just did the query and have a handle to the record
...but what exactly is a handle to the record? A handle is an immutable, unique and usually short identifier. Just like the id. The _id is the handle. I don't know python, but I guess you could write a method that takes a pointer to a database object and performs an update, knowing that every database object must have a field called _id. But from the database's perspective, a pointer in your code is not a handle, but the id is.

Categories