Remove attribute from all MongoDB documents using Python and PyMongo

Remove attribute from all MongoDB documents using Python and PyMongo - python

In my MongoDB, a bunch of these documents exist:
{ "_id" : ObjectId("5341eaae6e59875a9c80fa68"),
"parent" : {
"tokeep" : 0,
"toremove" : 0
}
}
I want to remove the parent.toremove attribute in every single one.
Using the MongoDB shell, I can accomplish this using:
db.collection.update({},{$unset: {'parent.toremove':1}},false,true)
But how do I do this within Python?
app = Flask(__name__)
mongo = PyMongo(app)
mongo.db.collection.update({},{$unset: {'parent.toremove':1}},false,true)
returns the following error:
File "myprogram.py", line 46
mongo.db.collection.update({},{$unset: {'parent.toremove':1}},false,true)
^
SyntaxError: invalid syntax

Put quotes around $unset, name the parameter you're including (multi) and use the correct syntax for true:
mongo.db.collection.update({}, {'$unset': {'parent.toremove':1}}, multi=True)

Just found weird to have to attach an arbitrary value for the field to remove, such as a small number (1), an empty string (''), etc, but it's really mentioned in MongoDB doc, with sample in JavaScript:
$unset
The $unset operator deletes a particular field. Consider the following syntax:
{ $unset: { field1: "", ... } }
The specified value in the $unset expression (i.e. "") does not impact
the operation.
For Python/PyMongo, I'd like to put a value None:
{'$unset': {'field1': None}}
So, for OP's question, it would be:
mongo.db.collection.update({}, {'$unset': {'parent.toremove': None}}, multi=True)

Related

How do i get the recent inserted document in MongoDB with all it's fields?

I'm working on this REST application in python Flask and a driver called pymongo. But if someone knows mongodb well he/she maybe able to answer my question.
Suppose Im inserting a new document in a collection say students. I want to get the whole inserted document as soon as the document is saved in the collection. Here is what i've tried so far.
res = db.students.insert_one({
"name": args["name"],
"surname": args["surname"],
"student_number": args["student_number"],
"course": args["course"],
"mark": args["mark"]
})
If i call:
print(res.inserted_id) ## i get the id
How can i get something like:
{
"name": "student1",
"surname": "surname1",
"mark": 78,
"course": "ML",
"student_number": 2
}
from the res object. Because if i print res i am getting <pymongo.results.InsertOneResult object at 0x00000203F96DCA80>

Put the data to be inserted into a dictionary variable; on insert, the variable will have the _id added by pymongo.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
doc = {
"name": "name"
}
db.students.insert_one(doc)
print(doc)
prints:
{'name': 'name', '_id': ObjectId('60ce419c205a661d9f80ba23')}

Unfortunately, the commenters are correct. The PyMongo pattern doesn't specifically allow for what you are asking. You are expected to just use the inserted_id from the result and if you needed to get the full object from the collection later do a regular query operation afterwards

PyMongo Atlas Search not returning anything

I'm trying to do a full text search using Atlas for MongoDB. I'm doing this through the PyMongo driver in Python. I'm using the aggregate pipeline, and doing a $search but it seems to return nothing.
cursor = db.collection.aggregate([
{"$search": {"text": {"query": "hello", "path": "text_here"}}},
{"$project": {"file_name": 1}}
])
for x in cursor:
print(x)
What I'm trying to achieve with this code is to search through a field in the collection called "text_here", and I'm searching for a term "hello" and returning all the results that contain that term and listing them by their "file_name". However, it returns nothing and I'm quite confused as this is almost identical to the example code on the documentation website. The only thing I could think of right now is that possible the path isn't correct and it can't access the field I've specified. Also, this code returns no errors, simply just returns nothing as I've tested by looping through cursor.

I had the same issue. I solved it by also passing the name of the index in the query. For example:
{
index: "name_of_the_index",
text: {
query: 'john doe',
path: 'name'
}
}
I followed the tutorials but couldn't get any result back without specifying the "index" name. I wish this was mentioned in the documentation as mandatory.

If you are only doing a find and project, you don't need an aggregate query, just a find(). The syntax you want is:
db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
Equivalent using aggregate:
cursor = db.collection.aggregate([
{'$match': {'$text': {'$search': 'hello'}}},
{'$project': {'file_name': 1}}])
Worked example:
from pymongo import MongoClient, TEXT
db = MongoClient()['mydatabase']
db.collection.create_index([('text_here', TEXT)])
db.collection.insert_one({"text_here": "hello, is it me you're looking for", "file_name": "foo.bar"})
cursor = db.collection.find({'$text': {'$search': 'hello'}}, {'file_name': 1})
for item in cursor:
print(item)
prints:
{'_id': ObjectId('5fc81ce9a4a46710459de610'), 'file_name': 'foo.bar'}

Python Elasticsearch create index mapping

I am trying to create a ES index with custom mapping with elasticsearch python to increase the size of text in each document:
mapping = {"mapping":{
"properties":{
"Apple":{"type":"text","ignore_above":1000},
"Mango":{"type":"text","ignore_above":1000}
}
}}
Creation:
from elasticsearch import Elasticsearch
es1 = Elasticsearch([{"host":"localhost","port":9200}])
es1.indices.create(index="hello",body=mapping)
Error:
RequestError: RequestError(400, 'mapper_parsing_exception', 'Mapping definition for [Apple] has unsupported parameters: [ignore_above : 10000]')
But I checked the elasticsearch website on how to increase the text length limit and ignore_above was the option given there.
https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html
Any suggestions on how to rectify this will be great.

The ignore_above setting is only for keyword types not text, so just change your mapping to this and it will work:
mapping = {"mapping":{
"properties":{
"Apple":{"type":"text"},
"Mango":{"type":"text"}
}
}}
If you absolutely need to be able to specify ignore_above then you need to change the type to keyword, like this:
mapping = {"mapping":{
"properties":{
"Apple":{"type":"keyword","ignore_above":1000},
"Mango":{"type":"keyword","ignore_above":1000}
}
}}

JSON Parse an element inside an element in Python

I have a JSON text grabbed from an API of a website:
{"result":"true","product":{"made":{"Taiwan":"Taipei","HongKong":"KongStore","Area":"Asia"}}}
I want to capture "Taiwan" and "Taipei" but always fail.
Here is my code:
import json
weather = urllib2.urlopen('url')
wjson = weather.read()
wjdata = json.loads(wjson)
print wjdata['product']['made'][0]['Taiwan']
I always get the following error:
Keyword 0 error
Whats the correct way to parse that json?

You are indexing an array where there are none.
The JSON is the following:
{
"result":"true",
"product": {
"made": {
"Taiwan":"Taipei",
"HongKong":"KongStore",
"Area":"Asia"
}
}
}
And the above contains no arrays.
You are assuming the JSON structure to be something like this:
{
"result":"true",
"product": {
"made": [
{"Taiwan":"Taipei"},
{"HongKong":"KongStore"},
{"Area":"Asia"}
]
}
}
From a brief look at the doc pages for the json package, I found this conversion table: Conversion table using json.loads
It tells us that a JSON object translates to a dict. And a dict has a method called keys, which returns a list of the keys.
I suggest you try something like this:
#... omitted code
objectKeys = wjdata['product']['made'].keys()
# You should now have a list of the keys stored in objectKeys.
for key in objectKeys:
print key
if key == 'Taiwan':
print 'Eureka'
I haven't tested the above code, but I think you get the gist here :)

wjdata['product']['made']['Taiwan'] works

Removing _id element from Pymongo results

I'm attempting to create a web service using MongoDB and Flask (using the pymongo driver). A query to the database returns documents with the "_id" field included, of course. I don't want to send this to the client, so how do I remove it?
Here's a Flask route:
#app.route('/theobjects')
def index():
objects = db.collection.find()
return str(json.dumps({'results': list(objects)},
default = json_util.default,
indent = 4))
This returns:
{
"results": [
{
"whatever": {
"field1": "value",
"field2": "value",
},
"whatever2": {
"field3": "value"
},
...
"_id": {
"$oid": "..."
},
...
}
]}
I thought it was a dictionary and I could just delete the element before returning it:
del objects['_id']
But that returns a TypeError:
TypeError: 'Cursor' object does not support item deletion
So it isn't a dictionary, but something I have to iterate over with each result as a dictionary. So I try to do that with this code:
for object in objects:
del object['_id']
Each object dictionary looks the way I'd like it to now, but the objects cursor is empty. So I try to create a new dictionary and after deleting _id from each, add to a new dictionary that Flask will return:
new_object = {}
for object in objects:
for key, item in objects.items():
if key == '_id':
del object['_id']
new_object.update(object)
This just returns a dictionary with the first-level keys and nothing else.
So this is sort of a standard nested dictionaries problem, but I'm also shocked that MongoDB doesn't have a way to easily deal with this.
The MongoDB documentation explains that you can exclude _id with
{ _id : 0 }
But that does nothing with pymongo. The Pymongo documentation explains that you can list the fields you want returned, but "(“_id” will always be included)". Seriously? Is there no way around this? Is there something simple and stupid that I'm overlooking here?

To exclude the _id field in a find query in pymongo, you can use:
db.collection.find({}, {'_id': False})
The documentation is somewhat missleading on this as it says the _id field is always included. But you can exclude it like shown above.

Above answer fails if we want specific fields and still ignore _id. Use the following in such cases:
db.collection.find({'required_column_A':1,'required_col_B':1, '_id': False})

You are calling
del objects['_id']
on the cursor object!
The cursor object is obviously an iterable over the result set and not single
document that you can manipulate.
for obj in objects:
del obj['_id']
is likely what you want.
So your claim is completely wrong as the following code shows:
import pymongo
c = pymongo.Connection()
db = c['mydb']
db.foo.remove({})
db.foo.save({'foo' : 42})
for row in db.foo.find():
del row['_id']
print row
$ bin/python foo.py
> {u'foo': 42}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove attribute from all MongoDB documents using Python and PyMongo - python

Put quotes around $unset, name the parameter you're including (multi) and use the correct syntax for true: mongo.db.collection.update({}, {'$unset': {'parent.toremove':1}}, multi=True)

Related

How do i get the recent inserted document in MongoDB with all it's fields?

PyMongo Atlas Search not returning anything

Python Elasticsearch create index mapping

JSON Parse an element inside an element in Python

Removing _id element from Pymongo results

Categories

Resources