Update all MongoDB fields with their own values with PyMongo - python

I want to update every field of my MongoDB collection using the field's own value to do so.
Example: if I have this document: "string": "foo", a possible update would do this: "string": $string.lower(). Here, $string would be "foo", but I don't know how to do this with PyMongo.
I've tried this:
user_collection.update_many({}, { "$set": { "word": my_func("$word")}})
Which replaces everything with "$word".
I've been able to do it successfully iterating each document but it takes too long.

As I know you can't find and update in one statement using python function. You can either use mongo query language:
user_collection.update_many({}, { "$set": {"name": { "$concat": ["$name", "_2"]}}})
or use separate functions of pymongo:
for obj in user_collection.find({some query here}):
user_collection.update({"_id": obj['_id']}, { "$set": {"name": my_func(obj['name']) } })

Related

How to search for a value in two different type of field or index or heading of mongodb using python?

I am new to any kind of programming. This is an issue I encountered when using mongodb. Below is the collection structure of the document I imported from two different csv files.
{
"_id": {
"$oid": "61bc4217ed94f9d5fe6a350c"
},
"Telephone Number": "8429950810",
"Date of Birth": "01/01/1945"
}
{
"_id": {
"$oid": "61bc4217ed94f9d5fe6a350c"
},
"Telephone Number": "8129437810",
"Date of Birth": "01/01/1998"
}
{
"_id": {
"$oid": "61bd98d36cc90a9109ab253c"
},
"TELEPHONE_NUMBER": "9767022829",
"DATE_OF_BIRTH": "16-Jun-98"
}
{
"_id": {
"$oid": "61bd98d36cc9090109ab253c"
},
"TELEPHONE_NUMBER": "9567085829",
"DATE_OF_BIRTH": "16-Jan-91"
}
The first two entries are from a csv and the next two entries from another csv file. Now I am creating a user interface where users can search for a telephone number. How to write the query to search the telephone number value in both the index ( Telephone Number and TELEPHONE_NUMBER) using find() in the above case. If not possible is there a way to change the index's to a desired format while importing csv to db. Or is there a way where I create two different collection and then import csv to each collections and then perform a collective search of both the collections. Or can we create a compound index and then search the compound index instead. I am using pymongo for all the operations.
Thankyou.
You can use or query if different key is used to store same type of data.
yourmongocoll.find({"$or":[ {"Telephone Number":"8429950810"}, {"TELEPHONE_NUMBER":8429950810}]})
Assuming you have your connection string to connect via pymongo. Then the following is an example of how to query for the telephone number "8429950810":
from pymongo import MongoClient
client = MongoClient("connection_string")
db = client["db"]
collection = db["collection"]
results = collection.find({"Telephone Number":"8429950810"})
Please note this will return as type cursor, if you would like your documents in a list consider wrapping the query in list() like so:
results = list(collection.find({"Telephone Number":"8429950810"}))

Elasticsearch prevent indexing of Markdown hyperlinks

I am building a Markdown file content search using Elasticsearch. Currently the whole content inside the MD file is indexed in Elasticsearch. But the problem is it shows results like this [Mylink](https://link-url-here.org), [Mylink2](another_page.md)
in the search results.
I would like to prevent indexing of hyperlinks and reference to other pages. When someone search for "Mylink" it should only return the text without the URL. It would be great if someone could help me with the right solution for this.
You need to render Markdown in your indexing application, then remove HTML tags and save it alongside with the markdown source.
I think you have two main solutions for this problem.
first: clean the data in your source code before indexing it into Elasticsearch.
second: use the Elasticsearch filter to clean the data for you.
the first solution is the easy one but if you need to do this process inside the Elasticsearch you need to create a ingest pipeline.
then you can use the Script processor to clean the data you need by a ruby script that can find your regex and remove it
You could use an ingest pipeline with a script processor to extract the link text:
1. Set up the pipeline
PUT _ingest/pipeline/clean_links
{
"description": "...",
"processors": [
{
"script": {
"source": """
if (ctx["content"] == null) {
// nothing to do here
return
}
def content = ctx["content"];
Pattern pattern = /\[([^\]\[]+)\](\(((?:[^\()]+)+)\))/;
Matcher matcher = pattern.matcher(content);
def purged_content = matcher.replaceAll("$1");
ctx["purged_content"] = purged_content;
"""
}
}
]
}
The regex can be tested here and is inspired by this.
2. Include the pipeline when ingesting the docs
POST my-index/_doc?pipeline=clean_links
{
"content": "[Mylink](https://link-url-here.org) [anotherLink](http://dot.com)"
}
POST my-index/_doc?pipeline=clean_links
{
"content": "[Mylink2](another_page.md)"
}
The python docs are here.
3. Verify
GET my-index/_search?filter_path=hits.hits._source
should yield
{
"hits" : {
"hits" : [
{
"_source" : {
"purged_content" : "Mylink anotherLink",
"content" : "[Mylink](https://link-url-here.org) [anotherLink](http://dot.com)"
}
},
{
"_source" : {
"purged_content" : "Mylink2",
"content" : "[Mylink2](another_page.md)"
}
}
]
}
}
You could instead replace the original content if you want to fully discard them from your _source.
In contrast, you could go a step further in the other direction and store the text + link pairs in a nested field of the form:
{
"content": "...",
"links": [
{
"text": "Mylink",
"href": "https://link-url-here.org"
},
...
]
}
so that when you later decide to make them searchable, you'll be able to do so with precision.
Shameless plug: you can find other hands-on ingestion guides in my Elasticsearch Handbook.

How do i get the recent inserted document in MongoDB with all it's fields?

I'm working on this REST application in python Flask and a driver called pymongo. But if someone knows mongodb well he/she maybe able to answer my question.
Suppose Im inserting a new document in a collection say students. I want to get the whole inserted document as soon as the document is saved in the collection. Here is what i've tried so far.
res = db.students.insert_one({
"name": args["name"],
"surname": args["surname"],
"student_number": args["student_number"],
"course": args["course"],
"mark": args["mark"]
})
If i call:
print(res.inserted_id) ## i get the id
How can i get something like:
{
"name": "student1",
"surname": "surname1",
"mark": 78,
"course": "ML",
"student_number": 2
}
from the res object. Because if i print res i am getting <pymongo.results.InsertOneResult object at 0x00000203F96DCA80>
Put the data to be inserted into a dictionary variable; on insert, the variable will have the _id added by pymongo.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
doc = {
"name": "name"
}
db.students.insert_one(doc)
print(doc)
prints:
{'name': 'name', '_id': ObjectId('60ce419c205a661d9f80ba23')}
Unfortunately, the commenters are correct. The PyMongo pattern doesn't specifically allow for what you are asking. You are expected to just use the inserted_id from the result and if you needed to get the full object from the collection later do a regular query operation afterwards

Updating entire collection in MongoDB

I have a MongoDB collection with various documents in it. Every tot seconds my Python scripts retrieves some data from an API, i want to update each document of the collection with the updated version of the document, so the entire collection has to be updated.
result = db.main_tst.insert_one(dic)
This is how i insert the data. Now instead of inserting dic, i should update it. How can i do it with Python in MongoDB? I know there is the update_many() method, but i've only found how to update a certain document, instead of the entire collection.
It should be simple :
Let's suppose if you consider below, it would update all matching documents where field name = 'N/A' to "No name" :
filterQuery = { 'name': 'N/A'}
updateQuery = { "$set": { "name": "No name" } }
result = mycol.update_many(filterQuery, updateQuery);
Where as for your requirement as you need to update all documents in a collection, all you've to do is pass empty {} in place of filter, means it should update all documents :
filterQuery = {}
updateQuery = { "$set": { "name": "No name" } }
result = mycol.update_many(filterQuery, updateQuery)

pymongo include javascript in aggregate query

I'm currently tasked with researching databases and am trying various queries using the pymongo library to investigate suitability for given projects.
My timestamps are saved in millisecond integer format and I'd like to do a simple sales by day aggregated query. I understand from here (answer by Alexandre Russel) that as the timestamps weren't uploaded in BSON format I can't use date and time functions to create bins, but can manipulate timestamps using embedded javascript.
As such I've written the following query:
[{
"$project": {
"year": {
"$year": {
"$add": ["new Date(0)", "$data.horaContacto"]
}
},
"month": {
"$month": {
"$add": ["new Date(0)", "$data.horaContacto"]
}
}
}
}, {
"$group": {
"_id": {
"year": "$year",
"month": "$month"
},
"sales": {
"$sum": {
"$cond": ["$data.estadoVenta", 1, 0]
}
}
}
}]
But get this error:
pymongo.errors.OperationFailure: exception: $add only supports numeric or date types, not String
I think whats happening is that the js "new Date(0)" is being interpreted by the mongo driver as a string, not applied as js. If I remove the encapsulating inverted double quotes then Python tries to interpret this code and errors accordingly. This is just one example and I'd like to include more js in queries in future tests but can't see a way to get it to play nicely with Python (having said this I'm fairly new to Python too).
Does anybody know if:
I'm correct in assuming the error occurs because mongo interprets the
JS as a string and tries to sum it directly?
If I can indicate to
mongo this is JS from Python without Python trying to intepret the
code?
So far I've tried searching via Google and various combinations of single and double inverted commas.
Pasted below is a few rows of randomly generated test data if required:
Thanks,
James
{'_id': 0,'data': {'edad': '74','estadoVenta': True,'visits': [{'visitLength': 1819.349246663518,'visitNo': 1,'visitTime': 1480244647948.0}],'apellido2': 'Aguilar','apellido1': 'Garcia','horaContacto': 1464869545373.0,'preNombre': 'Agustin','_id': 0,'telefono': 630331272,'location': {'province': 'Aragón','city': 'Zaragoza','type': 'Point','coordinates': [-0.900203, 41.747726],'country': 'Spain'}}},
{'_id': 1,'data': {'edad': '87','estadoVenta': False,'visits': [{'visitLength': 2413.9938072105024,'visitNo': 1,'visitTime': 1465417353597.0}],'apellido2': 'Torres','apellido1': 'Acosta','horaContacto': 1473404147769.0,'preNombre': 'Sara','_id': 1,'telefono': 665968746,'location': {'province': 'Galicia','city': 'Cualedro','type': 'Point','coordinates': [-7.659321, 41.925328],'country': 'Spain'}}},
{'_id': 2,'data': {'edad': '48','estadoVenta': True,'visits': [{'visitLength': 2413.9938072105024,'visitNo': 1,'visitTime': 1465415138597.0}],'apellido2': 'Perez','apellido1': 'Sanchez','horaContacto': 1473404923569.0,'preNombre': 'Sara','_id': 2,'telefono': 665967346,'location': {'province': 'Galicia','city': 'Barcelona','type': 'Point','coordinates': [-7.659321, 41.925328],'country': 'Spain'}}}
The MongoDB aggregation framework cannot use any Javascript. You must specify all the data in your aggregation pipeline using BSON. PyMongo can translate a standard Python datetime to BSON, and you can send it as part of the aggregation pipeline, like so:
import datetime
epoch = datetime.datetime.fromtimestamp(0)
pipeline = [{
"$project": {
"year": {
"$year": {
"$add": [epoch, "$data.horaContacto"]
}
},
# the rest of your pipeline here ....
}
}]
cursor = db.collection.aggregate(pipeline)

Categories