Elasticsearch for python - Get documents deleted by query

Elasticsearch for python - Get documents deleted by query - python

I'm using Elasticsearch in python, and I can't figure out how to get the ids of the documents deleted by the delete_by_query() method! By default it only the number of documents deleted.
There is a parameter called _source that if set to True should return the source of the deleted documents. This doesn't happen, nothing changes.
Is there a good way to know which document where deleted?

The delete by query endpoint only returns a macro summary of what happened during the task, mainly how many documents were deleted and some other details.
If you want to know the IDs of the document that are going to be deleted, you can do a search (with _source: false) before running the delete by query operation and you'll get the expected IDs.

Related

how to delete document without deleting collection in firestore?

I want to create some kind of collection which cannot be deleted. The reason I made it like that is because when the document is empty my website can't do the data creation process
is it possible to create a collection in firestore that has an empty document?
i use python firebase_admin

In Firestore, there is no such thing as an "empty collection". Collections simply appear in the console when there is a document present, and disappear when the last document is deleted. If you want to know if a collection is "empty", then you can simply query it and check that it has 0 documents.
Ideally, your code should be robust enough to handle the possibility of a missing document, because Firestore will do nothing to stop a document from being deleted if that's what you do in the console or your code.

Not able to retrieve data from firebase in django

I've been trying to get data from Firebase into my Django app the issue i face is that some of the documents are retrieved and some aren't. A really weird thing I noticed is when on the admin page the documents that can be accessed are highlighted in a darker shade than the ones that we aren't able to get from the database.
The highlighted issue is shown in the image above. The first document is highlighted but the second isn't and the first is read by the django function below
def home(request, user=""):
db = firestore.client()
docs = db.collection(u'FIR_NCR').stream()
for doc in docs:
print(doc.id,end="->")
s = db.collection(u'FIR_NCR').document(u'{}'.format(doc.id)).collection(u'all_data').get()
print(s[0].id,end="->")
print(s[0].to_dict())
return render(request, "home.html", {"user":user})
In this docs is not able to get the complete list of the documents necessary and hence the issue.
It would be wonderful if someone could help me understand what I'm doing wrong. T.I.A.

The document ID isn't actually highlighted. The difference between the first and the second ID is that the second one is in italics. That means there is no actual document with that ID. The reason why the Firestore console shows you a document ID at all for a missing document is because it has a nested subcollection. You can click into that missing document, then again click into the subcollection.
In Firestore, you can have subcollections nested under documents that don't exist. This is OK. Just be aware that these missing documents can't be discovered by a normal query in the collection where you see them in the console.

Setting read_policy in AppEngine Python

In this document it is mentioned that the default read_policy setting is ndb.EVENTUAL_CONSISTENCY.
After I did a bulk delete of entity items from the Datastore versions of the app I pulled up continued to read the old data, so I've tried to figure out how to change this to STRONG_CONSISTENCY with no success, including:
entity.query().fetch(read_policy=ndb.STRONG_CONSISTENCY) and
...fetch(options=ndb.ContextOptions(read_policy=ndb.STRONG_CONSISTENCY))
The error I get is
BadArgumentError: read_policy argument invalid ('STRONG_CONSISTENCY')
How does one change this default? More to the point, how can I ensure that NDB will go to the Datastore to load a result rather than relying on an old cached value? (Note that after the bulk delete the datastore browser tells me the entity is gone.)

You cannot change that default, it is also the only option available. From the very doc you referenced (no other options are mentioned):
Description
Set this to ndb.EVENTUAL_CONSISTENCY if, instead of waiting for the
Datastore to finish applying changes to all returned results, you wish
to get possibly-not-current results faster.
The same is confirmed by inspecting the google.appengine.ext.ndb.context.py file (no STRONG_CONSISTENCY definition in it):
# Constant for read_policy.
EVENTUAL_CONSISTENCY = datastore_rpc.Configuration.EVENTUAL_CONSISTENCY
The EVENTUAL_CONSISTENCY ends up in ndb via the google.appengine.ext.ndb.__init__.py:
from context import *
__all__ += context.__all__
You might be able to avoid the error using a hack like this:
from google.appengine.datastore.datastore_rpc import Configuration
...fetch(options=ndb.ContextOptions(read_policy=Configuration.STRONG_CONSISTENCY))
However I think that only applies to reading the entities for the keys obtained through the query, but not to obtaining the list of keys themselves, which comes from the index the query uses, which is always eventually consistent - the root cause of your deleted entities still appearing in the result (for a while, until the index is updated). From Keys-only Global Query Followed by Lookup by Key:
But it should be noted that a keys-only global query can not exclude
the possibility of an index not yet being consistent at the time of
the query, which may result in an entity not being retrieved at all.
The result of the query could potentially be generated based on
filtering out old index values. In summary, a developer may use a
keys-only global query followed by lookup by key only when an
application requirement allows the index value not yet being
consistent at the time of a query.
Potentially of interest: Bulk delete datastore entity older than 2 days

Mongoengine: Check if document is already in DB

I am working on a kind of initialization routine for a MongoDB using mongoengine.
The documents we deliver to the user are read from several JSON files and written into the database at the start of the application using the above mentioned init routine.
Some of these documents have unique keys which would raise a mongoengine.errors.NotUniqueError error if a document with a duplicate key is passed to the DB. This is not a problem at all since I am able to catch those errors using try-except.
However, some other documents are something like a bunch of values or parameters. So there is no unique key which a can check in order to prevent those from being inserted to the DB twice.
I thought I could read all existing documents from the desired collection like this:
docs = MyCollection.objects()
and check whether the document to be inserted is already available in docs by using:
doc = MyCollection(parameter='foo')
print(doc in docs)
Which prints false even if there is a MyCollection(parameter='foo') document in the the DB already.
How can I achieve a duplicate detection without using unique keys?

You can check using an if statement:
if not MyCollection.objects(parameter='foo'):
# insert your documents

Inconsistent results being returned by mongodb find query

I ran the following query on a collection in my mongodb database.
db.coll.find({field_name:{$exists:true}}).count() and it returned 2437185. The total records reported by db.coll.find({}).count() is 2437228 .
Now when i run the query db.coll.find({field_name:{$exists:false}}).count() , instead of returning 43 it returned 0.
I have the following two questions :
Does the above scenario mean that the data in my collection has become corrupt ?.
I had posted a question earlier about this at ( Updating records in MongoDB through pymongo leads to deletion of most of them). The person who replied said that updating data in mongo db could blank out the data but not delete it. What does that mean ?.
Thank You

I believe you're running into the issue reported at SERVER-1587. What version of MongoDB are you using? If it is less than 1.9.0, you can use the following as a work-around:
db.coll.find({field_name: {$not: {$exists: true}}}).count()
As for the other question, "blanking out" in this case means that an update can change the value of or unset any or all fields in a document, but can't remove the document itself. The only way to remove a document is with remove()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Elasticsearch for python - Get documents deleted by query - python

Related

how to delete document without deleting collection in firestore?

Not able to retrieve data from firebase in django

Setting read_policy in AppEngine Python

Mongoengine: Check if document is already in DB

Inconsistent results being returned by mongodb find query

Categories

Resources