Query on wildcard Document ID - python

Is that possible to filter document id based on date? For example, documents are inserted daily and we want to delete the previous date data. We planned to append date along document id, and perform deletion with filtering on document ID with wildcard - 20181101_* to delete all the document which their id start with some matching date.
Another approach will be inserting a date field in each document to run a WHERE clause.
q = doc_ref.where(u'date', u'==', 20181101).get()
I got this, but just wondering if there is a better approach.

perform deletion with filtering on document ID with wildcard - 20181101_*
There is no way in Cloud Firestore to create a query based on wildcards.
delete all the document which their id start with some matching date.
There is also no way in which you can query elements in a Firestore database that start with some matching date. To solve this, you should follow the instructions from the official documentation which says:
Cloud Firestore doesn't support native indexing or search for text fields in documents. Additionally, downloading an entire collection to search for fields client-side isn't practical.
To enable full text search of your Cloud Firestore data, use a third-party search service like Algolia.
And yes, you are guessing right, the best solution is to to add a new date property and to query the database according to it.

Related

Query mongoDB by date document created using pymongo

Lots of solutions to querying mongoDB using date/time field in MongoDB but what if the mongo doc doesn't have a date/time field?
I've noticed that when I hover the mouse over a document _id (using NoSQLBooster for MongoDB) I get a "createdAt" dropdown (see screenshot below). Just wondering if there is anyway to do a query using pymongo where documents are filtered based on a date/time range using their "createdAt" metadata?
In MongoDB the id of the docs contains the timestamp of creation, this is mentioned on this other question.
You can make a script that insert a date/field using this information to perform those queries or perform the query directly to using the objectId as in here.

How to delete queried results from Splunk database?

Query is on Splunk DB data delete:
My requirement:
I do a query to splunk, based on time stamp, "from date" & "to date".
After I got the list of all events results between the timestamp, I want to delete these list of events from the Splunk database.
Each queried results data will be stored in the destination database, hence I want to delete each queried results data from querying Splunk DB, so that my next query will not end up in giving repetitive results, also I want to free up the storage space in source Splunk DB.
Hence I want a effective solution on how to delete completely the Queried result data, from querying Splunk DB?
Thanks & Regards,
Dharmendra Setty
I'm not sure you can actually delete them to free up storage space.
As written here, what you can do is simply mask the results from ever showing up again in the next searches.
To do this, simply pipe the "delete" command to your search query.
BE CAREFUL: First make sure these really are the events you want to delete
Example:
index=<index-name> sourcetype=<sourcetype-name> source=<source-name>
earliest="%m/%d/%Y:%H:%M:%S" latest="%m/%d/%Y:%H:%M:%S" | delete
Where
index=<index-name> sourcetype=<sourcetype-name> source=<source-name>
earliest="%m/%d/%Y:%H:%M:%S" latest="%m/%d/%Y:%H:%M:%S"
is the search query

How to compare multiple dates on an NDB query?

I need to fetch objects on an NDB queries that match a given start and end date, but I'm not able to do this traditionally simple query because NDB is complaining:
from google.appengine.ext import ndb
from datetime import datetime
from server.page.models import Post
now = datetime.now()
query = Post.query(
Post.status == Post.STATUS_ACTIVE,
Post.date_published_start <= now,
Post.date_published_end >= now,
)
count = query.count()
Error:
BadRequestError: Only one inequality filter per query is supported.
Encountered both date_published_start and date_published_end
Is there any workarounds for this?
Dynamically obtaining a single result list that can be directly used for pagination without any further processing is not possible due to the limitation of a single inequality filter per query limitation. Related GAE 4301 issue.
As Jeff mentioned, filtering by one inequality (ideally the most restrictive one) followed by further dynamic processing of the results is always an option, inefficient as you noted, but unavoidable if you need total flexibility of the search.
You could improve the performance by using a projection query - reducing the amount of data transfered from the datastore to just the relevant properties.
You could also try to perform 2 keys-only queries, one for each inequality, then compute the intersection of the results - this could give you the pagination counts and list of entities (as keys) faster. Finally you'd get the entities for the current page by direct key lookups for the keys in the page list, ideally batched (using ndb.get_multi()).
Depending on the intended use you might have other alternatives in some cases (additional work required, of course).
You could restrict the scope of the queries. Instead of querying all Post entities since the begining of time maybe just results in a certain year or month would suffice in certain cases. Then you could add the year and/or month Post properties which you can include as equality filters in your queries, potentially reducing the number of results to process dynamically from thousands to, say, hundreds or less.
You could also avoid the queries altogether for typical, often-use cases. For example if the intended use is to generate a few kinds of monthly reports you could have some Report entities containing lists of Post keys for each such report kind/month which you could update whenever a Post entity's relevant properties change. Instead of querying Posts entities for a report you'd instead just use the already available lists from the respective Report entity. You could also store/cache the actual report upon generation, for direct re-use (instead of re-generating it at every access).
Another workaround for querying with multiple filter and inequalities is to use the Search API.
https://cloud.google.com/appengine/training/fts_adv/lesson1#query_options
From the documentation:
For example, the query job tag:"very important" sent < 2011-02-28
finds documents with the term job in any field, and also contain the
phrase very important in a tag field, and a sent date prior to
February 28, 2011.
Just put your data from Datastore query into Search documents and run your query on these documents.

Mongoengine: Check if document is already in DB

I am working on a kind of initialization routine for a MongoDB using mongoengine.
The documents we deliver to the user are read from several JSON files and written into the database at the start of the application using the above mentioned init routine.
Some of these documents have unique keys which would raise a mongoengine.errors.NotUniqueError error if a document with a duplicate key is passed to the DB. This is not a problem at all since I am able to catch those errors using try-except.
However, some other documents are something like a bunch of values or parameters. So there is no unique key which a can check in order to prevent those from being inserted to the DB twice.
I thought I could read all existing documents from the desired collection like this:
docs = MyCollection.objects()
and check whether the document to be inserted is already available in docs by using:
doc = MyCollection(parameter='foo')
print(doc in docs)
Which prints false even if there is a MyCollection(parameter='foo') document in the the DB already.
How can I achieve a duplicate detection without using unique keys?
You can check using an if statement:
if not MyCollection.objects(parameter='foo'):
# insert your documents

When an entry is deleted from the datastore, is its corresponding search document also deleted?

I am using Google App Engine's Search API to index entities from the Datastore. After I create or modify an object, I have to add it to the Search index. I do this by creating a add_to_search_index method for each model whose entities are indexed, for example:
class Location(ndb.Model):
...
def add_to_search_index(self):
fields = [
search.TextField(name="name", value=self.name),
search.GeoField(name="location", value= search.GeoPoint(self.location.lat, self.location.lon)),
]
document = search.Document(doc_id=str(self.key.id()), fields=fields)
index = search.Index(name='Location_index')
index.put(document)
Does the search API automatically maintain any correspondence between indexed documents and datastore entities?
I suspect they are not, meaning that the Search API will maintain deleted, obsolete entities in its index. If that's the case, then I suppose the best approach would be to use the NDB hook methods to create a remove_from_search_index method that is called before put (for edits/updates) and delete. Please advise if there is a better solution for maintaining correspondence between the datastore and search indices.
Since the datastore (NDB) and the search API are separate back ends they are to be maintained separately. I see you're using the key.id() as the document id. You can use this document id to get a document or to delete it. Maintaining the creation of the search document can be done in the model's _post_put_hook and _post_delete_hook. You may also use the repository pattern to do this. How you do this is up to you.
index = search.Index(name='Location_index')
index.delete([doc_id])

Categories