find_and_modify with upsert using Python-EVE

find_and_modify with upsert using Python-EVE - python

There is common use case when you need update or insert. For instance:
obj = db['data'].find_and_modify(
{
'Name': data['Name'],
'SourcePage': data['SourcePage'],
},
data,
upsert=True
)
Of course can split this request into GET and then PATCH or INSERT but maybe there is better way?
P.S. eve provides some nice features like document versions and meta data (_created, _updated etc.)

upsert support is now part of the upcoming release.
One doesn't have to do anything different. The feature is "turned on" by default. So if a user tries to PUT an item that does not exist, a new item will be created. The id field sent in the payload is ignored.
If a user does not want this feature, the user needs to explicitly set UPSERT_ON_PUT to False. Now, the user gets the "old" behaviour back. i.e when the user tries to PUT non-existing item, 404 is returned.

Related

Pymongo BulkWriteResult doesn't contain upserted_ids

Okey so currently I'm trying to upsert something in a local mongodb using pymongo.(I check to see if the document is in the db and if it is, update it, otherwise just insert it)
I'm using bulk_write to do that, and everything is working ok. The data is inserted/updated.
However, i would need the ids of the newly inserted/updated documents but the "upserted_ids" in the bulkWriteResult object is empty, even if it states that it inserted 14 documents.
I've added this screenshot with the variable. Is it a bug? or is there something i'm not aware of?
Finally, is there a way of getting the ids of the documents without actually searching for them in the db? (If possible, I would prefer to use bulk_write)
Thank you for your time.
EDIT:
As suggested, i added a part of the code so it's easier to get the general ideea:
for name in input_list:
if name not in stored_names: #completely new entry (both name and package)
operations.append(InsertOne({"name": name, "package" : [package_name]}))
if len(operations) == 0:
print ("## No new permissions to insert")
return
bulkWriteResult = _db_insert_bulk(collection_name,operations)
and the insert function:
def _db_insert_bulk(collection_name,operations_list):
return db[collection_name].bulk_write(operations_list)

The upserted_ids field in the pymongo BulkWriteResult only contains the ids of the records that have been inserted as part of an upsert operation, e.g. an UpdateOne or ReplaceOne with the upsert=True parameter set.
As you are performing InsertOne which doesn't have an upsert option, the upserted_ids list will be empty.
The lack of an inserted_ids field in pymongo's BulkWriteResult in an omission in the drivers; technically it conforms to crud specificaiton mentioned in D. SM's answer as it is annotated as "Drivers may choose to not provide this property.".
But ... there is an answer. If you are only doing inserts as part of your bulk update (and not mixed bulk operations), just use insert_many(). It is just as efficient as a bulk write and, crucially, does provide the inserted_ids value in the InsertManyResult object.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
inserts = [{'foo': 'bar'}]
result = db.test.insert_many(inserts, ordered=False)
print(result.inserted_ids)
Prints:
[ObjectId('5fb92cafbe8be8a43bd1bde0')]

This functionality is part of crud specification and should be implemented by compliant drivers including pymongo. Reference pymongo documentation for correct usage.
Example in Ruby:
irb(main):003:0> c.bulk_write([insert_one:{a:1}])
=> #<Mongo::BulkWrite::Result:0x00005579c42d7dd0 #results={"n_inserted"=>1, "n"=>1, "inserted_ids"=>[BSON::ObjectId('5fb7e4b12c97a60f255eb590')]}>
Your output shows that zero documents were upserted, therefore there wouldn't be any ids associated with the upserted documents.
Your code doesn't appear to show any upserts at all, which again means you won't see any upserted ids.

Saving and updating nested documents with MongoEngine

I want to implement this structural model to store my data on Mongodb with MongoEngine on flask:
skills = [{"asm":"Assembly",
"flag":False,
"date": datetime},
{"java":"Java",
"flag":False,
"date": datetime}]
So I don't know how I can declare and update this kind of structure.
For updating one object I used:
User.objects(skills=form.skills.data).update_one()
However, I don't know how to update more fields in one shot.
I tried with the code below but it doesn’t work.
now = datetime.now()
User.objects(skills=form.skills).update_one(set__skills = ({'ruby':'Ruby'}, {'flag':'true'},{'date':now}))
What kind of fields should I declare on forms.py?

For what I understood, you need a a nested document (skills) into another (who refers to User in this case). For doing something like this you don't have to update atomically a field but append values to the subdocument and the save everything.
Tryin' to follow your example, in your case should do something like this:
user = User.objects(email=current_user.email).get()
To get the BaseQuery that refers to user X through a certain query filter, in my example the email of the current logged user
user.kskills.append(SubDocumentClass(skillName="name_of_the_skill", status=True, date=datetime.now()))
For append a collection to the subdocument list. (I've appended your field)
user.save()
To save everything

Python-Eve: Prevent inserting duplicates without using unique fields

I am trying to prevent inserting duplicate documents by the following approach:
Get a list of all documents from the desired endpoint which will contain all the documents in JSON-format. This list is called available_docs.
Use a pre_POST_<endpoint> hook in order to handle the request before inserting to the data. I am not using the on_insert hook since I need to do this before validation.
Since we can access the request object use request.json to get the payload JSON-formatted
Check if request.json is already contained in available_docs
Insert new document if it's not a duplicate only, abort otherwise.
Using this approach I got the following snippet:
def check_duplicate(request):
if not request.json in available_sims:
print('Not a duplicate')
else:
print('Duplicate')
flask.abort(422, description='Document is a duplicate and already in database.')
The available_docs list looks like this:
available_docs = [{'foo': ObjectId('565e12c58b724d7884cd02bb'), 'bar': [ObjectId('565e12c58b724d7884cd02b9'), ObjectId('565e12c58b724d7884cd02ba')]}]
The payload request.json looks like this:
{'foo': '565e12c58b724d7884cd02bb', 'bar': ['565e12c58b724d7884cd02b9', '565e12c58b724d7884cd02ba']}
As you can see, the only difference between the document which was passed to the API and the document already stored in the DB is the datatype of the IDs. Due to that fact, the if-statement in my above snippet evaluates to True and judges the document to be inserted not being a duplicate whereas it definitely is a duplicate.
Is there a way to check if a passed document is already in the database? I am not able to use unique fields since the combination of all document fields needs to be unique only. There is an unique identifier (which I left out in this example), but this is not suitable for the desired comparison since it is kind of a time stamp.
I think something like casting the given IDs at the keys foo and bar as ObjectIDs would do the trick, but I do not know how to to this since I do not know where to get the datatype ObjectID from.

You approach would be much slower than setting a unique rule for the field.
Since, from your example, you are going to compare objectids, can't you simply use those as the _id field for the collection? In Mongo (and Eve of course) that field is unique by default. Actually, you typically don't even define it. You would not need to do anything at all, as a POST of a document with an already existing id would fail right away.
If you can't go that way (maybe you need to compare a different objectid field and still, for some reason, you can't simply set a unique rule for the field), I would look at querying the db for the field value instead than getting all the documents from the db and then scanning them sequentially in code. Something like db.find({db_field: new_document_field_value}). If that returns true, new document is a duplicate. Make sure db_field is indexed (which usually holds true also for fields tagged with unique rule)
EDIT after the comments. A trivial implementation would probable be something like this:
def pre_POST_callback(resource, request):
# retrieve mongodb collection using eve connection
docs = app.data.driver.db['docs']
if docs.find_one({'foo': <value>}):
flask.abort(422, description='Document is a duplicate and already in database.')
app = Eve()
app.run()

Here's my approach on preventing duplicate records:
def on_insert_subscription(items):
c_subscription = app.data.driver.db['subscription']
user = decode_token()
if user:
for item in items:
if c_subscription.find_one({
'topic': ObjectId(item['topic']),
'client': ObjectId(user['user_id'])
}):
abort(422, description="Client already subscribed to this topic")
else:
item['client'] = ObjectId(user['user_id'])
else:
abort(401, description='Please provide proper credentials')
What I'm doing here is creating subscriptions for clients. If a client is already subscribed to a topic I throw 422.
Note: the client ID is decoded from the JWT token.

Python-Eve: More than one additional lookup

Using additional lookups I am able to access a given document of a desired endpoint by a secondary one as stated in the docs:
Besides the standard item endpoint which defaults to
/<resource>/<ID_FIELD_value>, you can optionally define a secondary,
read-only, endpoint like /<resource>/<person_name>. You do so by
defining a dictionary comprised of two items field and url. The former
is the name of the field used for the lookup. If the field type (as
defined in the resource schema) is a string, then you put a URL rule
in url. If it is an integer, then you just omit url, as it is
automatically handled. See the code snippet below for an usage example
of this feature.
So recalling the given example from the docs:
people = {
# 'title' tag used in item links. Defaults to the resource title minus
# the final, plural 's' (works fine in most cases but not for 'people')
'item_title': 'person',
# by default, the standard item entry point is defined as
# '/people/<ObjectId>/'. We leave it untouched, and we also enable an
# additional read-only entry point. This way consumers can also perform
# GET requests at '/people/<lastname>'.
'additional_lookup': {
'url': 'regex("[\w]+")',
'field': 'lastname'
},
# We choose to override global cache-control directives for this resource.
'cache_control': 'max-age=10,must-revalidate',
'cache_expires': 10,
# we only allow GET and POST at this resource endpoint.
'resource_methods': ['GET', 'POST'],
}
Since lastname is set as a secondary endpoint I would be able to access the people endpoint either through the document id (which is default) or the lastname key of a stored document.
Assuming that each person has a unique lastname and a unique nickname. Is there a possibility to define more than one additional lookup in order to have access via the lastname and the nickname?
However, this example is just to show what I am looking for. In my real use case I have a database containing different product information and I want to be able to access those information by both the English and the German title so I can guarantee that all my additional lookups will result in unique document keys.

You can't add more than one additional lookup to the same endpoint. What you can do however, is have multiple endpoints consuming the same datasource.
Multiple API endpoints can target the same database collection. For example you can set both /admins and /users to read and write from the same people collection on the database.
Quote is from the Advanced Datasource Patterns. So you could simply have /books/english/<title> and /books/german/<title>. Both endpoints would still consume the same database collection.
Depending on your schema design you might also go the Sub Resources path and set up something like /products/<language>/books/<title>.

How to serialize and deserialize Django ORM query (not queryset)?

My use case is that I need to store queries in DB and retrieve them from time to time and evaluate. Thats needed for mailing-app where every user can subscribe to a web-site content selected by individually customized query.
Most basic solution is to store raw SQL and use it with RawQuerySet. But I wonder is there better solutions?

At first glance, it is really dangerous to hand out query building job to others, since they can do anything (even delete all your data in your database or drop entire table etc.)
Even you let them build a specific part of the query, it is still open to Sql Injection. If it is ok for all those dangers, then you may try the following.
This is and old script I used and let users set a specific part of the query. Basics are using string.Template and eval (the evil part)
Define your Model:
class SomeModel(Model):
usr = ForeingKey(User)
ct = ForeignKey(ContentType) # we will choose related DB table with this
extra_params = TextField() # store extra filtering criteria in here
Lets execute all queries belongs to a user. Say we have a User query with extra_params is_staff and 'username__iontains'
usr: somebody
ct: User
extra_params: is_staff=$stff_stat, username__icontains='$uname'
$ defines placeholders in extra_params
from string import Template
for _qry in SomeModel.objects.filter(usr='somebody'): # filter somebody's queries
cts = Template(_qry.extra_params) # take extras with Template
f_cts = cts.substitute(stff_stat=True, uname='Lennon') # sustitute placeholders with real time filtering values
# f_cts is now `is_staff=True, username__icontains='Lennon'`
qry = Template('_qry.ct.model_class().objects.filter($f_cts)') # Now, use Template again to place our extras into a django `filter` query. We also select related model in here with `_qry.ct.model_class()`
exec_qry = qry.substitute(f_cts=f_cts)
# now we have `User.objects.filter(is_staff=True, username__icontains='Lennon')
query = eval(exec_qry) # lets evaluate it!
If you have all relted imports done,then you an use Q or any other query building option in your extra_params. Also You can use other methods to form Create or Update queries.
You can read more about Template form there. But as I said. It is REALLY DANGEROUS to give a such option to other users.
Also you may need to read about Django Content Type
Update: As #GillBates mentioned, you can use a dictonary structure to create the query. In this case, you will not need Template anymore. You can use json for such data transfer (or any other if you wish). Assuming you use json to get the data from an outer source following code is a scratch that uses some variables from the upper code block.
input_data : '{"is_staff"=true, "username__icontains"="Lennon"}'
import json
_data = json.loads(input_data)
result_set = _qry.ct.model_class().objects.filter(**_data)

According to your answer,
User passes some content-specific parameters into a form, then view function, that recieves POST, constructs query
one option is to store parameters (pickle'd or json'ed, or in a model) and reconstruct query with regular django means. This is somewhat more robust solution, since it can handle some datastructure changes.

You could create a new model user_option and store the selections in this table.
From your question, it's hard to determine whether it is a better solution, but it would make your user's choices more explicit in your data structure.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.