Refresh mongodb collection structure through python mongoengine - python

I'm writing a simple Flask app, with the sole purpose to learn Python and MongoDB.
I've managed to reach to the point where all the collections are defined, and CRUD operations work in general. Now, one thing that I really want to understand, is how to refresh the collection, after updating its structure. For example, say that I have the following model:
user.py
class User(db.Document, UserMixin):
email = db.StringField(required=True, unique=True)
password = db.StringField(required=True)
active = db.BooleanField()
first_name = db.StringField(max_length=64, required=True)
last_name = db.StringField(max_length=64, required=True)
registered_at = db.DateTimeField(default=datetime.datetime.utcnow())
confirmed = db.BooleanField()
confirmed_at = db.DateTimeField()
last_login_at = db.DateTimeField()
current_login_at = db.DateTimeField()
last_login_ip = db.StringField(max_length=45)
current_login_ip = db.StringField(max_length=45)
login_count = db.IntField()
companies = db.ListField(db.ReferenceField('Company'), default=[])
roles = db.ListField(db.ReferenceField(Role), default=[])
meta = {
'indexes': [
{'fields': ['email'], 'unique': True}
]
}
Now, I already have entries in my user collection, but I want to change companies to:
company = db.ReferenceField('Company')
How can I refresh the collection's structure, without having to bring the whole database down?
I do have a manage.py script that helps me and also provides a shell:
#!/usr/bin/python
from flask.ext.script import Manager
from flask.ext.script.commands import Shell
from app import factory
app = factory.create_app()
manager = Manager(app)
manager.add_command("shell", Shell(use_ipython=True))
# manager.add_command('run_tests', RunTests())
if __name__ == "__main__":
manager.run()
and I have tried a couple of commands, from information that I could recompile and out of my basic knowledge:
>>> from app.models import db, User
>>> import mongoengine
>>> mongoengine.Document(User)
field = iter(self._fields_ordered)
AttributeError: 'Document' object has no attribute '_fields_ordered'
>>> mongoengine.Document(User).modify() # well, same result as above
Any pointers on how to achieve this?
Update
I am asking all of this, because I have updated my user.py to match my new requests, but anytime I interact with the db its self, since the table's structure was not refreshed, I get the following error:
FieldDoesNotExist: The field 'companies' does not exist on the
document 'User', referer: http://local.faqcolab.com/company

Solution is easier then I expected:
db.getCollection('user').update(
// query
{},
// update
{
$rename: {
'companies': 'company'
}
},
// options
{
"multi" : true, // update all documents
"upsert" : false // insert a new document, if no existing document match the query
}
);
Explanation for each of the {}:
First is empty because I want to update all documents in user collection.
Second contains $rename which is the invoking action to rename the fields I want.
Last contains aditional settings for the query to be executed.

I have updated my user.py to match my new requests, but anytime I interact with the db its self, since the table's structure was not refreshed, I get the following error
MongoDB does not have a "table structure" like relational databases do. After a document has been inserted, you can't change it's schema by changing the document model.
I don't want to sound like I'm telling you that the answer is to use different tools, but seeing things like db.ListField(db.ReferenceField('Company')) makes me think you'd be much better off with a relational database (Postgres is well supported in the Flask ecosystem).
Mongo works best for storing schema-less documents (you don't know before hand how your data is structured, or it varies significantly between documents). Unless you have data like that, it's worth looking at other options. Especially since you're just getting started with Python and Flask, there's no point in making things harder than they are.

Related

Django fake model instanciation - No testunit [duplicate]

This question already has an answer here:
How can I create a django model instance with deferred fields without hitting the database?
(1 answer)
Closed 8 months ago.
I want to know if I can instanciate an empty fake model just with id of database record.
I found way to create mockup model, but I want a production-friendly solution.
Explanation of my issue :
I want to list users settings for users who choose to be displayed on public mode :
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).only(
'user_id',
'is_premium',
)
user_settings_list = []
for user_displayed in user_displayed_list:
# I have to send user Instance to the next method :
user_settings = self.get_user_settings(user_displayed.user)
user_settings_list.append(user_settings)
# But ’user_displayed.user’ run an new SQL query
I know I can improve my queryset as :
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).select_related(
'user'
).only(
'user',
'is_premium',
)
But It makes an useless join because I need only the user id field in get_user_settings():
The get_user_settings() method (it could help to understand context):
def get_user_settings(self, user)
user_settings = UserSettings.objects.get(user = user)
return user_settings
In real project, this method run more business feature
Is there a way to instanciate a User model instance with only id field filled ?
I don't want to use a custom empty class coded for this purpose. I really want an object User.
I didn't find anything for that. If it's possible, I could use it by this way :
for user_displayed in user_displayed_list:
FakeUser = User.objects.create_fake(id = user_displayed.user_id)
# I have to send user Instance to the next method :
user_settings = self.get_user_settings(FakeUser)
Without seeing the complete models, I'm assuming a bit. Assuming that UserSettings has a ForeignKey to User. Same for UserPublicProfile. Or User has ForeignKey to UserSettings. Works as well.
Assuming that, I see two solutions.
Solution #1; use the ORM to full potential
Just saw your comment about the 'legacy method, used many times'.
Django relations are very smart. They accept either the object or the ID of a ForeignKey.
You'd imagine this only works with a User. But if you pass the id, Django ORM will help you out.
def get_user_settings(self, user)
user_settings = UserSettings.objects.get(user = user)
return user_settings
So in reality, these work the same:
UserSettings.objects.get(user=1)
UserSettings.objects.get(user_id=1)
Which means this should work, without a extra query:
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).only(
'user_id',
'is_premium',
)
user_settings_list = []
for user_displayed in user_displayed_list:
# I have to send user Instance to the next method :
user_settings = self.get_user_settings(user_displayed.user_id) # pass the user_id instead of the object.
user_settings_list.append(user_settings)
Solution #2: chain relations
Another solution, again, still assuming quite a bit ;)
It would think you can chain the model together.
Assuming these FK exists: UserPublicProfile -> User -> UserSetting.
You could do this:
user_displayed_list = UserPublicProfile.objects.filter(
displayed = True,
).select_related(
'user', 'user__usersettings', # depends on naming of relations
).only(
'user',
'is_premium',
)
for user_displayed in user_displayed_list:
# I have to send user Instance to the next method :
user_settings = user_displayed.user.usersettings # joined, so should cause no extra queries. Depends on naming of relations.
user_settings_list.append(user_settings)

router.register default POST method writes empty value to postgres DB

I have a python project setup with Django 1.8.0 and POSTGRESQL. My model look like this:
class poll_db(models.Model):
p_pk = models.AutoField(primary_key=True)
p_name = models.CharField(max_length=256)
p_desc = models.CharField(max_length=512)
I have a post url registered to router on urls.py:
router.register(r'newpoll', views.createPoll)
I am trying to make a default POST call with the following URL
http://localhost:8080/newpoll/
And my postBody looks like:
{
"name": "What's the weekend plan?",
"desc": "Poll to decide on the weekend plan"
}
The request hits the server and there is a new entry created on the DB. But when I look at the created entry, it has empty values except for the p_pk
14 | |
which means the values are passed as empty. But when I try to override the default create method on the views.py, I see the values as part of the request and add to the db is fine.
All I am trying is to skip writing a method for adding it to the DB and use the default create method.
Any help is much appreciated. Thanks!
My bad. I had read-only fields on serializer.

Update row (SQLAlchemy) with data from marshmallow

I'm using Flask, Flask-SQLAlchemy, Flask-Marshmallow + marshmallow-sqlalchemy, trying to implement REST api PUT method. I haven't found any tutorial using SQLA and Marshmallow implementing update.
Here is the code:
class NodeSchema(ma.Schema):
# ...
class NodeAPI(MethodView):
decorators = [login_required, ]
model = Node
def get_queryset(self):
if g.user.is_admin:
return self.model.query
return self.model.query.filter(self.model.owner == g.user)
def put(self, node_id):
json_data = request.get_json()
if not json_data:
return jsonify({'message': 'Invalid request'}), 400
# Here is part which I can't make it work for me
data, errors = node_schema.load(json_data)
if errors:
return jsonify(errors), 422
queryset = self.get_queryset()
node = queryset.filter(Node.id == node_id).first_or_404()
# Here I need some way to update this object
node.update(data) #=> raises AttributeError: 'Node' object has no attribute 'update'
# Also tried:
# node = queryset.filter(Node.id == node_id)
# node.update(data) <-- It doesn't if know there is any object
# Wrote testcase, when user1 tries to modify node of user2. Node doesn't change (OK), but user1 gets status code 200 (NOT OK).
db.session.commit()
return jsonify(), 200
UPDATED, 2022-12-08
Extending the ModelSchema from marshmallow-sqlalchemy instead of Flask-Marshmallow you can use the load method, which is defined like this:
load(data, *, session=None, instance=None, transient=False, **kwargs)
Putting that to use, it should look like that (or similar query):
node_schema.load(json_data, session= current_app.session, instance=Node().query.get(node_id))
And if you want to load without all required fields of Model, you can add the partial=True argument, like this:
node_schema.load(json_data, instance=Node().query.get(node_id), partial=True)
See the docs for more info (does not include definition of ModelSchema.load).
See the code for the load definition.
I wrestled with this issue for some time, and in consequence came back again and again to this post. In the end what made my situation difficult was that there was a confounding issue involving SQLAlchemy sessions. I figure this is common enough to Flask, Flask-SQLAlchemy, SQLAlchemy, and Marshmallow, to put down a discussion. I certainly, do not claim to be an expert on this, and yet I believe what I state below is essentially correct.
The db.session is, in fact, closely tied to the process of updating the DB with Marshmallow, and because of that decided to to give the details, but first the short of it.
Short Answer
Here is the answer I arrived at for updating the database using Marshmallow. It is a different approach from the very helpful post of Jair Perrut. I did look at the Marshmallow API and yet was unable to get his solution working in the code presented, because at the time I was experimenting with his solution I was not managing my SQLAlchemy sessions properly. To go a bit further, one might say that I wasn't managing them at all. The model can be updated in the following way:
user_model = user_schema.load(user)
db.session.add(user_model.data)
db.session.commit()
Give the session.add() a model with primary key and it will assume an update, leave the primary key out and a new record is created instead. This isn't all that surprising since MySQL has an ON DUPLICATE KEY UPDATE clause which performs an update if the key is present and creates if not.
Details
SQLAlchemy sessions are handled by Flask-SQLAlchemy during a request to the application. At the beginning of the request the session is opened, and when the request is closed that session is also closed. Flask provides hooks for setting up and tearing down the application where code for managing sessions and connections may be found. In the end, though, the SQLAlchemy session is managed by the developer, and Flask-SQLAlchemy just helps. Here is a particular case that illustrates the management of sessions.
Consider a function that gets a user dictionary as an argument and uses that with Marshmallow() to load the dictionary into a model. In this case, what is required is not the creation of a new object, but the update of an existing object. There are 2 things to keep in mind at the start:
The model classes are defined in a python module separate from any code, and these models require the session. Often the developer (Flask documentation) will put a line db = SQLAlchemy() at the head of this file to meet this requirement. This in fact, creates a session for the model.
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
In some other separate file there may be a need for a SQLAlchemy session as well. For example, the code may need to update the model, or create a new entry, by calling a function there. Here is where one might find db.session.add(user_model) and db.session.commit(). This session is created in the same way as in the bullet point above.
There are 2 SQLAlchemy sessions created. The model sits in one (SignallingSession) and the module uses its own (scoped_session). In fact, there are 3. The Marshmallow UserSchema has sqla_session = db.session: a session is attached to it. This then is the third, and the details are found in the code below:
from marshmallow_sqlalchemy import ModelSchema
from donate_api.models.donation import UserModel
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
class UserSchema(ModelSchema):
class Meta(object):
model = UserModel
strict = True
sqla_session = db.session
def some_function(user):
user_schema = UserSchema()
user['customer_id'] = '654321'
user_model = user_schema.load(user)
# Debug code:
user_model_query = UserModel.query.filter_by(id=3255161).first()
print db.session.object_session(user_model_query)
print db.session.object_session(user_model.data)
print db.session
db.session.add(user_model.data)
db.session.commit()
return
At the head of this module the model is imported, which creates its session, and then the module will create its own. Of course, as pointed out there is also the Marshmallow session. This is entirely acceptable to some degree because SQLAlchemy allows the developer to manage the sessions. Consider what happens when some_function(user) is called where user['id'] is assigned some value that exists in the database.
Since the user includes a valid primary key then db.session.add(user_model.data) knows that it is not creating a new row, but updating an existing one. This behavior should not be surprising, and is to be at least somewhat expected since from the MySQL documentation:
13.2.5.2 INSERT ... ON DUPLICATE KEY UPDATE Syntax
If you specify an ON DUPLICATE KEY UPDATE clause and a row to be inserted would cause a duplicate value in a UNIQUE index or PRIMARY KEY, an UPDATE of the old row occurs.
The snippet of code is then seen to be updating the customer_id on the dictionary for the user with primary key 32155161. The new customer_id is '654321'. The dictionary is loaded with Marshmallow and a commit done to the database. Examining the database it can be found that it was indeed updated. You might try two ways of verifying this:
In the code: db.session.query(UserModel).filter_by(id=325516).first()
In MySQL: select * from user
If you were to consider the following:
In the code: UserModel.query.filter_by(id=3255161).customer_id
You would find that the query brings back None. The model is not synchronized with the database. I have failed to manage our SQLAlchemy sessions correctly. In an attempt to bring clarity to this consider the output of the print statements when separate imports are made:
<sqlalchemy.orm.session.SignallingSession object at 0x7f81b9107b90>
<sqlalchemy.orm.session.SignallingSession object at 0x7f81b90a6150>
<sqlalchemy.orm.scoping.scoped_session object at 0x7f81b95eac50>
In this case the UserModel.query session is different from the Marshmallow session. The Marshmallow session is what gets loaded and added. This means that querying the model will not show our changes. In fact, if we do:
db.session.object_session(user_model.data).commit()
The model query will now bring back the updated customer_id! Consider the second alternative where the imports are done through flask_essentials:
from flask_sqlalchemy import SQLAlchemy
from flask_marshmallow import Marshmallow
db = SQLAlchemy()
ma = Marshmallow()
<sqlalchemy.orm.session.SignallingSession object at 0x7f00fe227910>
<sqlalchemy.orm.session.SignallingSession object at 0x7f00fe227910>
<sqlalchemy.orm.scoping.scoped_session object at 0x7f00fed38710>
And the UserModel.query session is now the same as the user_model.data (Marshmallow) session. Now the UserModel.query does reflect the change in the database: the Marshmallow and UserModel.query sessions are the same.
A note: the signalling session is the default session that Flask-SQLAlchemy uses. It extends the default session system with bind selection and modification tracking.
I have rolled out own solution. Hope it helps someone else. Solution implements update method on Node model.
Solution:
class Node(db.Model):
# ...
def update(self, **kwargs):
# py2 & py3 compatibility do:
# from six import iteritems
# for key, value in six.iteritems(kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
class NodeAPI(MethodView):
decorators = [login_required, ]
model = Node
def get_queryset(self):
if g.user.is_admin:
return self.model.query
return self.model.query.filter(self.model.owner == g.user)
def put(self, node_id):
json_data = request.get_json()
if not json_data:
abort(400)
data, errors = node_schema.load(json_data) # validate with marshmallow
if errors:
return jsonify(errors), 422
queryset = self.get_queryset()
node = queryset.filter(self.model.id == node_id).first_or_404()
node.update(**data)
db.session.commit()
return jsonify(message='Successfuly updated'), 200
Latest Update [2020]:
You might facing the issue of mapping keys to the database models. Your request body have only updated fields so, you want to change only those without affecting others. There is an option to write multiple if conditions but that's not a good approach.
Solution
You can implement patch or put methods using sqlalchemy library only.
For example:
YourModelName.query.filter_by(
your_model_column_id = 12 #change 12: where condition to find particular row
).update(request_data)
request_data should be dict object. For ex.
{
"your_model_column_name_1": "Hello",
"your_model_column_name_2": "World",
}
In above case, only two columns will be updated that is: your_model_column_name_1 and your_model_column_name_2
Update function maps request_data to the database models and creates update query for you. Checkout this: https://docs.sqlalchemy.org/en/13/core/dml.html#sqlalchemy.sql.expression.update
Previous answer seems to be outdated as ModelSchema is now deprecated.
You should instead SQLAlchemyAutoSchema with the proper options.
class NodeSchema(SQLAlchemyAutoSchema):
class Meta:
model = Node
load_instance = True
sqla_session = db.session
node_schema = NodeSchema()
# then when you need to update a Node orm instance :
node_schema.load(node_data, instance=node, partial=True)
db.session.update()
Below is my solution with Flask-Marshmallow + marshmallow-sqlalchemy bundle as the author requested initially.
schemas.py
from flask import current_app
from flask_marshmallow import Marshmallow
from app.models import Node
ma = Marshmallow(current_app)
class NodeSchema(ma.SQLAlchemyAutoSchema):
class Meta:
model = Node
load_instance = True
load_instance is a key point here to make an update further.
routes.py
from flask import jsonify, request
from marshmallow import ValidationError
from app import db
#bp.route("/node/<node_uuid>/edit", methods=["POST"])
def edit_node(node_uuid):
json_data = request.get_json(force=True, silent=True)
node = Node.query.filter_by(
node_uuid=node_uuid
).first()
if node:
try:
schema = NodeSchema()
json_data["node_uuid"] = node_uuid
node = schema.load(json_data, instance=node)
db.session.commit()
return schema.jsonify(node)
except ValidationError as err:
return jsonify(err.messages), 422
else:
return jsonify("Not found"), 404
You have to check for existence of Node first, otherwise the new instance will be created.

Using the SQL initialization hook with ManytoManyField

I'm fairly new to Django and I'm trying to add some 'host' data to 'record' using django's hook for using SQL to initialise (a SQL file in lowercase in the app folder & sql subfolder)
Here's the models:
class Record(models.Model):
species = models.TextField(max_length = 80)
data=models.TextField(max_length = 700)
hosts = models.ManyToManyField('Host')
class Host(models.Model):
hostname = models.TextField()
I've used a ManyToManyField as each record should be able to have multiple hosts, and hosts should be 'reusable': ie be able to appear in many records.
When I'm trying to insert via SQL I have
INSERT INTO myapp_record VALUES ('Species name', 'data1', XYZ);
I'm not sure what to put for XYZ (the ManytoMany) if I wanted hosts 1, 2 and 3 for example
Separating them by commas doesn't work obviously, and I tried a tuple and neither did that.
Should I be trying to insert into the intermediary table Django makes? Does that have a similar hook to the one I'm using? If not, how can I execute SQL inserts on this table?
The use of initial SQL data files is deprecated. Instead, you should be using a data migration, which might look something like this:
from django.db import models, migrations
def create_records(apps, schema_editor):
# We can't import the Person model directly as it may be a newer
# version than this migration expects. We use the historical version.
Record = apps.get_model("yourappname", "Record")
Host = apps.get_model("yourappname", "Host")
host1 = Host.objects.get(hostname='host1')
record = Record.objects.create(name='Species name', data='Data')
record.hosts.add(host1)
...etc...
class Migration(migrations.Migration):
dependencies = [
('yourappname', '0001_initial'),
]
operations = [
migrations.RunPython(create_records),
]

mongokit index does not work

I am developing a Web application using Flask and MongoDB. And I use (Flask-)MongoKit to define a schema to validate my data.
In my database, there is a collection called "users" (see below) that contains a field "email". I try to create a unique index on that field as specified in the MongoKit documentation (http://namlook.github.com/mongokit/indexes.html). However, when I check the collection indexes via MongoDB client shell, there is no index "email" at all.
I found a similar issue on the net: "unique index does not work" (https://github.com/namlook/mongokit/issues/98)
Does someone has any idea why it does not work?
User collection:
#db.register
class User(Model):
__collection__ = 'users'
structure = {
'first_name': basestring,
'last_name': basestring,
'email': basestring,
'password': unicode,
'registration_date': datetime,
}
required_fields = ['first_name', 'last_name', 'email', 'password', 'registration_date']
default_values = {
'registration_date': datetime.utcnow,
}
# Create a unique index on the "email" field
indexes = [
{
'fields': 'email', # note: this may be an array
'unique': True, # only unique values are allowed
'ttl': 0, # create index immediately
},
]
db.users.getIndexes() output:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "youthmind.users",
"name" : "_id_"
},
]
Note that I also try without 'ttl':0, and I was able to create an index using the following piece of code:
db.users.create_index('email', unique=True)
I think this uses the pymongo Connection object directly.
Thanks in advance for your help.
You are doing it exactly how you should be doing it. Automatic index creation has been removed from MongoKit as of version 0.7.1 (maybe version 0.8?). Here is an issue for it.
The reason behind it is that the it would have to call ensureIndex on the collection. The "ensure" part of the name makes it seem like it would check and then create the index if it doesn't exist, but a developer from Mongo said that it might still wind up (re-)creating the entire index, which could be terribly expensive. The developer also said it should be considered an administrative task, instead of a development task.
The work around is to call create_index yourself on for each index in the list you've defined as part of an upgrade/create script.
Right, you need to use separate script to recreate DB with indexes. It will be called if needed, not each time server runs. Example:
def recreatedb(uri, database_name):
connection = Connection(uri)
connection.drop_database(database_name)
#noinspection PyStatementEffect
connection[database_name]
connection.register(_DOCUMENTS)
for document_name, obj in connection._registered_documents.iteritems():
obj.generate_index(connection[database_name][obj._obj_class.__collection__])
To prevent using database without indexes:
def init_engine(uri, database_name):
global db
connection = Connection(uri)
if database_name not in connection.database_names():
recreatedb(uri, database_name)
connection.register(_DOCUMENTS)
db = connection[database_name]
I use Flask-Script so it was easy to add Marboni's answer as a command to my manage script which is easy to run.
#manager.command
def setup_indexes():
"""
create index for all the registered_documents
"""
for doc in application.db.registered_documents:
collection = application.db[doc.__collection__]
doc.generate_index(collection)
I keep my database as member of app (application.db) for various admin stuff. Now whenever I add few index or change anything I run my manager command.
./manage.py setup_indexes
You can read more about manager module here
http://flask-script.readthedocs.org/en/latest/

Categories