How to compare sql vs json in python - python

I have the following problem.
I have a class User simplified example:
class User:
def __init__(self, name, lastname, status, id=None):
self.id = id
self.name = name
self.lastname = lastname
self.status = status
def set_status(self,status)
# call to the api to change status
def get_data_from_db_by_id(self)
# select data from db where id = self.id
def __eq__(self, other):
if not isinstance(other, User):
return NotImplemented
return (self.id, self.name, self.lastname, self.status) == \
(other.id, other.name, other.lastname, other.status)
And I have a database structure like:
id, name, lastname, status
1, Alex, Brown, free
And json response from an API:
{
"id": 1,
"name": "Alex",
"lastname": "Brown",
"status": "Sleeping"
}
My question is:
What the best way to compare json vs sql responses?
What for? - it's only for testing purposes - I have to check that API has changed the DB correctly.
How can I deserialize Json and DB resul to the same class? Is there any common /best practices ?
For now, I'm trying to use marshmallow for json and sqlalchemy for DB, but have no luck with it.

Convert the database row to a dictionary:
def row2dict(row):
d = {}
for column in row.__table__.columns:
d[column.name] = str(getattr(row, column.name))
return d
Then convert json string to a dictionary:
d2 = json.loads(json_response)
And finally compare:
d2 == d

If you are using SQLAlchemy for the database, then I would recommend using SQLAthanor (full disclosure: I am the library’s author).
SQLAthanor is a serialization and de-serialization library for SQLAlchemy that lets you configure robust rules for how to serialize / de-serialize your model instances to JSON. One way of checking your instance and JSON for equivalence is to execute the following logic in your Python code:
First, serialize your DB instance to JSON. Using SQLAthanor you can do that as simply as:
instance_as_json = my_instance.dump_to_json()
This will take your instance and dump all of its attributes to a JSON string. If you want more fine-grained control over which model attributes end up on your JSON, you can also use my_instance.to_json() which respects the configuration rules applied to your model.
Once you have your serialized JSON string, you can use the Validator-Collection to convert your JSON strings to dicts, and then check if your instance dict (from your instance JSON string) is equivalent to the JSON from the API (full disclosure: I’m also the author of the Validator-Collection library):
from validator_collection import checkers, validators
api_json_as_dict = validators.dict(api_json_as_string)
instance_json_as_dict = validators.dict(instance_as_json)
are_equivalent = checkers.are_dicts_equivalent(instance_json_as_dict, api_json_as_dict)
Depending on your specific situation and objectives, you can construct even more elaborate checks and validations as well, using SQLAthanor’s rich serialization and deserialization options.
Here are some links that you might find helpful:
SQLAthanor Documentation on ReadTheDocs
SQLAthanor on Github
.dump_to_json() documentation
.to_json() documentation
Validator-Collection Documentation
validators.dict() documentation
checkers.are_dicts_equivalent() documentation
Hope this helps!

Related

Set Optional params in PUT method using fastAPI/mongodb

I am trying to set Optional some params in a PUT method from my API.
Using fastAPI and mongodb I've build a simple API to insert students and delete the ones, now I am looking to allow me update the entries but not mandatory "params".
I've checked this Fastapi: put method and looks like something I am looking for mongodb.
And this response from art049 looks similar what I already have in my #api_router.put('/update-student/{id}', tags=['Student']) MongoDb with FastAPI
As example for my question here I have this structure:
Models:
class Student(BaseModel):
age:int
name:str
address:str
class UpdateStudent(BaseModel):
age: Optional[int] = None
name: Optional[str] = None
address: Optional[str] = None
Schemas:
def serializeDict(a) -> dict:
return {**{i:str(a[i]) for i in a if i=='_id'},**{i:a[i] for i in a if i!='_id'}}
def serializeList(entity) -> list:
return [serializeDict(a) for a in entity]
Routes:
#api_router.post('/create-student', tags=['Students'])
async def create_students(student: Student):
client.collegedb.students_collection.insert_one(dict(student))
return serializeList(client.collegedb.students_collection.find())
Also I know I can update the entry without problems in this way:
#api_router.put('/update-student/{id}', tags=['Student'])
async def update_student(id,ustudent: UpdateStudent):
client.collegedb.students_collection.find_one_and_update({"_id":ObjectId(id)},{
"$set":dict(ustudent)
})
return serializeDict(client.collegedb.students_collection.find_one({"_id":ObjectId(id)}))
My problem as you can see with my Models I need a way to validate which params are modified and update the ones only:
If right now I Update for example the age only; since the other params are not required, name and address will be stored as None (null actually) because I set this in my model.
Maybe I can do something like this:
if ustudent.age != None:
students_collection[ObjectId(id)] = ustudent.age
if ustudent.name != None:
students_collection[ObjectId(id)] = ustudent.name
if ustudent.address != None:
students_collection[ObjectId(id)] = ustudent.address
I know I can use this in a simple dictionary but never tried before in a collection in mongodb since pydantic not support ObjectId for iterations and that's why serializeDict was created.
I will really appreciate if somebody can give a hint with my concern
You can use exclude_unset=True argument as suggested in FastAPI documentation:
#api_router.put('/update-student/{id}', tags=['Student'])
async def update_student(id,ustudent: UpdateStudent):
client.collegedb.students_collection.find_one_and_update({"_id":ObjectId(id)},{
"$set":ustudent.dict(exclude_unset=True)
})
return serializeDict(client.collegedb.students_collection.find_one({"_id":ObjectId(id)}))
Here is the documentation for exporting Pydantic models.

Iterate over json on sqlalchemy query

I'm working with on a project that has a questionnaire in it.
I'm using Python, Flask, Postgres and Sqlalchemy.
I need to build a search endpoint that filters the documents by the title or by any of the answers in the questionnaire.
The database is structured the following way:
[Client] - One to Many - [Document] - One to Many - [DocumentVersion]
So that one Client can have many Documents and each document may have many Document Versions.
### DocumentVersion Model
class DocumentVersion(db.Model):
__tablename__ = 'document_version'
id = db.Column(db.Integer, primary_key=True)
document_id = db.Column(db.Integer, db.ForeignKey('document.id'), nullable=False)
answers = db.Column(JSON, nullable=False)
# ... other columns
document = relationship('Document', back_populates='versions')
#hybrid_method
def answers_contain(self, text):
'''returns True if the text appears in any of the answers'''
contains_text = False
for answer in self.answers:
if text == str(answer['answer'].astext):
contains_text = True
return contains_text
Inside the [DocumentVersion] table, there is a JSONB field storing the questions and the answers.
The json is structured the following way:
[{
"value": "question"
"answer": "foo",
...
},
{
"value": "question"
"answer": "bar",
...
},
...
]
The filter document by document title is working fine, but I can't figure out a way to filter by the answers in the json.
I believe I have to iterate over the json to make the filter. So I tried to create a #hybrid_method called answers_contain to do so,
but when I do for answer in self.answers in the hydrid method, the loop actually never ends. I wonder if it's possible to iterate over the json
while making the query. If I try len(self.answers) inside the hybrid method, I get a
TypeError: object of type 'InstrumentedAttribute' has no len().
### Search endpoint
try:
page = int(request.args.get('page', 1))
per_page = int(request.args.get('per_page', 20))
search_param = str(request.args.get('search', ''))
except:
abort(400, "invalid parameters")
paginated_query = Document.query \
.filter_by(client_id=current_user['client_id']) \
.join(Document.versions) \
.filter(or_(
Document.title.ilike(f'%{search_param}%'),
DocumentVersion.answers_contain(f'%{search_param}%'),
)) \
.order_by(desc(Document.created_at)) \
.paginate(page=page, per_page=per_page)
I also tried to filter like this:
DocumentVersion.answers.ilike(f'%{search_param}%'), which gives me an error and a hint:
HINT: No operator matches the given name and argument types. You might need to add explicit type casts. If I added explicit type casts I would have to hardcode the questions, but I can't, since they can change.
What is the best way to do this filtering? I'd like to avoid bringing all the client documents to the backend server, if possible.
Is there a way to iterate over the json while making the query, on the db server?
Thanks in advance.

Flask-SQLAlchemy ORM/GeoAlchemy2 results to a dictionary and ultimately JSON

I am using Flask/SQLAlchemy to create a web app with a map in it, so naturally I'm using a PostGIS database. The geom column requires an ST_Transform and somehow I need to turn this column and all others into JSON. The general structure of the database is:
from app import login, db
from datetime import datetime
from geoalchemy2 import Geometry
from time import time
from flask import current_app
from sqlalchemy import func
class Streets(db.Model):
id = db.Column(db.Integer, primary_key=True)
street = db.Column(db.String(50))
geom = db.Column(Geometry(geometry_type='LINESTRING'))
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'_geom': func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326))
}
return data
My api route turns this result into an api:
return jsonify(Streets.query.get_or_404(id).to_dict())
But I keep getting this error: NameError: name 'ST_AsGeoJSON' is not defined
I also tried to create my _geom value like this:
data['_geom'] = db.session.query(func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326)))
The error message is: TypeError: Object of type 'BaseQuery' is not JSON serializable
Finally, I tried an api route like this:
data = Streets.to_dict(
db.session.query(
func.ST_AsGeoJSON(
func.ST_Transform(
Streets.geom, 4326
)
)
)
.filter(Streets.id==id))
return jsonify(data)
And I get a different error:
AttributeError: 'BaseQuery' object has no attribute 'id'
If I run this in flask shell it works:
streets = db.session.query(
Streets.id,
Streets.street,
func.ST_AsGeoJSON(func.ST_Transform(Streets.geom, 4326)))
How can I perform ST_Transform and get JSON to my api route?
UPDATE
I found this in the SQLALchemy documentation that got me some progress: "orm.column_property() can be used to map a SQL expression". So I tried adding this to my class Streets(db.Model):
coords = db.column_property(func.ST_AsGeoJSON(func.ST_Transform(geom, 4326)))
Then I add it to data like this:
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'coords': self.coords
}
return data
But now I'm double encoding my results, once into GeoJSON and then I jsonify it:
return jsonify(Streets.query.get_or_404(id).to_dict())
So my api inserts \'s:
{"coords": "{\"type\":\"MultiLineString\",\"coordinates\":[[[-80.8357132798193,35.2260689001034],[-80.8347602582754,35.2252424284259]]]}"}
And using ST_AsText just turns it into text:
{"coords": "MULTILINESTRING((-80.8357132798193 35.2260689001034,-80.8347602582754 35.2252424284259))"}
I think I'm close with this update, but does anyone have a suggestion for getting correct GeoJSON with the JSON of the other fields of my database?
The first error
NameError: name 'ST_AsGeoJSON' is not defined
means that your example code is not what you were actually using. You had forgot to access it through func. It would not work after fixing that either, since you'd be mixing the SQL world and the Python world. func.ST_AsGeoJSON(...) creates an SQL function expression object that is supposed to be compiled to SQL and sent to the DB in a query, not passed to jsonify().
The second error
TypeError: Object of type 'BaseQuery' is not JSON serializable
should be somewhat obvious.
data['_geom'] = db.session.query(func.ST_AsGeoJSON(func.ST_Transform(self.geom, 4326)))
creates a Query, and a too broad query at that, since you've not limited it to fetch data of the current object. The Query object is not JSON serializable.
In
data = Streets.to_dict(db.session.query(...)...)
you pass the Query object as self to Streets.to_dict(), which then tries to access its id attribute in
'id': self.id,
which fails for obvious reasons – namely passing an unrelated object as the instance to a method.
The column_property() approach produces the doubly encoded JSON because SQLAlchemy does not by default expect ST_AsGeoJSON to return JSON and treats it as text instead, which it actually returns. Try decoding in between manually:
def to_dict(self):
data = {
'id': self.id,
'street': self.street,
'coords': json.loads(self.coords)
}
return data

Updating DataStore JSON values using endpoints (Python)

I am trying to use endpoints to update some JSON values in my datastore. I have the following Datastore in GAE...
class UsersList(ndb.Model):
UserID = ndb.StringProperty(required=True)
ArticlesRead = ndb.JsonProperty()
ArticlesPush = ndb.JsonProperty()
In general what I am trying to do with the API is have the method take in a UserID and a list of articles read (with an article being represented by a dictionary holding an ID and a boolean field saying whether or not the user liked the article). My messages (centered on this logic) are the following...
class UserID(messages.Message):
id = messages.StringField(1, required=True)
class Articles(messages.Message):
id = messages.StringField(1, required=True)
userLiked = messages.BooleanField(2, required=True)
class UserIDAndArticles(messages.Message):
id = messages.StringField(1, required=True)
items = messages.MessageField(Articles, 2, repeated=True)
class ArticleList(messages.Message):
items = messages.MessageField(Articles, 1, repeated=True)
And my API/Endpoint method that is trying to do this update is the following...
#endpoints.method(UserIDAndArticles, ArticleList,
name='user.update',
path='update',
http_method='GET')
def get_update(self, request):
userID = request.id
articleList = request.items
queryResult = UsersList.query(UsersList.UserID == userID)
currentList = []
#This query always returns only one result back, and this for loop is the only way
# I could figure out how to access the query results.
for thing in queryResult:
currentList = json.loads(thing.ArticlesRead)
for item in articleList:
currentList.append(item)
for blah in queryResult:
blah.ArticlesRead = json.dumps(currentList)
blah.put()
for thisThing in queryResult:
pushList = json.loads(thisThing.ArticlesPush)
return ArticleList(items = pushList)
I am having two problems with this code. The first is that I can't seem to figure out (using the localhost Google APIs Explorer) how to send a list of articles to the endpoints method using my UserIDAndArticles class. Is it possible to have a messages.MessageField() as an input to an endpoint method?
The other problem is that I am getting an error on the 'blah.ArticlesRead = json.dumps(currentList)' line. When I try to run this method with some random inputs, I get the following error...
TypeError: <Articles
id: u'hi'
userLiked: False> is not JSON serializable
I know that I have to make my own JSON encoder to get around this, but I'm not sure what the format of the incoming request.items is like and how I should encode it.
I am new to GAE and endpoints (as well as this kind of server side programming in general), so please bear with me. And thanks so much in advance for the help.
A couple things:
http_method should definitely be POST, or better yet PATCH because you're not overwriting all existing values but only modifying a list, i.e. patching.
you don't need json.loads and json.dumps, NDB does it automatically for you.
you're mixing Endpoints messages and NDB model properties.
Here's the method body I came up with:
# get UsersList entity and raise an exception if none found.
uid = request.id
userlist = UsersList.query(UsersList.UserID == uid).get()
if userlist is None:
raise endpoints.NotFoundException('List for user ID %s not found' % uid)
# update user's read articles list, which is actually a dict.
for item in request.items:
userslist.ArticlesRead[item.id] = item.userLiked
userslist.put()
# assuming userlist.ArticlesPush is actually a list of article IDs.
pushItems = [Article(id=id) for id in userlist.ArticlesPush]
return ArticleList(items=pushItems)
Also, you should probably wrap this method in a transaction.

Google App Engine Datastore Query to JSON with Python

How can I get a JSON Object in python from getting data via Google App Engine Datastore?
I've got model in datastore with following field:
id
key_name
object
userid
created
Now I want to get all objects for one user:
query = Model.all().filter('userid', user.user_id())
How can I create a JSON object from the query so that I can write it?
I want to get the data via AJAX call.
Not sure if you got the answer you were looking for, but did you mean how to parse the model (entry) data in the Query object directly into a JSON object? (At least that's what I've been searching for).
I wrote this to parse the entries from Query object into a list of JSON objects:
def gql_json_parser(query_obj):
result = []
for entry in query_obj:
result.append(dict([(p, unicode(getattr(entry, p))) for p in entry.properties()]))
return result
You can have your app respond to AJAX requests by encoding it with simplejson e.g.:
query_data = MyModel.all()
json_query_data = gql_json_parser(query_data)
self.response.headers['Content-Type'] = 'application/json'
self.response.out.write(simplejson.dumps(json_query_data))
Your app will return something like this:
[{'property1': 'value1', 'property2': 'value2'}, ...]
Let me know if this helps!
If I understood you correctly I have implemented a system that works something like this. It sounds like you want to store an arbitrary JSON object in a GAE datastore model. To do this you need to encode the JSON into a string of some sort on the way into the database and decode it from a string into a python datastructure on the way out. You will need to use a JSON coder/decoder to do this. I think the GAE infrastructure includes one. For example you could use a "wrapper class" to handle the encoding/decoding. Something along these lines...
class InnerClass(db.Model):
jsonText = db.TextProperty()
def parse(self):
return OuterClass(self)
class Wrapper:
def __init__(self, storage=None):
self.storage = storage
self.json = None
if storage is not None:
self.json = fromJsonString(storage.jsonText)
def put(self):
jsonText = ToJsonString(self.json)
if self.storage is None:
self.storage = InnerClass()
self.storage.jsonText = jsonText
self.storage.put()
Then always operate on parsed wrapper objects instead of the inner class
def getall():
all = db.GqlQuery("SELECT * FROM InnerClass")
for x in all:
yield x.parse()
(untested). See datastoreview.py for some model implementations that work like this.
I did the following to convert the google query object to json. I used the logic in jql_json_parser above as well except for the part where everything is converted to unicode. I want to preserve the data-types like integer, floats and null.
import json
class JSONEncoder(json.JSONEncoder):
def default(self, obj):
if hasattr(obj, 'isoformat'): #handles both date and datetime objects
return obj.isoformat()
else:
return json.JSONEncoder.default(self, obj)
class BaseResource(webapp2.RequestHandler):
def to_json(self, gql_object):
result = []
for item in gql_object:
result.append(dict([(p, getattr(item, p)) for p in item.properties()]))
return json.dumps(result, cls=JSONEncoder)
Now you can subclass BaseResource and call self.to_json on the gql_object

Categories