Create dictionary of a sqlalchemy query object in Pyramid - python

I am new to Python and Pyramid. In a test application I am using to learn more about Pyramid, I want to query a database and create a dictionary based on the results of a sqlalchemy query object and finally send the dictionary to the chameleon template.
So far I have the following code (which works fine), but I wanted to know if there is a better way to create my dictionary.
...
index = 0
clients = {}
q = self.request.params['q']
for client in DBSession.query(Client).filter(Client.name.like('%%%s%%' % q)).all():
clients[index] = { "id": client.id, "name": client.name }
index += 1
output = { "clients": clients }
return output
While learning Python, I found a nice way to create a list in a for loop statement like the following:
myvar = [user.name for user in users]
So, the other question I had: is there a similar 'one line' way like the above to create a dictionary of a sqlalchemy query object?
Thanks in advance.

well, yes, we can tighten this up a bit.
First, this pattern:
index = 0
for item in seq:
frobnicate(index, item)
item += 1
is common enough that there's a builtin function that does it automatically, enumerate(), used like this:
for index, item in enumerate(seq):
frobnicate(index, item)
but, I'm not sure you need it, Associating things with an integer index starting from zero is the functionality of a list, you don't really need a dict for that; unless you want to have holes, or need some of the other special features of dicts, just do:
stuff = []
stuff.extend(seq)
when you're only interested in a small subset of the attributes of a database entity, it's a good idea to tell sqlalchemy to emit a query that returns only that:
query = DBSession.query(Client.id, Client.name) \
.filter(q in Client.name)
In the above i've also shortened the .name.like('%%%s%%' % q) into just q in name since they mean the same thing (sqlalchemy expands it into the correct LIKE expression for you)
Queries constructed in this way return a special thing that looks like a tuple, and can be easily turned into a dict by calling _asdict() on it:
so to put it all together
output = [row._asdict() for row in DBSession.query(Client.id, Client.name)
.filter(q in Client.name)]
or, if you really desperately need it to be a dict, you can use a dict comprehension:
output = {index: row._asdict()
for index, row
in enumerate(DBSession.query(Client.id, Client.name)
.filter(q in Client.name))}

#TokenMacGuy gave a nice and detailed answer to your question. However, I have a feeling you've asked a wrong question :)
You don't need to convert SQLALchemy objects to dictionaries before passing them to the template - that would be quite inconvenient. You can pass the result of a query as is and directly use SQLALchemy mapped objects in your template
q = self.request.params['q']
clients = DBSession.query(Client).filter(q in Client.name).all()
return {'clients': clients}

If you want to turn a SqlAlchemy object into a dict, you can use this code:
def obj_to_dict(obj):
return dict((col.name, getattr(obj, col.name)) for col in sqlalchemy_orm.class_mapper(obj.__class__).mapped_table.c)
there is another attribute of the mapped table that has the relationships in it , but the code gets dicey.
you don't need to cast an object into a dict for any of the template libraries, but if you decide to persist the data ( memcached, session, pickle, etc ) you'll either need to use dicts or write some code to 'merge' the persisted data back into the session.
a quick side note- if you render any of this data through json , you'll either need to have a custom json renderer that can handle datetime objects , or change the values in a function.

Related

SQLAlchemy - Search entire DB model columns for a pattern

I have a database which I need to return any value that matches the query or looks similar.
Using flask-sqlalchemy I can filter manually, but I'm having trouble getting a list of objects using list comprehensions or any other more pythonic method.
I have already tried to create a dict with all model columns (which I have a list ) and the search value passed to the filter_by query
column_dict_and_value = {column_name:'value' for column_name in columns}
a = model.query.filter_by(**column_dict_and_value).all()
...but doesn't even return an existing object, although
a = model.query.filter_by(column_name='value').all()
actually returns the object.
I tried filter(.like('%')) and it sort of works
a = model.query.filter(model.column_name.like('valu%')).all()
returns me a list of all the objects that matches that pattern but for that column, and I want to iterate over all columns. Basically a full blown search since I'm not sure what the user is going to look for and I want to show the object if the query exist on any column or a list of objects if the query is incomplete, like the * in any search.
I tried to use list comprehensions to iterate through the model attributes but I'm not allowed to do that apparently:
a = [model.query.filter(model.column.like('valu%')) for column in columns]
...this complains that column is not an attribute of the model, which makes sense due to the syntax, but I had to try anyway, innit?
I'm a newbie regarding databases and class objects so please be gentle. I tried to look for similar answers but I can't find the one that suits me. the filter_by(one_column, second_column, etc) doesn't look to me very pythonic and if for any reason I change the model i need to change the query too where as creating a dict comprehension of the model seems to me more foolproof.
SOLUTION
Based on calestini proposed answer I wrote this code that seems to do the trick. It also cleans the list because if all() doesn't have any result (since it looks in every field) it returns None and adds up to the list. I'd rather prefer it clean.
Note: All my fields are text. Please check the third proposed solution from calestini if yours differ. Not tested it though.
columns = [
"foo",
"bar",
"foobar"
]
def list_everything(search):
d = {column: search for column in columns}
raw = [
model.query.filter(getattr(model, col).ilike(f"{val}%")).all()
for col, val in d.items()
]
return [item for item in raw if item]
I'll keep optimizing and update this code if I come with a better solution. Thanks a lot
Probably the reason why you are not getting any result from
a = model.query.filter_by(**column_dict_and_value).all()
is because filter_by is testing for direct equality. In your case you are looking for a pattern, so you will need to loop and use filter as opposed to filter_by.
Assuming all your columns are String or Text types, you can try the following:
a = model.query
for col, val in column_dict_and_value.items():
a = a.filter(getattr(model, col).ilike(f'{val}%')) #ilike for case insensitive
a = a.all()
Or vs And
The issue can also be that in your case you could be testing for intersection but expecting union. In other words, you are returning something only if the pattern matches in all columns. If instead you want a result in case any column matches, then you need to tweak the code a bit:
condition = or_(*[getattr(model, col).ilike(f'{val}%') for col, val in column_dict_and_value.items()])
a = model.query.filter(codition).all()
Now if you actually have other data types among the columns, then you can try to first verify if the column type and then pass the same logic:
for col, val in column_dict_and_value.items():
## check if field python equivalent is string
if isinstance(class_.__table__.c[col].type.python_type, str):
...

Serialization optimization using Marshmallow, other solutions

This seems like it should be straightforward, but alas:
I have the following SQLAlchemy query object:
all = db.session.query(label('sid', distinct(Clinical.patient_sid))).all()
With the desired to serialize the output like [{'sid': 1}, {'sid': 2},...]
To do this, I am trying to use the following simple Marshmallow schema:
class TestSchema(Schema):
sid = fields.Int()
However, when I do
schema = TestSchema()
result = schema.dump(record)
print result
pprint(result.data)
I get:
MarshalResult(data={}, errors={})
{}
for my output.
However, when I only select only one row from my query, e.g.,
one_record = db.session.query(label('sid', distinct(Clinical.patient_sid))).first()
I get the desired results:
MarshalResult(data={u'sid': 1}, errors={})
{u'sid': 1}
I DO know the query with .all() is returning data, since when I print it I get a list of tuples:
[(1L,), (2L,), (3L,), ...]
I am assuming Marshmallow can handle list of tuples, since, in the documentation to marshaling.py under the serialize method, it says:
"Takes raw data (a dict, list, or other object) and a dict of..." However, this may be an incorrect assumption to think that lists of tuples could be classified as either "lists" or "other objects."
I like Marshmallow otherwise, and was hoping to use it as an optimization over serializing my SQLAlchemy output using an iterative method, like:
all = db.session.query(label('sid', distinct(Clinical.patient_sid)))
out = []
for result in all:
data = {'sid': result.sid}
out.append(data)
Which, for large records sets can take a while to process.
EDIT
Even if Marshmallow were able to serialize the entire record set as output by SQLAlchemy, I am not sure I would get any increase in speed, since it looks like it too iterates over the data.
Any suggestions for optimized serialization for the SQLAlchemy output, short of modifying the class definition for Clinical?
The solution to optimize my code was to go directly from my SQLAlchemy query object to a pandas data frame (I forgot to mention that I am doing some heavy lifting in pandas after I get my queried record set).
I thus was able to skip this step
out = []
for result in all:
data = {'sid': result.sid
out.append(data)
by using the sql_read method of Pandas as follows:
import pandas as pd
pd.read_sql(all.statement, all.session.bind)
and then doing all my data manipulations and gyrations, thereby shaving off several seconds of processing time.

SQLAlchemy: Execute/Add a function to a column in sqlalchemy

I'm using SQLAlchemy and I have a query from which one of the columns I obtain is a constant QUOTE_STATUS_ERROR, the values in this column are integers. Since the constant value doesn't mean anything to the end-user, I'd like to convert that value from within the query itself to show a string by mapping the values of that column to a dictionary that I have in the app using a function I have in place for that purpose already. I haven't been able to find a way to implement it since Columns in the query are object not value of the column itself. To make my question clear this is an example of what I have:
Query:
q = meta.session.query(MyModel.id, MyModel.quote_status).join(AnotherModel).subquery("q")
Function I want to use:
def get_status_names(status_value):
return QUOTE_STATUS_NAMES[status_value]
Is there a way to this directly from SQLAlchemy by attaching/passing a function (get_status_names()) to the column (MyModel.quote_status). If not what could be the best approach? I prefer not iterate over the values once I get the results in case the list of results is extensive. I would appreciate a push in the right direction.
UPDATE: I'm joining the resulting subquery with other tables
There are a few things you can do.
Off the top of my head...
If you just want to display things, you can use a property decorator:
QUOTE_STATUS__ID_2_NAME = {}
class MyModel(object):
id = Column()
quote_status_id = Column()
#property
def quote_status_string(self):
if self.quote_status_id:
return QUOTE_STATUS__ID_2_NAME[self.quote_status_id]
return None
If you want to render/accept strings and have sqlalchemy convert from string/int transparently, you can use a TypeDecorator -- http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#custom-types
Personally, I usually go for the property decorator.

Store dictionary in database

I create a Berkeley database, and operate with it using bsddb module. And I need to store there information in a style, like this:
username = '....'
notes = {'name_of_note1':{
'password':'...',
'comments':'...',
'title':'...'
}
'name_of_note2':{
#keys same as previous, but another values
}
}
This is how I open database
db = bsddb.btopen['data.db','c']
How do I do that ?
So, first, I guess you should open your database using parentheses:
db = bsddb.btopen('data.db','c')
Keep in mind that Berkeley's pattern is key -> value, where both key and value are string objects (not unicode). The best way in your case would be to use:
db[str(username)] = json.dumps(notes)
since your notes are compatible with the json syntax.
However, this is not a very good choice, say, if you want to query only usernames' comments. You should use a relational database, such as sqlite, which is also built-in in Python.
A simple solution was described by #Falvian.
For a start there is a column pattern in ordered key/value store. So the key/value pattern is not the only one.
I think that bsddb is viable solution when you don't want to rely on sqlite. The first approach is to create a documents = bsddb.btopen['documents.db','c'] and store inside json values. Regarding the keys you have several options:
Name the keys yourself, like you do "name_of_note_1", "name_of_note_2"
Generate random identifiers using uuid.uuid4 (don't forget to check it's not already used ;)
Or use a row inside this documents with key=0 to store a counter that you will use to create uids (unique identifiers).
If you use integers don't forget to pack them with lambda x: struct.pack('>q', uid) before storing them.
If you need to create index. I recommend you to have a look at my other answer introducting composite keys to build index in bsddb.

Django models - how to filter out duplicate values by PK after the fact?

I build a list of Django model objects by making several queries. Then I want to remove any duplicates, (all of these objects are of the same type with an auto_increment int PK), but I can't use set() because they aren't hashable.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
In general it's better to combine all your queries into a single query if possible. Ie.
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))
instead of
q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)
If the first query is returning duplicated Models then use distinct()
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()
If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
That's exactly what I would do if you were locked into your current structure of making several queries. Then a simply dictionary.values() will return your list back.
If you have a little more flexibility, why not use Q objects? Instead of actually making the queries, store each query in a Q object and use a bitwise or ("|") to execute a single query. This will achieve your goal and save database hits.
Django Q objects
You can use a set if you add the __hash__ function to your model definition so that it returns the id (assuming this doesn't interfere with other hash behaviour you may have in your app):
class MyModel(models.Model):
def __hash__(self):
return self.pk
If the order doesn't matter, use a dict.
Remove "duplicates" depends on how you define "duplicated".
If you want EVERY column (except the PK) to match, that's a pain in the neck -- it's a lot of comparing.
If, on the other hand, you have some "natural key" column (or short set of columns) than you can easily query and remove these.
master = MyModel.objects.get( id=theMasterKey )
dups = MyModel.objects.filter( fld1=master.fld1, fld2=master.fld2 )
dups.all().delete()
If you can identify some shorter set of key fields for duplicate identification, this works pretty well.
Edit
If the model objects haven't been saved to the database yet, you can make a dictionary on a tuple of these keys.
unique = {}
...
key = (anObject.fld1,anObject.fld2)
if key not in unique:
unique[key]= anObject
I use this one:
dict(zip(map(lambda x: x.pk,items),items)).values()

Categories