SQLAlchemy - Search entire DB model columns for a pattern

SQLAlchemy - Search entire DB model columns for a pattern - python

I have a database which I need to return any value that matches the query or looks similar.
Using flask-sqlalchemy I can filter manually, but I'm having trouble getting a list of objects using list comprehensions or any other more pythonic method.
I have already tried to create a dict with all model columns (which I have a list ) and the search value passed to the filter_by query
column_dict_and_value = {column_name:'value' for column_name in columns}
a = model.query.filter_by(**column_dict_and_value).all()
...but doesn't even return an existing object, although
a = model.query.filter_by(column_name='value').all()
actually returns the object.
I tried filter(.like('%')) and it sort of works
a = model.query.filter(model.column_name.like('valu%')).all()
returns me a list of all the objects that matches that pattern but for that column, and I want to iterate over all columns. Basically a full blown search since I'm not sure what the user is going to look for and I want to show the object if the query exist on any column or a list of objects if the query is incomplete, like the * in any search.
I tried to use list comprehensions to iterate through the model attributes but I'm not allowed to do that apparently:
a = [model.query.filter(model.column.like('valu%')) for column in columns]
...this complains that column is not an attribute of the model, which makes sense due to the syntax, but I had to try anyway, innit?
I'm a newbie regarding databases and class objects so please be gentle. I tried to look for similar answers but I can't find the one that suits me. the filter_by(one_column, second_column, etc) doesn't look to me very pythonic and if for any reason I change the model i need to change the query too where as creating a dict comprehension of the model seems to me more foolproof.
SOLUTION
Based on calestini proposed answer I wrote this code that seems to do the trick. It also cleans the list because if all() doesn't have any result (since it looks in every field) it returns None and adds up to the list. I'd rather prefer it clean.
Note: All my fields are text. Please check the third proposed solution from calestini if yours differ. Not tested it though.
columns = [
"foo",
"bar",
"foobar"
]
def list_everything(search):
d = {column: search for column in columns}
raw = [
model.query.filter(getattr(model, col).ilike(f"{val}%")).all()
for col, val in d.items()
]
return [item for item in raw if item]
I'll keep optimizing and update this code if I come with a better solution. Thanks a lot

Probably the reason why you are not getting any result from
a = model.query.filter_by(**column_dict_and_value).all()
is because filter_by is testing for direct equality. In your case you are looking for a pattern, so you will need to loop and use filter as opposed to filter_by.
Assuming all your columns are String or Text types, you can try the following:
a = model.query
for col, val in column_dict_and_value.items():
a = a.filter(getattr(model, col).ilike(f'{val}%')) #ilike for case insensitive
a = a.all()
Or vs And
The issue can also be that in your case you could be testing for intersection but expecting union. In other words, you are returning something only if the pattern matches in all columns. If instead you want a result in case any column matches, then you need to tweak the code a bit:
condition = or_(*[getattr(model, col).ilike(f'{val}%') for col, val in column_dict_and_value.items()])
a = model.query.filter(codition).all()
Now if you actually have other data types among the columns, then you can try to first verify if the column type and then pass the same logic:
for col, val in column_dict_and_value.items():
## check if field python equivalent is string
if isinstance(class_.__table__.c[col].type.python_type, str):
...

Related

Django - calling attribute from queryset with a string

I'm trying to loop over different query sets while not repeating myself too much and have encountered a problem using the queryset class.
This is not necessarily completely a Django-problem.
What I'm trying to do is to use my keylist, which corresponds to a django model's column names, to create a list of the data from those column names, what i want to do is something like this:
if needthisdata==1:
needdata=['column1', 'column2', 'column3']
else:
needdata=['column1', 'column4', 'column7']
entry=djangomodel.get.all().filter(identifier='id')
dictitems=[]
for n in range(0, len(needdata)):
if n==0:
dictitems=[entry.needdata[n]]
else:
dictitems.append(entry.needdata[n])
Which of course doesn't work since the queryset doesn't have a need data attribute, is there some way to call an attribute for a class with a string in this way?

A valid Django statement to obtain a single entry
First of all, there are some semantical problems here:
itentifier should probably be identifier, id, or pk;
you use .all immedately instead of first obtaining a manager (probably .objects); and
you here use a .filter(..) on the queryset to filter on an identifier, but usually this should be a .get(..), since by using a filter, zero, one or more results can be returned in an iterable.
entry = djangomodel.objects.get(id=some_id)
So now we obtain a single entry, but that of course does not resolve
obtaining the columns.
If all elements are real Django columns
In case the columns are real Django fields (so no #propertys, etc.) then we can use values_list, and perform a list(..) constructor on it:
dictitems = list(djangomodel.objects.values_list(*needdata).get(id=some_id))
If case some elements are #propertys
In case not all those fields are real Django fields, then we can use attrgetter instead:
from operator import attrgetter
dictitems = list(attrgetter(*needdata)(djangomodel.objects.get(id=some_id)))

Python SQL Alchemy how to query by excluding selected columns

I basically just need to know how to query by excluding selected columns. Is this possible?
Example: I have table which has id, name, age, address, location, birth, age, sex... etc.
Instead of citing out the columns to retrieve, I'd like to just exclude some columns in the query(exclude age for example).
Sample code:
db.session.query(User.username).filter_by(username = request.form['username'], password = request.form['password']).first()
Last thing I wanna do is to list down all the attributes on the query() method, since this would be pretty long especially when you have lots of attributes, thus I just wanna exclude some columns.

Not sure why you're not just fetching the model. When doing that, you can defer loading of certain columns so that they are only queried on access.
db.session.query(User).options(db.defer('location')).filter_by(...).first()
In this example, accessing User.location the first time on an instance will issue another query to get the data.
See the documentation on column deferral: http://sqlalchemy.readthedocs.org/en/rel_0_9/orm/mapper_config.html?highlight=defer#column-deferral-api
Note that unless you're loading huge amounts of data, you won't see any speedup with this. It might actually make things slower since another query will be issued later. I have queries that load thousands of rows with eager-loaded relationships in less than 200ms, so this might be a case of premature optimization.

We can use the Inspection API to get the model's columns, and then create a list of columns that we want.
exclude = {'age', 'registration_date'}
insp = sa.inspect(User)
include = [c for c in insp.columns if c.name not in exclude]
# Traditional ORM style
with Session() as s:
q = s.query(*include)
for row in q:
print(row.id, row.name)
print()
# 1.4 style
with Session() as s:
q = sa.select(*include)
for row in s.execute(q):
print(row.id, row.name)
print()
inspect returns the mapper for the model class; to work with non-column attributes like relationships use one of the mapper's other attributes, such as all_orm_descriptors.

If you're using an object deserializer like marshmallow, it is easier to omit the required fields during the deserialization.
https://marshmallow.readthedocs.io/en/latest/api_reference.html#marshmallow.EXCLUDE
The fields to be omitted can be formed dynamically and conditionally excluded. Example:
ModelSchema(exclude=(field1, field2,)).jsonify(records)

I am not aware of a method that does that directly, but you can always get the column keys, exclude your columns, then call the resulting list. You don't need to see what is in the list while doing that.
q = db.session.query(blah blah...)
exclude = ['age']
targ_cols = [x for x in q.first().keys() if x not in exclude]
q.with_entities(targ_cols).all()

Create dictionary of a sqlalchemy query object in Pyramid

I am new to Python and Pyramid. In a test application I am using to learn more about Pyramid, I want to query a database and create a dictionary based on the results of a sqlalchemy query object and finally send the dictionary to the chameleon template.
So far I have the following code (which works fine), but I wanted to know if there is a better way to create my dictionary.
...
index = 0
clients = {}
q = self.request.params['q']
for client in DBSession.query(Client).filter(Client.name.like('%%%s%%' % q)).all():
clients[index] = { "id": client.id, "name": client.name }
index += 1
output = { "clients": clients }
return output
While learning Python, I found a nice way to create a list in a for loop statement like the following:
myvar = [user.name for user in users]
So, the other question I had: is there a similar 'one line' way like the above to create a dictionary of a sqlalchemy query object?
Thanks in advance.

well, yes, we can tighten this up a bit.
First, this pattern:
index = 0
for item in seq:
frobnicate(index, item)
item += 1
is common enough that there's a builtin function that does it automatically, enumerate(), used like this:
for index, item in enumerate(seq):
frobnicate(index, item)
but, I'm not sure you need it, Associating things with an integer index starting from zero is the functionality of a list, you don't really need a dict for that; unless you want to have holes, or need some of the other special features of dicts, just do:
stuff = []
stuff.extend(seq)
when you're only interested in a small subset of the attributes of a database entity, it's a good idea to tell sqlalchemy to emit a query that returns only that:
query = DBSession.query(Client.id, Client.name) \
.filter(q in Client.name)
In the above i've also shortened the .name.like('%%%s%%' % q) into just q in name since they mean the same thing (sqlalchemy expands it into the correct LIKE expression for you)
Queries constructed in this way return a special thing that looks like a tuple, and can be easily turned into a dict by calling _asdict() on it:
so to put it all together
output = [row._asdict() for row in DBSession.query(Client.id, Client.name)
.filter(q in Client.name)]
or, if you really desperately need it to be a dict, you can use a dict comprehension:
output = {index: row._asdict()
for index, row
in enumerate(DBSession.query(Client.id, Client.name)
.filter(q in Client.name))}

#TokenMacGuy gave a nice and detailed answer to your question. However, I have a feeling you've asked a wrong question :)
You don't need to convert SQLALchemy objects to dictionaries before passing them to the template - that would be quite inconvenient. You can pass the result of a query as is and directly use SQLALchemy mapped objects in your template
q = self.request.params['q']
clients = DBSession.query(Client).filter(q in Client.name).all()
return {'clients': clients}

If you want to turn a SqlAlchemy object into a dict, you can use this code:
def obj_to_dict(obj):
return dict((col.name, getattr(obj, col.name)) for col in sqlalchemy_orm.class_mapper(obj.__class__).mapped_table.c)
there is another attribute of the mapped table that has the relationships in it , but the code gets dicey.
you don't need to cast an object into a dict for any of the template libraries, but if you decide to persist the data ( memcached, session, pickle, etc ) you'll either need to use dicts or write some code to 'merge' the persisted data back into the session.
a quick side note- if you render any of this data through json , you'll either need to have a custom json renderer that can handle datetime objects , or change the values in a function.

Return the column names from an empty MySQL query result

I'm using Python 3.2.3, with the MySQL/Connector 1.0.7 module. Is there a way to return the column names, if the MySQL query returns an empty result?
For example. Say I have this query:
SELECT
`nickname` AS `team`,
`w` AS `won`,
`l` AS `lost`
WHERE `w`>'10'
Yet, if there's nobody over 10, it returns nothing, obviously. Now, I know I can check if the result is None, but, can I get MySQL to return the column name and a NULL value for it?
If you're curious, the reason I'm wondering if this is possible, is because I'm dynamically building dict's based on the column names. So, the above, would end up looking something like this if nobody was over 10...
[{'team':None,'won':None,'lost':None}]
And looks like this, if it found 3 teams over 10...
[{'team':'Tigers','won':14,'lost':6},
{'team':'Cardinals','won':12,'lost':8},
{'team':'Giants','won':15,'lost':4}]
If this kind of thing is possible, then I won't have to write a ton of exception checks all over the code in case of empty dict's all over the place.

You could use a DESC table_name first, you should get the column names in the first column
Also you already know the keys in the dict so you can construct yourself and then append things to it if the result has values.
[{'team':None,'won':None,'lost':None}]
But what I fail to see why you need this. If you have a list of dictionaries, I am guessing you will have for loop operations. For loop will not do anything to a empty list, so you would not have to bother about exception checks
If you have to do something like result[0]['team'] then you should definitely check if len(result)>0

Django models - how to filter out duplicate values by PK after the fact?

I build a list of Django model objects by making several queries. Then I want to remove any duplicates, (all of these objects are of the same type with an auto_increment int PK), but I can't use set() because they aren't hashable.
Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.

In general it's better to combine all your queries into a single query if possible. Ie.
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2))
instead of
q1 = Models.object.filter(field1=f1)
q2 = Models.object.filter(field2=f2)
If the first query is returning duplicated Models then use distinct()
q = Model.objects.filter(Q(field1=f1)|Q(field2=f2)).distinct()
If your query really is impossible to execute with a single command, then you'll have to resort to using a dict or other technique recommended in the other answers. It might be helpful if you posted the exact query on SO and we could see if it would be possible to combine into a single query. In my experience, most queries can be done with a single queryset.

Is there a quick and easy way to do this? I'm considering using a dict instead of a list with the id as the key.
That's exactly what I would do if you were locked into your current structure of making several queries. Then a simply dictionary.values() will return your list back.
If you have a little more flexibility, why not use Q objects? Instead of actually making the queries, store each query in a Q object and use a bitwise or ("|") to execute a single query. This will achieve your goal and save database hits.
Django Q objects

You can use a set if you add the __hash__ function to your model definition so that it returns the id (assuming this doesn't interfere with other hash behaviour you may have in your app):
class MyModel(models.Model):
def __hash__(self):
return self.pk

If the order doesn't matter, use a dict.

Remove "duplicates" depends on how you define "duplicated".
If you want EVERY column (except the PK) to match, that's a pain in the neck -- it's a lot of comparing.
If, on the other hand, you have some "natural key" column (or short set of columns) than you can easily query and remove these.
master = MyModel.objects.get( id=theMasterKey )
dups = MyModel.objects.filter( fld1=master.fld1, fld2=master.fld2 )
dups.all().delete()
If you can identify some shorter set of key fields for duplicate identification, this works pretty well.
Edit
If the model objects haven't been saved to the database yet, you can make a dictionary on a tuple of these keys.
unique = {}
...
key = (anObject.fld1,anObject.fld2)
if key not in unique:
unique[key]= anObject

I use this one:
dict(zip(map(lambda x: x.pk,items),items)).values()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy - Search entire DB model columns for a pattern - python

Related

Django - calling attribute from queryset with a string

Python SQL Alchemy how to query by excluding selected columns

Create dictionary of a sqlalchemy query object in Pyramid

Return the column names from an empty MySQL query result

Django models - how to filter out duplicate values by PK after the fact?

Categories

Resources