Escaping special characters in django-haystack query - python

I'm trying to use Solr specific syntax with some of my django-haytack queries. For example I'd like to search: "state:Georgia", but haystack sends it to Solr as "state\:Georgia", breaking the query. A Raw query can be used but it seems to need to know the field and query beforehand like so: sqs = SearchQuerySet().filter(author=Raw('state:Georgia')), but I'm not always sure beforehand what the exact field should be. It could be state, collector, material, category or a number of others. Does anyone know a way around this or how I can access the user's actual query?
Or perhaps I'm going down the wrong path and a custom Clean method would be in order?
Thanks

Related

Flask and SQLAlchemy sort in display without new query?

I'm displaying the results from an SQLAlchemy (Flask-SQLAlchemy) query on a particular view. However the sorting/order is only set by what I originally passed into the query ( order_by(desc(SelectedTable.date_changed)) ). I'm trying to now add functionality that each column that is displayed can be selected to order the presentation.
Is there a way to alter the way a returned query object is sorted once it's returned to create this behavior? Or will I need to build custom queries for each possible column that could be sorted by and ascending/descending?
Is there a recipe for implementing something like this? I've tried google, here, the Flask, Flask-SQLAlchemy, and SQLAlchemy docs for something along these lines but haven't seen anything that touches on the subject and beginning to think that I'm going to need to use custom queries or without new queries try some JavaScript in the Jinja Template to achieve this.
Thanks!

query template in py2neo?

Does py2neo support any manner of query templating? My current need is to support simple date ranges. I could do it by inserting a WHERE clause into the query text. Before I do that, maybe there's a better way?
For example, assume certain node types have a date. I would like to be able to specify a WHERE clause that would filter the selection. But I don't always want to filter it. Think query params from an HTTP request.
Py2neo does have an extension for handling Dates.
This may help you - http://py2neo.org/2.0/ext/calendar.html

Django models - assign id instead of object

I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.

Parsing SQL Query into a DOM-like tree to enable automatic permutation?

I have a large and complicated sql view that I am attempting to debug. There is a record not showing in the view and I need to determine which clause or join is causing the record to now show up. At the moment I am doing this in a very manual way, removing clauses one at a time and running the query to see if the required row shows up.
I think that it would be great if I could do this programmatically because I end up diving into queries like this about once a fortnight.
Does anybody know if there is a way to parse an SQL query into a tree of objects (for example those in sqlalchemy.sql.expression) so I am able to permuate the tree and execute the results?
If you don't already have the view defined in SQLAlchemy, I don't think it can help you.
You could try something like sqlparse which might get you some of the way there.
You could permute it's output and execute the permutations as raw sql using SQLA.

Django objects.filter, how "expensive" would this be?

I am trying to make a search view in Django. It is a search form with freetext input + some options to select, so that you can filter on years and so on. This is some of the code I have in the view so far, the part that does the filtering. And I would like some input on how expensive this would be on the database server.
soknad_list = Soknad.objects.all()
if var1:
soknad_list = soknad_list.filter(pub_date__year=var1)
if var2:
soknad_list = soknad_list.filter(muncipality__name__exact=var2)
if var3:
soknad_list = soknad_list.filter(genre__name__exact=var3)
# TEXT SEARCH
stop_word_list = re.compile(STOP_WORDS, re.IGNORECASE)
search_term = '%s' % request.GET['q']
cleaned_search_term = stop_word_list.sub('', search_term)
cleaned_search_term = cleaned_search_term.strip()
if len(cleaned_search_term) != 0:
soknad_list = soknad_list.filter(Q(dream__icontains=cleaned_search_term) | Q(tags__icontains=cleaned_search_term) | Q(name__icontains=cleaned_search_term) | Q(school__name__icontains=cleaned_search_term))
So what I do is, first make a list of all objects, then I check which variables exists (I fetch these with GET on an earlier point) and then I filter the results if they exists. But this doesn't seem too elegant, it probably does a lot of queries to achieve the result, so is there a better way to this?
It does exactly what I want, but I guess there is a better/smarter way to do this. Any ideas?
filter itself doesn't execute a query, no query is executed until you explicitly fetch items from query (e.g. get), and list( query ) also executes it.
You can see the query that will be generated by using:
soknad_list.query.as_sql()[0]
You can then put that into your database shell to see how long the query takes, or use EXPLAIN (if your database backend supports it) to see how expensive it is.
As Aaron mentioned, you should get a hold of the query text that is going to be run against the database and use an EXPLAIN (or other some method) to view the query execution plan. Once you have a hold of the execution plan for the query you can see what is going on in the database itself. There are a lot of operations that see very expensive to run through procedural code that are very trivial for any database to run, especially if you provide indexes that the database can use for speeding up your query.
If I read your question correctly, you're retrieving a result set of all rows in the Soknad table. Once you have these results back you use the filter() method to trim down your results meet your criteria. From looking at the Django documentation, it looks like this will do an in-memory filter rather than re-query the database (of course, this really depends on which data access layer you're using and not on Django itself).
The most optimal solution would be to use a full-text search engine (Lucene, ferret, etc) to handle this for you. If that is not available or practical the next best option would be to to construct a query predicate (WHERE clause) before issuing your query to the database and let the database perform the filtering.
However, as with all things that involve the database, the real answer is 'it depends.' The best suggestion is to try out several different approaches using data that is close to production and benchmark them over at least 3 iterations before settling on a final solution to the problem. It may be just as fast, or even faster, to filter in memory rather than filter in the database.

Categories