I have a queryset that I want to paginate through alphabetically.
employees = Employee.nodes.order_by('name')
I want to compare the first letter of the employee's name name[0] to the letter that I am iterating on. - but I don't know how to filter based on conditions applied to my attribute.
employees_by_letter = []
for letter in alphabet:
employees_by_this_letter = employees.filter(name[0].lower()=letter)
employees_by_letter.append(employees_by_this_letter)
"""error -- SyntaxError: keyword can't be an expression"""
I suppose I could iterate through each employee object and append a value for their first letter... but there has to be a better way.
Well this is Python, and in Python parameter names are identifiers. You can not drop some sort of expression into it.
Django has however some ways to do advanced filtering. In your case, you want to use the __istartswith filter:
employees_by_letter = [
employees.filter(name__istartswith=letter)
for letter in alphabet
]
This is a list comprehension that will generate such that for every letter in alphabet, the corresponding queryset is in the list.
Note however that since you eventually will fetch every element, I would actually recommend iterating (or performing a groupby for example).
Like:
from django.db.models.functions import Lower
fromiteratools import groupby
employees_by_letter = {
k: list(v)
for k, v in groupby(
Employee.annotate(lowname=Lower('name')).nodes.order_by('lowname'),
lambda x: x.lowname[:1]
)
}
this will construct a dictionary with as key the lowercase letter (or an empty string, if there are strings with an empty name), and these all map to lists of Employee instances: the Employees with a name that start with that letter. So that means we perform only a single query on the database. Django querysets are however lazy, so in case you plan to only really fetch a few of the querysets, then the former can be more efficient.
Related
I was trying to combine filtering Django data through a combination of dictionary and regular expression. So my dictionary looks like this:
dict = {key: r'.*?' + value + '.*?$'}
where I have key and value as string variables.
And I want to filter the data such that all the rows I retrieve have its key index's values contains the string value.
table.objects.all().filter(**dict)
However, this does not give me the rows I desire; in fact, it gives me no row at all while all the rows with index key should have shown. I'd appreciate any insights into what might have gone wrong here and how I can begin fixing it. Thank you!
Edit: I would prefer to use the Dictionary approach to filtering because I'm taking the keys to filter as a substring given in the URL parameters. If you think there is any other way I can retrieve the string from the URL and do something like:
table.objects.all().filter(key: r'.*?' + value '*?$' ).
I would appreciate it as well, just not sure how to do that given I only have variable key as a string!
Edit: key is "city", value is "Philadelphi", and I'm looking for rows that have index "city" = "Philadelphia", but those do not show.
You say that key is "city" and value is "Philadelphi", if we replace these in the query you make you basically write:
table.objects.all().filter(city= r'.*?Philadelphi*?$')
If the problem is not yet visible, then when one doesn't specify a lookup it is automatically assumed to be an exact lookup, you instead want to either use the regex or iregex lookup. Also you should escape your value when you put it in a regex:
import re
key = '%s__iregex' % key
value = r'.*?%s.*?$' % re.escape(value)
dict = {key: value}
table.objects.all().filter(**dict)
Also as suggested in the comments your regex doesn't seem to be different from a contains lookup so unless you have some different regex you should use:
key = '%s__contains' % key
dict = {key: value}
table.objects.all().filter(**dict)
Code:
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Albert', 'Steven']
for letter, name in itertools.groupby(names, first_letter):
print(letter, list(name))
Output:
A ['Alan', 'Adam']
W ['Wes']
A ['Albert']
S ['Steven']
I want to group by the first element, but it seems not work well, what's wrong here?
As you would expect form any function in itertools, groupby operates on sequences of elements that share a common key. You have to remember that an iterator can be any source of sequential data, possibly one that doesn't store is own elements as a list does.
What this means is that if the data is not already grouped within the iterator, groupby won't work the way you expect. Put it another way, groupby starts another group whenever the key changes, regardless of whether the key has already appeared in the sequence or not.
Probably the easiest way to pre-group the data in your case is to sort it. Lists can be sorted in-place:
names=['Alan','Adam','Wes','Albert','Steven']
names.sort()
for letter, name in itertools.groupby(names, first_letter):
print( letter, list(name))
A similar result could be obtained by distributing your list into a dictionary. I use collections.defaultdict below because it makes adding new elements easier. You could use a regular dictionary just as easily:
grouped = collections.defaultdict(list)
for name in names:
grouped[name[0]].append(name)
for letter, group in grouped.items():
print(letter, group)
In either case, the point is that you can't expect groupby to do exactly what you want with the order of elements in your raw data.
I have a dictionary with each keys having multiple values in a list.
The tasks are:
To detect whether a given word is in the dictionary values
If it is true, then return the respective key from the dictionary
Task 1 is achieved by using an if condition:
if (word in dictionary[topics] for topics in dictionary.keys())
I want to get the topics when the if condition evaluates to be True. Something like
if (word in dictionary[topics] for topics in dictionary.keys()):
print topics
You can use a list comprehension (which is like a compressed for loop). They are simpler to write and can in some circumstances be faster to compute:
topiclist = [topic for topic in dictionary if word in dictionary[topic]]
You don't need dictionary.keys() because a dict is already an iterable object; iterating over it will yield the keys anyway, and (in Python 2) in a more efficient way than dictionary.keys().
EDIT:
Here is another way to approach this (it avoids an extra dictionary look up):
topiclist = [topic for (topic, tlist) in dictionary.items() if word in tlist]
Avoiding the extra dictionary lookup may make it faster, although I haven't tested it.
In Python 2, for efficiency sake, you may want to do:
topiclist = [topic for (topic, tlist) in dictionary.iteritems() if word in tlist]
if (word in dictionary[topics] for topics in dictionary.keys())
the problem with the above line is that you are creating a generator object that assesses whether word is in each value of dictionary and returning a bool for each. Since non-empty lists are always true, this if statement will ALWAYS be true, regardless if the word is in the values or not. you can do 2 things:
using any() will make your if statement work:
if any(word in dictionary[topics] for topics in dictionary.keys()):
however, this does not solve your initial problem of capturing the key value. so instead:
use an actual list comprehension that uses the predefined (I assume) variable word as a filter of sorts:
keys = [topics for topics in dictionary if word in dictionary[topics]]
or
use filter()
keys = filter(lambda key: word in dictionary[key],dictionary)
these both do the same thing. reminder that iterating through dictionary and dictionary.keys() are equivalent
just a note that both these methods return a list of all the keys that have values containing word. Access each key with regular list item getting.
It sounds like the word you are searching for will be found in only one key. Correct?
If so, you can just iterate over the dictionary's key-value pairs until you find the key that contains the search word.
For Python 2:
found = False
for (topic, value) in dictionary.iteritems():
if word in topic:
found = True
print topic
break
For Python 3, just replace iteritems() with items().
I have a list of filenames conforming to the pattern: s[num][alpha1][alpha2].ext
I need to sort, first by the number, then by alpha1, then by alpha2. The last two aren't alphabetical, however, but rather should reflect a custom ordering.
I've created two lists representing the ordering for alpha1 and alpha2, like so:
alpha1Order = ["Fizz", "Buzz", "Ipsum", "Dolor", "Lorem"]
alpha2Order = ["Sit", "Amet", "Test"]
What's the best way to proceed? My first though was to tokenize (somehow) such that I split each filename into its component parts (s, num, alpha1, alpha2), then sort, but I wasn't quite sure how to perform such a complicated sort. Using a key function seemed clunky, as this sort didn't seem to lend itself to a simple ordering.
Once tokenized, your data is perfectly orderable with a key function. Just return the index of the alpha1Order and alpha2Order lists for the value. Replace them with dictionaries to make the lookup easier:
alpha1Order = {token: i for i, token in enumerate(alpha1Order)}
alpha2Order = {token: i for i, token in enumerate(alpha2Order)}
def keyfunction(filename):
num, alpha1, alpha2 = tokenize(filename)
return int(num), alpha1Order[alpha1], alpha2Order[alpha2]
This returns a tuple to sort on; Python will use the first value to sort on, ordering anything that has the same int(num) value by the second entry, using the 3rd to break any values tied on the first 2 entries.
Trying to count the matches across all columns.
I currently use this code to copy across certain fields from a Scrapy item.
def getDbModel(self, item):
deal = { "name":item['name'] }
if 'imageURL' in item:
deal["imageURL"] = item['imageURL']
if 'highlights' in item:
deal['highlights'] = replace_tags(item['highlights'], ' ')
if 'fine_print' in item:
deal['fine_print'] = replace_tags(item['fine_print'], ' ')
if 'description' in item:
deal['description'] = replace_tags(item['description'], ' ')
if 'search_slug' in item:
deal['search_slug'] = item['search_slug']
if 'dealURL' in item:
deal['dealurl'] = item['dealURL']
Wondering how I would turn this into an OR search in mongodb.
I was looking at something like the below:
def checkDB(self,item):
# Check if the record exists in the DB
deal = self.getDbModel(item)
return self.db.units.find_one({"$or":[deal]})
Firstly, Is this the best method to be doing?
Secondly, how would I find the count of the amount of columns matched i.e. trying to limit records that match at least two columns.
There is no easy way of counting the number of colum matches on MongoDBs end, it just kinda matches and then returns.
You would probably be better doing this client side, I am unsure exactly how you intend to use this count figure but there is no easy way, whether through MR or aggregation framework of doing this.
You could, in the aggregation framework, change your schema a little to put these colums within a properties field and then $sum the matches within the subdocuemnt. This is a good approach since you can also sort on it to create a type of relevance search (if that is what your intending).
As to whether this is a good approach depends. When using an $or MongoDB will use an index for each condition, this is a special case within MongoDB indexing, however it does mean you should take this into consideration when making an $or and ensure you have indexes to cover each condition.
You have also got to consider that MongoDB will effectively eval each clause and then merge the results to remove duplicates, which can be heavy for bigger $ors or a large working set.
Of course the format of your $or is wrong, you need an array of arrays of your fields. At the minute you have a single array with another array which has all your attributes. When used like this the attributes will actually have an $and condition between them so it won't work.
You could probably change your code to:
def getDbModel(self, item):
deal = []
deal[] = { "name":item['name'] }
if 'imageURL' in item:
deal[] = {"imageURL": tem['imageURL']}
if 'highlights' in item:
// etc
// Some way down
return self.db.units.find_one({"$or":deal})
NB: I am not a Python programmer
Hope it helps,