Get 10 random elements in Django template [duplicate] - python

How do I get two distinct random records using Django? I've seen questions about how to get one but I need to get two random records and they must differ.

The order_by('?')[:2] solution suggested by other answers is actually an extraordinarily bad thing to do for tables that have large numbers of rows. It results in an ORDER BY RAND() SQL query. As an example, here's how mysql handles that (the situation is not much different for other databases). Imagine your table has one billion rows:
To accomplish ORDER BY RAND(), it needs a RAND() column to sort on.
To do that, it needs a new table (the existing table has no such column).
To do that, mysql creates a new, temporary table with the new columns and copies the existing ONE BILLION ROWS OF DATA into it.
As it does so, it does as you asked, and runs rand() for every row to fill in that value. Yes, you've instructed mysql to GENERATE ONE BILLION RANDOM NUMBERS. That takes a while. :)
A few hours/days later, when it's done it now has to sort it. Yes, you've instructed mysql to SORT THIS ONE BILLION ROW, WORST-CASE-ORDERED TABLE (worst-case because the sort key is random).
A few days/weeks later, when that's done, it faithfully grabs the two measly rows you actually needed and returns them for you. Nice job. ;)
Note: just for a little extra gravy, be aware that mysql will initially try to create that temp table in RAM. When that's exhausted, it puts everything on hold to copy the whole thing to disk, so you get that extra knife-twist of an I/O bottleneck for nearly the entire process.
Doubters should look at the generated query to confirm that it's ORDER BY RAND() then Google for "order by rand()" (with the quotes).
A much better solution is to trade that one really expensive query for three cheap ones (limit/offset instead of ORDER BY RAND()):
import random
last = MyModel.objects.count() - 1
index1 = random.randint(0, last)
# Here's one simple way to keep even distribution for
# index2 while still gauranteeing not to match index1.
index2 = random.randint(0, last - 1)
if index2 == index1: index2 = last
# This syntax will generate "OFFSET=indexN LIMIT=1" queries
# so each returns a single record with no extraneous data.
MyObj1 = MyModel.objects.all()[index1]
MyObj2 = MyModel.objects.all()[index2]

If you specify the random operator in the ORM I'm pretty sure it will give you two distinct random results won't it?
MyModel.objects.order_by('?')[:2] # 2 random results.

For the future readers.
Get the the list of ids of all records:
my_ids = MyModel.objects.values_list('id', flat=True)
my_ids = list(my_ids)
Then pick n random ids from all of the above ids:
n = 2
rand_ids = random.sample(my_ids, n)
And get records for these ids:
random_records = MyModel.objects.filter(id__in=rand_ids)

Object.objects.order_by('?')[:2]
This would return two random-ordered records. You can add
distinct()
if there are records with the same value in your dataset.

About sampling n random values from a sequence, the random lib could be used,
random.Random().sample(range(0,last),2)
will fetch 2 random samples from among the sequence elements, 0 to last-1

from django.db import models
from random import randint
from django.db.models.aggregates import Count
class ProductManager(models.Manager):
def random(self, count=5):
index = randint(0, self.aggregate(count=Count('id'))['count'] - count)
return self.all()[index:index + count]
You can get different number of objects.

class ModelName(models.Model):
# Define model fields etc
#classmethod
def get_random(cls, n=2):
"""Returns a number of random objects. Pass number when calling"""
import random
n = int(n) # Number of objects to return
last = cls.objects.count() - 1
selection = random.sample(range(0, last), n)
selected_objects = []
for each in selection:
selected_objects.append(cls.objects.all()[each])
return selected_objects

Related

Keeping count of values available from among multiple sets

I have the following situation:
I am generating n combinations of size 3 from, made from n values. Each kth combination [0...n] is pulled from a pool of values, located in the kth index of a list of n sets. Each value can appear 3 times. So if I have 10 values, then I have a list of size 10. Each index holds a set of values 0-10.
So, it seems to me that a good way to do this is to have something keeping count of all the available values from among all the sets. So, if a value is rare(lets say there is only 1 left), if I had a structure where I could look up the rarest value, and have the structure tell me which index it was located in, then it would make generating the possible combinations much easier.
How could I do this? What about one structure to keep count of elements, and a dictionary to keep track of list indices that contain the value?
edit: I guess I should put in that a specific problem I am looking to solve here, is how to update the set for every index of the list (or whatever other structures i end up using), so that when I use a value 3 times, it is made unavailable for every other combination.
Thank you.
Another edit
It seems that this may be a little too abstract to be asking for solutions when it's hard to understand what I am even asking for. I will come back with some code soon, please check back in 1.5-2 hours if you are interested.
how to update the set for every index of the list (or whatever other structures i end up using), so that when I use a value 3 times, it is made unavailable for every other combination.
I assume you want to sample the values truly randomly, right? What if you put 3 of each value into a list, shuffle it with random.shuffle, and then just keep popping values from the end of the list when you're building your combination? If I'm understanding your problem right, here's example code:
from random import shuffle
valid_values = [i for i in range(10)] # the valid values are 0 through 9 in my example, update accordingly for yours
vals = 3*valid_values # I have 3 of each valid value
shuffle(vals) # randomly shuffle them
while len(vals) != 0:
combination = (vals.pop(), vals.pop(), vals.pop()) # combinations are 3 values?
print(combination)
EDIT: Updated code based on the added information that you have sets of values (but this still assumes you can use more than one value from a given set):
from random import shuffle
my_sets_of_vals = [......] # list of sets
valid_values = list()
for i in range(my_sets_of_vals):
for val in my_sets_of_vals[i]:
valid_values.append((i,val)) # this can probably be done in list comprehension but I forgot the syntax
vals = 3*valid_values # I have 3 of each valid value
shuffle(vals) # randomly shuffle them
while len(vals) != 0:
combination = (vals.pop()[1], vals.pop()[1], vals.pop()[1]) # combinations are 3 values?
print(combination)
Based on the edit you could make an object for each value. It could hold the number of times you have used the element and the element itself. When you find you have used an element three times, remove it from the list

How to get many numbers of random records from database?

I know this question has been asked many times, but most of the answers use this method : MyModel.objects.order_by('?')[:4] which will kill the database in the second day of production.
I'm looking for a faster and lightweight solution, I'm using the one below for now, but I would like to get more than 1 random query (4 for example).
views.py
last = MyModel.objects.count() - 1
random_int = random.randint(0, last)
records = MyModel.objects.all()[random_int] #one record only
Any solution ?
As long as you need a reasonable number of elements (such as 4), I'd sample based on the queryset index rather than the id. The drawback is that you need 4 queries instead of 1, but this way you get a consistent performance that doesn't depend on the number and size of gaps in between your primary key values.
count = MyModel.objects.count()
sample = random.sample(range(count), 4)
records = [MyModel.objects.all()[i] for i in sample]
Indexing a queryset uses LIMIT and OFFSET, so the indexing is based on the number of rows in the database, not on the id.
Use in keyword and generate random sample of numbers
random_num_sample = random.sample(range(0, last), 4)
records = MyModel.objects.filter(id__in=random_num_sample)
If the database gets frequent deletes than this solution is not viable,
If deletes are not moderate you can still use following by randomizing sample with more than 4 elements
You can use in keyword.
random_num = [random.randint(0, last) for i in range(4)]
Then use
queryset_obj = MyModel.objects.filter(id__in=random_num)

Evenly Distributed Slice of Django QuerySet (or Python list)

I'd like to get some values from a database and slice "n" of them, say to draw on a graph.
I'm starting with a Django QuerySet, but I don't think this is something I can reduce just via the ORM, so I'm fine with getting a list of values and then using whatever libs in Python are available to get a sample of the data.
So if I have a dataset of a thousand items, I'd like to be able to grab an evenly distributed range of non-random samples from that dataset, always including the start and end points and then evenly distributing the elements in between.
For instance, if:
data = [x for x in xrange(0, 777)]
and I wanted ten of them, how would I get a list back not of every 10th item, but exactly ten list items distributed evenly over the total number of elements in the list?
I'm trying:
number_of_results = 10
step = len(data) / number_of_results
data[::step]
But I'm hoping there's a more efficient way (and also a way that keeps the end point and returns exactly number_of_results items, even if the step between the items can't be exactly even).
There might be more efficient ways of doing it, but this is just a thought:
qs_ids = list(ModelA.objects.order_by('id').values_list('id', flat=True))
number_of_results = 10
step = len(qs_ids) / number_of_results
ids = data[::step]
qs = ModelA.objects.filter(id__in=ids).order_by('id')

Program running too slow! : Suggest Algorithmic/Implementation Optimization

I have a huge python list(A) of lists. The length of list A is around 90,000. Each inner list contain around 700 tuples of (datetime.date,string). Now, I am analyzing this data. What I am doing is I am taking a window of size x in inner lists where- x = len(inner list) * (some fraction <= 1) and I am saving each ordered pair (a,b) where a occurs before b in that window (actually the innerlists are sorted wrt time). I am moving this window upto the last element adding one element at a time from one end and removing from other which takes O(window-size)time as I am considering the new tuples only. My code:
for i in xrange(window_size):
j = i+1;
while j<window_size:
check_and_update(cur, my_list[i][1], my_list[j][1],log);
j=j+1
i=1;
while i<=len(my_list)-window_size:
j=i;
k=i+window_size-1;
while j<k:
check_and_update(cur, my_list[j][1], my_list[k][1],log);
j+=1
i += 1
Here cur is actually a sqlite3 database cursor,my_list is a list containing the tuples and I iterate this code for all the lists in A and log is a opened logfile. In method check_and_update() I am looking up my database to find the tuple if exists or else I insert it, along with its total number of occurrence so far. Code:
def check_and_update(cur,start,end,log):
t = str(start)+":"+ str(end)
cur.execute("INSERT OR REPLACE INTO Extra (tuple,count)\
VALUES ( ? , coalesce((SELECT count +1 from Extra WHERE tuple = ?),1))",[t,t])
As expected this number of tuples is HUGE and I have previously experimented with dictionary which eats up the memory quite fast. So, I resorted to SQLite3, but now it is too slow. I have tried indexing but with no help. Probably the my program is spending way to much time querying and updating the database. Do you have any optimization ideas for this problem? Probably changing the algorithm or some different approach/tools. Thank you!
Edit: My goal here is to find the total number of tuples of strings that occur within the window grouped by the number of different innerlists they occur in. I extract this information with this query:
for i in range(1,size+1):
cur.execute('select * from Extra where count = ?',str(i))
#other stuff
For Example ( I am ignoring the date entries and will write them as 'dt'):
My_list = [
[ ( dt,'user1') , (dt, 'user2'), (dt, 'user3') ]
[ ( dt,'user3') , (dt, 'user4')]
[ ( dt,'user2') , (dt, 'user3'), (dt,'user1') ]
]
here if I take fraction = 1 then, results:
only 1 occurrence in window: 5 (user 1-2,1-3,3-4,2-1,3-1)
only 2 occurrence in window: 2 (user 2-3)
Let me get this straight.
You have up to about 22 billion potential tuples (for 90000 lists, any of 700, any of the following entries, on average 350) which might be less depending on the window size. You want to find, but number of inner lists that they appear in, how many tuples there are.
Data of this size has to live on disk. The rule for data that lives on disk due to size is, "Never randomly access, instead generate and then sort."
So I'd suggest that you write out each tuple to a log file, one tuple per line. Sort that file. Now all instances of any given tuple are in one place. Then run through the file, and for each tuple emit the count of how many times it appears in (that is how many inner lists it is in). Sort that second file. Now run through that file, and you can extract how many tuples appeared 1x, 2x, 3x, etc.
If you have multiple machines, it is easy to convert this into a MapReduce. (Which is morally the same approach, but you get to parallelize a lot of stuff.)
Apache Hadoop is one of the MapReduce implementations that is suited for this kind of problem:

3 questions about appengine indexes

I got the following situation
class M(db.Model):
a = db.ReferenceProperty(A)
x = db.ReferenceProperty(X)
y = db.ReferenceProperty(Y)
z = db.ReferenceProperty(Z)
items = db.StringListProperty()
date = db.DateTimeProperty()
I want to make queries that filter on (a), (x, y or z) and (items), ordered by date i.e.
mm = M.all().filter('a =', a1).filter('x =', x1).filter('items =', i).order('-date')
There will never be a query with filter on x and y at the same time, for example.
So, my questions are:
1) How many (and which) indexes should I create?
2) How many 'strings' can I add on items? (I'd like to add in the order of thousands)
3) How many index records will I have on a single "M" if there are 1000 items?
I don't quite yet understand this index stuff, and is killing me. Your help will be very appreciated :)
This article explains indexes/exploding indexes quite well, and it actually fits your example: https://developers.google.com/appengine/docs/python/datastore/queries#Big_Entities_and_Exploding_Indexes
Your biggest issue will be the fact that you will probably run into the 5000 indexes per entity limit with thousands of items. If you take an index for a, x, items (1000 items), date: |a||x||items|*|date| == 1*1*1000*1 == 1000.
If you have 5001 entries in items, the put() will fail with the appropriate exception.
From the example you provided, whether you filter on x, y or anything else seems irrelevant, as there is only 1 of that property, and therefore you do not run the chance of an exploding index. 1*1 == 1.
Now, if you had two list properties, you would want to make sure that they are indexed separately, otherwise you'd get an exploding index. For example, if you had 2 list properties with 100 items each, that would produce 100*100 indexes, unless you split them, which would then result in only 200 (assuming all other properties were non-lists).
For the criteria you have given, you only need to create three compound indexes: a,x,items,-date, a,y,items,-date, a,z,items,-date. Note that a list property creates an index entry for each property in the list.
There is a limit of total 5000 index entries per entity. If you only have three compound indexes, than it's 5000/3 = 1666 (capped at 1000 for a single list property).
In the case of three compound indexes only, 3*1000 = 3000.
NOTE: the above assumes you do not have built-in indexes per property (= properties are save as unindexed). Otherwise you need to account for built-in indexes at 2N, where N in number of single properties (2 is for asc, desc). In your case this would be 2*(5 + no_items), since items is a list property and every entry creates an index entry.
See also https://developers.google.com/appengine/articles/indexselection, which describes App Engine's (relatively recent) improved query planning capabilities. Basically, you can reduce the number of index entries needed to: (number of filters + 1) * (number of orders).
Though, as the article discusses, there can be reasons that you might still use compound indexes-- essentially, there is a time/space tradeoff.

Categories