Django get a random object - python

I am trying to get a random object from a model A
For now, it is working well with this code:
random_idx = random.randint(0, A.objects.count() - 1)
random_object = A.objects.all()[random_idx]
But I feel this code is better:
random_object = A.objects.order_by('?')[0]
Which one is the best? Possible problem with deleted objects using the first code? Because, for example, I can have 10 objects but the object with the number 10 as id, is not existing anymore? Did I have misunderstood something in A.objects.all()[random_idx] ?

Just been looking at this. The line:
random_object = A.objects.order_by('?')[0]
has reportedly brought down many servers.
Unfortunately Erwans code caused an error on accessing non-sequential ids.
There is another short way to do this:
import random
items = list(Product.objects.all())
# change 3 to how many random items you want
random_items = random.sample(items, 3)
# if you want only a single random item
random_item = random.choice(items)
The good thing about this is that it handles non-sequential ids without error.

Improving on all of the above:
from random import choice
pks = A.objects.values_list('pk', flat=True)
random_pk = choice(pks)
random_obj = A.objects.get(pk=random_pk)
We first get a list of potential primary keys without loading any Django object, then we randomly choose one primary key, and then we load the chosen object only.

The second bit of code is correct, but can be slower, because in SQL that generates an ORDER BY RANDOM() clause that shuffles the entire set of results, and then takes a LIMIT based on that.
The first bit of code still has to evaluate the entire set of results. E.g., what if your random_idx is near the last possible index?
A better approach is to pick a random ID from your database, and choose that (which is a primary key lookup, so it's fast). We can't assume that our every id between 1 and MAX(id) is available, in the case that you've deleted something. So following is an approximation that works out well:
import random
# grab the max id in the database
max_id = A.objects.order_by('-id')[0].id
# grab a random possible id. we don't know if this id does exist in the database, though
random_id = random.randint(1, max_id + 1)
# return an object with that id, or the first object with an id greater than that one
# this is a fast lookup, because your primary key probably has a RANGE index.
random_object = A.objects.filter(id__gte=random_id)[0]

How about calculating maximal primary key and getting random pk?
The book ‘Django ORM Cookbook’ compares execution time of the following functions to get random object from a given model.
from django.db.models import Max
from myapp.models import Category
def get_random():
return Category.objects.order_by("?").first()
def get_random3():
max_id = Category.objects.all().aggregate(max_id=Max("id"))['max_id']
while True:
pk = random.randint(1, max_id)
category = Category.objects.filter(pk=pk).first()
if category:
return category
Test was made on a million DB entries:
In [14]: timeit.timeit(get_random3, number=100)
Out[14]: 0.20055226399563253
In [15]: timeit.timeit(get_random, number=100)
Out[15]: 56.92513192095794
See source.
After seeing those results I started using the following snippet:
from django.db.models import Max
import random
def get_random_obj_from_queryset(queryset):
max_pk = queryset.aggregate(max_pk=Max("pk"))['max_pk']
while True:
obj = queryset.filter(pk=random.randint(1, max_pk)).first()
if obj:
return obj
So far it did do the job as long as there is an id.
Notice that the get_random3 (get_random_obj_from_queryset) function won’t work if you replace model id with uuid or something else. Also, if too many instances were deleted the while loop will slow the process down.

Yet another way:
pks = A.objects.values_list('pk', flat=True)
random_idx = randint(0, len(pks)-1)
random_obj = A.objects.get(pk=pks[random_idx])
Works even if there are larger gaps in the pks, for example if you want to filter the queryset before picking one of the remaining objects at random.
EDIT: fixed call of randint (thanks to #Quique). The stop arg is inclusive.
https://docs.python.org/3/library/random.html#random.randint

I'm sharing my latest test result with Django 2.1.7, PostgreSQL 10.
students = Student.objects.all()
for i in range(500):
student = random.choice(students)
print(student)
# 0.021996498107910156 seconds
for i in range(500):
student = Student.objects.order_by('?')[0]
print(student)
# 0.41299867630004883 seconds
It seems that random fetching with random.choice() is about 2x faster.

in python for getting a random member of a iterable object like list,set, touple or anything else you can use random module.
random module have a method named choice, this method get a iterable object and return a one of all members randomly.
so becouse random.choice want a iterable object you can use this method for queryset in django.
first import the random module:
import random
then create a list:
my_iterable_object = [1, 2, 3, 4, 5, 6]
or create a query_set like this:
my_iterable_object = mymodel.objects.filter(name='django')
and for getting a random member of your iterable object use choice method:
random_member = random.choice(my_iterable_object)
print(random_member) # my_iterable_object is [1, 2, 3, 4, 5, 6]
3
full code:
import random
my_list = [1, 2, 3, 4, 5, 6]
random.choice(my_list)
2

import random
def get_random_obj(model, length=-1):
if length == -1:
length = model.objects.count()
return model.objects.all()[random.randint(0, length - 1)]
#to use this function
random_obj = get_random_obj(A)

Related

Start a dictionary for loop at a specific key value

Here is the code:
EDIT**** Please no more "it's not possible with unordered dictionary replies". I pretty much already know that. I made this post on the off-chance that it MIGHT be possible or someone has a workable idea.
#position equals some set of two dimensional coords
for name in self.regions["regions"]: # I want to start the iteration with 'last_region'
# I don't want to run these next two lines over every dictionary key each time since the likelihood is that the new
# position is still within the last region that was matched.
rect = (self.regions["regions"][name]["pos1"], self.regions["regions"][name]["pos2"])
if all(self.point_inside(rect, position)):
# record the name of this region in variable- 'last_region' so I can start with it on the next search...
# other code I want to run when I get a match
return
return # if code gets here, the points were not inside any of the named regions
Hopefully the comments in the code explain my situation well enough. Lets say I was last inside region "delta" (i.e., the key name is delta, the value will be sets of coordinates defining it's boundaries) and I have 500 more regions. The first time I find myself in region delta, the code may not have discovered this until, let's say (hypothetically), the 389th iteration... so it made 388 all(self.point_inside(rect, position)) calculations before it found that out. Since I will probably still be in delta the next time it runs (but I must verify that each time the code runs), it would be helpful if the key "delta" was the first one that got checked by the for loop.
This particular code can be running many times a second for many different users.. so speed is critical. The design is such that very often, the user will not be in a region and all 500 records may need to be cycled through and will exit the loop with no matches, but I would like to speed the overall program up by speeding it up for those that are presently in one of the regions.
I don't want an additional overhead of sorting the dictionary in any particular order, etc.. I just want it to start looking with the last one that it successfully matched all(self.point_inside(rect, position))
Maybe this will help a bit more.. The following is the dictionary I am using (only the first record shown) so you can see the structure I coded to above... and yes, despite the name "rect" in the code, it actually checks for the point in a cubical region.
{"regions": {"shop": {"flgs": {"breakprot": true, "placeprot": true}, "dim": 0, "placeplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}, "breakplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}, "protected": true, "banplayers": {}, "pos1": [5120025, 60, 5120208], "pos2": [5120062, 73, 5120257], "ownerUuid": "4f953255-6775-4dc6-a612-fb4230588eff", "accessplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}}, more, more, more...}
You may try to implement some caching mechanism within a custom subclass of dict.
You could set a self._cache = None in __init__, add a method like set_cache(self, key) to set the cache and finally overriding __iter__ to yield self._cache before calling the default __iter__.
However, that can be kinda cumbersome, if you consider this stackoverflow answer and also this one.
For what it's written in your question, I would try, instead, to implement this caching logic in your code.
def _match_region(self, name, position):
rect = (self.regions["regions"][name]["pos1"], self.regions["regions"][name]["pos2"])
return all(self.point_inside(rect, position))
if self.last_region and self._match_region(self.last_region, position):
self.code_to_run_when_match(position)
return
for name in self.regions["regions"]:
if self._match_region(name, position):
self.last_region = name
self.code_to_run_when_match(position)
return
return # if code gets here, the points were not inside any of the named regions
That is right, dictionary is an unordered type. Therefore OrderedDict won't help you much for what you want to do.
You could store the last region into your class. Then, on the next call, check if last region is still good before check the entire dictionary ?
Instead of a for-loop, you could use iterators directly. Here's an example function that does something similar to what you want, using iterators:
def iterate(what, iterator):
iterator = iterator or what.iteritems()
try:
while True:
k,v = iterator.next()
print "Trying k = ", k
if v > 100:
return iterator
except StopIteration:
return None
Instead of storing the name of the region in last_region, you would store the result of this function, which is like a "pointer" to where you left off. Then, you can use the function like this (shown as if run in the Python interactive interpreter, including the output):
>>> x = {'a':12, 'b': 42, 'c':182, 'd': 9, 'e':12}
>>> last_region = None
>>> last_region = iterate(x, last_region)
Trying k = a
Trying k = c
>>> last_region = iterate(x, last_region)
Trying k = b
Trying k = e
Trying k = d
Thus, you can easily resume from where you left off, but there's one additional caveat to be aware of:
>>> last_region = iterate(x, last_region)
Trying k = a
Trying k = c
>>> x['z'] = 45
>>> last_region = iterate(x, last_region)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in iterate
RuntimeError: dictionary changed size during iteration
As you can see, it'll raise an error if you ever add a new key. So, if you use this method, you'll need to be sure to set last_region = None any time you add a new region to the dictionary.
TigerhawkT3 is right. Dicts are unordered in a sense that there is no guaranteed order or keys in the given dictionary. You can even have different order of keys if you iterate over same dictionary. If you want order you need to use either OrderedDict or just plain list. You can convert your dict to list and sort it the way it represents the order you need.
Without knowing what your objects are and whether self in the example is a user instance or an environment instance it is hard to come up with a solution. But if self in the example is the environment, its Class could have a class attribute that is a dictionary of all current users and their last known position, if the user instance is hashable.
Something like this
class Thing(object):
__user_regions = {}
def where_ami(self, user):
try:
region = self.__user_regions[user]
print 'AHA!! I know where you are!!'
except KeyError:
# find region
print 'Hmmmm. let me think about that'
region = 'foo'
self.__user_regions[user] = region
class User(object):
def __init__(self, position):
self.pos = position
thing = Thing()
thing2 = Thing()
u = User((1,2))
v = User((3,4))
Now you can try to retrieve the user's region from the class attribute. If there is more than one Thing they would share that class attribute.
>>>
>>> thing._Thing__user_regions
{}
>>> thing2._Thing__user_regions
{}
>>>
>>> thing.where_ami(u)
Hmmmm. let me think about that
>>>
>>> thing._Thing__user_regions
{<__main__.User object at 0x0433E2B0>: 'foo'}
>>> thing2._Thing__user_regions
{<__main__.User object at 0x0433E2B0>: 'foo'}
>>>
>>> thing2.where_ami(v)
Hmmmm. let me think about that
>>>
>>> thing._Thing__user_regions
{<__main__.User object at 0x0433EA90>: 'foo', <__main__.User object at 0x0433E2B0>: 'foo'}
>>> thing2._Thing__user_regions
{<__main__.User object at 0x0433EA90>: 'foo', <__main__.User object at 0x0433E2B0>: 'foo'}
>>>
>>> thing.where_ami(u)
AHA!! I know where you are!!
>>>
You say that you "don't want an additional overhead of sorting the dictionary in any particular order". What overhead? Presumably OrderedDict uses some additional data structure internally to keep track of the order of keys. But unless you know that this is costing you too much memory, then OrderedDict is your solution. That means profiling your code and making sure that an OrderedDict is the source of your bottleneck.
If you want the cleanest code, just use an OrderedDict. It has a move_to_back method which takes a key and puts it either in the front of the dictionary, or at the end. For example:
from collections import OrderedDict
animals = OrderedDict([('cat', 1), ('dog', 2), ('turtle', 3), ('lizard', 4)])
def check_if_turtle(animals):
for animal in animals:
print('Checking %s...' % animal)
if animal == 'turtle':
animals.move_to_end('turtle', last=False)
return True
else:
return False
Our check_if_turtle function looks through an OrderedDict for a turtle key. If it doesn't find it, it returns False. If it does find it, it returns True, but not after moving the turtle key to the beginning of the OrderedDict.
Let's try it. On the first run:
>>> check_if_turtle(animals)
Checking cat...
Checking dog...
Checking turtle...
True
we see that it checked all of the keys up to turtle. Now, if we run it again:
>>> check_if_turtle(animals)
Checking turtle...
True
we see that it checked the turtle key first.

how to get some fields list by sqlalchemy?

This is how I get all of the field topicid values in Topics table.
all_topicid = [i.topicid for i in session.query(Topics)]
But when Topics table have lots of values, the vps killed this process. So is there some good method to resolve this?
Thanks everyone. I edit my code again, My code is below:
last = session.query(Topics).order_by('-topicid')[0].topicid
all_topicid = [i.topicid for i in session.query(Topics.topicid)]
all_id = range(1, last+1)
diff = list(set(all_id).difference(set(all_topicid)))
I want to get diff. Now it is faster than before. So are there other method to improve this code?
you could try by changing your query to return a list of id's with something like:
all_topic_id = session.query(Topics.topicid).all()
if the table contains duplicate topicid's you could add distinct to the above to return unique values
from sqlalchemy import distinct
all_topic_id = session.query(distinct(Topics.topicid)).all()
if this still causes an issue I would probably go for writing a stored procedure that returns the list of topicid's and have sqlalchemy call it.
for the second part I would do something like the below.
from sqlalchemy import distinct, func
all_topic_id = session.query(distinct(Topics.topicid)).all() # gets all ids
max_id = session.query(func.max(Topics.topicid)).one() # gets the last id
all_ids = range(1, max_number[0] + 1)) # creates list of all id's
missing_ids = list(set(all_topic_ids) - set(max_id)) # creates a list of missing id's

Updating dictionary with randint performing unexpectedly

I'm trying to run a simple program in which I'm trying to run random.randint() in a loop to update a dictionary value but it seems to be working incorrectly. It always seems to be generating the same value.
The program so far is given below. I'm trying to create a uniformly distributed population, but I'm unsure why this isn't working.
import random
__author__ = 'navin'
namelist={
"person1":{"age":23,"region":1},
"person2":{"age":24,"region":2},
"person3":{"age":25,"region":0}
}
def testfunction():
default_val={"age":23,"region":1}
for i in xrange(100):
namelist[i]=default_val
for index in namelist:
x = random.randint(0, 2)
namelist[index]['region']=x
print namelist
if __name__ == "__main__" :
testfunction()
I'm expecting the 103 people to be roughly uniformly distributed across region 0-2, but I'm getting everyone in region 0.
Any idea why this is happening? Have I incorrectly used randint?
It is because all your 100 dictionary entries created in the for loop refer to not only the same value, but the same object. Thus there are only 4 distinct dictionaries at all as the values - the 3 created initially and the fourth one that you add 100 times with keys 0-99.
This can be demonstrated with the id() function that returns distinct integer for each distinct object:
from collections import Counter
...
ids = [ id(i) for i in namelist.values() ]
print Counter(ids)
results in:
Counter({139830514626640: 100, 139830514505160: 1,
139830514504880: 1, 139830514505440: 1})
To get distinct dictionaries, you need to copy the default value:
namelist[i] = default_val.copy()
Or create a new dictionary on each loop
namelist[i] = {"age": 23, "region": 1}
default_val={"age":23,"region":1}
for i in xrange(100):
namelist[i]=default_val
This doesn't mean "set every entry to a dictionary with these particular age and region values". This means "set every entry to this particular dictionary object".
for index in namelist:
x = random.randint(0, 2)
namelist[index]['region']=x
Since every object in namelist is really the same dictionary, all modifications in this loop happen to the same dictionary, and the last value of x wipes the others.
Evaluating a dict literal creates a new dict; assignment does not. If you want to make a new dictionary each time, put the dict literal in the loop:
for i in xrange(100):
namelist[i]={"age":23,"region":1}
Wanted to add this as a comment but the link is too long. As others have said you have just shared the reference to the dictionary, if you want to see the visualisation you can check it out on Python Tutor it should help you grok what's happening.

Why/how does iterating over a list and calling 'pass' each time fix this function?

I have written the following function:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
for i in ids: # I have absolutely no idea why this seems to be required:
pass # without this loop, the last AssessmentEntries fail to update
# in the following for loop.
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
start = end_mark
rating += 1
It does what it is supposed to do (i.e. update the relevant number of entries in assessment_entries_qs with each rating (starting at lowest_rating) as specified in amounts). Here is a simple example:
>>> assessment_entries = AssessmentEntry.objects.all()
>>> print [ae.rating for ae in assessment_entries]
[None, None, None, None, None, None, None, None, None, None]
>>>
>>> auto_update_ratings((2,4,3,1), assessment_entries, 1)
>>> print [ae.rating for ae in assessment_entries]
[1, 1, 2, 2, 2, 2, 3, 3, 3, 4]
However, if I do not iterate through ids before iterating through amounts, the function only updates a subset of the queryset: with my current test data (approximately 250 AssessmentEntries in the queryset), it always results in exactly 84 AssessmentEntries not being updated.
Interestingly, it is always the last iteration of the second for loop that does not result in any updates (although the rest of the code in that iteration does execute properly), as well as a portion of the previous iteration. The querysets are ordered_by('?') prior to being passed to this function, and the intended results are achieved if I simply add the previous 'empty' for loop, so it does not appear to be an issue with my data).
A few more details, just in case they prove to be relevant:
AssessmentEntry.rating is a standard IntegerField(null=True,blank=True).
I am using this function purely for testing purposes, so I have only been executing it from iPython.
Test database is SQLite.
Question: Can someone please explain why I appear to need to iterate through ids, despite not actually touching the data in any way, and why without doing so the function still (sort of) executes correctly, but always fails to update the last few items in the queryset despite apparently still iterating through them?
QuerySets and QuerySet slicing are evaluated lazily. Iterating ids executes the query and makes ids behave like a static list instead of a QuerySet. So when you loop through ids, it causes entries later on to be a fixed set of values; but if you don't loop through ids, then entries is just a subquery with a LIMIT clause added to represent the slicing you do.
Here is what is happening in detail:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
# assessment_entries_qs is an unevaluated QuerySet
# from your calling code, it would probably generate a query like this:
# SELECT * FROM assessments ORDER BY RANDOM()
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
# ids is a ValueQuerySet that adds "SELECT id"
# to the query that assessment_entries_qs would generate.
# So ids is now something like:
# SELECT id FROM assessments ORDER BY RANDOM()
# we omit the loop
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# entries is now another QuerySet with a LIMIT clause added:
# SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark)
# When filter() gets a QuerySet, it adds a subquery
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# FINALLY, we now actually EXECUTE a query which is something like this:
# UPDATE assessments SET rating=? WHERE id IN
# (SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark))
start = end_mark
rating += 1
Since the subquery in entries is executed every time you insert and it has a random order, the slicing you do is meaningless! This function does not have deterministic behavior.
However when you iterate ids you actually execute the query, so your slicing has deterministic behavior again and the code does what you expect.
Let's see what happens when you use a loop instead:
ids = assessment_entries_qs.values_list('id', flat=True)
# Iterating ids causes the query to actually be executed
# This query was sent to the DB:
# SELECT id FROM assessments ORDER BY RANDOM()
for id in ids:
pass
# ids has now been "realized" and contains the *results* of the query
# e.g., [5,1,2,3,4]
# Iterating again (or slicing) will now return values rather than modify the query
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# because ids was executed, entries contains definite values
# When filter() gets actual values, it adds a simple condition
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# The query executed is something like this:
# UPDATE assessments SET rating=? WHERE id IN (5,1)
# "(5,1)" will change on each iteration, but it will always be a set of
# scalar values rather than a subquery.
start = end_mark
rating += 1
If you ever need to eagerly evaluate a QuerySet to get all its values at a moment in time, rather than perform a do-nothing iteration just convert it to a list:
ids = list(assessment_entries_qs.values_list('id', flat=True))
Also the Django docs go into detail about when exactly a QuerySet is evaluated.

Python random.sample not working properly?

I'm a complete nab with python.
But now I need a simple storage containing MyObject-objects for some project.
Each object contains a few StringProperties nothing fancy.
Now I want to get from my list of MyObjects, 10 random objects and store them in some other array.
So I went searching and found random.sample and started implemending it.
def get10RandomMyObjects():
# waarders maken
dict = {}
myObjectsList = []
# Lijst vullen
myObjects = MyObject.all()
randomMyObjects = random.sample(myObjects, 10)
for o in randomMyObjects:
dict_myObject = { }
#some random property setting
myObjectsList.append(dict_myObject)
dict['myObjects'] = myObjectsList
return dict
This is the error I get back:
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/random.py", line 314, in sample
n = len(population)
TypeError: object of type 'Query' has no len()
So obviously something is wrong with the random.sample but my noobness can't decypher what it is.
Anyone care to explain me why I can't obtain those 10 random MyObjects I so desire?
random.sample() works on lists. Obviously, MyObject.all() does not return a list but a Query object. If Query is at least iterable then you can write:
myObjects = list(MyObject.all())
Otherwise, you have to create a list from MyObject.all() manually.
Looks like the Query object is a generator. random.sample likes to know how many items there are in order to create the sample. So the simplest thing to do is put the items to be sampled in a list:
randomMyObjects = random.sample(list(myObjects), 10)
There is nothing wrong with random.sample(). What is happening is that myObjects is not a collection.
Most likely, myObjects is an iterator. You'll have to turn it into a list before using it in random.sample():
randomMyObjects = random.sample(list(myObjects),10)
You may also use:
randomMyObjects = MyObject.all().order_by('?')[:10]
Which is faster because it will let the database do the random ordering and only load the 10 first objects into memory.

Categories