Length of _QueryIterator

Length of _QueryIterator - python

I'm trying to get the length of the result of the following query:
matchingTitles = db.GqlQuery("SELECT * FROM Post WHERE title=:1",title).run()
I tried doing this:
if(len(matchingTitles)>0):
But I get the following error:
TypeError: object of type '_QueryIterator' has no len()
I've been searching all over for the _QueryIteratorobject docs, but can't seem to find any. I instead just iterated over it and incremented a number "for each" item in the set. Wondering if there was a better way...
Thanks!
EDIT
There's a better way to do this. Instead of running and then counting, you can simply do:
matchingTitles = db.GqlQuery("SELECT * FROM Post WHERE title=:1",title).count()
and it returns the number of entities.

This can take a lot of memory, but you could use itertools.tee:
https://docs.python.org/2/library/itertools.html#itertools.tee

For anyone that comes across this question actually looking for the length of a _QueryIterator, you can try:
len(list(matchingTitles)) # This will load all the results into memory before counting.
# OR
sum([1 for _ in matchingTitles])
As mentioned though - it's usually better / faster / cheaper to use the database's count functionality than loading all the records and iterating over them. There may be a reason you can't use that - in which case those two options are available.

Related

How to delete first N items from queryset in django

I'm looking to delete only the first N results returned from a query in django. Following the django examples here which I found while reading this SO answer, I was able to limit the resulting set using the following code
m = Model.objects.all()[:N]
but attempting to delete it generates the following error
m.delete()
AssertionError: Cannot use 'limit' or 'offset' with delete.
Is there a way to accomplish this in django?

You can not delete through a limit. Most databases do not support this.
You can however accomplish this in two steps, like:
Model.objects.filter(id__in=list(Models.objects.values_list('pk', flat=True)[:N])).delete()
We thus first retrieve the primary keys of the first N elements, and then use this in a .filter(..) part to delete those items in bulk.

You don't have the option directly. So you should delete it by some advanced ways. For example:
not_ideal = Model.objects.all()[N:].values_list("id", flat=True)
Model.objects.exclude(pk__in=list(not_ideal)).delete()
Using this way you are finding your not ideal objects and delete everything except them.
You can use anything beside id. But id is unique and will help you to optimize.
Notice that in the first line I'm getting the items which are from N to the last.(Not from the first to N)

Try this.
Loop through all filtered objects
delatable_objects = Model.objects.all()[:N]
for m in delatable_objects:
m.delete()

You can loop through the queryset and apply delete method to the objects.
for obj in m:
obj.delete()

ravendb python api, query always return 128

I'm querying my ravendb instance. My target collection contain more than 30k documents. I'm using pyravendb with python 3.
I'm querying my index using the following code :
result_ = self.store.database_commands.query(index_name="Raven/DocumentsByEntityName",
index_query=IndexQuery("Tag:MyCollection",total_size=128,skipped_results=start))
if len(result_['Results']) < 128:
return
start being the offset variable that increments by 128 each time I query.
When I run this code the result's length is always 128 which leads to an infinite loop.
Any ideas why it acts like this ?

The problem was the parameter I was using. The proper parameter that should be used is start = offset_that_you_want_to_skip and not skipped_results=offset.
the correct code is the following :
result_ = self.store.database_commands.query(index_name="Raven/DocumentsByEntityName",
index_query=IndexQuery("Tag:MyCollection",total_size=128,skipped_results=0, default_operator=None,start=offset))
#blablabla
offset+=128
if len(result_['Results']) < 128:
return

take a look here in my commit
Get all of a collection's documents id's RavenDB for a "per-document" modification
In pyravendb v3.5.3.5 I updated the IndexQuery and now you able to skip or to take less or more documents then 128.
The other thing don't use total_size or skipped_results (They are going to be removed)

I know this is not exactly going to answer your question but did you consider using RavenDB's streaming functionality? https://ravendb.net/docs/article-page/3.5/csharp/client-api/session/querying/how-to-stream-query-results
In many cases when dealing with a large number of documents this might be faster and simpler compared to iterating with Query().
However please be aware that streamed objects will not be tracked. Meaning changes to these objects and a consequent SaveChanges()-call wont have any effect to the documents stored within RavenDB.

Efficient and not memory consuming way to find all possible pairs in list

I have a dictionary called lemma_all_context_dict, and it has approximately 8000 keys. I need a list of all possible pairs of these keys.
I used:
pairs_of_words_list = list(itertools.combinations(lemma_all_context_dict.keys(), 2))
However, when using this line I get a MemoryError. I have 8GB of RAM but perhaps I get this error anyway because I've got a few very large dictionaries in this code.
So I tried a different way:
pairs_of_words_list = []
for p_one in range(len(lemma_all_context_dict.keys())):
for p_two in range(p_one+1,len(lemma_all_context_dict.keys())):
pairs_of_words_list.append([lemma_all_context_dict.keys()[p_one],lemma_all_context_dict.keys()[p_two]])
But this piece of codes takes around 20 minutes to run... does anyone know of a more efficient way to solve the problem? Thanks
**I don't think that this question is a duplicate because what I'm asking - and I don't think this has been asked - is how to implement this stuff without my computer crashing :-P

Don't build a list, since that's the reason you get a memory error (you even create two lists, since that's what .keys() does). You can iterate over the iterator (that's their purpose):
for a, b in itertools.combinations(lemma_all_context_dict, 2):
print a, b

How to grab one random item from a database in Django/postgreSQL?

So i got the database.objects.all() and database.objects.get('name') but how would i got about getting one random item from the database. I'm having trouble trying to figure out how to get it ot select one random item.

Selecting a random element from a list of all database objects isn't a goog solution as retrieving all elements of the database can have a big impact on performance, neither is using order_by('?') as mentioned in the django documentation.
The best solution should be to retrieve an element with a random index:
import random
random_idx = random.randint(0, Model.objects.count() - 1)
random_obj = Model.objects.all()[random_idx]

Aamir's solution will select all objects before discarding all but one. This is extremely wasteful and, besides, this sort of calculation should be done in the database.
model.objects.all().order_by('?')[0]
Read more here: https://docs.djangoproject.com/en/dev/ref/models/querysets/#order-by
Edit: lazerscience's answer is indeed faster, as shown here.

I would do it slightly different. Querysets are lazy anyway in django.
import random
def get_my_random_object():
object = random.choice(model.objects.all())
return object
https://docs.djangoproject.com/en/dev/topics/db/queries/#querysets-are-lazy
https://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated

Python deep getsizeof list with contents?

I was surprised that sys.getsizeof( 10000*[x] )
is 40036 regardless of x: 0, "a", 1000*"a", {}.
Is there a deep_getsizeof
which properly considers elements that share memory ?
(The question came from looking at in-memory database tables like
range(1000000) -> province names: list or dict ?)
(Python is 2.6.4 on a mac ppc.)
Added:
10000*["Mississippi"] is 10000 pointers to one "Mississippi",
as several people have pointed out. Try this:
nstates = [AlabamatoWyoming() for j in xrange(N)]
where AlabamatoWyoming() -> a string "Alabama" .. "Wyoming".
What's deep_getsizeof(nstates) ?
(How can we tell ?
a proper deep_getsizeof: difficult, ~ gc tracer
estimate from total vm
inside knowledge of the python implementation
guess.
Added 25jan:
see also when-does-python-allocate-new-memory-for-identical-strings

10000 * [x] will produce a list of 10000 times the same object, so the sizeof is actually closer to correct than you think. However, a deep sizeof is very problematic because it's impossible to tell Python when you want to stop the measurement. Every object references a typeobject. Should the typeobject be counted? What if the reference to the typeobject is the last one, so if you deleted the object the typeobject would go away as well? What about if you have multiple (different) objects in the list refer to the same string object? Should it be counted once, or multiple times?
In short, getting the size of a data structure is very complicated, and sys.getsizeof() should never have been added :S

Have a look at guppy/heapy; I haven't played around with it too much myself, but a few of my co-workers have used it for memory profiling with good results.
The documentation could be better, but this howto does a decent job of explaining the basic concepts.

If you list is only holding objects with the same length you could get a more accurate estimate number by doing this
def getSize(array):
return sys.getsizeof(array) + len(array) * sys.getsizeof(array[0])
Obviously it's not going to work as good for strings with variable length.
If you only want to calculate the size for debugging or during development and you don't care about the performance, you could iterate over all items recursively and calculation the total size. Note that this solution is not going to handle multiple references to same object correctly.

I wrote a tool called RememberMe exactly for this. Basic usage:
from rememberme import memory
a = [1, 2, 3]
b = [a, a, a]
print(memory(a)) # 172 bytes
print(memory(b)) # 260 bytes. Duplication counted only once.
Hope it helps.

mylist = 10000 * [x] means create a list of size 10000 with 10000 references to object x.
Object x is not copied - only a single one exists in memory!!!
So to use getsizeof, it would be: sys.getsizeof(mylist) + sys.getsizeof(x)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Length of _QueryIterator - python

This can take a lot of memory, but you could use itertools.tee: https://docs.python.org/2/library/itertools.html#itertools.tee

Related

How to delete first N items from queryset in django

ravendb python api, query always return 128

Efficient and not memory consuming way to find all possible pairs in list

How to grab one random item from a database in Django/postgreSQL?

Python deep getsizeof list with contents?

Categories

Resources