I have the following 2 lines of
CategoryContext = Somemodel.objects.values('title__categories__category').distinct()
CategoryContextSum = CategoryContext.annotate(Total_Revenue=Sum('revenue')).order_by('-Total_Revenue')
CategoryContextAvg = CategoryContext.annotate(Average_Revenue=Avg('revenue')).order_by('-Average_Revenue')
The avg query yields a querylist of objects where the category comes first, followed by the revenue. So basically:
<QuerySet [{'title__categories__category':'Category', 'Average_Revenue':Decimal('100'),}, {'title__categories__category':'Category2':'Average_Revenue':Decimal('120'), }]>
The sum query on the other hand yields the revenue followed by the category, so basically:
<QuerySet [{'Total_Revenue':Decimal('100'), 'title__categories__category':'Category'}, {'Total_Revenue':Decimal('120'), 'title__categories__category':'Category2'}]>
Now I have tried flipping the queries around and changing the variable names so far, but I cannot seem to figure out why in the heck these 2 statements are behaving so differently. Does anybody know what could influence annotation behavior in Django?
Edit:
In case you are wondering why I need to understand this: I am passing the queryset to a method that turns it into data for generating a barchart and the first object in the dataset must be the identifier of the value. I could make it so that it inverts the whole process by checking whether this indeed is the case and ivnerting otherwise, but it seems to me that this shouldnt be necessary
This has little or nothing to do with annotate. Dictionaries in Python have no conventional sense of ordering (at least not until Python 3.6), and keys can be ordered differently across different queryset results.
And this constitutes little or not problem since you'll be access required values by key and not serially (as with sequences):
for obj_dct in your_qs:
print(obj_dct[some_key])
If your plot function takes dicts, no need to worry about ordering.
Related
Recently for a course at University, our teacher asked us to re-create one of Kaggle's competitions. I chose to do this one.
I was able to follow the tutorial relatively well, until I reached the for loop they wrote to clean the text in the data frame. Here it is:
# Get the number of reviews based on the dataframe column size
num_reviews = train["review"].size
# Initialize an empty list to hold the clean reviews
clean_train_reviews = []
# Loop over each review; create an index i that goes from 0 to the length
# of the movie review list
for i in xrange( 0, num_reviews ):
# Call our function for each one, and add the result to the list of
# clean reviews
clean_train_reviews.append( review_to_words( train["review"][i] ) )
My problem is when they use 'xrange' in the loop, since that was never assigned to anything during the tutorial and is, therefore, returning an error when I try to run the code. I also checked the code provided by the author on github and they did the exact same thing. So, my question is: Is this simply a mistake on their end or am I missing something? If I am not missing anything, what should be in the place of'xrange'?
I've tried assigning the relevant dataframe column to a variable I can then use here, but I then get a TypeError, stating 'Series' object is not callable. My knowledge in Python is a bit elementary still, so I apologize if I am simply missing something obvious. Appreciate any help!
range and xrange are both python functions, the only thing, xrange is used only in python 2.
Both are implemented in different ways and have different characteristics associated with them. The points of comparison are:
Return Type:
Type 'list',
Type 'xrange'
Memory:
Big size,
Small size
Operation Usage:
As range() returns the list, all the operations that can be applied on the list can be used on it. On the other hand, as xrange() returns the xrange object, operations associated to list cannot be applied on them, hence a disadvantage.
Speed:
Because of the fact that xrange() evaluates only the generator object containing only the values that are required by lazy evaluation, therefore is faster in implementation than range().
More info:
source
You are probably having this issue because you are using python3.
Just use range function instead.
I have a dictionary that is being built while iterating through objects. Now same object can be accessed multiple times. And I'm using object itself as a key.
So if same object is accessed more than once, then key becomes not unique and my dictionary is no longer correct.
Though I need to access it by object, because later on if someone wants access contents by it, they can request to get it by current object. And it will be correct, because it will access the last active object at that time.
So I'm wondering if it is possible to wrap object somehow, so it would keep its state and all attributes the same, but the only difference would be this new kind of object which is actually unique.
For example:
dct = {}
for obj in some_objects_lst:
# Well this kind of wraps it, but it loses state, so if I would
# instantiate I would lose all information that was in that obj.
wrapped = type('Wrapped', (type(obj),), {})
dct[wrapped] = # add some content
Now if there are some better alternatives than this, I would like to hear it too.
P.S. objects being iterated would be in different context, so even if object is the same, it would be treated differently.
Update
As requested, to give better example where the problem comes from:
I have this excel reports generator module. Using it, you can generate various excel reports. For that you need to write configuration using python dictionary.
Now before report is generated, it must do two things. Get metadata (metadata here is position of each cell that will be when report is about to be created) and second, parse configuration to fill cells with content.
One of the value types that can be used in this module, is formula (excel formulas). And the problem in my question is specifically with one of the ways formula can be computed: formula values that are retrieved for parent , that are in their childs.
For example imagine this excel file structure:
A | B | C
Total Childs Name Amount
1 sum(childs)
2 child_1 10
3 child_2 20
4 sum(childs)
...
Now in this example sum on cell 1A, would need to be 10+20=30 if sum would use expression to sum their childs column (in this case C column). And all of this is working until same object (I call it iterables) is repeated. Because when building metadata it I need to store it, to retrieve later. And key is object being iterated itself. So now when it will be iterated again when parsing values, it will not see all information, because some will overwritten by same object.
For example imagine there are invoice objects, then there are partner objects which are related with invoices and there are some other arbitrary objects that given invoice and partner produce specific amounts.
So when extracting such information in excel, it goes like this:
inoice1 -> partner1 -> amount_obj1, amount_obj2
invoice2 -> partner1 -> amount_obj3, amount_obj4.
Notice that partner in example is the same. Here is the problem, because I can't store this as key, because when parsing values, I will iterate over this object twice when metadata will actually hold values for amount_obj3 and amount_obj4
P.S Don't know if I explained it better, cause there is lots of code and I don't want to put huge walls of code here.
Update2
I'll try to explain this problem from more abstract angle, because it seems being too specific just confuses everyone even more.
So given objects list and empty dictionary, dictionary is built by iterating over objects. Objects act as a key in dictionary. It contains metadata used later on.
Now same list can be iterated again for different purpose. When its done, it needs to access that dictionary values using iterated object (same objects that are keys in that dictionary). But the problem is, if same object was used more than once, it will have only latest stored value for that key.
It means object is not unique key here. But the problem is the only thing I know is the object (when I need to retrieve the value). But because it is same iteration, specific index of iteration will be the same when accessing same object both times.
So uniqueness I guess then is (index, object).
I'm not sure if I understand your problem, so here's two options. If it's object content that matters, keep object copies as a key. Something crude like
new_obj = copy.deepcopy(obj)
dct[new_obj] = whatever_you_need_to_store(new_obj)
If the object doesn't change between the first time it's checked by your code and the next, the operation is just performed the second time with no effect. Not optimal, but probably not a big problem. If it does change, though, you get separate records for old and new ones. For memory saving you will probably want to replace copies with hashes, __str__() method that writes object data or whatever. But that depends on what your object is; maybe hashing will take too much time for miniscule savings in memory. Run some tests and see what works.
If, on the other hand, it's important to keep the same value for the same object, whether the data within it have changed or not (say, object is a user session that can change its data between login and logoff), use object ids. Not the builtin id() function, because if the object gets GCed or deleted, some other object may get its id. Define an id attribute for your objects and make sure different objects cannot possibly get the same one.
I'm really new to django programming, and I'm facing a problem I don't really know how to solve:
I want to get a list of users who have many string attributes, but only the users whom none of it's attributes is equal to a given one.
I have this piece of code
all_users = list(UserProfile.objects.attribute.filter(type=given).exists())
but this code will return me the users who have that attribute, so here's the question: How I can modify this line (or what lines do I need to add) in order to get the list of users without this attribute
Ps: Maybe I didn't explained myself clearly as I don't really know how to specify my problem in english, but, if you don't know what I'm asking I can try again
Thanks all
You can use exclude:
all_users = list(UserProfile.objects.attribute.exclude(type=given).exists())
To quote the docs:
To create such a subset, you refine the initial QuerySet, adding filter conditions. The two most common ways to refine a QuerySet are:
filter(**kwargs)
Returns a new QuerySet containing objects that match the given lookup parameters.
exclude(**kwargs)
Returns a new QuerySet containing objects that do not match the given lookup parameters.
I am writing a python extension to provide access to Solaris kstat data ( in the same spirit as the shipping perl library Sun::Solaris::Kstat ) and I have a question about conditionally returning a list or a single object. The python use case would look something like:
cpu_stats = cKstats.lookup(module='cpu_stat')
cpu_stat0 = cKstats.lookup('cpu_stat',0,'cpu_stat0')
As it's currently implemented, lookup() returns a list of all kstat objects which match. The first case would result in a list of objects ( as many as there are CPUs ) and the second call specifies a single kstat completely and would return a list containing one kstat.
My question is it poor form to return a single object when there is only one match, and a list when there are many?
Thank you for the thoughtful answer! My python-fu is weak but growing stronger due to folks like you.
"My question is it poor form to return a single object when there is only one match, and a list when there are many?"
It's poor form to return inconsistent types.
Return a consistent type: List of kstat.
Most Pythonistas don't like using type(result) to determine if it's a kstat or a list of kstats.
We'd rather check the length of the list in a simple, consistent way.
Also, if the length depends on a piece of system information, perhaps an API method could provide this metadata.
Look at DB-API PEP for advice and ideas on how to handle query-like things.
I have large chunks of data, normally at around 2000+ entries, but in this report we have the ability to look as far as we want so it could be up to 10,000 records
The report is split up into: Two categories and then within each Category, we split by Currency so we have several sub categories within the list.
My issue comes in efficiently calculating the various subtotals. I am using Django and pass a templatetag the currency and category, if it applies, and then the templatetag renders the total. Note that sometimes I have a subtotal just for the category, with no currency passed.
Initially, I was using a seperate query for each subtotal by just using .filter() if there was a currency/category like so:
if currency:
entries = entries.filter(item_currency=currency)
This became a problem as I would have too many queries, and too long of a generation time (2,000+ ms), so I opted to use list(entries) to execute my query right off the bat, and then loop through it with simple list comprehensions:
totals['quantity'] = sum([e.quantity for e in entries])
My problem if you don't see it yet, lies in .. how can I efficiently add the condition for currency / category on each list comprehension? Sometimes they won't be there, sometimes they will so I can't simply type:
totals['quantity'] = sum([e.quantity for e in entries if item_currency = currency])
I could make a huge if-block, but that's not very clean and is a maintenance disaster, so I'm reaching out to the Stackoverflow community for a bit of insight .. thanks in advance :)
You could define a little inline function:
def EntryMatches(e):
if use_currency and not (e.currency == currency):
return False
if use_category and not (e.category == category):
return False
return True
then
totals['quantity'] = sum([e.quantity for e in entries if EntryMatches(e)])
EntryMatches() will have access to all variables in enclosing scope, so no need to pass in any more arguments. You get the advantage that all of the logic for which entries to use is in one place, you still get to use the list comprehension to make the sum() more readable, but you can have arbitrary logic in EntryMatches() now.