why can the agg..method use dictionaries with reversed syntax? - python

This is my first Python-related question so bear with me....
I'm doing the "data scientist with Python"-course at Datacamp. One of the current rows of code I'm supposed to complete looks like this:
print(____.groupby(____).agg({'income':'median'}))
And I guess this bothers me. This is "not" how that method is supposed to work according to the documenation. Although it states dictionaries with arguments on the form "function:variable" can be passed as arguments (though annoyingly enough we have to infer this from the examples) it also states that the function must be the first argument and the variable the second. Why can the order be reversed in the above example?
Is the sequence of functions/columns in dictionaries passed as arguments totally arbitrary?

the correct syntax is:
df.groupby('column_name').agg({"column_name": agg_func})

Related

How to resolve inconsistent annotation in Django?

I have the following 2 lines of
CategoryContext = Somemodel.objects.values('title__categories__category').distinct()
CategoryContextSum = CategoryContext.annotate(Total_Revenue=Sum('revenue')).order_by('-Total_Revenue')
CategoryContextAvg = CategoryContext.annotate(Average_Revenue=Avg('revenue')).order_by('-Average_Revenue')
The avg query yields a querylist of objects where the category comes first, followed by the revenue. So basically:
<QuerySet [{'title__categories__category':'Category', 'Average_Revenue':Decimal('100'),}, {'title__categories__category':'Category2':'Average_Revenue':Decimal('120'), }]>
The sum query on the other hand yields the revenue followed by the category, so basically:
<QuerySet [{'Total_Revenue':Decimal('100'), 'title__categories__category':'Category'}, {'Total_Revenue':Decimal('120'), 'title__categories__category':'Category2'}]>
Now I have tried flipping the queries around and changing the variable names so far, but I cannot seem to figure out why in the heck these 2 statements are behaving so differently. Does anybody know what could influence annotation behavior in Django?
Edit:
In case you are wondering why I need to understand this: I am passing the queryset to a method that turns it into data for generating a barchart and the first object in the dataset must be the identifier of the value. I could make it so that it inverts the whole process by checking whether this indeed is the case and ivnerting otherwise, but it seems to me that this shouldnt be necessary
This has little or nothing to do with annotate. Dictionaries in Python have no conventional sense of ordering (at least not until Python 3.6), and keys can be ordered differently across different queryset results.
And this constitutes little or not problem since you'll be access required values by key and not serially (as with sequences):
for obj_dct in your_qs:
print(obj_dct[some_key])
If your plot function takes dicts, no need to worry about ordering.

Python style for line length and format when unpacking many return values

Suppose that function some_descriptively_named_function returns a 4-tuple of 4 return parameters. I want to call some_descriptively_named_function, adhere to the 80-character line length limit, and unpack all 4 outputs each into a descriptively-named variable:
some_desc_name1, some_desc_name2, some_desc_name3, some_desc_name4 = some_descriptively_named_function()
One option is:
some_desc_name1, some_desc_name2, some_desc_name3, some_desc_name4 = (
some_descriptively_named_function()
)
With four unpacked values, though, even this can be pushing it for line length. And if I wanted to make a brief comment on each argument, it's not easy to lay it out nicely.
The following works but it's unclear if this is considered good or very bad.
(some_desc_name1, # Comment 1
some_desc_name2, # Comment 3
some_desc_name3, # Comment 3
some_desc_name4 # Comment 4
) = some_descriptively_named_function()
It's certainly good for line length, but it's weird trying to think of how PEP8 might apply to the parentheses happening right at the beginning of a line.
Is there an established (hopefully PEP8 related) Python style guideline for this?
Your format LGTM, but a couple suggestions:
(some_desc_name1,
some_desc_name2,
some_desc_name3,
some_desc_name4) = some_descriptively_named_function()
Make the calling function more clear by pulling it out
Don't comment the unpacked variables. The docstring for some_descriptively_named_function() should define these clearly.
It would make sense that if a function returns a tuple, the values are all related and do not need individual commenting. That description would probably make more sense sitting within the function definition. Your first option would then be the best solution that quickly sets the function result to 4 different variables. Another option would be to simply use the entire tuple as a variable:
some_desc_name1 = some_descriptively_named_function()
print some_desc_name1[0] # Comment 1

Python function argument list from a dictionary

I'm still relatively new to Python, and sometimes something that should be relatively simple escapes me.
I'm storing the results of a POST operation to a database table as a character string formatted as a dictionary definition. I'm then taking that value and using eval() to convert it to an actual dict object, which is working great as it preserves the data types (dates, datetimes, integers, floats, strings etc.) of the dictionary data elements.
What has me flummoxed is using the resulting dictionary to construct a set of keyword arguments that can then be passed to a function or method. So far, I haven't been able to make this work, let alone figure out what the best/most Pythonic way to approach this. The dictionary makes it easy to iterate over the dictionary elements and identify key/value pairs but I'm stuck at that point not knowing how to use these pairs as a set of keyword arguments in the function or method call.
Thanks!
I think you're just looking for func(**the_dict)?
Understanding kwargs in Python
You are looking for **kwargs. It unpacks a dictionary into keyword arguments, just like you want. In the function call, just use this:
some_func(**my_dict)
Where my_dict is the dictionary you mentioned.
#tzaman and #Alex_Thornton - thanks - your answers led me to the solution, but your answers weren't clear re the use of the **kwargs in the function call, not the function definition. It took me a while to figure that out. I had only seen **kwargs used in the function/method definition before, so this usage was new to me. The link that #tzaman included triggered the "aha" moment.
Here is the code that implements the solution:
def do_it(model=None, mfg_date=None, mileage=0):
# Proceed with whatever you need to do with the
# arguments
print('Model: {} Mfg date: {} Mileage: {}'.format(model, mfg_date, mileage)
dict_string = ("{'model':'Mustang,"
"'mfg_date':datetime.datetime.date(2012, 11, 24),"
"'mileage':23824}")
dict_arg = eval(dict_string)
do_it(**dict_arg) # <---Here is where the **kwargs goes - IN THE CALL

Python conciseness confuses me

I have been looking at Pandas: run length of NaN holes, and this code fragment from the comments in particular:
Series([len(list(g)) for k, g in groupby(a.isnull()) if k])
As a python newbie, I am very impressed by the conciseness but not sure how to read this. Is it short for something along the lines of
myList = []
for k, g in groupby(a.isnull()) :
if k:
myList.append(len(list(g)))
Series(myList)
In order to understand what is going on I was trying to play around with it but get an error:
list object is not callable
so not much luck there.
It would be lovely if someone could shed some light on this.
Thanks,
Anne
You've got the translation correct. However, the code you give cannot be run because a is a free variable.
My guess is that you are getting the error because you have assigned a list object to the name list. Don't do that, because list is a global name for the type of a list.
Also, in future please always provide a full stack trace, not just one part of it. Please also provide sufficient code that at least there are no free variables.
If that is all of your code, then you have only a few possibilities:
myList.append is really a list
len is really a list
list is really a list
isnull is really a list
groupby is really a list
Series is really a list
The error exists somewhere behind groupby.
I'm going to go ahead and strike out myList.append (because that is impossible unless you are using your own groupby function for some reason) and Series. Unless you are importing Series from somewhere strange, or you are re-assigning the variable, we know Series can't be a list. A similar argument can be made for a.isnull.
So that leaves us with two real possibilities. Either you have re-assigned something somewhere in your script to be a list where it shouldn't be, or the error is behind groupby.
I think you're using the wrong groupby itertools.groupby takes and array or list as an argument, groupby in pandas may evaluate the first argument as a function. I especially think this because isnull() returns an array-like object.

Python Extension Returned Object Etiquette

I am writing a python extension to provide access to Solaris kstat data ( in the same spirit as the shipping perl library Sun::Solaris::Kstat ) and I have a question about conditionally returning a list or a single object. The python use case would look something like:
cpu_stats = cKstats.lookup(module='cpu_stat')
cpu_stat0 = cKstats.lookup('cpu_stat',0,'cpu_stat0')
As it's currently implemented, lookup() returns a list of all kstat objects which match. The first case would result in a list of objects ( as many as there are CPUs ) and the second call specifies a single kstat completely and would return a list containing one kstat.
My question is it poor form to return a single object when there is only one match, and a list when there are many?
Thank you for the thoughtful answer! My python-fu is weak but growing stronger due to folks like you.
"My question is it poor form to return a single object when there is only one match, and a list when there are many?"
It's poor form to return inconsistent types.
Return a consistent type: List of kstat.
Most Pythonistas don't like using type(result) to determine if it's a kstat or a list of kstats.
We'd rather check the length of the list in a simple, consistent way.
Also, if the length depends on a piece of system information, perhaps an API method could provide this metadata.
Look at DB-API PEP for advice and ideas on how to handle query-like things.

Categories