I prefer to use long identifiers to keep my code semantically clear, but in the case of repeated references to the same identifier, I'd like for it to "get out of the way" in the current scope. Take this example in Python:
def define_many_mappings_1(self):
self.define_bidirectional_parameter_mapping("status", "current_status")
self.define_bidirectional_parameter_mapping("id", "unique_id")
self.define_bidirectional_parameter_mapping("location", "coordinates")
#etc...
Let's assume that I really want to stick with this long method name, and that these arguments are always going to be hard-coded.
Implementation 1 feels wrong because most of each line is taken up with a repetition of characters. The lines are also rather long in general, and will exceed 80 characters easily when nested inside of a class definition and/or a try/except block, resulting in ugly line wrapping. Let's try using a for loop:
def define_many_mappings_2(self):
mappings = [("status", "current_status"),
("id", "unique_id"),
("location", "coordinates")]
for mapping in mappings:
self.define_parameter_mapping(*mapping)
I'm going to lump together all similar iterative techniques under the umbrella of Implementation 2, which has the improvement of separating the "unique" arguments from the "repeated" method name. However, I dislike that this has the effect of placing the arguments before the method they're being passed into, which is confusing. I would prefer to retain the "verb followed by direct object" syntax.
I've found myself using the following as a compromise:
def define_many_mappings_3(self):
d = self.define_bidirectional_parameter_mapping
d("status", "current_status")
d("id", "unique_id")
d("location", "coordinates")
In Implementation 3, the long method is aliased by an extremely short "abbreviation" variable. I like this approach because it is immediately recognizable as a set of repeated method calls on first glance while having less redundant characters and much shorter lines. The drawback is the usage of an extremely short and semantically unclear identifier "d".
What is the most readable solution? Is the usage of an "abbreviation variable" acceptable if it is explicitly assigned from an unabbreviated version in the local scope?
itertools to the rescue again! Try using starmap - here's a simple demo:
list(itertools.starmap(min,[(1,2),(2,2),(3,2)]))
prints
[1,2,2]
starmap is a generator, so to actually invoke the methods, you have to consume the generator with a list.
import itertools
def define_many_mappings_4(self):
list(itertools.starmap(
self.define_parameter_mapping,
[
("status", "current_status"),
("id", "unique_id"),
("location", "coordinates"),
] ))
Normally I'm not a fan of using a dummy list construction to invoke a sequence of functions, but this arrangement seems to address most of your concerns.
If define_parameter_mapping returns None, then you can replace list with any, and then all of the function calls will get made, and you won't have to construct that dummy list.
I would go with Implementation 2, but it is a close call.
I think #2 and #3 are equally readable. Imagine if you had 100s of mappings... Either way, I cannot tell what the code at the bottom is doing without scrolling to the top. In #2 you are giving a name to the data; in #3, you are giving a name to the function. It's basically a wash.
Changing the data is also a wash, since either way you just add one line in the same pattern as what is already there.
The difference comes if you want to change what you are doing to the data. For example, say you decide to add a debug message for each mapping you define. With #2, you add a statement to the loop, and it is still easy to read. With #3, you have to create a lambda or something. Nothing wrong with lambdas -- I love Lisp as much as anybody -- but I think I would still find #2 easier to read and modify.
But it is a close call, and your taste might be different.
I think #3 is not bad although I might pick a slightly longer identifier than d, but often this type of thing becomes data driven, so then you would find yourself using a variation of #2 where you are looping over the result of a database query or something from a config file
There's no right answer, so you'll get opinions on all sides here, but I would by far prefer to see #2 in any code I was responsible for maintaining.
#1 is verbose, repetitive, and difficult to change (e.g. say you need to call two methods on each pair or add logging -- then you must change every line). But this is often how code evolves, and it is a fairly familiar and harmless pattern.
#3 suffers the same problem as #1, but is slightly more concise at the cost of requiring what is basically a macro and thus new and slightly unfamiliar terms.
#2 is simple and clear. It lays out your mappings in data form, and then iterates them using basic language constructs. To add new mappings, you only need add a line to the array. You might end up loading your mappings from an external file or URL down the line, and that would be an easy change. To change what is done with them, you only need change the body of your for loop (which itself could be made into a separate function if the need arose).
Your complaint of #2 of "object before verb" doesn't bother me at all. In scanning that function, I would basically first assume the verb does what it's supposed to do and focus on the object, which is now clear and immediately visible and maintainable. Only if there were problems would I look at the verb, and it would be immediately evident what it is doing.
Related
USAGE CONTEXT ADDED AT END
I often want to operate on an abstract object like a list. e.g.
def list_ish(thing):
for i in xrange(0,len(thing)):
print thing[i]
Now this appropriate if thing is a list, but will fail if thing is a dict for example. what is the pythonic why to ask "do you behave like a list?"
NOTE:
hasattr('__getitem__') and not hasattr('keys')
this will work for all cases I can think of, but I don't like defining a duck type negatively, as I expect there could be cases that it does not catch.
really what I want is to ask.
"hey do you operate on integer indicies in the way I expect a list to do?" e.g.
thing[i], thing[4:7] = [...], etc.
NOTE: I do not want to simply execute my operations inside of a large try/except, since they are destructive. it is not cool to try and fail here....
USAGE CONTEXT
-- A "point-lists" is a list-like-thing that contains dict-like-things as its elements.
-- A "matrix" is a list-like-thing that contains list-like-things
-- I have a library of functions that operate on point-lists and also in an analogous way on matrix like things.
-- for example, From the users point of view destructive operations like the "spreadsheet-like" operations "column-slice" can operate on both matrix objects and also on point-list objects in an analogous way -- the resulting thing is like the original one, but only has the specified columns.
-- since this particular operation is destructive it would not be cool to proceed as if an object were a matrix, only to find out part way thru the operation, it was really a point-list or none-of-the-above.
-- I want my 'is_matrix' and 'is_point_list' tests to be performant, since they sometimes occur inside inner loops. So I would be satisfied with a test which only investigated element zero for example.
-- I would prefer tests that do not involve construction of temporary objects, just to determine an object's type, but maybe that is not the python way.
in general I find the whole duck typing thing to be kinda messy, and fraught with bugs and slowness, but maybe I dont yet think like a true Pythonista
happy to drink more kool-aid...
One thing you can do, that should work quickly on a normal list and fail on a normal dict, is taking a zero-length slice from the front:
try:
thing[:0]
except TypeError:
# probably not list-like
else:
# probably list-like
The slice fails on dicts because slices are not hashable.
However, str and unicode also pass this test, and you mention that you are doing destructive edits. That means you probably also want to check for __delitem__ and __setitem__:
def supports_slices_and_editing(thing):
if hasattr(thing, '__setitem__') and hasattr(thing, '__delitem__'):
try:
thing[:0]
return True
except TypeError:
pass
return False
I suggest you organize the requirements you have for your input, and the range of possible inputs you want your function to handle, more explicitly than you have so far in your question. If you really just wanted to handle lists and dicts, you'd be using isinstance, right? Maybe what your method does could only ever delete items, or only ever replace items, so you don't need to check for the other capability. Document these requirements for future reference.
When dealing with built-in types, you can use the Abstract Base Classes. In your case, you may want to test against collections.Sequence or collections.MutableSequence:
if isinstance(your_thing, collections.Sequence):
# access your_thing as a list
This is supported in all Python versions after (and including) 2.6.
If you are using your own classes to build your_thing, I'd recommend that you inherit from these abstract base classes as well (directly or indirectly). This way, you can ensure that the sequence interface is implemented correctly, and avoid all the typing mess.
And for third-party libraries, there's no simple way to check for a sequence interface, if the third-party classes didn't inherit from the built-in types or abstract classes. In this case you'll have to check for every interface that you're going to use, and only those you use. For example, your list_ish function used __len__ and __getitem__, so only check whether these two methods exist. A wrong behavior of __getitem__ (e.g. a dict) should raise an exception.
Perhaps their is no ideal pythonic answer here, so I am proposing a 'hack' solution, but don't know enough about the class structure of python to know if I am getting this right:
def is_list_like(thing):
return hasattr(thing, '__setslice__')
def is_dict_like(thing):
return hasattr(thing, 'keys')
My reduce goals here are to simply have performant tests that will:
(1) never call a dict-thing, nor a string-like-thing a list List item
(2) returns the right answer for python types
(3) will return the right answer if someone implement a "full" set of core method for a list/dict
(4) is fast (ideally does not allocate objects during the test)
EDIT: Incorporated ideas from #DanGetz
In my code, I am printing a menu
print(num_chars * char)
for option in options:
print("{:d}. {:s}".format(option, options[option]))
print(num_chars * char)
The code print(num_chars * char) prints a separator/delimiter in order to "beautify" the output. I have learned from several coding tutorials that I am not allowed to write the same code more than once.
Is it really preferable to define a function
def get_char_repeated(char='*', num_chars=30):
"""
Return the character repeated an arbitrary number of times
"""
return num_chars * char
and call this two times in my original code?
Are there any alternatives if I need to print nice looking menu from a dictionary?
Thank you.
I have learned from several coding tutorials that I am not allowed to write the same code more than once.
This principle, called "don't repeat yourself" (DRY) is a good rough guideline. For every programmer who writes too many functions (splitting code into too small units), there are 20 who write too few.
Don't go overboard with it, though. The reasoning behind DRY is to make reading and changing the code later on easier. print(num_chars * char) is pretty basic already, and super-easy to understand and change, so it doesn't really pay off to factor it into a function.
If the repeated code grows to 3 lines, you can (and probably should) factor it out then.
It's not necessary at that level. What might be helpful is if you were often doing that whole block of code, you could easily change it to
def printOptions(options, char='*', num_chars=30):
print(num_chars * char)
for option in options:
print("{:d}. {:s}".format(option, options[option]))
print(num_chars * char)
The main point of functions is to save time with blocks of code you use a lot in a very similar way by not retyping/copy pasting them. But they also save time when you make changes. If you used the same function in 10 different places you still only need to change it once for all 10 uses to be updated rather than having to find all 10 manually and update them.
So if you decided you wanted to put a title header into this menu printing section and you had used it as this or a similar function in a bunch of places, you could quite easily update them all without difficulty.
I find it a good rule, that if the code you're turning into a function is used more than once and takes up more than 3 lines, it is a candidate to be turned into a function. If it is a very complex single line of code (like this x = [ i**j for i in z for j in y ]) and is used more than twice, it could be a candidate to turn into a function.
It may be a matter of preference where you draw the line but the basic idea is if it makes your code easier to read or easier to write, turning something into a function can be a good idea. If it makes your code harder to read (because every time you see the function you have to look back at the specifics of what it does), you probably should not have turned that code into a function.
I would like to have informative representations for my composite objects (i.e., objects composed of other (potentially composite) objects). However, because my code fundamentally deals with high-precision numbers (please don't ask me why I don't just use doubles), I end up with representations like you see here: http://pastebin.com/jpLgAfxC. Would it just be better to just stick with the default __repr__?
Whether to have a verbose repr depends on what you want to accomplish. For complex or composite objects, I know which I'd prefer of the following:
Point(x=1.12, y=2.2, z=-1.9)
<__main__.Point object at 0x103011890>
They both tell me what type the object is, but only the first is clear about all of the (relevant) values involved, and avoids low-level information that is only relevant on the rarest of occasions.
I like to see the real values. But, yours is a special case, given that your values are so frightfully humongous:
72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73237863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851
That they cannot be useful for most development or debugging purposes. I'm sure there are times you need the full serialization--to send to and from files, for example. But those have to be fairly rare, no? I can't imagine you really remember all 309 digits, or can determine if the above number is the same as the one below on visual inspection:
72401317106217603290426741268390656010621951704689382948334809645
87850348552960901165648762842931879347325584704068956434195098288
38279057775096090002410493665682226331178331461681861612403032369
73327863637784679012984303024949059416189689048527978878840119376
5152408961823197987224502419157858495179687559851
They're not the same. But unless you're Spock or The Terminator, you wouldn't know that from a quick glance. (And actually, I've made it easier here, length-wrapping to avoid having to horizontally scroll.)
So I would recommend (massively) shortening their representation, to make the output more tractable. This is like printing out the entire chapter text every time you want to print a Chapter object. Overkill.
Instead, try something much shorter and easier to work with. Truncation and/or ellipsis are useful. e.g.
72401...59851
7240131710...
You can use the object id as well. If your high-precision type is HP, then:
HP(0x103011890)
At least then you will be able to tell them apart. One ugliness of using object ids, however, is that objects can be logically equivalent, but if you create multiple objects with the same logical value, they'd have different ids, thus appear different when they are not. You can get around that by creating your own short hash function. There's a bit of an art to hashing, but for reprs, even something simple would work. E.g.:
import binascii, struct
def shorthash(s):
"""
Given a Python value, produce a short alphanumeric hash that
helps identify it for debugging purposes. A riff on
http://stackoverflow.com/a/2511059/240490
Enhanced to remove trailing boilerplate, and to work
on either Python 2 or Python 3.
"""
hashbytes = binascii.b2a_base64(struct.pack('l', hash(s)))
return hashbytes.decode('utf-8').rstrip().rstrip("=")
Then define your repr in the high-precision class:
def __repr__(self):
clsname = self.__class__.__name__
return '{0}({1}).format(clsname, shorthash(self.value))
Where self.value is whatever local attribute, property, or method creates the multi-hundred-digit value. If you're subclassing int, this could be just self.
This gets you to:
HP(Tea+5MY0WwA)
The two massive, almost identical numbers above? Using this scheme, they render out to:
HP(XhkG0358Fx4)
HP(27CdIG5elhQ)
Which are obviously different. You can combine this with a bit of a value representation. E.g. a few alternatives:
HP(~7.24013e308 # XhkG0358Fx4)
HP(dig='72401...59851', ndigits=309, hash='XhkG0358Fx4')
You'll find these shorter values more useful in debugging contexts. You can, of course, keep around a method or property (e.g. .value, .digits, or .alldigits) for those case in which you need every last bit, but define the common case as something more easily consumed.
Thank you to Demian for the pointer to https://docs.python.org/2/reference/datamodel.html#object.repr, specifically:
This is typically used for debugging, so it is important that the
representation is information-rich and unambiguous.
http://pastebin.com/jpLgAfxC is probably the best possible __repr__ in this case.
This is a question about a clean, pythonic way to juggle some different instance methods.
I have a class that operates a little differently depending on certain inputs. The differences didn't seem big enough to justify producing entirely new classes. I have to interface the class with one of several data "providers". I thought I was being smart when I introduced a dictionary:
self.interface_tools={'TYPE_A':{ ... various ..., 'data_supplier':self.current_data},
'TYPE_B':{ ... various ..., 'data_supplier':self.predicted_data} }
Then, as part of the class initialization, I have an input "source_name" and I do ...
# ... various ....
self.data_supplier = self.interface_tools[source_name]['data_supplier']
self.current_data and self.predicted_data need the same input parameter, so when it comes time to call the method, I don't have to distinguish them. I can just call
new_data = self.data_supplier(param1)
But now I need to interface with a new data source -- call it "TYPE_C" -- and it needs more input parameters. There are ways to do this, but nothing I can think of is very clean. For instance, I could just add the new parameters to the old data_suppliers and never use them, so then the call would look like
new_data = self.data_supplier(param1,param2,param3)
But I don't like that. I could add an if block
if self.data_source != 'TYPE_C':
new_data = self.data_supplie(param1)
else:
new_data = self.data_c_supplier(param1,param2,param3)
but avoiding if blocks like this was exactly what I was trying to do in the first place with that dictionary I came up with.
So the upshot is: I have a few "data_supplier" routines. Now that my project has expanded, they have different input lists. But I want my class to be able to treat them all the same to the extent possible. Any ideas? Thanks.
Sounds like your functions could be making use of variable length argument lists.
That said, you could also just make subclasses. They're fairly cheap to make, and would solve your problem here. This is pretty much the case they were designed for.
You could make all your data_suppliers accept a single argument and make it a dictionary or a list or even a NamedTuple.
Is it better not to name list variables "list"? Since it's conflicted with the python reserved keyword. Then, what's the better naming? "input_list" sounds kinda awkward.
I know it can be problem-specific, but, say I have a quick sort function, then quick_sort(unsorted_list) is still kinda lengthy, since list passed to sorting function is clearly unsorted by context.
Any idea?
I like to name it with the plural of whatever's in it. So, for example, if I have a list of names, I call it names, and then I can write:
for name in names:
which I think looks pretty nice. But generally for your own sanity you should name your variables so that you can know what they are just from the name. This convention has the added benefit of being type-agnostic, just like Python itself, because names can be any iterable object such as a tuple, a dict, or your very own custom (iterable) object. You can use for name in names on any of those, and if you had a tuple called names_list that would just be weird.
(Added from a comment below:) There are a few situations where you don't have to do this. Using a canonical variable like i to index a short loop is OK because i is usually used that way. If your variable is used on more than one page worth of code, so that you can't see its entire lifetime at once, you should give it a sensible name.
goats
Variable names should refer what they are not just what type they are.
Python stands for readability. So basically you should name variables that promote readability. See PEP20.You should only have a general rule of consistency and should break this consistency in the following situations:
When applying the rule would make the code less readable, even for
someone who is used to reading code that follows the rules.
To be consistent with surrounding code that also breaks it (maybe for
historic reasons) -- although this is also an opportunity to clean up
someone else's mess (in true XP style)
Also, use the function naming rules: lowercase with words separated by underscores as necessary to improve readability.
All this is taken from PEP 8
Just use lst, or seq (for sequence)
I use a naming convention based on descriptive name and type. (I think I learned this from a Jeff Atwood blog post but I can't find it.)
goats_list
for goat in goats_list :
goat.bleat()
cow_hash = {}
etc.
Anything more complicated (list_list_hash_list) I make a class.
What about L?
Why not just use unsorted? I prefer to have names, which communicate ideas, not data types. There are special cases, where the type of a variable is important. But in most cases, it's obvious from the context - like in your case. Quick sort is obviously working on a list.