Do “Clean Code”'s function argument number guidelines apply to API design? - python

I am a newbie reading Uncle Bob's Clean Code Book.
It is indeed great practice to limit the number of function arguments as few as possible. But I still come across so many functions offered in many libraries that require a bunch of arguments. For example, in Python's pandas, there is a function with 9 arguments:
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
(And this function also violates the advice about flag arguments)
It seems that such cases are much rarer in Python standard libraries, but I still managed to find one with 4 arguments:
re.split(pattern, string, maxsplit=0, flags=0)
I understand that this is just a suggestion instead of silver bullet, but is it applicable when it comes to something mentioned above?

Uncle Bob does not mention a hard limit of arguments that would make your code smell, but I would consider 9 arguments as too much.
Today's IDEs are much better in supporting the readability of the code, nevertheless refactoring stays tricky, especially with a large number of equally typed arguments.
The suggested solution is to encapsulate the arguments in a single struct/object (depending on your language). In the given case, this could be a GroupingStrategy:
strategy = GroupingStrategy();
strategy.by = "Foo"
strategy.axis = 0
strategy.sorted = true
DataFrame.groupby(strategy)
All not mentioned attributes will be assigned with the respective default values.
You could then also convert it to a fluent API:
DataFrame.groupby(GroupingStrategy.by("Foo").axis(0).sorted())
Or keep some of the arguments, if this feels better:
DataFrame.groupby("Foo", GroupingStrategy.default())

The first point to note is that all those arguments to groupby are relevant. You can reduce the number of arguments by having different versions of groupby but that doesn't help much when the arguments can be applied independently of each other, as is the case here. The same logic would apply to re.split.
It's true that integer "flag" arguments can be dodgy from a maintenance point of view - what happens if you want to change a flag value in your code? You have to hunt through and manually fix each case. The traditional approach is to use enums (which map numbers to words eg a Day enum would have Day.Sun = 0, Day.Mon = 1, etc) In compiled languages like C++ or C# this gives you the speed of using integers under the hood but the readability of using labels/words in your code. However enums in Python are slow.
One rule that I think applies to any source code is to avoid "magic numbers", ie numbers which appear directly in the source code. The enum is one solution. Another solution is to have constant variables to represent different flag settings. Python sort-of supports constants (uppercase variable names in constant.py which you then import) however they are constant only by convention, you can actually change their value :(

Related

Difference between Python methods which is can make new variable or not

Some methods don't need to make a new variable, i.e. lists.reverse() works like this:
lists = [123, 456, 789]
lists.reverse()
print(lists)
this method make itself reversed (without new variable).
Why there is vary ways to manufacture variable in Python?
Some cases which is like variable.method().method2().method3() are typed continuously but type(variable) and print() are not. Why we can't typing like variable.print() or variable.type()?
Is there any philosophical reasons for Python?
You may be confused by the difference between a function and a method, and by three different purposes to them. As much as I dislike using SO for tutorial purposes, these issues can be hard to grasp from other documentation. You can look up function vs method easily enough -- once you know it's a (slightly) separate issue.
Your first question is a matter of system design. Python merely facilitates what programmers want to do, and the differentiation is common to many (most?) programming languages since ASM and FORTRAN crawled out of the binary slime pools in the days when dinosaurs roamed the earth.
When you design how your application works, you need to make a lot of implementation decisions: individual variables vs a sequence, in-line coding vs functions, separate functions vs encased functions vs classes and methods, etc. Part of this decision making is what each function should do. You've raised three main types:
(1) Process this data -- take the given data and change it, rearrange it, whatever needs doing -- but I don't need the previous version, just the improved version, so just put the new stuff where the old stuff was. This is used almost exclusively when one variable is getting processed; we don't generally take four separate variables and change each of them. In that case, we'd put them all in a list and change the list (a single variable). reverse falls into this class.
One important note is that for such a function, the argument in question must be mutable (capable of change). Python has mutable and immutable types. For instance, a list is mutable; a tuple is immutable. If you wanted to reverse a tuple, you'd need to return a new tuple; you can't change the original.
(2) Tell me something interesting -- take the given data and extract some information. However, I'm going to need the originals, so leave them alone. If I need to remember this cool new insight, I'll put it in a variable of my own. This is a function that returns a value. sqrt is one such function.
(3) Interact with the outside world -- input or output data permanently. For output, nothing in the program changes; we may present the data in an easy-to-read format, but we don't change anything internally. print is such a function.
Much of this decision also depends on the function's designed purpose: is this a "verb" function (do something) or a noun/attribute function (look at this data and tell me what you see)?
Now you get the interesting job for yourself: learn the art of system design. You need to become familiar enough with the available programming tools that you have a feeling for how they can be combined to form useful applications.
See the documentation:
The reverse() method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence.

How to revert the functionality of a lambda/function in python

I am trying to degenerate my data from already manipulated data. It takes five parameters: target, source, target_key, source_key, transformer
For example:
target = {}
source = {first_name: tom}
target_key = name
source_key = first_name
transformer = lambda value: value.title()
So, currently, I set first_name to name. and response becomes {name: Tom}
Now, I am trying to reverse it. If I get {name: Tom}, it should result in {first_name: tom} using same lambda or function. similarly, there are many other keys with different transformers
Is there any way/keyword to reverse the functionality of a lambda or any function.
Thanks,
Short answer: is if fundamentally impossible to construct an "inverse" function for a given function.
You cannot derive the "reverse" function from a given function (whether it is a lambda expression) is irrelevant. There are several aspects here:
First of all, it is possible that several inputs map on the same output. Take for instance the function lambda x : x.lower(). In that case both 'foo' and 'FOO' map to 'foo'. So even if you somehow could calculate input that maps on a given output, a question would be: "what input do you pick".
Next say we simply state that any input would suffice, one can ask whether it is possible. It still is not since the problem is also undecidable in the sense that if you provide as "expected output" a value that cannot be generated by the function, the hypothetical inverse function cannot know that. One can prove this by using computability theory since it would conflict with the fact that the emptiness problem ETM is undecidable.
Is there a theoretical way to derive an object that maps to a given valid value? Yes: one could enumerate over all possible inputs (it is an infinite, but countable so enumerable), calculate the output and then validate it. Furthermore the evaluation of functions should happen in "parallel" since it is possible one of the function calls results in an infinite loop.
Nevertheless hoping that it is realistic to construct a real function that calculates the inverse is not advisable. In a practical sense the above sketched algorithm is unfeasible. It would require an enormous amount of memory to store all the simulations of these functions. Furthermore it is possible that these have side effects (like writing to a file). As a result you should make copies of everything that might have side effects. Furthermore in practice some side effects cannot be "virtualized" or "undone". If the function for instance communicates with a web server, you cannot "undo" the HTTP request. It can also take ages before a valid input structure is entered and evaluated.
Like #JohnColeman says in his comment the fact that a function is not (feasibly) inverse is sometimes desired behavior. In asymmetrical encryption for instance the public key is usually publicly available. Nevertheless we do not want the message encrypted by the public key to be (efficiently) computable. A lot of todays cryptography and security depends on the fact that it is hard or impossible to perform the inverse operation of a function.
A final note is that of course it can be possible to construct an "inverse constructor" for certain families of functions. But in general (meaning a "inverse generator" that can take any kind of function as input), it is impossible.
To regenerate your data, you need to invert the mappings that were applied to it. There is no general function-inverse operator in Python or any other programming language, for the reasons that #Willem explained, but humans are pretty good at identifying and reversing simple manipulations. With enough work, it is possible to understand and reverse complicated manipulations too. This is part of how hackers reverse-engineer programs and algorithms, and it is what you need to do too if your data is worth the effort. (Of course you can partially
automate the process, especially if you know the kinds of manipulations that have been applied, e.g. if you wrote them yourself.)
If you have the source code, it's relatively easy: Inspect each function, write a suitable inverse (to the extent that hey exist), and write a main loop that somehow determines which inverse to apply. If you don't have the source code but have the compiled program (.pyc or .pyo files), you can still disassemble them and puzzle out what they do. See the dis module (but it's not at at all trivial):
>>> import dis
>>> dis.dis(transformer)
1 0 LOAD_FAST 0 (value)
3 LOAD_ATTR 0 (title)
6 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
9 RETURN_VALUE
So... the bottom line is, you have to do it yourself. Good luck with it.

What is the maximum length for an attribute name in python?

I'm writing a set of python functions that perform some sort of conformance checking on a source code project. I'd like to specify quite verbose names for these functions, e.g.: check_5_theVersionOfAllVPropsMatchesTheVersionOfTheAutolinkHeader()
Could such excessively long names be a problem for python? Is there a maximum length for attribute names?
2.3. Identifiers and keywords from The Python Language Reference:
Identifiers are unlimited in length.
But you'll be violating PEP-8 most likely, which is not really cool:
Limit all lines to a maximum of 79 characters.
Also you'll be violating PEP-20 (the Zen of Python):
Readability counts.
They could be a problem for the programmer. Keep the function names reasonably short, and use docstrings to document them.
Since attribute names just get hashed and turned in to keys on inst.__dict__ for 99% of classes you'll ever encounter, there's no real limit on length. As long as it is hashable, it'll work as an attribute name. For the other 1% of classes that fiddle with __setattr__\ __getattr__\ __getattribute__ in ways that break the guarantee that anything hashable is a valid attribute name though, the previous does not apply.
Of course, as others have pointed out, you will have code style and quality concerns with longer named attributes. If you are finding yourself needing such long names, it's likely indicative of a design flaw in your program, and you should probably look at giving your data more hierarchical structure and better abstracting and dividing responsibility in your functions and methods.

Why don't any and all take multiple parameters like min and max?

The functions min and max are very flexible; they can take any number of parameters, or a single parameter that is an iterable. any and all are similar in taking an iterable of any size, but they do not take more than one parameter. Is there a reason for this difference in behavior?
I realize that the question might seem unanswerable, but the process of enhancing Python is pretty open; many seemingly arbitrary design decisions are part of the public record. I've seen similar questions answered in the past, and I'm hoping this one can be as well.
Inspired by this question: Is there a builtin function version of and and/or or in Python?
A lot of the features in Python are suggested based on how much users need them, however they must also conform to the style of the language. People often need to do this:
max_val = 0
for x in seq:
# ... do complex calculations
max_val = max(max_val, result)
which warrants the use of the multiple parameters. It also looks good. I haven't heard of anyone needing to use any(x, y, z) because it is most often used on sequences. For a small number of values you can just use the and/or logical operators and for a lot of values you really should be using a list anyway or your code gets messy. I'm certain that not much thought has gone into this because it really wouldn't benefit anyone, it hasn't been under large demand so the Python devs don't worry about it.

Does it make sense to use Hungarian notation prefixes in interpreted languages? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
First of all, I have taken a look at the following posts to avoid duplicate question.
https://stackoverflow.com/questions/1184717/hungarian-notation
Why shouldn't I use "Hungarian Notation"?
Are variable prefixes (“Hungarian notation”) really necessary anymore?
Do people use the Hungarian Naming Conventions in the real world?
Now, all of these posts are related to C#, C++, Java - strongly typed languages.
I do understand that there is no need for the prefixes when the type is known before compilation.
Nevertheless, my question is:
Is it worthwhile to use the prefixes in interpreter based languages, considering the fact that you cant see the type of the object before runtime?
Edit: If someone can make this post a community wiki, please do. I am hardly interested in the reputation (or negative reputation) from this post.
It depends on which of the two versions you refer to:
If you want to use the "real", original Hungarian notation AKA Applications Hungarian notation, denoting the logical variable type resp. its purpose, feel free to do so.
OTOH, the "misunderstood" version AKA Systems Hungarian notation, denotng just the physical variable type is frowned upon and should not be used.
IMHO, it never(*) makes real sense to use Systems Hungarian (prefixing the data type). Either you use a static language or a dynamic language, but with both the compiler or interpreter takes care of the type system. Annotating the type of a variable by means of the variable name can only cause ambiguity (e.g. imagine a float called intSomething).
It is completely different with regard to Application Hungarian, i.e. prefixing with some kind of usage pattern. I'd argue it is good practice to use this kind of notation, e.g. 'usValue' for an unsafe (i.e. unvalidated) value. This gives a visual cue as to the usage and prevents you from mixing different uses of variables which do have the same type but are not intended to be used together (or when they are intended to be used together, you at least have an idea as to what is being used and they produce a blip on your code checking radar).
I frequently use such a thing in MATLAB, e.g. idxInterest to indicate that the array of doubles are not raw data values, but just the indexes (into another array) which are of interest in one way or the other. I regularly use selInterest (sel from select) to do the same with logical indexes (I agree this might look like borderline Systems Hungarian), but in many cases both can be used in the same context.
Similarly for iterators: I regularly use multidimensional arrays (e.g. 4D), in the odd case I run a (par)for over a dimension, the iterators are called iFoo, jBar, kBaz, ... while their upper limit is generally nFoo, nBar, nBaz, ... (or numFoo, ...). When doing more complicated index manipulation, you can easily see what index belongs to what dimension (by the prefix you know what numerical dimension is used, by the full name you know what that dimension represents). This makes the code a lot more readable.
Next to that, I regularly use dFoo=1;, dBar=2;, ... to denote the number of the dimension for a certain set of variables. That way, you can easily see that something like meanIncome = mean(income, dBar) takes the mean income over the Bars , while meanIncome = mean(income, 2) does not convey the same information. Since you also have to set the dVariables, it also serves as documentation of your variables.
While it is not technically incorrect to do something like iFoo + jBar or kBaz + dBar, it does raise some questions when these do occur in your code and they allow you to inspect that part more vigilantly. And that is what real (Applications) Hungarian Notation is all about.
(*) The only moment where it might make some sense, is where your complete framework/language asks you to use it. E.g. the win32 API uses it, so when you interface with that directly, you should use those standards to keep confusion to a minimum. However, I'd argue that it might make even as much or even more sense to look for another framework/language.
Do note that this is something different from sigils as used in Perl, some BASIC dialects etc. These also convey the type, but in many implementations this is the type definition so no or little ambiguity is possible. It is another question whether it is good practice to use that kind of type declaration (and I'm not really sure about my own stance in this).
The reason Hungarian notation conveying type ("systems Hungarian") is frowned upon in Python is simple. It's misleading. A variable might be called iPhones (the integer number of phones, maybe :-) but because it's Python, there's nothing at all to keep you from putting something other than an integer into it! And maybe you will find you need to do that for some reason. And then all the code that uses it is very misleading to someone trying to understand it, unless of course you globally change the name of the variable.
This notation was intended to help you keep track of variable types in statically-typed languages and was arguably useful for a time. But it's obsolete now, even for statically typed languages, given the availability of IDEs that do the job in a much better way.
As it was proposed, Hungarian notation is a reasonable idea. As it was applied? It should be nuked from orbit (It's the only way to be sure.)
The accepted answer from the first question you link to applies the same to Python:
Hungarian notation has no place in Java. The Java API does not use it, and neither do most developers. Java code would not look like Java using it.
All this is also true for Python.

Categories