Python - Implicit Boolean comparisons - python

I was reading the PEP 8 (Python . org), and I noticed that using implicit comparisons with Boolean was preferred.
if booleanCond == True # Actually works
if booleanCond # Works too but preferred according to PEP8
Those two statements mean the same, but in most languages I know explicit comparison is preferred.
Can anyone explain me (quickly ?) why ?
Thanks !

AFAIK explicit comparison is frowned upon in most languages. There is a question about this practice on the Software Engineering stack exchange.
The big picture is that if you need to explicitely compare your boolean condition to True you might have a naming problem with your variable.
if is_blue: reads well (which is an important thing in python because it helps reduce the cognitive load of the programmer) and if is_blue is True: does not.
As usual this is a heuristic and should not be dogmatic, but if you ever feel that you need to compare a boolean value to True or False to help your reader understand what you're doing it might be worth questionning your naming for this variable.

Related

Python: Unable to run pseudocode in PyCharm [duplicate]

The following code is an example of what I think would qualify as pseudocode, since it does not execute in any language but the logic is correct.
string checkRubric(gpa, major)
bool brake = false
num lastRange
num rangeCounter
string assignment = "unassigned"
array bus['business']= array('person a'=>array(0, 2.9), 'person b'=>array(3, 4))
array cis['computer science']= array('person c'=>array(0, 2.9), 'person d'=>array(3, 4))
array lib['english']= array('person e'=>array(0, 4))
array rubric = array(bus, cis, lib)
foreach (rubric as fieldAr)
foreach (fieldAr as field => advisorAr)
if (major == field)
foreach (advisorAr as advisor => gpaRangeAr)
rangeCounter = 0
foreach (gpaRangeAr as gpaValue)
if (rangeCounter < 1)
lastRange = gpaValue
else if (gpa >= lastRange && gpa <= gpaValue)
assignment = advisor
brake = true
break
endif
rangeCounter++
endforeach
if (brake == true)
break
endif
endforeach
if (brake == true)
break
endif
endif
endforeach
if (brake == true)
break
endif
endforeach
return assignment
For the past couple of weeks I've been trying to create a clear definition of what pseudocode actually is. Is it relative to the programmer or is there an actual clearcut syntax? I say pseudocode is any code that does not execute, how about you? Thanks (links to this subject welcome)
There is no fixed definition of pseudocode. It's any notation that you expect your audience to understand to get your point across. The important idea is that it is intended for humans to read, not computers, so it doesn't have to be precise. You can include the details that are important to your exposition, and omit the ones that are not.
Shamelessly ripped from Wikipedia:
Pseudocode is a compact and informal high-level description of a computer programming algorithm that uses the structural conventions of a programming language, but is intended for human reading rather than machine reading. Pseudocode typically omits details that are not essential for human understanding of the algorithm, such as variable declarations, system-specific code and subroutines.
There is a lot of code that does not execute. That does not mean it is pseudocode. Your "psuedocode" has a lot of extra stuff that non-programmers will not understand. Instead of being pseudocode, your "psuedocode" language is very, very close to an actual language.
Pseudocode should, in theory, be implementation independant. It presents logical steps in plain language of what to do. It is intended for human interpretation, not machine execution.
OP's example is a bit closer to actual code than pseudocode. For example, ++ is not found in all languages. It could also have a very different meaning in others.
Pseudo-code is any compact, human readable explanation of an algorithm or program. Since your program is not readable to me, I would say that it is not quite pseudo-code. Here is an example of pseudo-code:
def sum(x):
result = 0
for each entry in x:
add current entry to result
report result
Or, in a slightly different style:
sum(x):
Let x be an array
Let result be an integer representing the result, initially 0
for item in x:
result += item
return result
You can use elements of a particular syntax (and, in fact, my pseudo-code tends to look a lot like Python), but it needs to be understandable by a wide audience and should not be obstructed by syntax. For example, I use "+=", but this is because it is highly compact and convenient, not because it is required. If you found "endforeach" helpful and convenient in your exposition, it would have been ok; however, I would argue that such a thing does not belong in pseudo-code as it looks more stinted than helpful or explanatory.
Well, if I don't compile/link my C++ code, it won't execute, so I don't think "Code that doesn't execute" is an acceptable definition.
Likewise scripting languages aren't executed, they're often times interpreted.
My definition of pseudo code would be:
"[Concise] Code that is syntax agnostic, written to convey a function, behavior, or algorithm.""
An outline of a program, written in a form that can easily be converted into real programming statements.
Pseudocode cannot be compiled nor executed, and there are no real formatting or syntax rules. It is simply one step - an important one - in producing the final code. The benefit of pseudocode is that it enables the programmer to concentrate on the algorithms without worrying about all the syntactic details of a particular programming language.
My two cents on this:
I say pseudocode is any code that does
not execute, how about you? Thanks
(links to this subject welcome)
That's not what I think of when thinking about its definition. A pseudocode are the steps your program will take to accomplish a task in more detail than describing the algorithm would.
One thing in particular that I find extremely important about how to write a pseudocode is that, it has to be understood by everyone in order for each individual to "port" it to one's desired language. In other words, it does have to be language agnostic.
Just as a constructive criticism, I would not consider your example as pseudocode for various reasons but, specially because, you are using syntax and conventions that resembles a particular programming language. I say pseudocodes should be programming-language agnostic in order to be port to several actual programming languages by different people.
EDIT:
Probably one more rule I would add to my definition is that, it has to resemble human language than a programming language. As in, equals instead of ==, assign instead of =. The reason behind this is that, for instance, assignment and equality operators are different in different languages.
Pseudocode is what you'd write on the whiteboard if you want to get your ideas across quickly and clearly. In practice, for me, it's much like an untyped scripting language, but with much looser syntactical requirements. For me it looks much like C because, frankly, most programmers grok some language that is a variant on C syntax and so intuition is easier for more people (it used to look like Pascal, but that's because that was one of the first languages I learned in school).

Do “Clean Code”'s function argument number guidelines apply to API design?

I am a newbie reading Uncle Bob's Clean Code Book.
It is indeed great practice to limit the number of function arguments as few as possible. But I still come across so many functions offered in many libraries that require a bunch of arguments. For example, in Python's pandas, there is a function with 9 arguments:
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
(And this function also violates the advice about flag arguments)
It seems that such cases are much rarer in Python standard libraries, but I still managed to find one with 4 arguments:
re.split(pattern, string, maxsplit=0, flags=0)
I understand that this is just a suggestion instead of silver bullet, but is it applicable when it comes to something mentioned above?
Uncle Bob does not mention a hard limit of arguments that would make your code smell, but I would consider 9 arguments as too much.
Today's IDEs are much better in supporting the readability of the code, nevertheless refactoring stays tricky, especially with a large number of equally typed arguments.
The suggested solution is to encapsulate the arguments in a single struct/object (depending on your language). In the given case, this could be a GroupingStrategy:
strategy = GroupingStrategy();
strategy.by = "Foo"
strategy.axis = 0
strategy.sorted = true
DataFrame.groupby(strategy)
All not mentioned attributes will be assigned with the respective default values.
You could then also convert it to a fluent API:
DataFrame.groupby(GroupingStrategy.by("Foo").axis(0).sorted())
Or keep some of the arguments, if this feels better:
DataFrame.groupby("Foo", GroupingStrategy.default())
The first point to note is that all those arguments to groupby are relevant. You can reduce the number of arguments by having different versions of groupby but that doesn't help much when the arguments can be applied independently of each other, as is the case here. The same logic would apply to re.split.
It's true that integer "flag" arguments can be dodgy from a maintenance point of view - what happens if you want to change a flag value in your code? You have to hunt through and manually fix each case. The traditional approach is to use enums (which map numbers to words eg a Day enum would have Day.Sun = 0, Day.Mon = 1, etc) In compiled languages like C++ or C# this gives you the speed of using integers under the hood but the readability of using labels/words in your code. However enums in Python are slow.
One rule that I think applies to any source code is to avoid "magic numbers", ie numbers which appear directly in the source code. The enum is one solution. Another solution is to have constant variables to represent different flag settings. Python sort-of supports constants (uppercase variable names in constant.py which you then import) however they are constant only by convention, you can actually change their value :(

Python: validate whether string is a float without conversion

Is there a Pythonic way to validate whether a string represents a floating-point number (any input that would be recognizable by float(), e.g. -1.6e3), without converting it (and, ideally, without resorting to throwing and catching exceptions)?
Previous questions have been submitted about how to check if a string represents an integer or a float. Answers suggest using try...except clauses together with the int() and float() built-ins, in a user-defined function.
However, these haven't properly addressed the issue of speed. While using the try...except idiom for this ties the conversion process to the validation process (to some extent rightfully), applications that go over a large amount of text for validation purposes (any schema validator, parsers) will suffer from the overhead of performing the actual conversion. Besides the slowdown due to the actual conversion of the number, there is also the slowdown caused by throwing and catching exceptions. This GitHub gist demonstrates how, compared to user-defined validation only, built-in conversion code is twice as costly (compare True cases), and exception handling time (False time minus True time for the try..except version) alone is as much as 7 validations. This answers my question for the case of integer numbers.
Valid answers will be: functions that solve the problem in a more efficient way than the try..except method, a reference to documentation for a built-in feature that will allow this in the future, a reference to a Python package that allows this now (and is more efficient than the try..except method), or an explanation pointing to documentation of why such a solution is not Pythonic, or will otherwise never be implemented. Specifically, to prevent clutter, please avoid answers such as 'No' without pointing to official documentation or mailing-list debate, and avoid reiterating the try..except method.
As #John mentioned in a comment, this appears as an answer in another question, though it is not the accepted answer in that case. Regular expressions and the fastnumbers module are two solutions to this problem.
However, it's duly noted (as #en_Knight did) that performance depends largely on the inputs. If expecting mostly valid inputs, then the EAFP approach is faster, and arguably more elegant. If you don't know what to input to expect, then LBYL might be more appropriate. Validation, in essence, should expect mostly valid inputs, so it's more appropriate for try..except.
The fact is, for my use case (and as the writer of the question it bears relevance) of identifying types of data in a tabular data file, the try..except method was more appropriate: a column is either all float, or, if it has a non-float value, from that row on it's considered textual, so most of the inputs actually tested for float are valid in either case. I guess all those other answers were on to something.
Back to answer, fastnumbers and regular expressions are still appealing solutions for the general case. Specifically, the fastnumbers package seem to be working well for all values except for special ones, such as Infinity, Inf and NaN, as demonstrated in this GitHub gist. The same goes for the simple regular expression from the aforementioned answer (modified slightly - removed the trailing \b as it would cause some inputs to fail):
^[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+)(?:[eE][-+]?[0-9]+\b)?$
A bulkier version, that does recognize the special values, was used in the gist, and has equal performance:
^[-+]?(?:[Nn][Aa][Nn]|[Ii][Nn][Ff](?:[Ii][Nn][Ii][Tt][Yy])?|(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+)(?:[eE][-+]?[0-9]+\b)?)$
The regular expression implementation is ~2.8 times slower on valid inputs, but ~2.2 faster on invalid inputs. Invalid inputs run ~5 times slower than valid ones using try..except, or ~1.3 times faster using regular expressions. Given these results, it means it's favorable to use regular expressions when 40% or more of expected inputs are invalid.
fastnumbers is merely ~1.2 times faster on valid inputs, but ~6.3 times faster on invalid inputs.
Results are described in the plot below. I ran with 10^6 repeats, with 170 valid inputs and 350 invalid inputs (weighted accordingly, so the average time is per a single input). Colors don't show because boxes are too narrow, but the ones on the left of each column describe timings for valid inputs, while invalid inputs are to the right.
NOTE The answer was edited multiple times to reflect on comments both to the question, this answer and other answers. For clarity, edits have been merged. Some of the comments refer to previous versions.
If being pythonic is a justification then you should just stick to The Zen of Python. Specifically to this ones:
Explicit is better than implicit.
Simple is better than complex.
Readability counts.
There should be one-- and preferably only one --obvious way to do it.
If the implementation is hard to explain, it's a bad idea.
All those are in favour of the try-except approach. The conversion is explicit, is simple, is readable, is obvious and easy to explain
Also, the only way to know if something is a float number is testing if it's a float number. This may sound redundant, but it's not
Now, if the main problem is speed when trying to test too much supposed float numbers you could use some C extensions with cython to test all of them at once. But I don't really think it will give you too much improvements in terms of speed unless the amount of strings to try is really big
Edit:
Python developers tend to prefer the EAFP approach (Easier to Ask for Forgiveness than Permission), making the try-except approach more pythonic (I can't find the PEP)
And here (Cost of exception handlers in Python) is a comparisson between try-except approach against the if-then. It turns out that in Python the exception handling is not as expensive as it is in other languages, and it's only more expensive in the case that a exception must be handled. And in general use cases you won't be trying to validate a string with high probability of not being actually a float number (Unless in your specific scenario you have this case).
Again as I said in a comment. The entire question doesn't have that much sense without a specific use case, data to test and a measure of time. Just talking about the most generic use case, try-except is the way to go, if you have some actual need that can't be satisfied fast enough with it then you should add it to the question
To prove a point: there's not that many conditions that a string has to abide by in order to be float-able. However, checking all those conditions in Python is going to be rather slow.
ALLOWED = "0123456789+-eE."
def is_float(string):
minuses = string.count("-")
if minuses == 1 and string[0] != "-":
return False
if minuses > 1:
return False
pluses = string.count("+")
if pluses == 1 and string[0] != "+":
return False
if pluses > 1:
return False
points = string.count(".")
if points > 1:
return False
small_es = string.count("e")
large_es = string.count("E")
es = small_es + large_es
if es > 1:
return False
if (es == 1) and (points == 1):
if small_es == 1:
if string.index(".") > string.index("e"):
return False
else:
if string.index(".") > string.index("E"):
return False
return all(char in ALLOWED for char in string)
I didn't actually test this, but I'm willing to bet that this is a lot slower than try: float(string); return True; except Exception: return False
Speedy Solution If You're Sure You Want it
Taking a look at this reference implementation - the conversion to float in python happens in C code and is executed very efficiently. If you really were worried about overhead, you could copy that code verbatim into a custom C extension, but instead of raising the error flag, return a boolean indicating success.
In particular, look at the complicated logic implemented to coerce hex into float. This is done in the C level, with a lot of error cases; it seems highly unlikely there's a shortcut here (note the 40 lines of comments arguing for one particular guarding case), or that any hand-rolled implementation will be faster while preserving these cases.
But... Necessary?
As a hypothetical, this question is interesting, but in the general case one should try to profile their code to ensure that the try catch method is adding overhead. Try/catch is often idiomatic and moreover can be faster depending on your usage. For example, for-loops in python use try/catch by design.
Alternatives and Why I Don't Like Them
To clarify, the question asks about
any input that would be recognizable by float()
Alternative #1 -- How about a regex
I find it hard to believe that you will get a regex to solve this problem in general. While a regex will be good at capturing float literals, there are a lot of corner cases. Look at all the cases on this answer - does your regex handle NaN? Exponentials? Bools (but not bool strings)?
Alternative #2: Manually Unrlodded Python Check:
To summarize the tough cases that need to be captured (which Python natively does)
Case insensitive capturing of Nan
Hex matching
All of the cases enumerated in the language specification
Signs, including signs in the exponent
Booleans
I also would point you to the case below floating points in the language specification; imaginary numbers. The floating method handles these elegantly by recognizing what they are, but throwing a type error on the conversion. Will your custom method emulate that behaviour?

Python usage of built in all function on list in return statement

I have a method, which needs to return the result of multiple checks for equality. It is an implementation of the __eq__ method of a class, which represents vocables in my application. Here is the code for the return statement:
return all((
self.first_language_translations == other.first_language_translations,
self.first_language_phonetic_scripts == other.first_language_phonetic_scripts,
self.second_language_translations == other.second_language_translations,
self.second_language_phonetic_scripts == other.second_language_phonetic_scripts
))
I've tested the runtime of this way of doing it and the other way, using and operators. The and operators are slightly faster, maybe 0.05s. It seems logical, because of having to create a list of those boolean values first and then running a function, which might do more than what the corresponding and operators would have done. However, this is probably going to be executed a lot during my applications runtime.
Now I am wondering, if the usage of all in such a case is a good or a bad practice and if it is worth the slowdown, if it is a good practice. My application is all about vocables and might often need to check, whether a vocable or an identical one is already in a list of vocables. It doesn't need to be super fast and I am thinking this might be micro-optimization, so I'd like to use the best practice for such a situation.
Is this a good usage of the built in all function?
No, that's not a good use of all(), since you have a small, fixed number of comparisons to make, and all() isn't even letting you represent it any more succinctly than you would when using and. Using and is more readable, and you should always put readability first unless you've profiled and performance is actually an issue. That said, using and is indeed a tiny bit faster in the worst case, and even faster on average because it'll short circuit on the first False rather than executing all the comparisons every time.

Does it make sense to use Hungarian notation prefixes in interpreted languages? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
First of all, I have taken a look at the following posts to avoid duplicate question.
https://stackoverflow.com/questions/1184717/hungarian-notation
Why shouldn't I use "Hungarian Notation"?
Are variable prefixes (“Hungarian notation”) really necessary anymore?
Do people use the Hungarian Naming Conventions in the real world?
Now, all of these posts are related to C#, C++, Java - strongly typed languages.
I do understand that there is no need for the prefixes when the type is known before compilation.
Nevertheless, my question is:
Is it worthwhile to use the prefixes in interpreter based languages, considering the fact that you cant see the type of the object before runtime?
Edit: If someone can make this post a community wiki, please do. I am hardly interested in the reputation (or negative reputation) from this post.
It depends on which of the two versions you refer to:
If you want to use the "real", original Hungarian notation AKA Applications Hungarian notation, denoting the logical variable type resp. its purpose, feel free to do so.
OTOH, the "misunderstood" version AKA Systems Hungarian notation, denotng just the physical variable type is frowned upon and should not be used.
IMHO, it never(*) makes real sense to use Systems Hungarian (prefixing the data type). Either you use a static language or a dynamic language, but with both the compiler or interpreter takes care of the type system. Annotating the type of a variable by means of the variable name can only cause ambiguity (e.g. imagine a float called intSomething).
It is completely different with regard to Application Hungarian, i.e. prefixing with some kind of usage pattern. I'd argue it is good practice to use this kind of notation, e.g. 'usValue' for an unsafe (i.e. unvalidated) value. This gives a visual cue as to the usage and prevents you from mixing different uses of variables which do have the same type but are not intended to be used together (or when they are intended to be used together, you at least have an idea as to what is being used and they produce a blip on your code checking radar).
I frequently use such a thing in MATLAB, e.g. idxInterest to indicate that the array of doubles are not raw data values, but just the indexes (into another array) which are of interest in one way or the other. I regularly use selInterest (sel from select) to do the same with logical indexes (I agree this might look like borderline Systems Hungarian), but in many cases both can be used in the same context.
Similarly for iterators: I regularly use multidimensional arrays (e.g. 4D), in the odd case I run a (par)for over a dimension, the iterators are called iFoo, jBar, kBaz, ... while their upper limit is generally nFoo, nBar, nBaz, ... (or numFoo, ...). When doing more complicated index manipulation, you can easily see what index belongs to what dimension (by the prefix you know what numerical dimension is used, by the full name you know what that dimension represents). This makes the code a lot more readable.
Next to that, I regularly use dFoo=1;, dBar=2;, ... to denote the number of the dimension for a certain set of variables. That way, you can easily see that something like meanIncome = mean(income, dBar) takes the mean income over the Bars , while meanIncome = mean(income, 2) does not convey the same information. Since you also have to set the dVariables, it also serves as documentation of your variables.
While it is not technically incorrect to do something like iFoo + jBar or kBaz + dBar, it does raise some questions when these do occur in your code and they allow you to inspect that part more vigilantly. And that is what real (Applications) Hungarian Notation is all about.
(*) The only moment where it might make some sense, is where your complete framework/language asks you to use it. E.g. the win32 API uses it, so when you interface with that directly, you should use those standards to keep confusion to a minimum. However, I'd argue that it might make even as much or even more sense to look for another framework/language.
Do note that this is something different from sigils as used in Perl, some BASIC dialects etc. These also convey the type, but in many implementations this is the type definition so no or little ambiguity is possible. It is another question whether it is good practice to use that kind of type declaration (and I'm not really sure about my own stance in this).
The reason Hungarian notation conveying type ("systems Hungarian") is frowned upon in Python is simple. It's misleading. A variable might be called iPhones (the integer number of phones, maybe :-) but because it's Python, there's nothing at all to keep you from putting something other than an integer into it! And maybe you will find you need to do that for some reason. And then all the code that uses it is very misleading to someone trying to understand it, unless of course you globally change the name of the variable.
This notation was intended to help you keep track of variable types in statically-typed languages and was arguably useful for a time. But it's obsolete now, even for statically typed languages, given the availability of IDEs that do the job in a much better way.
As it was proposed, Hungarian notation is a reasonable idea. As it was applied? It should be nuked from orbit (It's the only way to be sure.)
The accepted answer from the first question you link to applies the same to Python:
Hungarian notation has no place in Java. The Java API does not use it, and neither do most developers. Java code would not look like Java using it.
All this is also true for Python.

Categories