What is the preferred way to implement 'yield' in Scala? - python

I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose 'yield' statement is extremely useful for implementing complex iterators over large, often irregularly structured text files. Similar constructs exist in other languages (e.g. C#), for good reason.
Yes I know there have been previous threads on this. But they look like hacked-up (or at least badly explained) solutions that don't clearly work well and often have unclear limitations. I would like to write code something like this:
import generator._
def yield_values(file:String) = {
generate {
for (x <- Source.fromFile(file).getLines()) {
# Scala is already using the 'yield' keyword.
give("something")
for (field <- ":".r.split(x)) {
if (field contains "/") {
for (subfield <- "/".r.split(field)) { give(subfield) }
} else {
// Scala has no 'continue'. IMO that should be considered
// a bug in Scala.
// Preferred: if (field.startsWith("#")) continue
// Actual: Need to indent all following code
if (!field.startsWith("#")) {
val some_calculation = { ... do some more stuff here ... }
if (some_calculation && field.startsWith("r")) {
give("r")
give(field.slice(1))
} else {
// Typically there will be a good deal more code here to handle different cases
give(field)
}
}
}
}
}
}
}
I'd like to see the code that implements generate() and give(). BTW give() should be named yield() but Scala has taken that keyword already.
I gather that, for reasons I don't understand, Scala continuations may not work inside a for statement. If so, generate() should supply an equivalent function that works as close as possible to a for statement, because iterator code with yield almost inevitably sits inside a for loop.
Please, I would prefer not to get any of the following answers:
'yield' sucks, continuations are better. (Yes, in general you can do more with continuations. But they are hella hard to understand, and 99% of the time an iterator is all you want or need. If Scala provides lots of powerful tools but they're too hard to use in practice, the language won't succeed.)
This is a duplicate. (Please see my comments above.)
You should rewrite your code using streams, continuations, recursion, etc. etc. (Please see #1. I will also add, technically you don't need for loops either. For that matter, technically you can do absolutely everything you ever need using SKI combinators.)
Your function is too long. Break it up into smaller pieces and you won't need 'yield'. You'd have to do this in production code, anyway. (First, "you won't need 'yield'" is doubtful in any case. Second, this isn't production code. Third, for text processing like this, very often, breaking the function into smaller pieces -- especially when the language forces you to do this because it lacks the useful constructs -- only makes the code harder to understand.)
Rewrite your code with a function passed in. (Technically, yes you can do this. But the result is no longer an iterator, and chaining iterators is much nicer than chaining functions. In general, a language should not force me to write in an unnatural style -- certainly, the Scala creators believe this in general, since they provide shitloads of syntactic sugar.)
Rewrite your code in this, that, or the other way, or some other cool, awesome way I just thought of.

The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?
Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.
Here's how you do it.
// You want to write
for (x <- xs) { /* complex yield in here */ }
// Instead you write
xs.iterator.flatMap { /* Produce iterators in here */ }
// You want to write
yield(a)
yield(b)
// Instead you write
Iterator(a,b)
// You want to write
yield(a)
/* complex set of yields in here */
// Instead you write
Iterator(a) ++ /* produce complex iterator here */
That's it! All your cases can be reduced to one of these three.
In your case, your example would look something like
Source.fromFile(file).getLines().flatMap(x =>
Iterator("something") ++
":".r.split(x).iterator.flatMap(field =>
if (field contains "/") "/".r.split(field).iterator
else {
if (!field.startsWith("#")) {
/* vals, whatever */
if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))
else Iterator(field)
}
else Iterator.empty
}
)
)
P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):
import scala.util.control.Breaks._
for (blah) { breakable { ... break ... } }
but that won't get you what you want because Scala doesn't have the yield you want.

'yield' sucks, continuations are better
Actually, Python's yield is a continuation.
What is a continuation? A continuation is saving the present point of execution with all its state, such that one can continue at that point later. That's precisely what Python's yield, and, also, precisely how it is implemented.
It is my understanding that Python's continuations are not delimited, however. I don't know much about that -- I might be wrong, in fact. Nor do I know what the implications of that may be.
Scala's continuation do not work at run-time -- in fact, there's a continuations library for Java that work by doing stuff to bytecode at run-time, which is free of the constrains that Scala's continuation have.
Scala's continuation are entirely done at compile time, which require quite a bit of work. It also requires that the code that will be "continued" be prepared by the compiler to do so.
And that's why for-comprehensions do not work. A statement like this:
for { x <- xs } proc(x)
If translated into
xs.foreach(x => proc(x))
Where foreach is a method on xs's class. Unfortunately, xs class has been long compiled, so it cannot be modified into supporting the continuation. As a side note, that's also why Scala doesn't have continue.
Aside from that, yes, this is a duplicate question, and, yes, you should find a different way to write your code.

The implementation below provides a Python-like generator.
Notice that there's a function called _yield in the code below, because yield is already a keyword in Scala, which by the way, does not have anything to do with yield you know from Python.
import scala.annotation.tailrec
import scala.collection.immutable.Stream
import scala.util.continuations._
object Generators {
sealed trait Trampoline[+T]
case object Done extends Trampoline[Nothing]
case class Continue[T](result: T, next: Unit => Trampoline[T]) extends Trampoline[T]
class Generator[T](var cont: Unit => Trampoline[T]) extends Iterator[T] {
def next: T = {
cont() match {
case Continue(r, nextCont) => cont = nextCont; r
case _ => sys.error("Generator exhausted")
}
}
def hasNext = cont() != Done
}
type Gen[T] = cps[Trampoline[T]]
def generator[T](body: => Unit #Gen[T]): Generator[T] = {
new Generator((Unit) => reset { body; Done })
}
def _yield[T](t: T): Unit #Gen[T] =
shift { (cont: Unit => Trampoline[T]) => Continue(t, cont) }
}
object TestCase {
import Generators._
def sectors = generator {
def tailrec(seq: Seq[String]): Unit #Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
}
def main(args: Array[String]): Unit = {
for (s <- sectors) { println(s) }
}
}
It works pretty well, including for the typical usage of for loops.
Caveat: we need to remember that Python and Scala differ in the way continuations are implemented. Below we see how generators are typically used in Python and compare to the way we have to use them in Scala. Then, we will see why it needs to be like so in Scala.
If you are used to writing code in Python, you've probably used generators like this:
// This is Scala code that does not compile :(
// This code naively tries to mimic the way generators are used in Python
def myGenerator = generator {
val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
list foreach {s => _yield(s)}
}
This code above does not compile. Skipping all convoluted theoretical aspects, the explanation is: it fails to compile because "the type of the for loop" does not match the type involved as part of the continuation. I'm afraid this explanation is a complete failure. Let me try again:
If you had coded something like shown below, it would compile fine:
def myGenerator = generator {
_yield("Financials")
_yield("Materials")
_yield("Technology")
_yield("Utilities")
}
This code compiles because the generator can be decomposed in a sequence of yields and, in this case, a yield matches the type involved in the continuation. To be more precise, the code can be decomposed onto chained blocks, where each block ends with a yield. Just for the sake of clarification, we can think that the sequence of yields could be expressed like this:
{ some code here; _yield("Financials")
{ some other code here; _yield("Materials")
{ eventually even some more code here; _yield("Technology")
{ ok, fine, youve got the idea, right?; _yield("Utilities") }}}}
Again, without going deep into convoluted theory, the point is that, after a yield you need to provide another block that ends with a yield, or close the chain otherwise. This is what we are doing in the pseudo-code above: after the yield we are opening another block which in turn ends with a yield followed by another yield which in turn ends with another yield, and so on. Obviously this thing must end at some point. Then the only thing we are allowed to do is closing the entire chain.
OK. But... how we can yield multiple pieces of information? The answer is a little obscure but makes a lot of sense after you know the answer: we need to employ tail recursion, and the the last statement of a block must be a yield.
def myGenerator = generator {
def tailrec(seq: Seq[String]): Unit #Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
val list = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
}
Let's analyze what's going on here:
Our generator function myGenerator contains some logic that obtains that generates information. In this example, we simply use a sequence of strings.
Our generator function myGenerator calls a recursive function which is responsible for yield-ing multiple pieces of information, obtained from our sequence of strings.
The recursive function must be declared before use, otherwise the compiler crashes.
The recursive function tailrec provides the tail recursion we need.
The rule of thumb here is simple: substitute a for loop with a recursive function, as demonstrated above.
Notice that tailrec is just a convenient name we found, for the sake of clarification. In particular, tailrec does not need to be the last statement of our generator function; not necessarily. The only restriction is that you have to provide a sequence of blocks which match the type of an yield, like shown below:
def myGenerator = generator {
def tailrec(seq: Seq[String]): Unit #Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
_yield("Before the first call")
_yield("OK... not yet...")
_yield("Ready... steady... go")
val list = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
_yield("done")
_yield("long life and prosperity")
}
One step further, you must be imagining how real life applications look like, in particular if you are employing several generators. It would be a good idea if you find a way to standardize your generators around a single pattern that demonstrates to be convenient for most circumstances.
Let's examine the example below. We have three generators: sectors, industries and companies. For brevity, only sectors is completely shown. This generator employs a tailrec function as demonstrated already above. The trick here is that the same tailrec function is also employed by other generators. All we have to do is supply a different body function.
type GenP = (NodeSeq, NodeSeq, NodeSeq)
type GenR = immutable.Map[String, String]
def tailrec(p: GenP)(body: GenP => GenR): Unit #Gen[GenR] = {
val (stats, rows, header) = p
if (!stats.isEmpty && !rows.isEmpty) {
val heads: GenP = (stats.head, rows.head, header)
val tails: GenP = (stats.tail, rows.tail, header)
_yield(body(heads))
// tail recursion
tailrec(tails)(body)
}
}
def sectors = generator[GenR] {
def body(p: GenP): GenR = {
// unpack arguments
val stat, row, header = p
// obtain name and url
val name = (row \ "a").text
val url = (row \ "a" \ "#href").text
// create map and populate fields: name and url
var m = new scala.collection.mutable.HashMap[String, String]
m.put("name", name)
m.put("url", url)
// populate other fields
(header, stat).zipped.foreach { (k, v) => m.put(k.text, v.text) }
// returns a map
m
}
val root : scala.xml.NodeSeq = cache.loadHTML5(urlSectors) // obtain entire page
val header: scala.xml.NodeSeq = ... // code is omitted
val stats : scala.xml.NodeSeq = ... // code is omitted
val rows : scala.xml.NodeSeq = ... // code is omitted
// tail recursion
tailrec((stats, rows, header))(body)
}
def industries(sector: String) = generator[GenR] {
def body(p: GenP): GenR = {
//++ similar to 'body' demonstrated in "sectors"
// returns a map
m
}
//++ obtain NodeSeq variables, like demonstrated in "sectors"
// tail recursion
tailrec((stats, rows, header))(body)
}
def companies(sector: String) = generator[GenR] {
def body(p: GenP): GenR = {
//++ similar to 'body' demonstrated in "sectors"
// returns a map
m
}
//++ obtain NodeSeq variables, like demonstrated in "sectors"
// tail recursion
tailrec((stats, rows, header))(body)
}
Credits to Rich Dougherty and huynhjl.
See this SO thread: Implementing yield (yield return) using Scala continuations*
Credits to Miles Sabin, for putting some of the code above together
http://github.com/milessabin/scala-cont-jvm-coro-talk/blob/master/src/continuations/Generators.scala

Related

How does the python #cache decorator work?

I recently learned about the cache decorator in Python and was surprised how well it worked and how easily it could be applied to any function. Like many others before me I tried to replicate this behavior in C++ without success ( tried to recursively calculate the Fib sequence ). The problem was that the internal calls didn't get cached. This is not a problem if I modify the original function , but I want it to be a decorator so that it can be applied anywhere. I am trying to decipher the Python #cache decorator from the source code but couldn't make out a lot, to figure out how I can ( IF it is even possible ) to replicate this behavior elsewhere.
Is there a way to cache the internal calls also ?
This is a simple way to add memoisation to the fib function. What i want is to build a decorator so that i can wrap any function. Just like the one in Python.
class CacheFib {
public:
CacheFib() {}
unsigned long long fib(int n) {
auto hit = cache_pool.find(n);
if (hit != cache_pool.end()) {
return hit->second;
} else if (n <= 1) {
return n;
} else {
auto miss = this->fib(n - 1) + this->fib(n - 2);
cache_pool.insert({n, miss});
return miss;
}
}
std::map<int, int> cache_pool;
};
This approach caches the actual call, meaning that if i call cachedFib(40) , twice the second time it will be O(1).It doesn't actually cache the internal calls to help with performance.
// A PROTOTYPE IMPLEMENTATION
template <typename Func> class CacheDecorator {
public:
CacheDecorator(Func fun) : function(fun) {}
int operator()(int n) {
auto hit = cache_pool.find(n);
if (hit != cache_pool.end()) {
return hit->second;
} else {
auto miss = function(n);
cache_pool.insert({n, miss});
return miss;
}
}
std::function<Func> function;
std::map<int, int> cache_pool;
};
int fib(int n) {
if (n == 0 || n == 1) {
return n;
} else
return fib(n - 1) + fib(n - 2);
}
//main
auto cachedFib = CacheDecorator<decltype(fib)>(fib);
cachedFib(**);
Also any information on the #cache decorator or any C++ implementation ideas would be helpful.
So, as you're finding out, Python and C++ are different languages.
The key difference in this context is that in Python, the function name fib is looked up at run-time, even for the recursive call; meanwhile, in C++, the function name is looked up at compile-time, so by the time your CacheDecorator gets to it, it's too late.
A few possibilities:
Move the lookup of fib to run-time; you can do this either by using an explicit function pointer, or by making it a dynamic method. Either of those would mean coding the fib function differently.
Some sort of terrible, platform-dependent hack to overwrite either the function address table or the beginning of the function itself. This is going to be deep magic, particularly in the face of optimisations; the compiler might write out multiple copies of the function, or it might turn a recursive call into a loop, for example.
Move the implementation of the CacheDecorator to compile-time, as a pre-processor macro. That's probably the best way to preserve the intent of a python decorator.
Ideally, write Python in Python and C++ in C++; the languages each have their own idioms, which don't generally translate to each other in a one-to-one fashion.
Trying to write Python-style code in C++ will always result in code that's somewhat alien, even in cases where it is possible. Much better to become fluent in the idioms of the language you're using.
I think that internal calls are not being cached because once the CacheDecorator is invoked, the CacheDecorator::operator() is only invoked once. After that, the CacheDecorator::function is recursively invoked. This is problematic because you want to check the cache at every recursive call of CacheDecorator::function; however, this does not occur because your cache checking code is in CacheDecorator::operator(). Consequently, the first and only time CacheDecorator::operator() is invoked is when you invoke it in main, which is also the first and only time the cache is checked.
I also think that you would only encounter your issue if the passed in function uses recursion to compute its return value.
I may have made a mistake, but that's what I think the issue is.
I think that one way to fix this would be to accumulate and return a vector/map of computed values. Once your CacheDecorator::function is complete, you can then cache the returned vector/map. This would require you to modify your fib function, so this may not be a good solution. This modification also does not perfectly replicate Python's #cache decorator functionality since the programmer is essentially expected to store pre-computed values.

Writing in MIPS a function that converts a string representing an integer (in a given base), to the corresponding number in base 10

Suppose we have a string representing an integer in a given base, and that very base.
The parameters are the address (in $a0) starting from which the string is stored, and the base in $a1.
I should convert the corresponding number into base 10, and save it in $v0. In this case 0 should be loaded into $v1. If instead the string does not correctly represent an integer in the given base, then in the end $v0 should contain -1 and $v1 should contain 1.
Also, the function that actually performs the conversion should be recursive.
I have written beforehand a Python program in such a way (you'll notice the various s0, s1 etc.) that I could transfer the thinking to MIPS, but it got really confusing when I realised I probably should perform contextually the counting of the characters of the string, the investigation over the string being "in base" or not, and the actual conversion with the gradual summing of quantities to some designated variable - and not separately as in the below program.
How should I go about this and write the function(s)?
Here's the Python code:
digitsDict = dict();
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper();
for i in range(26):
digitsDict.update({lowerCase[i]:i+10})
digitsDict.update({upperCase[i]:i+10})
def isInBase(string, base):
s1 = False;
for char in string:
if (str(char).isdigit()):
if (int(char) >= base):
return s1
else:
if (char not in digitsDict.keys() or digitsDict[char] >= base):
return s1
s1 = True
return s1
def convert(string, base, k=0):
s2 = 0
char = string[k]
l = len(string) - 1
if str(char).isdigit(): s2 += int(char)*(base**(l-k))
else: s2 += digitsDict[char]*(base**(l-k))
if k == l: return s2;
return s2 + convert(string, base, k+1)
def strToInt(string, base):
return (convert(string, base, 0), 0)
def main(a0String, a1Integer):
if isInBase(a0String, a1Integer):
v0, v1 = strToInt(a0String, a1Integer)
else: v0 = -1; v1 = 1;
print(v0)
if (v0 == -1): return 22 #MIPS error code for invalid argument
return 0 #OK
First, you have pseudo code, so very good!
Next, as a general rule, make sure your pseudo code actually works (so test it), as debugging design issues in assembly is very difficult, and small changes to the design (in pseudo code) can require large changes in the assembly (e.g. rewriting a lot of code).
You're doing some things in Python that are rather involved in assembly language, so you ought to write your pseudo code in C first, since that will make you address those differences. 
in — represents a loop in C or assembly
+= and + on strings — represents either some simple operations on global buffer, or, some complicated memory management (hint: go for the former)
.isDigit() is conditional conjunction or disjunction depending on how you code it
.update() will need some alternative translation
** also represents a loop in C or assembly
The next big complication is the calling of functions.  Function calling in MIPS assembly language is probably the most complex topic, due to the requirements register usage and stack handling.
Recursion is a bit of a red herring for assembly language: it appears hard but it is actually no harder and even no different in assembly language than functions calling other functions without involving recursion (this is true assuming the assembly instruction set has a runtime stack for function calling, MIPS does (but if it didn't you'd have to simulate a call stack)).
You'll need to study up on this, it is somewhat involved.  Look for other example, fibonacci, for example.
When translating to MIPS one function that calls another function, we need an analysis of variables and values that are "live across a function call".  These variables need special consideration, since some registers are wiped out by calling another function.  You'll see this in fibonacci, for example.
Part of recursive fib, is fib(n-1)+fib(n-2).  The return value from the first call to fib must be given special handling since it ultimately needed for the addition (+), so it is live across the 2nd call to fib, and without special handling, will be lost by making that 2nd call.  Also, n is live across the first call yet needed again to make the second call, and without some special handling would similarly be wiped out by that first call.
This is a consequence of functions calling functions in MIPS whose instruction set does not automatically stack things and requires the assembly language programmer or compiler to do so manually.  Some or all of this is also needed on other architectures.  This is not a consequence of recursion, so a(n)+b(n) would involve the same requirements for analysis and special handling of variables and values.
When you have a good algorithm (e.g. in C) and it works, your ready for assembly.  Translate your data first: global variables.  Next translate functions, similarly: translate your local variables into MIPS, then translate each structured statement, (e.g. if, for) following its assembly language pattern.  Nested structured statements also nest in assembly language, and it doesn't matter if you translate the inner statements first or outer statements first, as long as you exactly observe the nesting.
In order to translate structured statements into assembly I first use intermediate forms in C, for example:
for ( int i = 0; i < n; i++ ) {
..body of for..
}
Note that in the above the "body of the for" could easily contain control structures, like another for or an if.  These then, are nested structure statements — just translate them one at a time, each structured statement following its complete pattern, and make sure the nesting in C is equally observed in assembly.
Translate for to while :
...
int i = 0;
while ( i <n ) {
..body of for..
i++;
}
...
Next, apply "if-goto-label" intermediate form :
...
int i = 0;
loop1:
if ( i >= n ) goto endLoop1;
..body of for..
i++;
goto loop1;
endLoop1:
...
As you apply patterns, you'll need to create label names; new label names for each application of the pattern.
Then this translates into assembly fairly easily.  If the "body of for" also has control structures make sure their entire translation is embedded and located at the "body of for" section in the above pattern.
Notice how the i < n in assembly sometimes translates into i >= n — here because control flow in if-goto-label form make is say: exit the loop on the logically opposite condition of the while statement in C.
And finally, translate expressions within statements in your C code into assembly.

Finding the input dependencies of a functions outputs

I've been working on a python program with pycparser which is supposed to generate a JSON-file with dependencies of a given function and its outputs.
For an example function:
int Test(int testInput)
{
int b = testInput;
return b;
}
Here I would expect b to be dependent on testInput. But ofcourse it can get a lot more complicated with structs and if-statements etc. The files I'm testing also have functions in a specific form that are considered inputs and outputs as in:
int Test(int testInput)
{
int anotherInput = DatabaseRead(VariableInDatabase);
int b = testInput;
int c;
c = anotherInput + 1;
DatabaseWrite(c);
return b;
}
Here c would be dependent on VariableInDatabase, and b same as before.
I've run into a wall with this analysis in pycparser as mostly structs and pointers are really hard for me to handle, and it seems like there'd be a better way. I've read into ASTs and CFGs, and other analysis tools like Frama-C but I can't seem to find a clear answer if this is even a thing.
Is there a known way to do this kind of analysis, and if so, what should I be looking into?
It's supposed to thousands of files and be able to output these dependencies into a JSON, so plugins for editors doesn't seem like what I'm looking for.
You need data flow analysis of your code, and then you want to follow the data flow backwards from a result to its sources, up to some stopping point (in your case, you stopped at a function parameter but you probably also want to stop at any global variable).
This is called program slicing in the literature.
Computing data flows is pretty hard, especially if you have a complex language (C is fun: you can have data flows through indirectly called functions that read values; now you need indirect points-to analysis to support your data flow, and vice versa).
Here's fun example:
// ocean of functions:
...
int a(){ return b; }
...
int p(){ return q; }
...
void foo( int()* x )
{ return (*x)(); }
Does foo depend on b? on q? You can't know unless you know that
foo calls a or b. But foo is handed a function pointer... and
what might that point to?
Using just ASTs and CFGs is necessary but not sufficient; data flow analysis algorithms are hard, especially if you have scale (as you suggest you do); you need a lot of machinery to do this that is not easy to build
[We've done this on C programs of 16 million lines]. See my essay on Life After Parsing.

Does something like Python generator exist in objective-c?

Does something like Python generator exist in objective-c ?
I have following code on few places, so is there some way to simplify it ?
int maxWinInRow = [self maxWinInRow];
// how many wins in row
for (int i=1; i <= maxWinInRow; i++) {
NSString * key = [NSString stringWithFormat:#"%d",i];
NSNumber * value = [_winsInRow valueForKey:key ];
int numbeOfWinInRow = value.intValue;
// only this line is specific
gameScore = gameScore + ( pow(6,i) * numbeOfWinInRow);
}
To be specific, there is no such generator pattern built inside Objective C programming language. However with the introduction of "blocks" in Objective C (and C with LLVM) it has become somewhat possible to build your own generator pattern in Objective C.
If you are serious to learn this, you can go through this article by Mike Ash.
However, please be cautious when working with blocks, as it occasionally incurs extra overhead of memory management or else it might result a potential memory leak in your app. So if you want to use the described pattern, please make sure you have the proper understanding of "blocks". Otherwise, I would always advise you to refactor your code to build a basic method that would work as your stub as described by Wain. Certainly that wont allow the lazy initialization that the pattern implies. However, it's a better than nothing solution in Objective C.

Python, why elif keyword? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I just started Python programming, and I'm wondering about the elif keyword.
Other programming languages I've used before use else if. Does anyone have an idea why the Python developers added the additional elif keyword?
Why not:
if a:
print("a")
else if b:
print("b")
else:
print("c")
Far as I know, it's there to avoid excessive indentation. You could write
if x < 0:
print 'Negative'
else:
if x == 0:
print 'Zero'
else:
print 'Positive'
but
if x < 0:
print 'Negative'
elif x == 0:
print 'Zero'
else:
print 'Positive'
is just so much nicer.
Thanks to ign for the docs reference:
The keyword elif is short for 'else if', and is useful to avoid excessive indentation.
Languages with C-like syntax get else if for free without having to implement it at all.
The reason is that in that syntax control structures simply operate on the next statement, which can be a compound statement enclosed in braces if necessary (e.g. { x += 1; y += 1 }).
This means that once you've implemented if and else, else if just falls out of the grammar of the language naturally for free, with no further implementation effort. To see why, have a look at this:
if (condition) {
if_body();
} else if (another_condition) {
else_if_body();
} else {
else_body();
}
This looks like an if with an else if and an else attached, each applied to a compound statement. But in fact it's not. This is actually two separate if statements, each with exactly one else case; the second if statement is inside the body of the else of the first if statement.
else if { ... } is really parsed as else applied to the next statement, which is an if statement (applied to the compound statement { else_if_body(); }. Then the final else binds to the immediately preceding if, which is the second one.
Here's the same thing written more in line with how it's parsed1:
if (condition) {
if_body();
} else {
if (another_condition) {
else_if_body();
} else {
else_body();
}
}
But it turns out that if the language did directly implement else if as a first-class option for if statements, it would behave exactly the same as the second independent if statement inside the else of the first! So there's no need to bother implementing else if at all; language implementers get else if for free with this style of syntax, once they've implemented if and else.
Python's syntax doesn't allow this freebie.
Programmers of C-style syntax can think in terms of else if even though the language only has if with exactly zero-or-one else, but only because they can write code like my first example that is formatted in a way that looks different to a human reader than it does to the compiler.
Python, OTOH, uses indentation to indicate block structure, which forces the block structure to look the same to a human reader as it does to the interpreter2. Once you've got if and else in Python-style syntax, programmers could still write code that behaves identically to an else-if, by putting a second if statement inside the else of a first. But that comes out looking like this:
if condition:
if_body()
else:
if another_condition:
else_if_body()
else:
else_body()
This looks ugly, and is much more complex to think in terms of than an else-if chain once you get more than 1 or 2 else-ifs. So it's worth adding in an explicit language feature to get back the ability to think in terms of else-if. Even though it technically makes the language more complex, it actually makes thinking in terms of the language simpler, so it's good complexity; with a manually constructed chain of nested ifs inside elses the reader has to manually read all the code and verify that every else except the last contains exactly one if statement and nothing else, in order to conclude that the whole sequence is equivalent to a linear chain of conditions checked in order, with some code to execute for the first check that succeeds.
So then. We've seen that languages with C-like syntax might as well go with else if, because they get it for free. That's the reason why that exists. Languages with Python-like syntax have to explicitly do something to get a construct that can be used as an else-if. Why did they choose elif? It's arbitrary; you'd have to actually ask the people who made the decision.
However Python didn't invent elif, it was around in other languages long before Python existed. So I would guess that when they had to implement an explicit else-if construct they simply picked one that programmers were already familiar with.
1 Technically, this is how people who are REALLY serious about always using braces with control structures should write their code. ;)
2 You can certainly construct counter-examples to this, but it's the general idea of indentation-based syntax.
To avoid brace^H^H^H^H^Helse if war.
In C/C++ where you have an else if, you can structure your code in many different styles:
if (...) {
...
} else if (...) {
...
}
if (...) {
...
}
else if (...) {
...
}
if (...) {
...
} else
if (...) {
...
}
// and so on
by having an elif instead, such war would never happen since there is only one way to write an elif. Also, elif is much shorter than else if.
That's just the way it is. Javascript uses else if, php uses elseif, perl uses elsif, the C preprocessor and python use elif. None of them are wrong, they just choose slightly different syntax to do the same thing. :D
I find them helpful to help differentiate the "else-if"s from the "final else".
elif is some sort of replacement for switch in other languages but with more power
for example in C you write
switch (number){
case 1:
doA()
break;
case 2:
doB()
break;
case N:
doN()
break;
default:
doSomethingElse()
break;
}
in Python you write
if number == 1: doA()
elif number == 2: doB()
elif number == N: doC()
else: doSomethingElse()
As you see elif is more powerful since you can put more complex statements than in a switch, plus avoid nesting if/else statements
Most likely it's syntactic sugar. Like the Wend of Visual Basic.
Python inherits this from Perl, where it's called elsif.
In Python's case, else if as two separate constructs like it is in C-like languages would be quite ugly as you'd have to have else: if: with two indenting levels.
It's arguable whether special-casing the two keywords together would be better (so making else if a single construct, like the not in operator.
PL/SQL also has elsif, and the C preprocessor has it spelled elif.

Categories