How does the python #cache decorator work? - python

I recently learned about the cache decorator in Python and was surprised how well it worked and how easily it could be applied to any function. Like many others before me I tried to replicate this behavior in C++ without success ( tried to recursively calculate the Fib sequence ). The problem was that the internal calls didn't get cached. This is not a problem if I modify the original function , but I want it to be a decorator so that it can be applied anywhere. I am trying to decipher the Python #cache decorator from the source code but couldn't make out a lot, to figure out how I can ( IF it is even possible ) to replicate this behavior elsewhere.
Is there a way to cache the internal calls also ?
This is a simple way to add memoisation to the fib function. What i want is to build a decorator so that i can wrap any function. Just like the one in Python.
class CacheFib {
public:
CacheFib() {}
unsigned long long fib(int n) {
auto hit = cache_pool.find(n);
if (hit != cache_pool.end()) {
return hit->second;
} else if (n <= 1) {
return n;
} else {
auto miss = this->fib(n - 1) + this->fib(n - 2);
cache_pool.insert({n, miss});
return miss;
}
}
std::map<int, int> cache_pool;
};
This approach caches the actual call, meaning that if i call cachedFib(40) , twice the second time it will be O(1).It doesn't actually cache the internal calls to help with performance.
// A PROTOTYPE IMPLEMENTATION
template <typename Func> class CacheDecorator {
public:
CacheDecorator(Func fun) : function(fun) {}
int operator()(int n) {
auto hit = cache_pool.find(n);
if (hit != cache_pool.end()) {
return hit->second;
} else {
auto miss = function(n);
cache_pool.insert({n, miss});
return miss;
}
}
std::function<Func> function;
std::map<int, int> cache_pool;
};
int fib(int n) {
if (n == 0 || n == 1) {
return n;
} else
return fib(n - 1) + fib(n - 2);
}
//main
auto cachedFib = CacheDecorator<decltype(fib)>(fib);
cachedFib(**);
Also any information on the #cache decorator or any C++ implementation ideas would be helpful.

So, as you're finding out, Python and C++ are different languages.
The key difference in this context is that in Python, the function name fib is looked up at run-time, even for the recursive call; meanwhile, in C++, the function name is looked up at compile-time, so by the time your CacheDecorator gets to it, it's too late.
A few possibilities:
Move the lookup of fib to run-time; you can do this either by using an explicit function pointer, or by making it a dynamic method. Either of those would mean coding the fib function differently.
Some sort of terrible, platform-dependent hack to overwrite either the function address table or the beginning of the function itself. This is going to be deep magic, particularly in the face of optimisations; the compiler might write out multiple copies of the function, or it might turn a recursive call into a loop, for example.
Move the implementation of the CacheDecorator to compile-time, as a pre-processor macro. That's probably the best way to preserve the intent of a python decorator.
Ideally, write Python in Python and C++ in C++; the languages each have their own idioms, which don't generally translate to each other in a one-to-one fashion.
Trying to write Python-style code in C++ will always result in code that's somewhat alien, even in cases where it is possible. Much better to become fluent in the idioms of the language you're using.

I think that internal calls are not being cached because once the CacheDecorator is invoked, the CacheDecorator::operator() is only invoked once. After that, the CacheDecorator::function is recursively invoked. This is problematic because you want to check the cache at every recursive call of CacheDecorator::function; however, this does not occur because your cache checking code is in CacheDecorator::operator(). Consequently, the first and only time CacheDecorator::operator() is invoked is when you invoke it in main, which is also the first and only time the cache is checked.
I also think that you would only encounter your issue if the passed in function uses recursion to compute its return value.
I may have made a mistake, but that's what I think the issue is.
I think that one way to fix this would be to accumulate and return a vector/map of computed values. Once your CacheDecorator::function is complete, you can then cache the returned vector/map. This would require you to modify your fib function, so this may not be a good solution. This modification also does not perfectly replicate Python's #cache decorator functionality since the programmer is essentially expected to store pre-computed values.

Related

Python-like Coding in C for Pointers

I am transitioning from Python to C, so my question might appear naive. I am reading tutorial on Python-C bindings and it is mentioned that:
In C, all parameters are pass-by-value. If you want to allow a function to change a variable in the caller, then you need to pass a pointer to that variable.
Question: Why cant we simply re-assign the values inside the function and be free from pointers?
The following code uses pointers:
#include <stdio.h>
int i = 24;
int increment(int *j){
(*j)++;
return *j;
}
void main() {
increment(&i);
printf("i = %d", i);
}
Now this can be replaced with the following code that doesn't use pointers:
int i = 24;
int increment(int j){
j++;
return j;
}
void main() {
i=increment(i);
printf("i = %d", i);
}
You can only return one thing from a function. If you need to update multiple parameters, or you need to use the return value for something other than the updated variable (such as an error code), you need to pass a pointer to the variable.
Getting this out of the way first - pointers are fundamental to C programming. You cannot be “free” of pointers when writing C. You might as well try to never use if statements, arrays, or any of the arithmetic operators. You cannot use a substantial chunk of the standard library without using pointers.
“Pass by value” means, among other things, that the formal parameter j in increment and the actual parameter i in main are separate objects in memory, and changing one has absolutely no effect on the other. The value of i is copied to j when the function is called, but any changes to i are not reflected in j and vice-versa.
We work around this in C by using pointers. Instead of passing the value of i to increment, we pass its address (by value), and then dereference that address with the unary * operator.
This is one of the cases where we have to use pointers. The other case is when we track dynamically-allocated memory. Pointers are also useful (if not strictly required) for building containers (lists, trees, queues, stacks, etc.).
Passing a value as a parameter and returning its updated value works, but only for a single parameter. Passing multiple parameters and returning their updated values in a struct type can work, but is not good style if you’re doing it just to avoid using pointers. It’s also not possible if the function must update parameters and return some kind of status (such as the scanf library function, for example).
Similarly, using file-scope variables does not scale and creates maintenance headaches. There are times when it’s not the wrong answer, but in general it’s not a good idea.
So, imagine you need to pass large arrays or other data structures that need modification. If you apply the way you use to increment an integer, then you create a copy of that large array for each call to that function. Obviously, it is not memory-friendly to create a copy, instead, we pass pointers to functions and do the updates on a single array or whatever it is.
Plus, as the other answer mentioned, if you need to update many parameters then it is impossible to return in the way you declared.

How are yield and return different from one another?

I encountered this term frequently that this function yields somethings or this function return something. I'm trying to understand this and read few articles in python. Then I encountered same statement in c++ which says:
some expressions yield objects but return them as rvalues, not lvalues.
Can anyone help in understanding these two terms in language independent way or in detailed manner so I can grasp it easily.
Edit - if they are different in both languages please explain in both or any one which you know.
In Python, yield is used for generation. For example:
def func():
i =0
while True:
i += 1
yield i
If I remember Python correctly, this should allow this function to basically pause execution and get called over and over again. This can generate some sequence like {0,1,2,3...}.
On the other hand, return just returns a single value and ends execution:
def func():
i =0
while True:
i += 1
return i
This should always return 0, since the function ends execution completely so i goes out of scope every time.
On the other hand, C++ has no direct real equivalent to yield as far as I'm aware (except for apparently in the new C++20, which is adding an equivalent), where as it does have an equivalent (in all versions) to return here. It is, of course, called return.
That said, C++ can achieve something similar to our yield example using static variables:
int func() {
static i = 0;
return i++;
}
However, that is not to say that static variables are replacements for yield in C++. It's just that you can sort of achieve the same thing in C++ with static variables in this (and possibly other) example(s).
So, in short, return ends execution of a function in both languages whereas yield allows a function to sort of resume execution. There is no real equivalent for Python's yield in C++ until at least C++20.
Have you ever tried to iterate over an entire database of objects? That's what I tried my first time, and it quickly consumed all 16GB of my memory and ground my system to a halt. This is why generators exist - to load data in as needed instead of all at once (and probably a few other uses as well). Try reading this post, it has a few examples and will go into more detail.

Finding the input dependencies of a functions outputs

I've been working on a python program with pycparser which is supposed to generate a JSON-file with dependencies of a given function and its outputs.
For an example function:
int Test(int testInput)
{
int b = testInput;
return b;
}
Here I would expect b to be dependent on testInput. But ofcourse it can get a lot more complicated with structs and if-statements etc. The files I'm testing also have functions in a specific form that are considered inputs and outputs as in:
int Test(int testInput)
{
int anotherInput = DatabaseRead(VariableInDatabase);
int b = testInput;
int c;
c = anotherInput + 1;
DatabaseWrite(c);
return b;
}
Here c would be dependent on VariableInDatabase, and b same as before.
I've run into a wall with this analysis in pycparser as mostly structs and pointers are really hard for me to handle, and it seems like there'd be a better way. I've read into ASTs and CFGs, and other analysis tools like Frama-C but I can't seem to find a clear answer if this is even a thing.
Is there a known way to do this kind of analysis, and if so, what should I be looking into?
It's supposed to thousands of files and be able to output these dependencies into a JSON, so plugins for editors doesn't seem like what I'm looking for.
You need data flow analysis of your code, and then you want to follow the data flow backwards from a result to its sources, up to some stopping point (in your case, you stopped at a function parameter but you probably also want to stop at any global variable).
This is called program slicing in the literature.
Computing data flows is pretty hard, especially if you have a complex language (C is fun: you can have data flows through indirectly called functions that read values; now you need indirect points-to analysis to support your data flow, and vice versa).
Here's fun example:
// ocean of functions:
...
int a(){ return b; }
...
int p(){ return q; }
...
void foo( int()* x )
{ return (*x)(); }
Does foo depend on b? on q? You can't know unless you know that
foo calls a or b. But foo is handed a function pointer... and
what might that point to?
Using just ASTs and CFGs is necessary but not sufficient; data flow analysis algorithms are hard, especially if you have scale (as you suggest you do); you need a lot of machinery to do this that is not easy to build
[We've done this on C programs of 16 million lines]. See my essay on Life After Parsing.

Does something like Python generator exist in objective-c?

Does something like Python generator exist in objective-c ?
I have following code on few places, so is there some way to simplify it ?
int maxWinInRow = [self maxWinInRow];
// how many wins in row
for (int i=1; i <= maxWinInRow; i++) {
NSString * key = [NSString stringWithFormat:#"%d",i];
NSNumber * value = [_winsInRow valueForKey:key ];
int numbeOfWinInRow = value.intValue;
// only this line is specific
gameScore = gameScore + ( pow(6,i) * numbeOfWinInRow);
}
To be specific, there is no such generator pattern built inside Objective C programming language. However with the introduction of "blocks" in Objective C (and C with LLVM) it has become somewhat possible to build your own generator pattern in Objective C.
If you are serious to learn this, you can go through this article by Mike Ash.
However, please be cautious when working with blocks, as it occasionally incurs extra overhead of memory management or else it might result a potential memory leak in your app. So if you want to use the described pattern, please make sure you have the proper understanding of "blocks". Otherwise, I would always advise you to refactor your code to build a basic method that would work as your stub as described by Wain. Certainly that wont allow the lazy initialization that the pattern implies. However, it's a better than nothing solution in Objective C.

What is the preferred way to implement 'yield' in Scala?

I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose 'yield' statement is extremely useful for implementing complex iterators over large, often irregularly structured text files. Similar constructs exist in other languages (e.g. C#), for good reason.
Yes I know there have been previous threads on this. But they look like hacked-up (or at least badly explained) solutions that don't clearly work well and often have unclear limitations. I would like to write code something like this:
import generator._
def yield_values(file:String) = {
generate {
for (x <- Source.fromFile(file).getLines()) {
# Scala is already using the 'yield' keyword.
give("something")
for (field <- ":".r.split(x)) {
if (field contains "/") {
for (subfield <- "/".r.split(field)) { give(subfield) }
} else {
// Scala has no 'continue'. IMO that should be considered
// a bug in Scala.
// Preferred: if (field.startsWith("#")) continue
// Actual: Need to indent all following code
if (!field.startsWith("#")) {
val some_calculation = { ... do some more stuff here ... }
if (some_calculation && field.startsWith("r")) {
give("r")
give(field.slice(1))
} else {
// Typically there will be a good deal more code here to handle different cases
give(field)
}
}
}
}
}
}
}
I'd like to see the code that implements generate() and give(). BTW give() should be named yield() but Scala has taken that keyword already.
I gather that, for reasons I don't understand, Scala continuations may not work inside a for statement. If so, generate() should supply an equivalent function that works as close as possible to a for statement, because iterator code with yield almost inevitably sits inside a for loop.
Please, I would prefer not to get any of the following answers:
'yield' sucks, continuations are better. (Yes, in general you can do more with continuations. But they are hella hard to understand, and 99% of the time an iterator is all you want or need. If Scala provides lots of powerful tools but they're too hard to use in practice, the language won't succeed.)
This is a duplicate. (Please see my comments above.)
You should rewrite your code using streams, continuations, recursion, etc. etc. (Please see #1. I will also add, technically you don't need for loops either. For that matter, technically you can do absolutely everything you ever need using SKI combinators.)
Your function is too long. Break it up into smaller pieces and you won't need 'yield'. You'd have to do this in production code, anyway. (First, "you won't need 'yield'" is doubtful in any case. Second, this isn't production code. Third, for text processing like this, very often, breaking the function into smaller pieces -- especially when the language forces you to do this because it lacks the useful constructs -- only makes the code harder to understand.)
Rewrite your code with a function passed in. (Technically, yes you can do this. But the result is no longer an iterator, and chaining iterators is much nicer than chaining functions. In general, a language should not force me to write in an unnatural style -- certainly, the Scala creators believe this in general, since they provide shitloads of syntactic sugar.)
Rewrite your code in this, that, or the other way, or some other cool, awesome way I just thought of.
The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?
Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.
Here's how you do it.
// You want to write
for (x <- xs) { /* complex yield in here */ }
// Instead you write
xs.iterator.flatMap { /* Produce iterators in here */ }
// You want to write
yield(a)
yield(b)
// Instead you write
Iterator(a,b)
// You want to write
yield(a)
/* complex set of yields in here */
// Instead you write
Iterator(a) ++ /* produce complex iterator here */
That's it! All your cases can be reduced to one of these three.
In your case, your example would look something like
Source.fromFile(file).getLines().flatMap(x =>
Iterator("something") ++
":".r.split(x).iterator.flatMap(field =>
if (field contains "/") "/".r.split(field).iterator
else {
if (!field.startsWith("#")) {
/* vals, whatever */
if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))
else Iterator(field)
}
else Iterator.empty
}
)
)
P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):
import scala.util.control.Breaks._
for (blah) { breakable { ... break ... } }
but that won't get you what you want because Scala doesn't have the yield you want.
'yield' sucks, continuations are better
Actually, Python's yield is a continuation.
What is a continuation? A continuation is saving the present point of execution with all its state, such that one can continue at that point later. That's precisely what Python's yield, and, also, precisely how it is implemented.
It is my understanding that Python's continuations are not delimited, however. I don't know much about that -- I might be wrong, in fact. Nor do I know what the implications of that may be.
Scala's continuation do not work at run-time -- in fact, there's a continuations library for Java that work by doing stuff to bytecode at run-time, which is free of the constrains that Scala's continuation have.
Scala's continuation are entirely done at compile time, which require quite a bit of work. It also requires that the code that will be "continued" be prepared by the compiler to do so.
And that's why for-comprehensions do not work. A statement like this:
for { x <- xs } proc(x)
If translated into
xs.foreach(x => proc(x))
Where foreach is a method on xs's class. Unfortunately, xs class has been long compiled, so it cannot be modified into supporting the continuation. As a side note, that's also why Scala doesn't have continue.
Aside from that, yes, this is a duplicate question, and, yes, you should find a different way to write your code.
The implementation below provides a Python-like generator.
Notice that there's a function called _yield in the code below, because yield is already a keyword in Scala, which by the way, does not have anything to do with yield you know from Python.
import scala.annotation.tailrec
import scala.collection.immutable.Stream
import scala.util.continuations._
object Generators {
sealed trait Trampoline[+T]
case object Done extends Trampoline[Nothing]
case class Continue[T](result: T, next: Unit => Trampoline[T]) extends Trampoline[T]
class Generator[T](var cont: Unit => Trampoline[T]) extends Iterator[T] {
def next: T = {
cont() match {
case Continue(r, nextCont) => cont = nextCont; r
case _ => sys.error("Generator exhausted")
}
}
def hasNext = cont() != Done
}
type Gen[T] = cps[Trampoline[T]]
def generator[T](body: => Unit #Gen[T]): Generator[T] = {
new Generator((Unit) => reset { body; Done })
}
def _yield[T](t: T): Unit #Gen[T] =
shift { (cont: Unit => Trampoline[T]) => Continue(t, cont) }
}
object TestCase {
import Generators._
def sectors = generator {
def tailrec(seq: Seq[String]): Unit #Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
}
def main(args: Array[String]): Unit = {
for (s <- sectors) { println(s) }
}
}
It works pretty well, including for the typical usage of for loops.
Caveat: we need to remember that Python and Scala differ in the way continuations are implemented. Below we see how generators are typically used in Python and compare to the way we have to use them in Scala. Then, we will see why it needs to be like so in Scala.
If you are used to writing code in Python, you've probably used generators like this:
// This is Scala code that does not compile :(
// This code naively tries to mimic the way generators are used in Python
def myGenerator = generator {
val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
list foreach {s => _yield(s)}
}
This code above does not compile. Skipping all convoluted theoretical aspects, the explanation is: it fails to compile because "the type of the for loop" does not match the type involved as part of the continuation. I'm afraid this explanation is a complete failure. Let me try again:
If you had coded something like shown below, it would compile fine:
def myGenerator = generator {
_yield("Financials")
_yield("Materials")
_yield("Technology")
_yield("Utilities")
}
This code compiles because the generator can be decomposed in a sequence of yields and, in this case, a yield matches the type involved in the continuation. To be more precise, the code can be decomposed onto chained blocks, where each block ends with a yield. Just for the sake of clarification, we can think that the sequence of yields could be expressed like this:
{ some code here; _yield("Financials")
{ some other code here; _yield("Materials")
{ eventually even some more code here; _yield("Technology")
{ ok, fine, youve got the idea, right?; _yield("Utilities") }}}}
Again, without going deep into convoluted theory, the point is that, after a yield you need to provide another block that ends with a yield, or close the chain otherwise. This is what we are doing in the pseudo-code above: after the yield we are opening another block which in turn ends with a yield followed by another yield which in turn ends with another yield, and so on. Obviously this thing must end at some point. Then the only thing we are allowed to do is closing the entire chain.
OK. But... how we can yield multiple pieces of information? The answer is a little obscure but makes a lot of sense after you know the answer: we need to employ tail recursion, and the the last statement of a block must be a yield.
def myGenerator = generator {
def tailrec(seq: Seq[String]): Unit #Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
val list = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
}
Let's analyze what's going on here:
Our generator function myGenerator contains some logic that obtains that generates information. In this example, we simply use a sequence of strings.
Our generator function myGenerator calls a recursive function which is responsible for yield-ing multiple pieces of information, obtained from our sequence of strings.
The recursive function must be declared before use, otherwise the compiler crashes.
The recursive function tailrec provides the tail recursion we need.
The rule of thumb here is simple: substitute a for loop with a recursive function, as demonstrated above.
Notice that tailrec is just a convenient name we found, for the sake of clarification. In particular, tailrec does not need to be the last statement of our generator function; not necessarily. The only restriction is that you have to provide a sequence of blocks which match the type of an yield, like shown below:
def myGenerator = generator {
def tailrec(seq: Seq[String]): Unit #Gen[String] = {
if (!seq.isEmpty) {
_yield(seq.head)
tailrec(seq.tail)
}
}
_yield("Before the first call")
_yield("OK... not yet...")
_yield("Ready... steady... go")
val list = List("Financials", "Materials", "Technology", "Utilities")
tailrec(list)
_yield("done")
_yield("long life and prosperity")
}
One step further, you must be imagining how real life applications look like, in particular if you are employing several generators. It would be a good idea if you find a way to standardize your generators around a single pattern that demonstrates to be convenient for most circumstances.
Let's examine the example below. We have three generators: sectors, industries and companies. For brevity, only sectors is completely shown. This generator employs a tailrec function as demonstrated already above. The trick here is that the same tailrec function is also employed by other generators. All we have to do is supply a different body function.
type GenP = (NodeSeq, NodeSeq, NodeSeq)
type GenR = immutable.Map[String, String]
def tailrec(p: GenP)(body: GenP => GenR): Unit #Gen[GenR] = {
val (stats, rows, header) = p
if (!stats.isEmpty && !rows.isEmpty) {
val heads: GenP = (stats.head, rows.head, header)
val tails: GenP = (stats.tail, rows.tail, header)
_yield(body(heads))
// tail recursion
tailrec(tails)(body)
}
}
def sectors = generator[GenR] {
def body(p: GenP): GenR = {
// unpack arguments
val stat, row, header = p
// obtain name and url
val name = (row \ "a").text
val url = (row \ "a" \ "#href").text
// create map and populate fields: name and url
var m = new scala.collection.mutable.HashMap[String, String]
m.put("name", name)
m.put("url", url)
// populate other fields
(header, stat).zipped.foreach { (k, v) => m.put(k.text, v.text) }
// returns a map
m
}
val root : scala.xml.NodeSeq = cache.loadHTML5(urlSectors) // obtain entire page
val header: scala.xml.NodeSeq = ... // code is omitted
val stats : scala.xml.NodeSeq = ... // code is omitted
val rows : scala.xml.NodeSeq = ... // code is omitted
// tail recursion
tailrec((stats, rows, header))(body)
}
def industries(sector: String) = generator[GenR] {
def body(p: GenP): GenR = {
//++ similar to 'body' demonstrated in "sectors"
// returns a map
m
}
//++ obtain NodeSeq variables, like demonstrated in "sectors"
// tail recursion
tailrec((stats, rows, header))(body)
}
def companies(sector: String) = generator[GenR] {
def body(p: GenP): GenR = {
//++ similar to 'body' demonstrated in "sectors"
// returns a map
m
}
//++ obtain NodeSeq variables, like demonstrated in "sectors"
// tail recursion
tailrec((stats, rows, header))(body)
}
Credits to Rich Dougherty and huynhjl.
See this SO thread: Implementing yield (yield return) using Scala continuations*
Credits to Miles Sabin, for putting some of the code above together
http://github.com/milessabin/scala-cont-jvm-coro-talk/blob/master/src/continuations/Generators.scala

Categories