Finding the input dependencies of a functions outputs - python

I've been working on a python program with pycparser which is supposed to generate a JSON-file with dependencies of a given function and its outputs.
For an example function:
int Test(int testInput)
{
int b = testInput;
return b;
}
Here I would expect b to be dependent on testInput. But ofcourse it can get a lot more complicated with structs and if-statements etc. The files I'm testing also have functions in a specific form that are considered inputs and outputs as in:
int Test(int testInput)
{
int anotherInput = DatabaseRead(VariableInDatabase);
int b = testInput;
int c;
c = anotherInput + 1;
DatabaseWrite(c);
return b;
}
Here c would be dependent on VariableInDatabase, and b same as before.
I've run into a wall with this analysis in pycparser as mostly structs and pointers are really hard for me to handle, and it seems like there'd be a better way. I've read into ASTs and CFGs, and other analysis tools like Frama-C but I can't seem to find a clear answer if this is even a thing.
Is there a known way to do this kind of analysis, and if so, what should I be looking into?
It's supposed to thousands of files and be able to output these dependencies into a JSON, so plugins for editors doesn't seem like what I'm looking for.

You need data flow analysis of your code, and then you want to follow the data flow backwards from a result to its sources, up to some stopping point (in your case, you stopped at a function parameter but you probably also want to stop at any global variable).
This is called program slicing in the literature.
Computing data flows is pretty hard, especially if you have a complex language (C is fun: you can have data flows through indirectly called functions that read values; now you need indirect points-to analysis to support your data flow, and vice versa).
Here's fun example:
// ocean of functions:
...
int a(){ return b; }
...
int p(){ return q; }
...
void foo( int()* x )
{ return (*x)(); }
Does foo depend on b? on q? You can't know unless you know that
foo calls a or b. But foo is handed a function pointer... and
what might that point to?
Using just ASTs and CFGs is necessary but not sufficient; data flow analysis algorithms are hard, especially if you have scale (as you suggest you do); you need a lot of machinery to do this that is not easy to build
[We've done this on C programs of 16 million lines]. See my essay on Life After Parsing.

Related

How does the python #cache decorator work?

I recently learned about the cache decorator in Python and was surprised how well it worked and how easily it could be applied to any function. Like many others before me I tried to replicate this behavior in C++ without success ( tried to recursively calculate the Fib sequence ). The problem was that the internal calls didn't get cached. This is not a problem if I modify the original function , but I want it to be a decorator so that it can be applied anywhere. I am trying to decipher the Python #cache decorator from the source code but couldn't make out a lot, to figure out how I can ( IF it is even possible ) to replicate this behavior elsewhere.
Is there a way to cache the internal calls also ?
This is a simple way to add memoisation to the fib function. What i want is to build a decorator so that i can wrap any function. Just like the one in Python.
class CacheFib {
public:
CacheFib() {}
unsigned long long fib(int n) {
auto hit = cache_pool.find(n);
if (hit != cache_pool.end()) {
return hit->second;
} else if (n <= 1) {
return n;
} else {
auto miss = this->fib(n - 1) + this->fib(n - 2);
cache_pool.insert({n, miss});
return miss;
}
}
std::map<int, int> cache_pool;
};
This approach caches the actual call, meaning that if i call cachedFib(40) , twice the second time it will be O(1).It doesn't actually cache the internal calls to help with performance.
// A PROTOTYPE IMPLEMENTATION
template <typename Func> class CacheDecorator {
public:
CacheDecorator(Func fun) : function(fun) {}
int operator()(int n) {
auto hit = cache_pool.find(n);
if (hit != cache_pool.end()) {
return hit->second;
} else {
auto miss = function(n);
cache_pool.insert({n, miss});
return miss;
}
}
std::function<Func> function;
std::map<int, int> cache_pool;
};
int fib(int n) {
if (n == 0 || n == 1) {
return n;
} else
return fib(n - 1) + fib(n - 2);
}
//main
auto cachedFib = CacheDecorator<decltype(fib)>(fib);
cachedFib(**);
Also any information on the #cache decorator or any C++ implementation ideas would be helpful.
So, as you're finding out, Python and C++ are different languages.
The key difference in this context is that in Python, the function name fib is looked up at run-time, even for the recursive call; meanwhile, in C++, the function name is looked up at compile-time, so by the time your CacheDecorator gets to it, it's too late.
A few possibilities:
Move the lookup of fib to run-time; you can do this either by using an explicit function pointer, or by making it a dynamic method. Either of those would mean coding the fib function differently.
Some sort of terrible, platform-dependent hack to overwrite either the function address table or the beginning of the function itself. This is going to be deep magic, particularly in the face of optimisations; the compiler might write out multiple copies of the function, or it might turn a recursive call into a loop, for example.
Move the implementation of the CacheDecorator to compile-time, as a pre-processor macro. That's probably the best way to preserve the intent of a python decorator.
Ideally, write Python in Python and C++ in C++; the languages each have their own idioms, which don't generally translate to each other in a one-to-one fashion.
Trying to write Python-style code in C++ will always result in code that's somewhat alien, even in cases where it is possible. Much better to become fluent in the idioms of the language you're using.
I think that internal calls are not being cached because once the CacheDecorator is invoked, the CacheDecorator::operator() is only invoked once. After that, the CacheDecorator::function is recursively invoked. This is problematic because you want to check the cache at every recursive call of CacheDecorator::function; however, this does not occur because your cache checking code is in CacheDecorator::operator(). Consequently, the first and only time CacheDecorator::operator() is invoked is when you invoke it in main, which is also the first and only time the cache is checked.
I also think that you would only encounter your issue if the passed in function uses recursion to compute its return value.
I may have made a mistake, but that's what I think the issue is.
I think that one way to fix this would be to accumulate and return a vector/map of computed values. Once your CacheDecorator::function is complete, you can then cache the returned vector/map. This would require you to modify your fib function, so this may not be a good solution. This modification also does not perfectly replicate Python's #cache decorator functionality since the programmer is essentially expected to store pre-computed values.

Writing in MIPS a function that converts a string representing an integer (in a given base), to the corresponding number in base 10

Suppose we have a string representing an integer in a given base, and that very base.
The parameters are the address (in $a0) starting from which the string is stored, and the base in $a1.
I should convert the corresponding number into base 10, and save it in $v0. In this case 0 should be loaded into $v1. If instead the string does not correctly represent an integer in the given base, then in the end $v0 should contain -1 and $v1 should contain 1.
Also, the function that actually performs the conversion should be recursive.
I have written beforehand a Python program in such a way (you'll notice the various s0, s1 etc.) that I could transfer the thinking to MIPS, but it got really confusing when I realised I probably should perform contextually the counting of the characters of the string, the investigation over the string being "in base" or not, and the actual conversion with the gradual summing of quantities to some designated variable - and not separately as in the below program.
How should I go about this and write the function(s)?
Here's the Python code:
digitsDict = dict();
lowerCase = "abcdefghijklmnopqrstuvwxyz"
upperCase = lowerCase.upper();
for i in range(26):
digitsDict.update({lowerCase[i]:i+10})
digitsDict.update({upperCase[i]:i+10})
def isInBase(string, base):
s1 = False;
for char in string:
if (str(char).isdigit()):
if (int(char) >= base):
return s1
else:
if (char not in digitsDict.keys() or digitsDict[char] >= base):
return s1
s1 = True
return s1
def convert(string, base, k=0):
s2 = 0
char = string[k]
l = len(string) - 1
if str(char).isdigit(): s2 += int(char)*(base**(l-k))
else: s2 += digitsDict[char]*(base**(l-k))
if k == l: return s2;
return s2 + convert(string, base, k+1)
def strToInt(string, base):
return (convert(string, base, 0), 0)
def main(a0String, a1Integer):
if isInBase(a0String, a1Integer):
v0, v1 = strToInt(a0String, a1Integer)
else: v0 = -1; v1 = 1;
print(v0)
if (v0 == -1): return 22 #MIPS error code for invalid argument
return 0 #OK
First, you have pseudo code, so very good!
Next, as a general rule, make sure your pseudo code actually works (so test it), as debugging design issues in assembly is very difficult, and small changes to the design (in pseudo code) can require large changes in the assembly (e.g. rewriting a lot of code).
You're doing some things in Python that are rather involved in assembly language, so you ought to write your pseudo code in C first, since that will make you address those differences. 
in — represents a loop in C or assembly
+= and + on strings — represents either some simple operations on global buffer, or, some complicated memory management (hint: go for the former)
.isDigit() is conditional conjunction or disjunction depending on how you code it
.update() will need some alternative translation
** also represents a loop in C or assembly
The next big complication is the calling of functions.  Function calling in MIPS assembly language is probably the most complex topic, due to the requirements register usage and stack handling.
Recursion is a bit of a red herring for assembly language: it appears hard but it is actually no harder and even no different in assembly language than functions calling other functions without involving recursion (this is true assuming the assembly instruction set has a runtime stack for function calling, MIPS does (but if it didn't you'd have to simulate a call stack)).
You'll need to study up on this, it is somewhat involved.  Look for other example, fibonacci, for example.
When translating to MIPS one function that calls another function, we need an analysis of variables and values that are "live across a function call".  These variables need special consideration, since some registers are wiped out by calling another function.  You'll see this in fibonacci, for example.
Part of recursive fib, is fib(n-1)+fib(n-2).  The return value from the first call to fib must be given special handling since it ultimately needed for the addition (+), so it is live across the 2nd call to fib, and without special handling, will be lost by making that 2nd call.  Also, n is live across the first call yet needed again to make the second call, and without some special handling would similarly be wiped out by that first call.
This is a consequence of functions calling functions in MIPS whose instruction set does not automatically stack things and requires the assembly language programmer or compiler to do so manually.  Some or all of this is also needed on other architectures.  This is not a consequence of recursion, so a(n)+b(n) would involve the same requirements for analysis and special handling of variables and values.
When you have a good algorithm (e.g. in C) and it works, your ready for assembly.  Translate your data first: global variables.  Next translate functions, similarly: translate your local variables into MIPS, then translate each structured statement, (e.g. if, for) following its assembly language pattern.  Nested structured statements also nest in assembly language, and it doesn't matter if you translate the inner statements first or outer statements first, as long as you exactly observe the nesting.
In order to translate structured statements into assembly I first use intermediate forms in C, for example:
for ( int i = 0; i < n; i++ ) {
..body of for..
}
Note that in the above the "body of the for" could easily contain control structures, like another for or an if.  These then, are nested structure statements — just translate them one at a time, each structured statement following its complete pattern, and make sure the nesting in C is equally observed in assembly.
Translate for to while :
...
int i = 0;
while ( i <n ) {
..body of for..
i++;
}
...
Next, apply "if-goto-label" intermediate form :
...
int i = 0;
loop1:
if ( i >= n ) goto endLoop1;
..body of for..
i++;
goto loop1;
endLoop1:
...
As you apply patterns, you'll need to create label names; new label names for each application of the pattern.
Then this translates into assembly fairly easily.  If the "body of for" also has control structures make sure their entire translation is embedded and located at the "body of for" section in the above pattern.
Notice how the i < n in assembly sometimes translates into i >= n — here because control flow in if-goto-label form make is say: exit the loop on the logically opposite condition of the while statement in C.
And finally, translate expressions within statements in your C code into assembly.

Python-like Coding in C for Pointers

I am transitioning from Python to C, so my question might appear naive. I am reading tutorial on Python-C bindings and it is mentioned that:
In C, all parameters are pass-by-value. If you want to allow a function to change a variable in the caller, then you need to pass a pointer to that variable.
Question: Why cant we simply re-assign the values inside the function and be free from pointers?
The following code uses pointers:
#include <stdio.h>
int i = 24;
int increment(int *j){
(*j)++;
return *j;
}
void main() {
increment(&i);
printf("i = %d", i);
}
Now this can be replaced with the following code that doesn't use pointers:
int i = 24;
int increment(int j){
j++;
return j;
}
void main() {
i=increment(i);
printf("i = %d", i);
}
You can only return one thing from a function. If you need to update multiple parameters, or you need to use the return value for something other than the updated variable (such as an error code), you need to pass a pointer to the variable.
Getting this out of the way first - pointers are fundamental to C programming. You cannot be “free” of pointers when writing C. You might as well try to never use if statements, arrays, or any of the arithmetic operators. You cannot use a substantial chunk of the standard library without using pointers.
“Pass by value” means, among other things, that the formal parameter j in increment and the actual parameter i in main are separate objects in memory, and changing one has absolutely no effect on the other. The value of i is copied to j when the function is called, but any changes to i are not reflected in j and vice-versa.
We work around this in C by using pointers. Instead of passing the value of i to increment, we pass its address (by value), and then dereference that address with the unary * operator.
This is one of the cases where we have to use pointers. The other case is when we track dynamically-allocated memory. Pointers are also useful (if not strictly required) for building containers (lists, trees, queues, stacks, etc.).
Passing a value as a parameter and returning its updated value works, but only for a single parameter. Passing multiple parameters and returning their updated values in a struct type can work, but is not good style if you’re doing it just to avoid using pointers. It’s also not possible if the function must update parameters and return some kind of status (such as the scanf library function, for example).
Similarly, using file-scope variables does not scale and creates maintenance headaches. There are times when it’s not the wrong answer, but in general it’s not a good idea.
So, imagine you need to pass large arrays or other data structures that need modification. If you apply the way you use to increment an integer, then you create a copy of that large array for each call to that function. Obviously, it is not memory-friendly to create a copy, instead, we pass pointers to functions and do the updates on a single array or whatever it is.
Plus, as the other answer mentioned, if you need to update many parameters then it is impossible to return in the way you declared.

Does something like Python generator exist in objective-c?

Does something like Python generator exist in objective-c ?
I have following code on few places, so is there some way to simplify it ?
int maxWinInRow = [self maxWinInRow];
// how many wins in row
for (int i=1; i <= maxWinInRow; i++) {
NSString * key = [NSString stringWithFormat:#"%d",i];
NSNumber * value = [_winsInRow valueForKey:key ];
int numbeOfWinInRow = value.intValue;
// only this line is specific
gameScore = gameScore + ( pow(6,i) * numbeOfWinInRow);
}
To be specific, there is no such generator pattern built inside Objective C programming language. However with the introduction of "blocks" in Objective C (and C with LLVM) it has become somewhat possible to build your own generator pattern in Objective C.
If you are serious to learn this, you can go through this article by Mike Ash.
However, please be cautious when working with blocks, as it occasionally incurs extra overhead of memory management or else it might result a potential memory leak in your app. So if you want to use the described pattern, please make sure you have the proper understanding of "blocks". Otherwise, I would always advise you to refactor your code to build a basic method that would work as your stub as described by Wain. Certainly that wont allow the lazy initialization that the pattern implies. However, it's a better than nothing solution in Objective C.

What types of languages allow programmatic creation of variable names?

This question comes purely out of intellectual curiosity.
Having browsed the Python section relatively often, I've seen a number of questions similar to this, where someone is asking for a programmatic way to define global variables. Some of them are aware of the pitfalls of exec, others aren't.
However, I've recently been programming in Stata, where the following is common:
local N = 100
local i = 1
foreach x of varlist x1 - x`N' {
local `x' = `i' * `i'
++i
}
In Stata parlance, a local macro with the name N is created, and N evaluates to 100. In each iteration of the foreach loop, a value from x1 to x100 is assigned to the local macro x. Then, the line inside the loop, assigns the square of i to the expansion of x, a local macro with the same ending as i. That is, after this loop x4 expands to 4^2 and x88 expands to 88^2.
In Python, the way to do something similar would be:
squares = {}
for x in range(1,101):
squares[x] = x**2
Then squares[7] equals 7^2.
This is a pretty simple example. There are a lot of other uses for stata macros. You can use them as a way to pass functions to be evaluated, for example:
local r1 "regress"
local r2 "corr"
foreach r of varlist r1-r2 {
``r'' y x
}
The double tickmarks around r expand that macro twice, first to r1/r2 then to regress/corr, with the result of running a linear regression with y as the dependent and x as the independent variable, and then showing the correlation between y and x. Even more complex stuff is possible.
My question is basically, does stata fall into some larger category of languages where variable assignment/evaluation takes this form of "macro assignment/expansion"? Bonus points for any explanation of why a language would be designed like this, and/or examples of similar constructs in other languages.
It's really just a question of how much syntactic sugar is there. In any language worth its salt, you can use a map or dictionary data structure to create variable names (keys) at runtime with some value. Some languages may more transparently integrate that with ordinary variable identifiers than others.
(Sorry this is an "answer," not a comment.... people don't rate my answers, so I don't have enough points to comment on the question.)
First, let me point out what that is strange about Stata is that it translates the macro before execulting that line of code. For example:
Say you type.
local x3 = 20
local y = 3
display "I am `x`y'' years old"
Internally, Stata is going to translate the locals (inner to outer) and then execute the display command.
That is, Stata will translate the command
display "I am `x3' years old"
then
display "I am 20 years old"
then, Stata will actually execute this last line of code. You can watch all of this by first executing this command: set trace on.
There is a subtle difference. The ` ' brackets change the command. I think this is actually different than other programing languages. You can often do something in Stata with one line of code where other languages would require two lines of code (one to "expand" the macro; one to execute the line of code).
What's useful about this is that Stata can also evaluate all kinds of expressions inside the ` ' brackets (as long as they return a number or string... e.g., nothing that returns a matrix)
display "I am `= 2011 - 1991' years old"
display "I am `= floor(uniform()*`x`y'')' years old"
This is immensely useful once you get used to it. Macros make things in Stata way more clean than you would do in, for example, SAS. SAS's %let statement's aren't nearly as flexible.
I was also going to point out a few mistakes.
(1) The loop is set up wrong in these examples. x1, x2, ... , x100 are macros (locals), not variables. You can't say foreach x of varlist x1 - x100 because x1-x100 is not a variable list. If I was trying to do that, I would probably use:
local N = 100
forvalues i = 1/`N' {
local x`i' = `i' * `i'
}
The same mistake is made in the second example. r1 and r2 are not variables.
You could do this:
local mycommands regress corr
foreach r in `mycommands' {
`r' y x
}
(Although I would actually type the equivalent, foreach r of local mycommands { ... }, which is supposedly a hair faster to execute).
(2) Second, ++i is not valid. You probably meant to say local ++i.
I don't know if this is what you are looking for, but in PHP you can do:
for ($i=0; $i<10; $i++) {
${x.$i} = $i*$i;
}
print $x3; // prints 9
print $x4; // prints 16
I personally find this very unpleasant.
Javascript is an obvious example, although the mechanism is like Python, not Stata.
for(var i = 0; i < 100; i++)
this["var" + i] = i * i;
alert(var8); // 64

Categories