I am trying to degenerate my data from already manipulated data. It takes five parameters: target, source, target_key, source_key, transformer
For example:
target = {}
source = {first_name: tom}
target_key = name
source_key = first_name
transformer = lambda value: value.title()
So, currently, I set first_name to name. and response becomes {name: Tom}
Now, I am trying to reverse it. If I get {name: Tom}, it should result in {first_name: tom} using same lambda or function. similarly, there are many other keys with different transformers
Is there any way/keyword to reverse the functionality of a lambda or any function.
Thanks,
Short answer: is if fundamentally impossible to construct an "inverse" function for a given function.
You cannot derive the "reverse" function from a given function (whether it is a lambda expression) is irrelevant. There are several aspects here:
First of all, it is possible that several inputs map on the same output. Take for instance the function lambda x : x.lower(). In that case both 'foo' and 'FOO' map to 'foo'. So even if you somehow could calculate input that maps on a given output, a question would be: "what input do you pick".
Next say we simply state that any input would suffice, one can ask whether it is possible. It still is not since the problem is also undecidable in the sense that if you provide as "expected output" a value that cannot be generated by the function, the hypothetical inverse function cannot know that. One can prove this by using computability theory since it would conflict with the fact that the emptiness problem ETM is undecidable.
Is there a theoretical way to derive an object that maps to a given valid value? Yes: one could enumerate over all possible inputs (it is an infinite, but countable so enumerable), calculate the output and then validate it. Furthermore the evaluation of functions should happen in "parallel" since it is possible one of the function calls results in an infinite loop.
Nevertheless hoping that it is realistic to construct a real function that calculates the inverse is not advisable. In a practical sense the above sketched algorithm is unfeasible. It would require an enormous amount of memory to store all the simulations of these functions. Furthermore it is possible that these have side effects (like writing to a file). As a result you should make copies of everything that might have side effects. Furthermore in practice some side effects cannot be "virtualized" or "undone". If the function for instance communicates with a web server, you cannot "undo" the HTTP request. It can also take ages before a valid input structure is entered and evaluated.
Like #JohnColeman says in his comment the fact that a function is not (feasibly) inverse is sometimes desired behavior. In asymmetrical encryption for instance the public key is usually publicly available. Nevertheless we do not want the message encrypted by the public key to be (efficiently) computable. A lot of todays cryptography and security depends on the fact that it is hard or impossible to perform the inverse operation of a function.
A final note is that of course it can be possible to construct an "inverse constructor" for certain families of functions. But in general (meaning a "inverse generator" that can take any kind of function as input), it is impossible.
To regenerate your data, you need to invert the mappings that were applied to it. There is no general function-inverse operator in Python or any other programming language, for the reasons that #Willem explained, but humans are pretty good at identifying and reversing simple manipulations. With enough work, it is possible to understand and reverse complicated manipulations too. This is part of how hackers reverse-engineer programs and algorithms, and it is what you need to do too if your data is worth the effort. (Of course you can partially
automate the process, especially if you know the kinds of manipulations that have been applied, e.g. if you wrote them yourself.)
If you have the source code, it's relatively easy: Inspect each function, write a suitable inverse (to the extent that hey exist), and write a main loop that somehow determines which inverse to apply. If you don't have the source code but have the compiled program (.pyc or .pyo files), you can still disassemble them and puzzle out what they do. See the dis module (but it's not at at all trivial):
>>> import dis
>>> dis.dis(transformer)
1 0 LOAD_FAST 0 (value)
3 LOAD_ATTR 0 (title)
6 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
9 RETURN_VALUE
So... the bottom line is, you have to do it yourself. Good luck with it.
Related
I am a newbie reading Uncle Bob's Clean Code Book.
It is indeed great practice to limit the number of function arguments as few as possible. But I still come across so many functions offered in many libraries that require a bunch of arguments. For example, in Python's pandas, there is a function with 9 arguments:
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
(And this function also violates the advice about flag arguments)
It seems that such cases are much rarer in Python standard libraries, but I still managed to find one with 4 arguments:
re.split(pattern, string, maxsplit=0, flags=0)
I understand that this is just a suggestion instead of silver bullet, but is it applicable when it comes to something mentioned above?
Uncle Bob does not mention a hard limit of arguments that would make your code smell, but I would consider 9 arguments as too much.
Today's IDEs are much better in supporting the readability of the code, nevertheless refactoring stays tricky, especially with a large number of equally typed arguments.
The suggested solution is to encapsulate the arguments in a single struct/object (depending on your language). In the given case, this could be a GroupingStrategy:
strategy = GroupingStrategy();
strategy.by = "Foo"
strategy.axis = 0
strategy.sorted = true
DataFrame.groupby(strategy)
All not mentioned attributes will be assigned with the respective default values.
You could then also convert it to a fluent API:
DataFrame.groupby(GroupingStrategy.by("Foo").axis(0).sorted())
Or keep some of the arguments, if this feels better:
DataFrame.groupby("Foo", GroupingStrategy.default())
The first point to note is that all those arguments to groupby are relevant. You can reduce the number of arguments by having different versions of groupby but that doesn't help much when the arguments can be applied independently of each other, as is the case here. The same logic would apply to re.split.
It's true that integer "flag" arguments can be dodgy from a maintenance point of view - what happens if you want to change a flag value in your code? You have to hunt through and manually fix each case. The traditional approach is to use enums (which map numbers to words eg a Day enum would have Day.Sun = 0, Day.Mon = 1, etc) In compiled languages like C++ or C# this gives you the speed of using integers under the hood but the readability of using labels/words in your code. However enums in Python are slow.
One rule that I think applies to any source code is to avoid "magic numbers", ie numbers which appear directly in the source code. The enum is one solution. Another solution is to have constant variables to represent different flag settings. Python sort-of supports constants (uppercase variable names in constant.py which you then import) however they are constant only by convention, you can actually change their value :(
Some methods don't need to make a new variable, i.e. lists.reverse() works like this:
lists = [123, 456, 789]
lists.reverse()
print(lists)
this method make itself reversed (without new variable).
Why there is vary ways to manufacture variable in Python?
Some cases which is like variable.method().method2().method3() are typed continuously but type(variable) and print() are not. Why we can't typing like variable.print() or variable.type()?
Is there any philosophical reasons for Python?
You may be confused by the difference between a function and a method, and by three different purposes to them. As much as I dislike using SO for tutorial purposes, these issues can be hard to grasp from other documentation. You can look up function vs method easily enough -- once you know it's a (slightly) separate issue.
Your first question is a matter of system design. Python merely facilitates what programmers want to do, and the differentiation is common to many (most?) programming languages since ASM and FORTRAN crawled out of the binary slime pools in the days when dinosaurs roamed the earth.
When you design how your application works, you need to make a lot of implementation decisions: individual variables vs a sequence, in-line coding vs functions, separate functions vs encased functions vs classes and methods, etc. Part of this decision making is what each function should do. You've raised three main types:
(1) Process this data -- take the given data and change it, rearrange it, whatever needs doing -- but I don't need the previous version, just the improved version, so just put the new stuff where the old stuff was. This is used almost exclusively when one variable is getting processed; we don't generally take four separate variables and change each of them. In that case, we'd put them all in a list and change the list (a single variable). reverse falls into this class.
One important note is that for such a function, the argument in question must be mutable (capable of change). Python has mutable and immutable types. For instance, a list is mutable; a tuple is immutable. If you wanted to reverse a tuple, you'd need to return a new tuple; you can't change the original.
(2) Tell me something interesting -- take the given data and extract some information. However, I'm going to need the originals, so leave them alone. If I need to remember this cool new insight, I'll put it in a variable of my own. This is a function that returns a value. sqrt is one such function.
(3) Interact with the outside world -- input or output data permanently. For output, nothing in the program changes; we may present the data in an easy-to-read format, but we don't change anything internally. print is such a function.
Much of this decision also depends on the function's designed purpose: is this a "verb" function (do something) or a noun/attribute function (look at this data and tell me what you see)?
Now you get the interesting job for yourself: learn the art of system design. You need to become familiar enough with the available programming tools that you have a feeling for how they can be combined to form useful applications.
See the documentation:
The reverse() method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence.
I have a function f(x), which does something and return values (a tuple).
I have another function that call this function , after processing parameters (the whole function operation is irrelevant to the question); and now I would like to know if there are evil intent in returning the function itself, vs runt the function, dump the output in a variable and return the variable.
A variable has a cost, and assign a value to a variable has a cost; but beside that, is there any sorcery that would happen behind the scene, that would make one better than the other ?
def myfunction(self):
[do something]
return f(x)
is the same as
def myfunction(self):
[do something]
b = f(x)
return b
or one is to prefer to the other (and why)? I am talking purely on the OOP persepctive; without considering that create variables and assign has a cost, in terms of memory and CPU cycles.
That doesn't return the function. Returning the function would look like return f. You're returning the result of the function call. Generally speaking, the only reason to save that result before returning it is if you plan to do some other kind of processing on it before the return, in which case it's faster to just refer to a saved value rather than recomputing it. Another reason to save it would be for clarity, turning what might be a long one-liner with extensive chaining into several steps.
There's a possibility that those two functions might produce different results if you have some kind of asynchronous process that modifies your data in the background between saving the reference and returning it, but that's something you'll have to keep in mind based on your program's situation.
In a nutshell, save it if you want to refer to it, or just return it directly otherwise.
Those are practically identical; use whichever one you think is more readable. If the performance of once versus the other actually matters for you, perhaps Python is not the best choice ;).
The cost difference between these is utterly negligible: in the worst case, one extra dictionary store, one extra dictionary lookup and one extra string in memory. In practice it won't even be that bad, since cpython stores local variables in a C array, so it's more like two c level pointer indirections.
As a matter of style, I would usually avoid the unnecessary variable but its possible that it might be better in particular cases. As a guideline, think about things like whether the amalgamated version leads to an excessively long line of code, whether the extra variable has a better name than eg result, and how clear it is that that function call is the result you need (and if it isnt, whether/how much a variable helps).
I am looking for a way to rename all variables in a formula according to a given substitution map. I am currently using the substitute function, but it seems to be quite slow.
Is there another function I can use which is faster than it? is there any other way of doing it quickly?
N.B. I am only substituting fresh variables to the variables in the original formula, so there are no renaming clashes. Is there any way to perform the renaming faster under this assumption?
For instance,
# given
f = And(Int('x') > Int('y'), Or(Int('x') - 5 >= Int('z'), Int('k') > 1))
# expected result after substitution
# f = And(Int('v0') > Int('v1'), Or(Int('v0') - 5 >= Int('v2'), Int('v3') > 1))
Is there any way to do it working on the context of f?
There isn't an inherently faster way over the API. I have a few comments regarding speed:
You seem to be using the Python API, which by itself has a huge overhead. It may help to time
the portion used in python separately from Z3.
The implementation of the substitute function uses class that gets allocated on the stack.
It is quite possible that making this class a persisted attribute on the context will
speed up amortized time because it will not be allocating and re-allocating memory repeatedly. I would have to profile an instance to be able to tell if this change really pays off.
The more fundamental way to perform renaming is to work with implicit renaming, so not apply substitution at all, but access variables with different offsets. This low level way of dereferencing variables is not available in any way over the API or even the way we represent high-level expressions so it is not going to be an option.
If your application allows it, you may be able to work with existing terms and encode substitutions implicitly. For example in some applications one can just add equality constraints between old and new variables.
I'm quite new to python (2.7) and have a question about what's the most Pythonic way to do something; my code (part of a class) Looks like this (a somewhat naive Version):
def calc_pump_height(self):
for i in range(len(self.primary_)):
for j in range(len(self.primary_)):
if self.connections_[i][j].sub_kind_ in [1,4]:
self.calc_spec_pump_height(i,j)
def calc_spec_pump_height(self,i,j):
pass
(obviously pass will be replaced by something else, manipulating attributes of the object of this class, without generating a return value)
I'd like to ask how I should do this: I could avoid the second function and write the extra code directly into the first function, getting rid of one function (Simple is better than complex), but creating a heavily nested function at the same time (Flat is better than nested).
I could also create some sort of list comprehension to avoid using a double Loop, eg:
def calc_pump_height(self):
ra = range(len(self.primary_))
[self.calc_spec_pump_height(i,j) for i,j in zip(ra, ra)]
(I'd have to move the if condition into the 2nd function; this would also create a null-list but I don't care about this, since calc_spec_pump_height is supposed to manipulate the object, not return something useful)
In essence: I'm iterating over a 2D list, testing each object for a certain characteristic and then do something with that object.
Which of the above methods is 'the best'? Or is there another way that I'm missing?
The key thing about functions/methods is that they should do one thing.
calc_pump_height implements two things: It finds elements in a 2D list that match some criteria, and then it calculates a value for each of those elements. It's ok for its purpose to be combining the other two operations, if that makes sense for the object's public API, but its not ok for it to implement either or both.
Finding the elements that match the criteria is a discrete step; that should be a function.
Calculating your value is clearly a discrete step; that should be a function.
I would implement the element matcher as a (private) generator, that takes the test condition as an argument, and yields all matching elements. It's just an iterator over your data structure, masked by the logical test. You can wrap that in a named public method called get_1_4_subkinds() or something that makes more sense in your domain. That generalises the code and gives you the flexibility to implement other conditions in the future. Also, your i and j are tightly coupled, so it makes sense to pass them around as a single concept. Then your code becomes:
def calc_pump_height(self):
for subkind_indices in self.get_1_4_subkinds():
self.calc_pump_spec_height(subkind_indices)
You have misunderstood “simplicity”:
write the extra code directly into the first function, getting rid of one function (Simple is better than complex)
That's not simple. Breaking complex sequences into discrete, focussed functions increases simplicity.
In that light, I would say that yes, you should definitely prefer calc_spec_pump_height as a separate function.
You can eliminate one level of nesting in your first function by using itertools.product to generate your i and j values at the same time (itertools.product(range(len(self.primary_)), repeat=2). The zip you use in the your second version won't work correctly, it will only yield identical pairs, 0,0, 1,1, 2,2, etc.
As for the overall design, you should not use a list comprehension if you don't care about the return value from the function you're calling. Use an explicit loop when it's the looping you want (rather than a list of computed values).
If there's a non-trivial amount of code that will go in calc_spec_pump_height, it makes perfect sense to make it as a separate method. If it's a one or two liner, then it might be OK to inline within calc_pump_height, but that method's loops and condition testing may be complicated enough already to justify factoring out the inner part of the algorithm.
You should usually think about splitting a big function up when it is too long to fit onto a single screen in your editor. That is about the limit of how many details (variable names, etc.) we can keep in our mind simultaneously. On the other hand, you shouldn't waste time (either your own programming time or function call overhead at run time) by factoring out every little piece of every problem. Factor part of a function out if you're using it from more than one place, or if you can't keep the details of the whole function in your head at once otherwise.
So, other than the (marginal) improvement of itertools.product and given the limited information you've provided about what calc_spec_pump_height will do, I think your code is already about as good as it can get!