Tracking changes in python source files?

Tracking changes in python source files? - python

I'm learning python and came into a situation where I need to change the behvaviour of a function. I'm initially a java programmer so in the Java world a change in a function would let Eclipse shows that a lot of source files in Java has errors. That way I can know which files need to get modified. But how would one do such a thing in python considering there are no types?! I'm using TextMate2 for python coding.
Currently I'm doing the brute-force way. Opening every python script file and check where I'm using that function and then modify. But I'm sure this is not the way to deal with large projects!!!
Edit: as an example I define a class called Graph in a python script file. Graph has two objects variables. I created many objects (each with different name!!!) of this class in many script files and then decided that I want to change the name of the object variables! Now I'm going through each file and reading my code again in order to change the names again :(. PLEASE help!
Example: File A has objects x,y,z of class C. File B has objects xx,yy,zz of class C. Class C has two instance variables names that should be changed Foo to Poo and Foo1 to Poo1. Also consider many files like A and B. What would you do to solve this? Are you serisouly going to open each file and search for x,y,z,xx,yy,zz and then change the names individually?!!!

Sounds like you can only code inside an IDE!
Two steps to free yourself from your IDE and become a better programmer.
Write unit tests for your code.
Learn how to use grep
Unit tests will exercise your code and provide reassurance that it is always doing what you wanted it to do. They make refactoring MUCH easier.
grep, what a wonderful tool grep -R 'my_function_name' src will find every reference to your function in files under the directory src.
Also, see this rather wonderful blog post: Unix as an IDE.

Whoa, slow down. The coding process you described is not scalable.
How exactly did you change the behavior of the function? Give specifics, please.
UPDATE: This all sounds like you're trying to implement a class and its methods by cobbling together a motley patchwork of functions and local variables - like I wrongly did when I first learned OO coding in Python. The code smell is that when the type/class of some class internal changes, it should generally not affect the class methods. If you're refactoring all your code every 10 mins, you're doing something seriously wrong. Step back and think about clean decomposition into objects, methods and data members.
(Please give more specifics if you want a more useful answer.)
If you were only changing input types, there might be no need to change the calling code.
(Unless the new fn does something very different to the old one, in which case what was the argument against calling it a different name?)
If you changed the return type, and you can't find a common ancestor type or container (tuple, sequence etc.) to put the return values in, then yes you need to change its caller code. However...
...however if the function should really be a method of a class, declare that class and the method already. The previous paragraph was a code smell that your function really should have been a method, specifically a polymorphic method.
Read about code smells, anti-patterns and When do you know you're dealing with an anti-pattern?. There e.g. you will find a recommendation for the video "Recovery from Addiction - A taste of the Python programming language's concision and elegance from someone who once suffered an addiction to the Java programming language." - Sean Kelly
Also, sounds like you want to use Test-Driven Design and add some unittests.
If you give us the specifics we can critique it better.

You won't get this functionality in a text editor. I use sublime text 3, and I love it, but it doesn't have this functionality. It does however jump to files and functions via its 'Goto Anything' (Ctrl+P) functionality, and its Multiple Selections / Multi Edit is great for small refactoring tasks.
However, when it comes to IDEs, JetBrains pycharm has some of the amazing re-factoring tools that you might be looking for.
The also free Python Tools for Visual Studio (see free install options here which can use the free VS shell) has some excellent Refactoring capabilities and a superb REPL to boot.
I use all three. I spend most of my time in sublime text, I like pycharm for refactoring, and I find PT4VS excellent for very involved prototyping.
Despite python being a dynamically typed language, IDEs can still introspect to a reasonable degree. But, of course, it won't approach the level of Java or C# IDEs. Incidentally, if you are coming over from Java, you may have come across JetBrains IntelliJ, which PyCharm will feel almost identical to.
One's programming style is certainly different between a statically typed language like C# and a dynamic language like python. I find myself doing things in smaller, testable modules. The iteration speed is faster. And in a dynamic language one relies less on IDE tools and more on unit tests that cover the key functionality. If you don't have these you will break things when you refactor.

One answer only specific to your edit:
if your old code was working and does not need to be modified, you could just keep old names as alias of the new ones, resulting in your old code not to be broken. Example:
class MyClass(object):
def __init__(self):
self.t = time.time()
# creating new names
def new_foo(self, arg):
return 'new_foo', arg
def new_bar(self, arg):
return 'new_bar', arg
# now creating functions aliases
foo = new_foo
bar = new_bar
if your code need rework, rewrite your common code, execute everything, and correct any failure. You could also look for any import/instantiation of your class.

One of the tradeoffs between statically and dynamically typed languages is that the latter require less scaffolding in the form of type declarations, but also provide less help with refactoring tools and compile-time error detection. Some Python IDEs do offer a certain level of type inference and help with refactoring, but even the best of them will not be able to match the tools developed for statically typed languages.
Dynamic language programmers typically ensure correctness while refactoring in one or more of the following ways:
Use grep to look for function invocation sites, and fix them. (You would have to do that in languages like Java as well if you wanted to handle reflection.)
Start the application and see what goes wrong.
Write unit tests, if you don't already have them, use a coverage tool to make sure that they cover your whole program, and run the test suite after each change to check that everything still works.

Related

How to apply Closed-Open and Inversion of Control principles in Python?

Building out a new application now and struggling a lot with the implementation part of "Closed-Open" and "Inversion of Control" principles I following after reading Clean Architecture book by Uncle Bob.
How can I implement them in Python?
Usually, these two principles coming hand in hand and depicted in the UML as an Interface reversing control from module/package A to B.
I'm confused because:
Python does not possess Interfaces as Java and C++ do. Yes, there are ABC and #abstractmethod, but it is not a Pythonic style and redundant from my point of view if you are not developing a framework
Passing a class to the method of another one (I understood that it is a way to implement open-closed principle) is a little bit dangerous in Python, since it does not have a compiler which is catching issues may (and will) happen if one of two loosely coupled objects change
After neglecting interfaces and passing a top-level class to lower-level ones... I still need to import everything somewhere at the top module. And by that, the whole thing is violated.
So, as you can see I'm super confused and having a hard time programming according to my design. I came up with. Can you help me, please?

You just pass an object that implements the methods you need it to implement.
True, there is no "Interface" to define what those methods have to be, but that's just the way it is in python.
You pass around arguments all the time that have to be lists, maps, tuples, or whatever, and none of these are type-checked. You can write code that calls whatever you want on these things and python will not notice any kind of problem until that code is actually executed.
It's exactly the same when you need those arguments to implement whatever IoC interface you're using. Make sure you detail the requirements in comments.
Yes, this is all pretty dangerous. That's why we prefer statically typed languages for large systems that have complex interfaces.

OOP programming in python

I was reading Dietel's C++ programming book. In this book they mention how a programmer should release only the interface part of his code and not the implementation.
So carrying this over to python:
I have 2 files:
1) the implementation file = accountClass.py and
2) the interface file = useAccountClass.py
I have compiled the implementation file and have obtained the .pyc file. So when I provide my code to someone else, I would provide him with the .pyc file and the interface file, right?
Also, if I provide someone else with ONLY the .pyc file, can I expect him to write the interface on his own? I'm going to say no. But there's this one nagging doubt that I have:
The creators of numpy and scipy did not share the implementation with us end users. And I don't think they shared any interfaces either. But we can still search for the different classes and their methods inside both numpy and scipy. So, using this example of numpy and scipy, I guess what I'm trying to ask is:
Is it possible for someone else to create an interface to my code if I provide him/ her with only the compiled implementation file (in this case accountClass.pyc)? How will that person know what classes and methods I have defined in my implementation? I mean, will they use the
if __name__ = "__main__" :
blah blah
or is there some other way??

You got that entirely wrong. Or perhaps it's a horrible book whose author got something seriously wrong. Code using other code should indeed, barring significant counterarguments, adhere to an interface and not care about the details of the implementation. However, even in the world of static compilation to machine code (e.g. C++), this does not mean you should lock away the source code of the implementation.
Whether someone has access to the implementation, and whether they make use of that knowledge while writing a specific piece of code, are completely different issues. Heck, even the author of the implementation can/should still program to an interface when working on other code (e.g. other modules). Likewise, even if you lock the implementation away from someone, they may very well rely on implementation quirks which are not part of the interface. If anyone in the world of static compilation to machine code provides only headers and object files, and not the source code, it's because the projects are closed source, not to encourage good programming practices among clients.
In Python, your question makes no sense - there are no "interface" and "implementation" files, there's just code which is run and defines functions, classes, and other values. There is no such thing as an interface file you'd provide. You provide an implementation - and (hopefully) documentation which details both interface and possibly implementation details. And once a module is imported, the class objects, function objects, and other objects, contain plenty of information (including, in many cases, the text from which large parts of the documentation was generated). This is also true for extension modules like numpy. And note that their implementation is accessible, it's just not included in all distributions because it's of little use. With Python code, you practically have to distribute the source code because anything else is platform-specific.
On a side note, .pyc files are pretty high level, and easily understood when disassembled (which is as easy as importing the module and running the stdlib module dis on any function inside). I consider this a minor technicality as it's already the wrong question to ask.

Deitel's advice to C++ programmers doesn't apply to Python, for a number of reasons:
Python isn't compiled to machine code, so no matter what form you provide the program in, it will be relatively easy for someone to read the code.
Python doesn't have .h and .c files, all you can provide is the .py or .pyc files.
Treating code as a secret is kind of silly anyway. What is in your code that you need to keep hidden from others?
Numpy and Scipy are largely implemented in C, which is why you don't have the source, for your own convenience. You can get the source if you like. The "interface" to that code is the module that you can import and then call.

You should not confuse "user interface" with "class interface". If you have a useAccountClass file, that file probably performs some task using the classes and methods defined in the accountClass file, if I understood right.
If you send the file to other person, they are not supposed to "guess" what your compiled class does. That's what DOCUMENTATION is for: a description of the functions contained in the module (compiled or not), which parameters they take, which values they return, and what they are expected to do, the "meaning" of the task they perform.
As an abstract example, let's suppose you have an image processing class. If that class has the function findCircles(image), the documentation should explain that it takes an image, possibly containing circles, and returns a list or array of coordinates of the centers of circles contained in the image. HOW the circles are detected is not important, you don't need to know that to use the function. Now if the function was called like findCircles(image, gaussian_threshold=10), the caller would have to know the function uses some "gaussian_threshold" parameter, that is, the caller would NEED to know about the function's entrails, and in OOP this is Not Good. If you decided to use another algorithm in the future, every code using that function would have to be rewritten, because the gaussian_threshold most probably wouldn't make sense anymore.
So, the interface, in OOP, is the abstraction used to communicate to the object only the canonical parameters or inputs it needs to know to perform a task in the language of the problem, not in the language of the implementation (that can change anytime).
The documentation, in this sense, is a contract that assures to the user (in this case, another developer) that the function will perform as expected if sane inputs are given to it.
Now the FINAL USER, a non-technical person wanting to use your program, would need the WHOLE working program (controls and views), not only the class definitions (the model).
Hope this helps, and I must recommend the books "Code Complete 2nd ed." and "Pragmatic Programmer - From Journeyman to Master" as VERY enlightening readings on the broad topic.

When should a Python script be split into multiple files/modules?

In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?

Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.

As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)

In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)

Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.

Protection from accidentally misnaming object attributes in Python?

A friend was "burned" when starting to learn Python, and now sees the language as perhaps fatally flawed.
He was using a library and changed the value of an object's attribute (the class being in the library), but he used the wrong abbreviation for the attribute name. It took him "forever" to figure out what was wrong. His objection to Python is thus that it allows one to accidentally add attributes to an object.
Unit tests don't provide a solution to this. One doesn't write unit tests against an API being used. One may have a mock for the class, but the mock could have the same typo or incorrect assumption about the attribute name.
It's possible to use __setattr__() to guard against this, but (as far as I know), no one does.
The only thing I've been able to tell my friend is that after several years of writing Python code full-time, I don't recall ever being burned by this. What else can I tell him?

"changed the value of an object's attribute" Can lead to problems. This is pretty well known. You know it, now, also. That doesn't indict the language. It simply says that you've learned an important lesson in dynamic language programming.
Unit testing absolutely discovers this. You are not forced to mock all library classes. Some folks say it's only a unit test when it's tested in complete isolation. This is silly. You have to trust the library modules -- it's a feature of your architecture. Rather than mock them, just use them. (It is important to write mocks for your own newly-developed libraries. It's also important to mock libraries that make expensive API calls.)
In most cases, you can (and should) test your classes with the real library modules. This will find the misspelled attribute name.
Also, now that you know that attributes are dynamic, it's really easy to verify that the attribute exists. How?
Use interactive Python to explore the classes before writing too much code.
Remember, Python is not Java and it's not C. You can execute Python interactively and determine immediately if you've spelled something wrong. Writing a lot of code without doing any interactive confirmation is -- simply -- the wrong way to use Python.
A little interactive exploration will find misspelled attribute names.
Finally -- for your own classes -- you can wrap updatable attributes as properties. This makes it easier to debug any misspelled attribute names. Again, you know to check for this. You can use interactive development to confirm the attribute names.
Fussing around with __setattr__ creates problems. In some cases, we actually need to add attributes to an object. Why? It's simpler than creating a whole subclass for one special case where we have to maintain more state information.
Other things you can say:
I was burned by a C program that absolutely could not be made to work because of ______. [Insert any known C-language problem you want here. No array bounds checking, for example] Does that make C fatally flawed?
I was burned by a DBA who changed a column name and all the SQL broke. It's painful to unit test all of it. Does that make the relational database fatally flawed?
I was burned by a sys admin who changed a directory's permissions and my application broke. It was nearly impossible to find. Does that make the OS fatally flawed?
I was burned by a COBOL program where someone changed the copybook, forgot to recompile the program, and we couldn't debug it because the source looked perfect. COBOL, however, actually is fatally flawed, so this isn't a good example.

There are code analyzers like pylint that will warn you if you add a attribute outside of __init__. PyDev has nice support for it. Such errors are very easy to find with a debugger too.

If the possibility to make mistakes is enough for him to consider a language "fatally flawed", I don't think you can convince him otherwise. The more you can do with a language, the more you can do wrong with the language. It's a caveat of flexibility—but that's true for any language.

You can use the __slots__ class attribute to limit the attributes that instances have. Attempting to set an attribute that's not expliticly listed will raise an AttributeError. There are some complications that arise with subclassing. See the Python data model reference for details.

A tool like pylint or pychecker may be able to detect this.

He's effectively ruling out an entire class of programming languages -- dynamically-typed languages -- because of one hard lesson learned. He can use only statically-typed languages if he wishes and still have a very productive career as a programmer, but he is certainly going to have deep frustrations with them as well. Will he then conclude that they are fatally-flawed?

I think your friend has misplaced his frustration in the language. His real problem is lack of debugging techniques. teach him how to break down a program into small pieces to examine the output. like a manual unit test, this way any inconsistency is found and any assumptions are proven or discarded.

I had a similar bad experience with Python when I first started ... took me 3 months to get over it. Having a tool which warns would be nice back then ...

What's the best way to record the type of every variable assignment in a Python program?

Python is so dynamic that it's not always clear what's going on in a large program, and looking at a tiny bit of source code does not always help. To make matters worse, editors tend to have poor support for navigating to the definitions of tokens or import statements in a Python file.
One way to compensate might be to write a special profiler that, instead of timing the program, would record the runtime types and paths of objects of the program and expose this data to the editor.
This might be implemented with sys.settrace() which sets a callback for each line of code and is how pdb is implemented, or by using the ast module and an import hook to instrument the code, or is there a better strategy? How would you write something like this without making it impossibly slow, and without runnning afoul of extreme dynamism e.g side affects on property access?

I don't think you can help making it slow, but it should be possible to detect the address of each variable when you encounter a STORE_FAST STORE_NAME STORE_* opcode.
Whether or not this has been done before, I do not know.
If you need debugging, look at PDB, this will allow you to step through your code and access any variables.
import pdb
def test():
print 1
pdb.set_trace() # you will enter an interpreter here
print 2

What if you monkey-patched object's class or another prototypical object?
This might not be the easiest if you're not using new-style classes.

You might want to check out PyChecker's code - it does (i think) what you are looking to do.

Pythoscope does something very similar to what you describe and it uses a combination of static information in a form of AST and dynamic information through sys.settrace.
BTW, if you have problems refactoring your project, give Pythoscope a try.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.