I've got a Codebase of around 5,3k LOC with around 30 different classe. The code is already very well formatted and I want to improve it further by prefixing methods that are only called in the module that were defined in with a "_", in order to indicate that. Yes it would have been a good idea to do that from the beginning on but now it's too late :D
Basically I'm searching for a tool that will tell me if a method is not called outside of the module it was defined in, I'm not looking for stuff that will automatically convert the whole thing to use underscores, just a "simple" thing that tells me where I have to look for prefixing stuff.
I'd took a look at the AST module, but there's no easy way to get a list of method definitions and calls, also parsing the plain text yields just too many false positives. I don't insist in spending day(s) on reinventing the wheel when there might be an already existing solution to my problem.
For me, this sounds like special case of coverage.
Thus I'd take a look at coverage.py or figleaf and modify it to ignore inter-module calls.
Related
I've written a fair bit of my first significant Python script. I just finished reading PEP 8, and I learned that lower_case_with_underscores is preferred for instance variable names. I've been using mixedCase for variable names throughout, and I'd like my code to be make more Pythonic by changing those to lower_case_with_underscores if that's how we do things around here.
I could probably write some script that searches for mixedCase and tries to smartly replace it, but before I potentially reinvent the wheel, my question is whether a solution for that already exists, either within a Python-savvy editor or as a standalone application; or whether there's another approach that would accomplish the task of converting all mixedCase variable names to lower_case_with_underscores. I have searched a fair bit for a solution but didn't turn up anything. Any technique that specifically would yield this result would be appreciated.
I was able to accomplish what I wanted using GNU sed:
sed -i -e :loop -re 's/(^|[^A-Za-z_])([a-z0-9_]+)([A-Z])([A-Za-z0-9_]*)'\
'([^A-Za-z0-9_]|$)/\1\2_\l\3\4\5/' -e 't loop' myFile.py
It finds every instance of mixedCase -- but not CapitalWords, so class names are left intact -- and replaces it with lower_case_with_underscores; e.g. myVariable becomes my_variable, but MyClass remains MyClass.
(On an unrelated note, now that I've done it, I think I prefer the appearance of mixedCase over lower_case_with_underscores. The appearance of so many underscores all over my code is weird. But I'll do things the Python Way and see if I get used to it, particularly if I intend for my code to be seen or worked with by others; or maybe I'll do it the way I like and I've now got a simple way of converting it to the PEP 8 way if I intend to make the code public.)
First of all, I'm not sure I'm asking for something that is doable.
Sometimes, when I have to do some refactoring in a large codebase, I need to change the output or even the signature of a certain function. Before than I do that, I have to make sure that that output is handled correctly by the other functions that are going to call it.
The way I do it is by searching for the function name, say get_timezone_from user and then I see that function is used by two other functions called format and change_timezone_for_user. And I start looking for where those two functions are used etc... until I end up with a graph where each path looks like a stacktrace.
There are two problems.
The first one is that doing this manually is pretty time consuming.
The second is that when I look at all occurrences of a function name like format, I will often find occurences of the word format in different contexts. It can be a comment or even another function defined somewhere else.
My question is simple:
Is there an IDE or a tool that allows to find where in your code a certain function is called? Even better would be if that tool is recursive so that it draws the whole graph.
I'm trying to achieve this in Python if that helps.
I have many python files.
Each file has functions. That's it. I don't do anything with classes.
Will Sphinx (or another documentation software) still create the documentation for it, and if so, what will it look like?
Yes, of course. That's one of the many normal ways to program in Python. You don't have to use classes -- in fact, many of the modules that come with Python are just collections of functions. Any Python documentation tool will be prepared to deal with this.
Ideally, Python functions and classes are all documented in a specific way that the language (and documentation-generating software) can see, called a docstring. If you put a string immediately after every function definition, like this:
def double(number):
"""Returns number times 2. May behave poorly if number isn't actually a number"""
return number * 2;
...then you can get the function double's docstring from double.__doc__.
For a more detailed description of how to write good docstrings, see PEP8. Search for "Documentation Strings" to skip to the part you care about.
In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?
Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.
As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)
In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)
Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.
I have a class called Path for which there are defined about 10 methods, in a dedicated module Path.py. Recently I had a need to write 5 more methods for Path, however these new methods are quite obscure and technical and 90% of the time they are irrelevant.
Where would be a good place to put them so their context is clear? Of course I can just put them with the class definition, but I don't like that because I like to keep the important things separate from the obscure things.
Currently I have these methods as functions that are defined in a separate module, just to keep things separate, but it would be better to have them as bound methods. (Currently they take the Path instance as an explicit parameter.)
Does anyone have a suggestion?
If the method is relevant to the Path - no matter how obscure - I think it should reside within the class itself.
If you have multiple places where you have path-related functionality, it leads to problems. For example, if you want to check if some functionality already exists, how will a new programmer know to check the other, less obvious places?
I think a good practice might be to order functions by importance. As you may have heard, some suggest putting public members of classes first, and private/protected ones after. You could consider putting the common methods in your class higher than the obscure ones.
If you're keen to put those methods in a different source file at any cost, AND keen to have them at methods at any cost, you can achieve both goals by using the different source file to define a mixin class and having your Path class import that method and multiply-inherit from that mixin. So, technically, it's quite feasible.
However, I would not recommend this course of action: it's worth using "the big guns" (such as multiple inheritance) only to serve important goals (such as reuse and removing repetition), and separating methods out in this way is not really a particularly crucial goal.
If those "obscure methods" played no role you would not be implementing them, so they must have SOME importance, after all; so I'd just clarify in docstrings and comments what they're for, maybe explicitly mention that they're rarely needed, and leave it at that.
I would just prepend the names with an underscore _, to show that the reader shouldn't bother about them.
It's conventionally the same thing as private members in other languages.
Put them in the Path class, and document that they are "obscure" either with comments or docstrings. Separate them at the end if you like.
Oh wait, I thought of something -- I can just define them in the Path.py module, where every obscure method will be a one-liner that will call the function from the separate module that currently exists. With this compromise, the obscure methods will comprise of maybe 10 lines in the end of the file instead of 50% of its bulk.
I suggest making them accessible from a property of the Path class called something like "Utilties". For example: Path.Utilities.RazzleDazzle. This will help with auto-completion tools and general maintenance.