Overview over used and non used function Python / Django application - python

Concept of this question is to gather information on how you would
proceed to gather information about whether a function and/or class is
in use in the entirety of an application.
Background information
The application is 3-5 years old, based originally on Python 2.4 (Upgraded over the years to latest Python 2.7.11), Django 1.0 (Upgraded over the years to 1.4.22), some custom frameworks which implement some ruby on rails magic (Create a controller file with function names and they turn into an HTTP endpoint, functions with _ infront will not be visible to the enduser), total number of endpoints, derived from django.url* tells me I have to manually create 100 endpoints for various needs and purpose. Number of Django apps/modules is around 20, they are entangled into each other, and I know not all are used, but heres the thing, how would I proceed to gather information to tell which function are used or not? So I could do a refactoring of the code, to reduce noise?
I've used PyCharm and its intelligence but based on how the application and how Python works, some of the suggestion are making the application not working.
Example of above: some functions in models, and views are not using self and then PyCharm thinks 'well this function can be changed to a static method' but somewhere else in the code the previous developer use self."function_name" and by using that call it actually imply "Please provide me with self and the argument".
TLDR: How to proceed to weed out dead and not used code in an easy and efficient way? Thanks for all input in advance.

Related

Global Reliance Making Unit Testing Impossible

Background:
I am a hobbyist Python coder whose first and only project is a text-based Adventure Game Engine (https://github.com/zig13/scenzig).
An Adventure Game is a common first project but I realised I didn't have the creativity to write an adventure so decided instead to write a framework so someone else could do the adventure bit.
It started out as a monolithic script with no functions - at one point 712 lines long - but as I have learnt Python it has been repeatedly rewritten.
I recently learned of the concepts of unit testing and logging and I think they would be really helpful to me going forward but I have hit a snag.
The majority of my functions need access to the Adventure files and/or Character file to function.
These are accessed as a dictionary (thanks to ConfigObj) and a reference to the dictionaries is passed among the modules like this:
def GiveAdv(a) :
global adv
adv = a
efunc.GiveAdv(a)
def GiveChar(c) :
global char
char = c
argsolve.GiveChar(char)
efunc.GiveChar(c)
i.e. After importing another module, a module passes on it's reference to the adventure files and character files. Later in the same module adv.f['Actions'][str(action)] is used to get the information on a specific action from the Actions file for example.
I've never liked this but it has been fine until now but is a problem now because:
I am trying to merge two modules rewriting much from scratch in the process and I have no way of testing as I go as everything is so inter-connected
Unit testing relies on a function taking arguments and producing outputs but most of mine also expect to have access to information in addition to their arguments
As I see it, Logging (using the logging module) would require somehow passing a reference to the log file around and I just can't believe that people would resort to my horrible "Give" function solution so there must be a better way.
What is the Pythonic way of giving disparate modules access to the same information, class instances and loggers?

How do I get the local variables defined in a Python function?

I'm working on a project to do some static analysis of Python code. We're hoping to encode certain conventions that go beyond questions of style or detecting code duplication. I'm not sure this question is specific enough, but I'm going to post it anyway.
A few of the ideas that I have involve being able to build a certain understanding of how the various parts of source code work so we can impose these checks. For example, in part of our application that's exposing a REST API, I'd like to validate something like the fact that if a route is defined as a GET, then arguments to the API are passed as URL arguments rather than in the request body.
I'm able to get something like that to work by pulling all the routes, which are pretty nicely structured, and there are guarantees of consistency given the route has to be created as a route object. But once I know that, say, a given route is a GET, figuring out how the handler function uses arguments requires some degree of interpretation of the function source code.
Naïvely, something like inspect.getsourcelines will allow me to get the source code, but on further examination that's not the best solution because I immediately have to build interpreter-like features, such as figuring out whether a line is a comment, and then do something like use regular expressions to hunt down places where state is moved from the request context to a local variable.
Looking at tools like PyLint, they seem mostly focused on high-level "universals" of static analysis, and (at least on superficial inspection) don't have obvious ways of extracting this sort of understanding at a lower level.
Is there a more systematic way to get this representation of the source code, either with something in the standard library or with another tool? Or is the only way to do this writing a mini-interpreter that serves my purposes?

Tracking changes in python source files?

I'm learning python and came into a situation where I need to change the behvaviour of a function. I'm initially a java programmer so in the Java world a change in a function would let Eclipse shows that a lot of source files in Java has errors. That way I can know which files need to get modified. But how would one do such a thing in python considering there are no types?! I'm using TextMate2 for python coding.
Currently I'm doing the brute-force way. Opening every python script file and check where I'm using that function and then modify. But I'm sure this is not the way to deal with large projects!!!
Edit: as an example I define a class called Graph in a python script file. Graph has two objects variables. I created many objects (each with different name!!!) of this class in many script files and then decided that I want to change the name of the object variables! Now I'm going through each file and reading my code again in order to change the names again :(. PLEASE help!
Example: File A has objects x,y,z of class C. File B has objects xx,yy,zz of class C. Class C has two instance variables names that should be changed Foo to Poo and Foo1 to Poo1. Also consider many files like A and B. What would you do to solve this? Are you serisouly going to open each file and search for x,y,z,xx,yy,zz and then change the names individually?!!!
Sounds like you can only code inside an IDE!
Two steps to free yourself from your IDE and become a better programmer.
Write unit tests for your code.
Learn how to use grep
Unit tests will exercise your code and provide reassurance that it is always doing what you wanted it to do. They make refactoring MUCH easier.
grep, what a wonderful tool grep -R 'my_function_name' src will find every reference to your function in files under the directory src.
Also, see this rather wonderful blog post: Unix as an IDE.
Whoa, slow down. The coding process you described is not scalable.
How exactly did you change the behavior of the function? Give specifics, please.
UPDATE: This all sounds like you're trying to implement a class and its methods by cobbling together a motley patchwork of functions and local variables - like I wrongly did when I first learned OO coding in Python. The code smell is that when the type/class of some class internal changes, it should generally not affect the class methods. If you're refactoring all your code every 10 mins, you're doing something seriously wrong. Step back and think about clean decomposition into objects, methods and data members.
(Please give more specifics if you want a more useful answer.)
If you were only changing input types, there might be no need to change the calling code.
(Unless the new fn does something very different to the old one, in which case what was the argument against calling it a different name?)
If you changed the return type, and you can't find a common ancestor type or container (tuple, sequence etc.) to put the return values in, then yes you need to change its caller code. However...
...however if the function should really be a method of a class, declare that class and the method already. The previous paragraph was a code smell that your function really should have been a method, specifically a polymorphic method.
Read about code smells, anti-patterns and When do you know you're dealing with an anti-pattern?. There e.g. you will find a recommendation for the video "Recovery from Addiction - A taste of the Python programming language's concision and elegance from someone who once suffered an addiction to the Java programming language." - Sean Kelly
Also, sounds like you want to use Test-Driven Design and add some unittests.
If you give us the specifics we can critique it better.
You won't get this functionality in a text editor. I use sublime text 3, and I love it, but it doesn't have this functionality. It does however jump to files and functions via its 'Goto Anything' (Ctrl+P) functionality, and its Multiple Selections / Multi Edit is great for small refactoring tasks.
However, when it comes to IDEs, JetBrains pycharm has some of the amazing re-factoring tools that you might be looking for.
The also free Python Tools for Visual Studio (see free install options here which can use the free VS shell) has some excellent Refactoring capabilities and a superb REPL to boot.
I use all three. I spend most of my time in sublime text, I like pycharm for refactoring, and I find PT4VS excellent for very involved prototyping.
Despite python being a dynamically typed language, IDEs can still introspect to a reasonable degree. But, of course, it won't approach the level of Java or C# IDEs. Incidentally, if you are coming over from Java, you may have come across JetBrains IntelliJ, which PyCharm will feel almost identical to.
One's programming style is certainly different between a statically typed language like C# and a dynamic language like python. I find myself doing things in smaller, testable modules. The iteration speed is faster. And in a dynamic language one relies less on IDE tools and more on unit tests that cover the key functionality. If you don't have these you will break things when you refactor.
One answer only specific to your edit:
if your old code was working and does not need to be modified, you could just keep old names as alias of the new ones, resulting in your old code not to be broken. Example:
class MyClass(object):
def __init__(self):
self.t = time.time()
# creating new names
def new_foo(self, arg):
return 'new_foo', arg
def new_bar(self, arg):
return 'new_bar', arg
# now creating functions aliases
foo = new_foo
bar = new_bar
if your code need rework, rewrite your common code, execute everything, and correct any failure. You could also look for any import/instantiation of your class.
One of the tradeoffs between statically and dynamically typed languages is that the latter require less scaffolding in the form of type declarations, but also provide less help with refactoring tools and compile-time error detection. Some Python IDEs do offer a certain level of type inference and help with refactoring, but even the best of them will not be able to match the tools developed for statically typed languages.
Dynamic language programmers typically ensure correctness while refactoring in one or more of the following ways:
Use grep to look for function invocation sites, and fix them. (You would have to do that in languages like Java as well if you wanted to handle reflection.)
Start the application and see what goes wrong.
Write unit tests, if you don't already have them, use a coverage tool to make sure that they cover your whole program, and run the test suite after each change to check that everything still works.

Understand programmatically a python code without executing it

I am implementing a workflow management system, where the workflow developer overloads a little process function and inherits from a Workflow class. The class offers a method named add_component in order to add a component to the workflow (a component is the execution of a software or can be more complex).
My Workflow class in order to display status needs to know what components have been added to the workflow. To do so I tried 2 things:
execute the process function 2 times, the first time allow to gather all components required, the second one is for the real execution. The problem is, if the workflow developer do something else than adding components (add element in a databases, create a file) this will be done twice!
parse the python code of the function to extract only the add_component lines, this works but if some components are in a if / else statement and the component should not be executed, the component apears in the monitoring!
I'm wondering if there is other solution (I thought about making my workflow being an XML or something to parse easier but this is less flexible).
You cannot know what a program does without "executing" it (could be in some context where you mock things you don't want to be modified but it look like shooting at a moving target).
If you do a handmade parsing there will always be some issues you miss.
You should break the code in two functions :
a first one where the code can only add_component(s) without any side
effects, but with the possibility to run real code to check the
environment etc. to know which components to add.
a second one that
can have side effects and rely on the added components.
Using an XML (or any static format) is similar except :
you are certain there are no side effects (don't need to rely on the programmer respecting the documentation)
much less flexibility but be sure you need it.

global variables in web2py [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I want to understand how global variables are implemented in web2py web framework.
I assume that a reader knows the structure of web2py app.
I don't understand how come variable db is available in every controller .py file without any import statement.
I found that it db is created in models/db.py and it is bonded to current.db.
How db is made globally available without any import.
Thank you!
This is right there in the docs:
Models, controllers, and views are executed in an environment where the following objects are already imported for us:
(That's the reference docs; the tutorial says effectively the same thing, but without the complete list of everything that's imported.)
If you want to know how that works, the basic concept simple: web2py doesn't just run your web app as a standalone script; it loads and executes your code the way it wants to. If you want full details, see the source. (From compileapp.py, it looks like they're part-way through publicly exposing the interfaces for loading applications and their components with an environment, but haven't gotten there yet.)
If you want to know different ways you could do something similar, there are two basic ways.
The hacky solution is to skip import entirely, and use exec to run the code within a custom globals. (Slightly better, you can compile the file (even using the standard .pyc caching mechanism if you want), and then exec the resulting code object.) This makes sense if you want to run the module directly in the top level namespace, or if you need more isolation than modules can give you and plan to build it yourself. But usually it's not what you want.
The other solution is to intercept part of the import process. For simple cases, it's just a matter of calling __import__ with a custom globals.
But a framework that's doing this often needs to customize a lot more. Python 3.3+ makes this relatively easily; if you want to be compatible with a wide range of Python versions, you end up rewriting large chunks of the import process yourself, which I'm guessing is what web2py does.
In web2py, model files are executed in an environment that has been populated with many of the framework API objects. The controller is then executed in that same environment after the model files have run, so any objects created in the models will be available in the controller (and the view).
For more details, check out the Workflow section in the book and the end of the Dispatching section.
Note, items such as db that are defined in a model file are not added to the current thread local object by web2py, though you can explicitly add them yourself in the app code.

Categories