I'm working on a huge Python module, like this:
import millions, and, billions, of, modules...
...lots of functions...
def myfunc
...with huge body
...more functions
I'd like to extract myfunc to its own module. However, tracking down all the imports I need is actually pretty tedious. Is there a way to do this automatically using Eclipse? I'm using Eclipse 3.7.0 with Aptana Studio plugin (and hence PyDev). There's an "extract method" refactoring tool, but it doesn't do this.
Ok, maybe this is easier than I thought:
Copy function definition to a new file
Copy the entire imports section to that new file
Unused imports are shown with a wiggly yellow line. Delete them.
Delete the function definition from the original file, replacing it with a call reference.
Now you have unused imports here too, so delete those as per 3.
It's not automatic, but it's relatively straightforward and painless.
Related
For years, I've known that the very definition of a Python module is as a separate file. In fact, even the official documentation states that "a module is a file containing Python definitions and statements". Yet, this online tutorial from people who seem pretty knowledgeable states that "a module usually corresponds to a single file". Where does the "usually" come from? Can a Python module consist of multiple files?
Not really.
Don't read too much into the phrasing of one short throwaway sentence, in a much larger blog post that concerns packaging and packages, both of which are by nature multi-file.
Imports do not make modules multifile
By the logic that modules are multifile because of imports... almost any python module is multifile. Unless the imports are from the subtree, which has no real discernible difference to code using the module. That notion of subtree imports, btw, is relevant... to Python packages.
__module__, the attribute found on classes and functions, also maps to one file, as determined by import path.
The usefulness of expanding the definition of modules that way seems… limited, and risks confusion. Let imports be imports ans modules be modules (i.e. files).
But that's like, my personal opinion.
Let's go all language lawyer on it
And refer to the Python tutorial. I figure they will be talking about modules at some point and will be much more careful in their wording than a blog post which was primarily concerned about another subject.
6. Modules
To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module; definitions from a module can be imported into other modules or into the main module (the collection of variables that you have access to in a script executed at the top level and in calculator mode).
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module’s name (as a string) is available as the value of the global variable name.
p.s. OK, what about calling it a file, instead of a module, then?
That supposes that you store Python code in a file system. But you could have an exotic environment that stores it in a database instead (or embeds it in a larger C/Rust executable?). So, module, seems better understood as a "contiguous chunk of Python code". Usually that's a file, but having a separate term allows for flexibility, without changing anything to the core concepts.
Yup, a python module can include more than one file. Basically what you would do is get a file for the main code of the module you are writing, and in that main file include some other tools you can use.
For example, you can have the file my_splitter_module.py, in which you have... say a function that gets a list of integers and split it in half creating two lists. Now say you wanna multiply all the numbers that are in the first half between each other ([1, 2, 3] -> 1 * 2 * 3), but with the other half sum them ([1, 2, 3] -> 1 + 2 + 3). Now say you don't want to make the code messy and so you decide to make another two functions, one that gets a list and multiply its items, and another that sum them.
Of course, you could make the two functions in the same my_splitter_module.py file, but in other situations when you have big files with big classes etc, you would like to make a file like multiply_list.py and sum_list.py, and then importing them to the my_splitter_module.py
At the end, you would import my_splitter_module.py to your main.py file, and while doing this, you would also be importing multiply_list.py and sum_list.py files.
Yes, sure.
If you create a folder named mylib in your PATH or in the same directory as your script, it allows you to use import mylib.
Make sure to put __init__.py in the folder and in that imoprt everything from other files because the variables, functions, etc. import just from the __init__.py.
For example:
project -+- lib -+- __init__.py
| +- bar.py
| +- bar2.py
|
+- foo.py
__init__.py :
from bar import test, random
from bar2 import sample
foo.py :
import lib
print(test)
sample()
Hope it helps.
In order to simplify my code, I have put various functions into external files with I load via:
from (external_file) import (function_name)
...which works fine.
My question though has to do with other modules, such as cv2 or numpy - do I need those listed in my external file (as well as my main file) or is there a way to just list them in my main file?
Each file you put Python code in is its own module. Each module has its own namespace. If some of your code (in any module) uses some library code, it will need some way to access the library from the namespace it is defined in.
Usually this means you need to import the library in each module it's being used from. Don't worry about duplication, modules are cached when they are first loaded, so additional imports from other modules will quickly find the existing module and just add a reference to it in their own namespaces.
Note that it's generally not a good idea to split up your code too much. There's certainly no need for every function or every class to have its own file. Instead, use modules to group related things together. If you have a couple of functions that interoperate a lot, put them in the same module.
I'm using Sphinx to document a project that depends on wxPython, using the autodocs extension so that it will automatically generate pages from our docstrings. The autodocs extension automatically operates on every module you import, which is fine for our packages but is a problem when we import a large external library like wxPython. Thus, instead of letting it generate everything from wxPython I'm using the unittest.mock library module (previously the external package Mock). The most basic setup works fine for most parts of wxPython, but I've run into a situation I can't see an easy way around (likely because of my relative unfamiliarity with mock until this week).
Currently, the end of my conf.py file has the following:
MOCK_MODULES = ['wx.lib.newevent'] # I've skipped irrelevant entries...
for module_name in MOCK_MODULES:
sys.modules[module_name] = mock.Mock()
For all the wxPython modules but wx.lib.newevent, this works perfectly. However, here I'm using the newevent.NewCommandEvent() function[1] to create an event for a particular scenario. In this case, I get a warning on the NewCommandEvent() call with the note TypeError: 'Mock' object is not iterable.
While I can see how one would use patching to handle this for building out unit tests (which I will be doing in the next month!), I'm having a hard time seeing how to integrate that at a simple level in my Sphinx configuration.
Edit: I've just tried using MagicMock() as well; this still produces an error at the same point, though it now produces ValueError: need more than 0 values to unpack. That seems like a step in the right direction, but I'm still not sure how to handle this short of explicitly setting it up for this one module. Maybe that's the best solution, though?
Footnotes
Yes, that's a function, naming convention making it look like a class notwithstanding; wxPython follows the C++ naming conventions which are used throughout the wxWidgets toolkit.
From the error, it looks like it is actually executing newevent.NewCommandEvent(), so I assume that somewhere in your code you have a top-level line something like this:
import wx.lib.newevent
...
event, binder = wx.lib.newevent.NewCommandEvent()
When autodoc imports the module, it tries to run this line of code, but since NewCommandEvent is actually a Mock object, Python can't bind its output to the (event, binder) tuple. There are two possible solutions. The first is to change your code to that this is not executed on import, maybe by wrapping it inside if __name__ == '__main__'. I would recommend this solution because creating objects like this on import can often have preblematic side effects.
The second solution is to tell the Mock object to return appropriate values thus:
wx.lib.newevent.NewCommandEvent = mock.Mock(return_value=(Mock(), Mock()))
However, if you are doing anything in your code with the returned values you might run into the same problem further down the line.
I am managing a quite large python code base (>2000 lines) that I want anyway to be available as a single runnable python script. So I am searching for a method or a tool to merge a development folder, made of different python files into a single running script.
The thing/method I am searching for should take code split into different files, maybe with a starting __init___.py file that contains the imports and merge it into a single, big script.
Much like a preprocessor. Best if a near-native way, better if I can anyway run from the dev folder.
I have already checked out pypp and pypreprocessor but they don't seem to take the point.
Something like a strange use of __import__() or maybe a bunch of from foo import * replaced by the preprocessor with the code? Obviously I only want to merge my directory and not common libraries.
Update
What I want is exactly mantaining the code as a package, and then being able to "compile" it into a single script, easy to copy-paste, distribute and reuse.
It sounds like you're asking how to merge your codebase into a single 2000-plus source file-- are you really, really sure you want to do this? It will make your code harder to maintain. Python files correspond to modules, so unless your main script does from modname import * for all its parts, you'll lose the module structure by converting it into one file.
What I would recommend is leaving the source structured as they are, and solving the problem of how to distribute the program:
You could use PyInstaller, py2exe or something similar to generate a single executable that doesn't even need a python installation. (If you can count on python being present, see #Sebastian's comment below.)
If you want to distribute your code base for use by other python programs, you should definitely start by structuring it as a package, so it can be loaded with a single import.
To distribute a lot of python source files easily, you can package everything into a zip archive or an "egg" (which is actually a zip archive with special housekeeping info). Python can import modules directly from a zip or egg archive.
waffles seems to do exactly what you're after, although I've not tried it
You could probably do this manually, something like:
# file1.py
from .file2 import func1, func2
def something():
func1() + func2()
# file2.py
def func1(): pass
def func2(): pass
# __init__.py
from .file1 import something
if __name__ == "__main__":
something()
Then you can concatenate all the files together, removing any line starting with from ., and.. it might work.
That said, an executable egg or regular PyPI distribution would be much simpler and more reliable!
How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.