Python: prefer several small modules or one larger module? [closed]

Python: prefer several small modules or one larger module? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a Python web application in which I have some small modules that serve very specific functions: session.py, logger.py, database.py, etc. And by "small" I really do mean small; each of these files currently includes around 3-5 lines of code, or maybe up to 10 at most. I might have a few imports and a class definition or two in each. I'm wondering, is there any reason I should or shouldn't merge these into one module, something like misc.py?
My thoughts are that having separate modules helps with code clarity, and later on, if by some chance these modules grow to more than 10 lines, I won't feel so bad about having them separated. But on the other hand, it just seems like such a waste to have a bunch of files with only a few lines in each! And is there any significant difference in resource usage between the multi-file vs. single-file approach? (Of course I'm nowhere near the point where I should be worrying about resource usage, but I couldn't resist asking...)
I checked around to see whether this had been asked before and didn't see anything specific to Python, but if it's in fact a duplicate, I'd appreciate being pointed in the right direction.

My thoughts are that having separate
modules helps with code clarity, and
later on, if by some chance these
modules grow to more than 10 lines, I
won't feel so bad about having them
separated.
This. Keep it the way you have it.

As a user of modules, I greatly prefer when I can include the entire module via a single import. Don't make a user of your package do multiple imports unless there's some reason to allow for importing different alternates.
BTW, there's no reason a single modules can't consist of multiple source files. The simplest case is to use an __init__.py file to simply load all the other code into the module's namespace.

Personally I find it easier to keep things like this in a single file, just for the practicality of editing a smaller number of files in my editor.
The important thing to do is treat the different pieces of code as though they were in separate files, so you ensure that you can trivially separate them later, for the reasons you cite. So for instance, don't introduce dependencies between the different pieces that will make it hard to disentangle them later.

For command line scripts there most likely will not be much difference unless each invocation invokes all files in the module, in which case there will be a slight performance cost as n files need to be opened vs one.
For mod_python there most likely will be no difference as byte-compiled modules stay alive for the duration of the apache process.
For google app engine though there will be a performance hit unless the service is constantly used and is "hot" as each cold start would require opening all files.

Off course you can have as many modules as you like.
But now let as think a little, what happens when we put every small code snippet into one single file.
We will end up in hundreds of import statements in any less trivial module. And off course you could also save a little by having all explicit in seperated files. But guess what: Nobody can remember so many module names and you might end up in searching for the right file anyway ...
I try to put things that belong together in one single file (unless it becomes to big!). But when I have small functions or classes that do not belong to other components in my system, I have "util" modules or the like. I also try to group these for example according to my application layering or seperate them by other means. One seperation criteria could be: Utilities that are used for UI and those that are not.

Small.

Related

How do I track and reuse small projects I've written across many large projects I'm writing?

I have written a single, monolithic project (let's call it Monolith.py) over the course of about 3 years in python. During that time I have created a lot of large and small utilities that would be useful in other projects.
Here are a couple simple examples:
Color.py: A small script that easily allows me to essentially colorize text. I use this a lot for print().
OExplorer.py: A larger script that is a CLI object explorer that I can use to causally browse classes and objects interactively.
Stuff like that. There's probably 20 of them, mostly modules. They are all under constant development.
My question is, what is the best way to use those in another project and keeping everything up to date?
I am using visual studio code and I use it to handle all my git stuff. I'm guessing there's a way to like nested git folders? I'm worried about screwing up my main repo.
I also do not want to separate out these projects into their own vscode workspace. It is better if they are left where they are. I also do not want to pull in all the code from the monolith for another project, that would be silly.
I think there is a simple git solution here. I would appreciate it if someone can give me some direction here and hand hold a bit, git is clearly not my strong suit.

Python evaluate and grade code from students [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
For a class, I would like to automatically evaluate (parts) of the coding assignments of students. The setup I had in mind is something like:
Students get a class skeleton, which they have to fill in.
Students ``upload'' this class definition to a server (or via webinterface)
The server runs a script an test on specific functions, eg class.sigmoid(x), and checks if the output of the function is correct and might give suggestions.
This setup brings a whole lot of problems, since you're evaluating untrusted code. However, it would be extremely useful, for many of my classes, so I'm willing to spend some time in thinking it trough. I remember Coursera had something similar for matlab/octace assignments, but I can't get the details of that.
I've looked at many online python interfaces (eg, codecademy.com, ideone.com, c9.io); while they seem perfect to learn and or share code, with online evaluation. I do miss the option, that the evaluation script is "hidden" from the students (ie the evaluation script should contain a correct reference implementation to compare output on random generated data). Moreover, the course I give requires some data (eg images) and packages (sklearn / numpy), which is not always available.
Specifically, my questions are
Do I miss an online environment which actually offers such a functionality. (that would be easiest)
To set this up myself, I was thinking to host it at (eg) amazon cloud (so no problem with infrastructure at University), but are there any python practices you could recommend on sandboxing the evaluation?
Thanks in advance for any suggestions!
Pity to hear that the question is not suitable for StackOverflow. Thanks to the people (partially) answering the question.
After some more feedback via other channels, I think my approach will become as follows:
Student gets skeleton and fills it in
Student also has the evaluation script.
In the script, some connections with a server are made to
login
obtain some random data
check if the output of the students code is numerically identical to what the server expects.
In this way the students code is evaluated locally, but only output is send to the server. This limits the kind of evaluations possible, but still allows for kind of automatic evaluation of code.

Sandboxing Python in general is impossible. You can try to prevent dangerous operations, which will mean significantly limiting what the student code can do. But that will likely leave open attack vectors anyway. A better option is to use OS-level sandboxing to isolate the Python process. The CodeJail library uses AppArmor to provide a safe Python eval, for example.
As an example of the difficulty of sandboxing Python, see Eval really is dangerous, or consider this input to your sandbox: 9**9**99, which will attempt to compute an integer on the order of a googolplex, consuming all of your RAM after a long time.

This is currently a very active field in programming languages research.
I know of these two different approaches that look at the problem:
- http://arxiv.org/pdf/1409.0166.pdf
- http://research.microsoft.com/en-us/um/people/sumitg/pubs/cacm14.pdf (this is actually only one of very many papers by Sumit and his group)
You may want to look at these things to find something that could help with your problem (and edit this answer to make it more useful).

How to reduce errors in dynamic language such as python, and improve my code quality？ [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've always have trouble with dynamic language like Python.
Several troubles:
Typo error, I can use pylint to reduce some of these errors. But there's still some errors that pylint can not figure out.
Object type error, I often forgot what type of the parameter is, int? str? some object? Also, forgot the type of some object in my code.
Unit test might help me sometimes, but I'm not always have enough time to do UT. When I need a script to do a small job, the line of code are 100 - 200 lines, not big, but I don't have time to do the unit test, because I need to use the script as soon as possible. So, many errors appear.
So, any idea on how to reduce the number of these troubles?

Unit testing is the best way to handle this. If you think the testing is taking too much time, ask yourself how much time you are loosing on defects - identifying, diagnosing and rectifying - after you have released the code.
In effect, you are testing in production, and there's plenty of evidence to show that defects found later in the development cycle can be orders of magnitude more expensive to fix.

In addition to unittesting (see chamila_c's answer), sticking to good conventions and coding style helps. I think I know the sort of one-use scripts you're talking about (assuming that is what you're talking about), and often writing a full test suite for them seems like overkill. A few other tips which might help:
Break your code up into functions. It is often easier to identify a problem, especially a naming problem if you are dealing with a small, isolated piece of code. It is also much easier to write a unit test for small function. I find that using this approach means you don't need a full testing suite to test and isolate an identified problem.
Stick to a consistent, and expressive naming convention. For example, use min_value = min(all_values) rather than something arbitrary like a = min(b). Same goes for function names, use def calculate_mean(sequence) rather than def f(s)
Read and apply PEP8.
Document your code with comments. This only takes a few seconds while writing the code, and it makes it much easier to figure out what's going on when you come back to it. Its amazing the detail you can forget just in one day!

Python 'theory' - constructing a multifunction program - how to plan a basic flow [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I've written a few little things in python, and I am ramping up to build something a little more challenging.
The last project I made basically ingested some text files, did some regex over each file and structured the data in an useful way so I could investigate some data I have.
I found it quite tough near the end to remember what section operated on what part of the text, especially as the code grew as I 'fixed' things along the way.
In my head, I imagine my code to be a series of small interconnected modules - descrete .py files that I can leave to one side knowing what they do, and how they interoperate.
The colleague that showed me how to def functions basically meant that I ended up with one really long piece of code that I found really hard to navigate and troubleshoot.
(1) Is this the right way? or is there an easier way of making modules that pass variables between them, I think i would find this better, as I could visualise the flow better (mainly becuase its how I was used to working in MATLAB a few years ago I guess)
(2) Can you use this method to plan out the various layers of functions before hand to give you a 'map' to write towards?
(3) is there any easy to access tutorials for this kind of stuff? I often find the tutorials suddenly jump way over my head....
Thanks.

(1) It is possible to write a fine programme in a single .py file
(2) In any style of programming, it is always (apart from special, hardware-driven cases) best to break your code up into short functions (or methods) that accomplish a discrete task.
(3) Experienced programmers will frequent write their code one way, discover a problem, either write more code, or different code, and consider whether any of their existing code can be broken out into a separate function.
A sign that you need to do this is when you are sequentially assigning to variables to pass data down your function. Never copy-paste your code to another place, even with changes, unless it be to break it out as a function, and replace the original code with a call to that function.
(4) In many cases, it can be useful to organise your code into classes and objects, even when it is not technologically necessary to do so. It can help you see that you have defined a complete set of operations (or not) necessary on some collection of data.
(5) Programming is actually quite hard. Even among those who have a talent for it, it takes a while to be comfortable. As an illustration, when I was doing my master's degree, I and my (fairly talented) friends all felt only in our final year that we had begun to achieve a degree of facility and competence (and these are all people who had been programming since at least their teenage years).
The important thing is to keep learning and improving, rather than repeating the same one or two years of experience over and over.
(6) To that end, read books and articles. Try new things. Think.

Others have suggested studying other experienced programmers' code from open source projects, etc. and from tutorials and textbooks, which is sound advice. Sometimes a similar example is all you need to set you on the right path.
I also suggest to use your own frustration and experience as feedback to help yourself improve. Whenever you find yourself thinking any of the following:
It feels like I'm writing the same code over and over again with only small changes
I wrote this code myself, but I had to study it for a long time to re-learn how it works
Each time I go back and add something to this code it takes me longer to get it working again
There's a bug in this code somewhere, but I haven't a clue where
Surely somebody somewhere has solved this problem already
Why is this taking me so long to get done?
That means you have room for improvement in your technique. A lot of the difference between an expert and beginning programmer is the ability to do the following:
Don't Repeat Yourself (DRY): Instead of copy-pasting code, or writing the same code over and over with variations, write a single common routine with one or more parameters that can do all of those things. Then call that routine in multiple places.
Keep It Simple (KIS): Break up your code into simple well-defined behaviors/routines that make sense on their own, organized into classes/modules/packages, so that each part of the overall program is easy to understand and maintain. Write informative and concise comments, and document the calls even if you don't intend to publish them.
Divide & Conquer Testing: Thoroughly test each individual class, function, etc. by itself (preferably with a unit-testing framework) as you develop it, rather than only testing the entire application.
Don't Re-invent the Wheel: Use open source frameworks or other tools where possible to solve problems that are general and not specific to your application. In all but the most trivial cases, there is a risk that you do not fully understand the problem and your home-grown solution may be lacking in an important way.
Estimate Honestly: Study your own previous efforts to learn how long it takes you to do certain things. Try to work faster next time, but don't assume you will. Measure yourself and use your own experience to estimate future effort. Set expectations and bargain with scope.

It's hard to know even where to begin answering your question without a snippet of your code for reference. You might want to post your code to a free public site such as http://www.bitbucket.org/ or http://www.github.org/ and then include some specific questions about small snippets of code with links back to your repository. This allows respondents here to look at the code and comment on it. (Both of these options even include color syntax highlighting, and interested correspondent can even pull the code down, make changes and push up a patch or create their own branch of your code and send you a "pull" request so you can look at the differences and pull selected changesets back into your branch).
More generally there are a number of approaches to program design. You seem to be trying to re-invent a very old methodology which is referred to as "functional decomposition" --- look at the overall task at hand as a function (digest text files) and consider how that breaks down (decomposes) into smaller functions (ingest input files, parse them, prepare results, output those) and then breaking those down further until you have units which are small enough to be coded easily in your programming environment (Python).
Modern approaches (and tools) tend to use object oriented design methodologies. You might try reading: http://www.itmaybeahack.com/homepage/books/oodesign/build-python/html/index.html

What refactoring tools do you use for Python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a bunch of classes I want to rename. Some of them have names that are small and that name is reused in other class names, where I don't want that name changed. Most of this lives in Python code, but we also have some XML code that references class names.
Simple search and replace only gets me so far. In my case, I want to rename AdminAction to AdminActionPlug and AdminActionLogger to AdminActionLoggerPlug, so the first one's search-and-replace would also hit the second, wrongly.
Does anyone have experience with Python refactoring tools ? Bonus points if they can fix class names in the XML documents too.

In the meantime, I've tried it two tools that have some sort of integration with vim.
The first is Rope, a python refactoring library that comes with a Vim (and emacs) plug-in. I tried it for a few renames, and that definitely worked as expected. It allowed me to preview the refactoring as a diff, which is nice. It is a bit text-driven, but that's alright for me, just takes longer to learn.
The second is Bicycle Repair Man which I guess wins points on name. Also plugs into vim and emacs. Haven't played much with it yet, but I remember trying it a long time ago.
Haven't played with both enough yet, or tried more types of refactoring, but I will do some more hacking with them.

I would strongly recommend PyCharm - not just for refactorings. Since the first PyCharm answer was posted here a few years ago the refactoring support in PyCharm has improved significantly.
Python Refactorings available in PyCharm (last checked 2016/07/27 in PyCharm 2016.2)
Change Signature
Convert to Python Package/Module
Copy
Extract Refactorings
Inline
Invert Boolean
Make Top-Level Function
Move Refactorings
Push Members down
Pull Members up
Rename Refactorings
Safe Delete
XML refactorings (I checked in context menu in an XML file):
Rename
Move
Copy
Extract Subquery as CTE
Inline
Javascript refactorings:
Extract Parameter in JavaScript
Change Signature in JavaScript
Extract Variable in JavaScript

WingIDE 4.0 (WingIDE is my python IDE of choice) will support a few refactorings, but I just tried out the latest beta, beta6, and... there's still work to be done. Retract Method works nicely, but Rename Symbol does not.
Update: The 4.0 release has fixed all of the refactoring tools. They work great now.

I would take a look at Bowler (https://pybowler.io).
It's better suited for use directly from the command-line than rope and encourages scripting (one-off scripts).

Your IDE can support refactorings !!
Check it Eric, Eclipse, WingIDE have build in tools for refactorings (Rename including). And that are very safe refactorings - if something can go wrong IDE wont do ref.
Also consider adding few unit test to ensure your code did not suffer during refactorings.

PyCharm have some refactoring features.
PYTHON REFACTORING
Rename refactoring allows to perform global code changes safely and instantly. Local changes within a file are performed in-place. Refactorings work in plain Python and Django projects.
Use Introduce Variable/Field/Constant and Inline Local for improving the code structure within a method, Extract Method to break up longer methods, Extract Superclass, Push Up, Pull Down and Move to move the methods and classes.

You can use sed to perform this. The trick is to recall that regular expressions can recognize word boundaries. This works on all platforms provided you get the tools, which on Windows is Cygwin, Mac OS may require installing the dev tools, I'm not sure, and Linux has this out of the box. So grep, xargs, and sed should do the trick, after 12 hours of reading man pages and trial and error ;)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.