I am writing a program in Python 2.7 that retrieves remote files and dumps them in a directory that can be specified by the user. Currently, in order to verify that the program can in fact write to that directory, I do the following (assuming that dumpdir is a string containing the name of the directory to check):
try:
os.mkdir(dumpdir+'/.mwcrawler')
os.rmdir(dumpdir+'/.mwcrawler')
except:
logging.error('Could not open %s for writing, using default', dumpdir)
But this feels even more hackish than my usual code. What's the correct way to go about this? Maybe some sort of assertion on privileges?
In general, it's better to ask for forgiveness than permission—you have to handle errors in writing each file anyway, so why check in advance?
But, when you have a user interface—even a command-line interface, where you may read a prefs file long before you get to any writing—it's often much more convenient to the user to return errors as soon as possible. As long as that's the only reason you're doing this check, there's nothing wrong with it.
However, there are many little ways you could improve the way you do the check.
First, you should almost never just use except: without specifying anything. Besides the fact that this catches different things in different versions of Python (and therefore also confuses human readers used to other versions), it means you have no way of distinguishing between not writable, a bad Unicode character, or even a typo in the code.
Plus, your error message says "not readable" if it can't write, which is pretty odd.
Second, unless you're sure nobody will ever have a file named .mwcrawler (e.g., because you refuse to transfer files starting with '.' or something), using any fixed name is just asking for trouble. A better solution is to use, e.g., tempfile.mkdtemp.
Also, you should avoid using string manipulation for paths if you want to be portable. That's what os.path (and higher-level utilities) are for—so you don't have to learn or think about Windows, zOS, etc.
Putting it all together:
try:
d = tempfile.mkdtemp(prefix='.mwcrawler', dir=dumpdir)
except Exception as e:
logging.error('Could not open %s for reading (%s), using default', dumpdir, e)
else:
os.rmdir(d)
This link describes the usage of os.access, a method specifically created for your needs.
It also explains a better way of approaching rights checking.
As also rightfully mentioned in comments, os.access will have issues in a few specific cases, so just to be totally sure, "hit-n-run" approach is actually better, try writing, catch exception, see what happened - go from there.
Related
This is a novice question.
Consider the below code block :
try:
import os
except ImportError as error:
print " Unable to import buildin module os"
raise error
Do we need to add exception block while importing python built-in modules(like above? What would cause to fail importing a built in module?
Can someone point at python documentation explaining this theory?
Short answer, no.
Longer answer: it doesn't help your program much to catch exceptions that you can't do anything about. Some file is missing -- you can report it, maybe ask the user again, or perhaps it is known that this sometimes happens and you can give a clear error message explaining why. Some API call fails -- maybe it can be retried, or someone needs to receive a message that a service is down.
But something as basic as this... First, it never happens (I've never seen import os fail in twenty years). Second, if that fails, there's nothing your program can usefully do (if this fails, chances are print also fails). And also, the library documentation doesn't say that this is something that can happen.
You have to rely on the basic system working. Only catch exceptions when it is known that they could be raised and you have a way to deal with them.
There are a couple of reasons that the code in the question is pretty much pointless.
First of all, it does not add any new information. The error is just reraised. The printout adds no new information that isn't already in the error and stack trace.
Second, as #RemcoGerlich's answer points out, you are asking specifically about builtin modules. It would make sense to react to the absence of an optional module by either finding a replacement or turning off program features, but there's nothing much you can do in response to your platform being broken.
Failure of builtin imports is never addressed in the documentation explicitly to the best of my knowledge. Builtin module imports can fail for any of the reasons a normal import can fail. Builtins are a collection of Python files and C-extensions (in CPython at least). Modifying, replacing, deleting any of these files can lead to anything from import failures to the interpreter not starting up at all. Setting the wrong file permissions can have a similar effect.
E.g. If I am trying to open a file, can I not simply check if os.path.exists(myfile) instead of using try/except . I think the answer to why I should not rely on os.path.exists(myfile) is that there may be a number of other reasons why the file may not open.
Is that the logic behind why error handling using try/except should be used?
Is there a general guideline on when to use Exceptions in Python.
Race conditions.
In the time between checking whether a file exists and doing an operation that file might have been deleted, edited, renamed, etc...
On top of that, an exception will give you an OS error code that allows you to get more relevant reason why an operation has failed.
Finally, it's considered Pythonic to ask for forgiveness, rather than ask for permission.
Generally you use try/except when you handle things that are outside of the parameters that you can influence.
Within your script you can check variables for type, lists for length, etc. and you can be sure that the result will be sufficient since you are the only one handling these objects. As soon however as you handle files in the file system or you connect to remote hosts etc. you can neither influence or check all parameters anymore nor can you be sure that the result of the check stays valid.
As you said,
the file might be existent but you don't have access rights
you might be able to ping a host address but a connection is declined
There are too many factors that could go wrong to check them all seperately plus, if you do, they might still change until you actually perform your command.
With the try/error you can generally catch every exception and handle the most important errors individually. You make sure that the error is handled even if the test succeeds at first but fails after you start running your commands.
Say you have a some meta data for a custom file format that your python app reads. Something like a csv with variables that can change as the file is manipulated:
var1,data1
var2,data2
var3,data3
So if the user can manipulate this meta data, do you have to worry about someone crafting a malformed meta data file that will allow some arbitrary code execution? The only thing I can imagine if you you made the poor choice to make var1 be a shell command that you execute with os.sys(data1) in your own code somewhere. Also, if this were C then you would have to worry about buffers being blown, but I don't think you have to worry about that with python. If your reading in that data as a string is it possible to somehow escape the string "\n os.sys('rm -r /'), this SQL like example totally wont work, but is there similar that is possible?
If you are doing what you say there (plain text, just reading and parsing a simple format), you will be safe. As you indicate, Python is generally safe from the more mundane memory corruption errors that C developers can create if they are not careful. The SQL injection scenario you note is not a concern when simply reading in files in python.
However, if you are concerned about security, which it seems you are (interjection: good for you! A good programmer should be lazy and paranoid), here are some things to consider:
Validate all input. Make sure that each piece of data you read is of the expected size, type, range, etc. Error early, and don't propagate tainted variables elsewhere in your code.
Do you know the expected names of the vars, or at least their format? Make sure the validate that it is the kind of thing you expect before you use it. If it should be just letters, confirm that with a regex or similar.
Do you know the expected range or format of the data? If you're expecting a number, make sure it's a number before you use it. If it's supposed to be a short string, verify the length; you get the idea.
What if you get characters or bytes you don't expect? What if someone throws unicode at you?
If any of these are paths, make sure you canonicalize and know that the path points to an acceptable location before you read or write.
Some specific things not to do:
os.system(attackerControlledString)
eval(attackerControlledString)
__import__(attackerControlledString)
pickle/unpickle attacker controlled content (here's why)
Also, rather than rolling your own config file format, consider ConfigParser or something like JSON. A well understood format (and libraries) helps you get a leg up on proper validation.
OWASP would be my normal go-to for providing a "further reading" link, but their Input Validation page needs help. In lieu, this looks like a reasonably pragmatic read: "Secure Programmer: Validating Input". A slightly dated but more python specific one is "Dealing with User Input in Python"
Depends entirely on the way the file is processed, but generally this should be safe. In Python, you have to put in some effort if you want to treat text as code and execute it.
Python is pretty clean, and I can code neat apps quickly.
But I notice I have some minor error someplace and I dont find the error at compile but at run time. Then I need to change and run the script again. Is there a way to have it break and let me modify and run?
Also, I dislike how python has no enums. If I were to write code that needs a lot of enums and types, should I be doing it in C++? It feels like I can do it quicker in C++.
"I don't find the error at compile but at run time"
Correct. True for all non-compiled interpreted languages.
"I need to change and run the script again"
Also correct. True for all non-compiled interpreted languages.
"Is there a way to have it break and let me modify and run?"
What?
If it's a run-time error, the script breaks, you fix it and run again.
If it's not a proper error, but a logic problem of some kind, then the program finishes, but doesn't work correctly. No language can anticipate what you hoped for and break for you.
Or perhaps you mean something else.
"...code that needs a lot of enums"
You'll need to provide examples of code that needs a lot of enums. I've been writing Python for years, and have no use for enums. Indeed, I've been writing C++ with no use for enums either.
You'll have to provide code that needs a lot of enums as a specific example. Perhaps in another question along the lines of "What's a Pythonic replacement for all these enums."
It's usually polymorphic class definitions, but without an example, it's hard to be sure.
With interpreted languages you have a lot of freedom. Freedom isn't free here either. While the interpreter won't torture you into dotting every i and crossing every T before it deems your code worthy of a run, it also won't try to statically analyze your code for all those problems. So you have a few choices.
1) {Pyflakes, pychecker, pylint} will do static analysis on your code. That settles the syntax issue mostly.
2) Test-driven development with nosetests or the like will help you. If you make a code change that breaks your existing code, the tests will fail and you will know about it. This is actually better than static analysis and can be as fast. If you test-first, then you will have all your code checked at test runtime instead of program runtime.
Note that with 1 & 2 in place you are a bit better off than if you had just a static-typing compiler on your side. Even so, it will not create a proof of correctness.
It is possible that your tests may miss some plumbing you need for the app to actually run. If that happens, you fix it by writing more tests usually. But you still need to fire up the app and bang on it to see what tests you should have written and didn't.
You might want to look into something like nosey, which runs your unit tests periodically when you've saved changes to a file. You could also set up a save-event trigger to run your unit tests in the background whenever you save a file (possible e.g. with Komodo Edit).
That said, what I do is bind the F7 key to run unit tests in the current directory and subdirectories, and the F6 key to run pylint on the current file. Frequent use of these allows me to spot errors pretty quickly.
Python is an interpreted language, there is no compile stage, at least not that is visible to the user. If you get an error, go back, modify the script, and try again. If your script has long execution time, and you don't want to stop-restart, you can try a debugger like pdb, using which you can fix some of your errors during runtime.
There are a large number of ways in which you can implement enums, a quick google search for "python enums" gives everything you're likely to need. However, you should look into whether or not you really need them, and if there's a better, more 'pythonic' way of doing the same thing.
One of my favorite features about python is that you can write configuration files in python that are very simple to read and understand. If you put a few boundaries on yourself, you can be pretty confident that non-pythonistas will know exactly what you mean and will be perfectly capable of reconfiguring your program.
My question is, what exactly are those boundaries? My own personal heuristic was
Avoid flow control. No functions, loops, or conditionals. Those wouldn't be in a text config file and people aren't expecting to have understand them. In general, it probably shouldn't matter the order in which your statements execute.
Stick to literal assignments. Methods and functions called on objects are harder to think through. Anything implicit is going to be a mess. If there's something complicated that has to happen with your parameters, change how they're interpreted.
Language keywords and error handling are right out.
I guess I ask this because I came across a situation with my Django config file where it seems to be useful to break these rules. I happen to like it, but I feel a little guilty. Basically, my project is deployed through svn checkouts to a couple different servers that won't all be configured the same (some will share a database, some won't, for example). So, I throw a hook at the end:
try:
from settings_overrides import *
LOCALIZED = True
except ImportError:
LOCALIZED = False
where settings_overrides is on the python path but outside the working copy. What do you think, either about this example, or about python config boundaries in general?
There is a Django wiki page, which addresses exactly the thing you're asking.
http://code.djangoproject.com/wiki/SplitSettings
Do not reinvent the wheel. Use configparser and INI files. Python files are to easy to break by someone, who doesn't know Python.
Your heuristics are good. Rules are made so that boundaries are set and only broken when it's obviously a vastly better solution than the alternate.
Still, I can't help but wonder that the site checking code should be in the parser, and an additional configuration item added that selects which option should be taken.
I don't think that in this case the alternative is so bad that breaking the rules makes sense...
-Adam
I think it's a pain vs pleasure argument.
It's not wrong to put code in a Python config file because it's all valid Python, but it does mean you could confuse a user who comes in to reconfigure an app. If you're that worried about it, rope it off with comments explaining roughly what it does and that the user shouldn't edit it, rather edit the settings_overrides.py file.
As for your example, that's nigh on essential for developers to test then deploy their apps. Definitely more pleasure than pain. But you should really do this instead:
LOCALIZED = False
try:
from settings_overrides import *
except ImportError:
pass
And in your settings_overrides.py file:
LOCALIZED = True
... If nothing but to make it clear what that file does.. What you're doing there splits overrides into two places.
As a general practice, see the other answers on the page; it all depends. Specifically for Django, however, I see nothing fundamentally wrong with writing code in the settings.py file... after all, the settings file IS code :-)
The Django docs on settings themselves say:
A settings file is just a Python module with module-level variables.
And give the example:
assign settings dynamically using normal Python syntax. For example:
MY_SETTING = [str(i) for i in range(30)]
Settings as code is also a security risk. You import your "config", but in reality you are executing whatever code is in that file. Put config in files that you parse first and you can reject nonsensical or malicious values, even if it is more work for you. I blogged about this in December 2008.