So I made a python phonebook program which allows the user to add contacts, change contact info, delete contacts, etc. and write this data to a text file which I can read from every time the program is opened again and get existing contact data. However, in my program, I write to the text file in a very specific manner so I know what the format is and I can set it up to be read very easily. Since it is all formatted in a very specific manner, I want to prevent the user from opening the file and accidentally messing the data up with even just a simple space. How can I do this?
I want to prevent the user from opening the file and accidentally messing the data up...
I will advise you not to prevent users from accessing their own files. Messing with file permissions might result in some rogue files that the user won't be able to get rid of. Trust your user. If they delete or edit a sensitive file, it is their fault. Think of it this way - you have plenty of software installed on your own computer, but how often do you open them in an editor and make some damaging changes? Even if you do edit these files, does the application developer prevent you from doing so?
If you do intent to allow users to change/modify that file give them a good documentation on how to do it. This is the most apt thing to do. Also, make a backup file during run-time (see tempfile below) as an added layer of safety. Backups are almost always a good idea.
However, you can take some precautions to hide that data, so that users can't accidentally open them in an editor by double-clicking on it. There are plenty of options to do this including
Creating a binary file in a custom format
Zipping the text file using zipfile module.
Using tempfile module to create a temporary file, and zipping it using the previous option. (easy to discard if no changes needs to be saved)
Encryption
Everything from here on is not about preventing access, but about hiding the contents of your file
Note that all the above options doesn't have to be mutually exclusive. The advantages of using a zip file is that it will save some space, and it is not easy to read and edit in a text editor (binary data). It can be easily manipulated in your Python Script:
with ZipFile('spam.zip') as myzip:
with myzip.open('eggs.txt') as myfile:
print(myfile.read())
It is as simple as that! A temp file on the other hand, is a volatile (delete=True/False) file and can be discarded once you are done with it. You can easily copy its contents to another file or zip it before you close it as mentioned above.
with open tempfile.NamedTemporaryFile() as temp:
temp.write(b"Binary Data")
Again, another easy process. However, you must zip or encrypt it to achieve the final result. Now, moving on to encryption. The easiest way is an XOR cipher. Since we are simply trying to prevent 'readability' and not concerned about security, you can do the following:
recommended solution (XOR cipher):
from itertools import cycle
def xorcize(data, key):
"""Return a string of xor mutated data."""
return "".join(chr(ord(a)^ord(b)) for a, b in zip(data, cycle(key)))
data = "Something came in the mail today"
key = "Deez Nuts"
encdata = xorcize(data, key)
decdata = xorcize(encdata, key)
print(data, encdata, decdata, sep="\n")
Notice how small that function is? It is quite convenient to include it in any of your scripts. All your data can be encrypted before writing them to a file, and save it using a file extension such as ".dat" or ".contacts" or any custom name you choose. Make sure it is not opened in an editor by default (such as ".txt", ".nfo").
It is difficult to prevent user access to your data storage completely. However, you can either make it more difficult for the user to access your data or actually make it easier not to break it. In the second case, your intention would be to make it clear to the user what the rules are hope that not destroying the data is in the user's own best interest. Some examples:
Using a well established, human-readable serialization format, e.g. JSON. This is often the best solution as it actually allows an experienced user to easily inspect the data, or even modify it. Inexperienced users are unlikely to mess with the data anyways, and an experienced user knowing the format will follow the rules. At the same time, your parser will detect inconsistencies in the file structure.
Using a non-human readable, binary format, such as Pickle. Those files are likely to be left alone by the user as it is pretty clear that they are not meant to be modified outside the program.
Using a database, such as MySQL. Databases provide special protocols for data access which can be used to ensure data consistency and also make it easier to prevent unwanted access.
Assuming that you file format has a comment character, or can be modified to have one, add these lines to the top of your text file:
# Do not edit this file. This file was automatically generated.
# Any change, no matter how slight, may corrupt this file beyond repair.
The contact file belongs to your user, not to you. The best you can do is to inform the user. The best you can hope for is that the user will make intelligent use of your product.
I think the best thing to do in your case is just to choose a new file extension for your format.
It obviously doesn't prevent editing, but it clearly states for user that it has some specific format and probably shouldn't be edited manually. And GUI won't open it by default probably (it will ask what to edit it with).
And that would be enough for any case I can imagine if what you're worrying about is user messing up their own data. I don't think you can win with user who actively tries to mess up their data. Also I doubt any program does anything more. The usual "contract" is that user's data is, well, user's so it can be destroyed by the user.
If you actually won't to prevent editing you could change permissions to forbid editing with os.chmod for example. User would still be able to lift them manually and there will be some time window when you are actually writing, so it will be neither clean nor significantly more effective. And I would expect more trouble than benefit from such a solution.
If you want to actually make it impossible for a user to read/edit a file you can run your process from a different user (or use some heavier like SELinux or other MAC mechanism) and so you could make it really impossible to damage the data (with user's permissions). But it is not worth the effort if it is only about protecting the user from the not-so-catastophic effects of being careless.
Related
I have saved the game data to an ini file, how can I protect it from being edited by the user?
Strictly speaking, it's not possible. You can't do anything to a local file that user is unable to undo.
However, you can obfuscate or encrypt (however simply) that file, so that at least a casual person with a Notepad is likely to give up. The simplest thing to do is to save data as a pickle file, so that it's easy to manipulate in Python, but looks baffling to a non-techy user/player.
Make somehow a hash / control sum of your ini-file and store at separate file. It will not prevent the ini-file from modification but you'll be able to know if modification occured and react on such user's behaviour.
its just a question to understand if maybe the function could create some problems/fails in the large file.
i have >10 users who want to read/write not exactly in the same time but nearly as a background progress with a .py script the same large file. each user has his own line where huge relation information to one other user has been written as a really large string. as example:
[['user1','user2','1'],['user6','user50','2'],['and so on']]
['user1','user2','this;is;the;really;long;string;..(i am 18k letters long)...']
['user6','user50','this;is;the;really;long;string;..(i am 16k letters long)...']
...and so on
now user 1 want just to read his line 1 and user 6 wants to remove his own line 2.
so now my questions:
i cant find any problems if all users just read the file, but if user 6 wants to delete his own line information and rewrite the line 0 with the new information and rewrite the other lines to a newline position, how would the other users >10 would read the file if user 6 needs more time to write the new file as the other users >10? they dont need so long to open the file and if they down wait to let user 6 finished his job, the others would read the wrong information for the file
would be enough to write the .py script
f = open(fileNameArr, "rw")
....
f.close()
to solve that problem? or maybe "rwb+" or what would be needed to do for that?
Should i insert some temp timeout function in the .py script as example i have to insert it in php set_time_limit(300); for long calculations and outputs?
for any input to understand a big thx up to you.
You should look up Unix file management - Unix doesn't give you a great out-of-the-box solution to this problem.
The short version is that any number of processes can read the same file at once, but under most sets of permissions, any process can overwrite the file. Unlike on, say, Windows, where the OS prevents multiple programs from editing the same file at once, on Unix any write will overwrite all previous writes - if two users start from the same base file and make different changes, then whoever calls .write() most recently will win. Yes, this does cause concurrency issues.
The answer above mentions some countermeasures - namely, enforcing file-locking at a software level in your program, which is essentially what I suggested in a comment - but to my knowledge there's no generalized solution to this issue.
Google Docs and the rest of Drive have collaborative file editing that, though the code is obviously not public, seems to use Operational Transformation as its main approach, in which, essentially, no user can directly modify the file, and instead of using typical file I/O commands each user sends the server its desired modifications and the server sorts out concurrency issues.
Maybe you should rethink the way you've designed this system? Why is all of this information stored within a single file, with each line dedicated to a specific user? Why not have multiple, smaller files, one for each user, which would cut down on the concurrency issues with reading/writing? Why not use a database to store this information instead, and let the database handle the concurrency issues? Most databases can handle arbitrarily large strings, and though some aren't easily scalable to the 30GB you mention in your question, others definitely are.
I'm looking to store some individual settings to each user's computer. Things like preferences and a license key. From what I know, saving to the registry could be one possibility. However, that won't work on Mac.
One of the easy but not so proper techniques are just saving it to a settings.txt file and reading that on load.
Is there a proper way to save this kind of data? I'm hoping to use my wx app on Windows and Mac.
There is no proper way. Use whatever works best for your particular scenario. Some common ways for storing user data include:
Text files (e.g. Windows INI, cfg files)
binary files (sometimes compressed)
Windows registry
system environment variables
online profiles
There's nothing wrong with using text files. A lot of proper applications uses them exactly for the reason that they are easy to implement, and additionally human readable. The only thing you need to worry about is to make sure you have some form of error handling in place, in case the user decides to replace you config file content with some rubbish.
Take a look at Data Persistence on python docs. One option a you said could be persist them to a simple text file. Or you can save your data using some serialization format as pickle (see previous link) or json but it will be pretty ineficient if you have several keys and values or it will be too complex.
Also, you could save user preferences in an .ini file using python's ConfigParser module as show in this SO answer.
Finally, you can use a database like sqlite3 which is simpler to handle from your code in order to save and retrieve preferences.
I have a file that I want to read. The file may at any time be overwritten by another process. I do not want to block that writing. I am prepared to manage corruption to the data that I read, but do not want my reading to be in any way change the behaviour of the writing process.
The process that is writing the file is a delphi program running locally on the server. It opens the file using fmCreate. fmCreate tries to open the file exclusively and fails if there are any other handles on the file.
I am reading the file from a python script that accesses the file remotely across our network.
I am interested in whether there is a solution, independent of whether it is supported by python or delphi. I want to know if there is any way of achieving this under windows without modifying the writing program.
Edit: To reiterate, this is not a duplicate. The other question was trying to get read access to a file that is being written to. I want to the writer to have access to a file that I have open for reading. These are different questions (although I fear the answer will be similar, that it can't be done.)
I think the real answer here, all of these years later, is to use opportunistic locks. With this, you can open the file for read access, while telling the OS that you want to be notified if another program wants to access the file. Basically, you can use the file as long as you like, and then back off if someone else needs it. This avoids the sharing/access violation that the other program would normally get, if you had just opened the file "normally".
There is an MSDN article on Opportunistic Locks. Raymond Chen also has a blog article about this, complete with sample code: Using opportunistic locks to get out of the way if somebody wants the file
The key is calling the DeviceIoControl function, with the FSCTL_REQUEST_OPLOCK flag, and passing it the handle to an event that you previously created by calling CreateEvent.
It should be straightforward to use this from Delphi, since it supports calling Windows API functions. I am not so sure about Python. But, given the arrangement in the question, it should not be necessary to modify the Python code. Just make your Delphi code use the opportunistic lock when it opens the file, and let it get out of the way when the Python script needs the file.
Also much easier and lighter weight than a filter driver or the Volume Shadow Copy service.
You can setup a filter driver which can act in two ways: (1) modify the flags when the file is opened, and (2) it can capture the data when it's written to the file and save a copy of the data elsewhere.
This approach is much more lightweight and efficient than volume shadow copy service, mentioned in comments, however it requires having a filter driver. There exist several drivers on the market (i.e. those are products which include a driver and let you write business logic in user mode) yet they are costly and can be an overkill in your case. Still, if you need the thing for private use only, contact me privately for a license for our CallbackFilter.
Update: if you want to let the writer open the file which has been already opened, then a filter which will modify flags when the file is being opened is your only option.
I am writing a script that will be polling a directory looking for new files.
In this scenario, is it necessary to do some sort of error checking to make sure the files are completely written prior to accessing them?
I don't want to work with a file before it has been written completely to disk, but because the info I want from the file is near the beginning, it seems like it could be possible to pull the data I need without realizing the file isn't done being written.
Is that something I should worry about, or will the file be locked because the OS is writing to the hard drive?
This is on a Linux system.
Typically on Linux, unless you're using locking of some kind, two processes can quite happily have the same file open at once, even for writing. There are three ways of avoiding problems with this:
Locking
By having the writer apply a lock to the file, it is possible to prevent the reader from reading the file partially. However, most locks are advisory so it is still entirely possible to see partial results anyway. (Mandatory locks exist, but a strongly not recommended on the grounds that they're far too fragile.) It's relatively difficult to write correct locking code, and it is normal to delegate such tasks to a specialist library (i.e., to a database engine!) In particular, you don't want to use locking on networked filesystems; it's a source of colossal trouble when it works and can often go thoroughly wrong.
Convention
A file can instead be created in the same directory with another name that you don't automatically look for on the reading side (e.g., .foobar.txt.tmp) and then renamed atomically to the right name (e.g., foobar.txt) once the writing is done. This can work quite well, so long as you take care to deal with the possibility of previous runs failing to correctly write the file. If there should only ever be one writer at a time, this is fairly simple to implement.
Not Worrying About It
The most common type of file that is frequently written is a log file. These can be easily written in such a way that information is strictly only ever appended to the file, so any reader can safely look at the beginning of the file without having to worry about anything changing under its feet. This works very well in practice.
There's nothing special about Python in any of this. All programs running on Linux have the same issues.
On Unix, unless the writing application goes out of its way, the file won't be locked and you'll be able to read from it.
The reader will, of course, have to be prepared to deal with an incomplete file (bearing in mind that there may be I/O buffering happening on the writer's side).
If that's a non-starter, you'll have to think of some scheme to synchronize the writer and the reader, for example:
explicitly lock the file;
write the data to a temporary location and only move it into its final place when the file is complete (the move operation can be done atomically, provided both the source and the destination reside on the same file system).
If you have some control over the writing program, have it write the file somewhere else (like the /tmp directory) and then when it's done move it to the directory being watched.
If you don't have control of the program doing the writing (and by 'control' I mean 'edit the source code'), you probably won't be able to make it do file locking either, so that's probably out. In which case you'll likely need to know something about the file format to know when the writer is done. For instance, if the writer always writes "DONE" as the last four characters in the file, you could open the file, seek to the end, and read the last four characters.
Yes it will.
I prefer the "file naming convention" and renaming solution described by Donal.