I have a file that I want to read. The file may at any time be overwritten by another process. I do not want to block that writing. I am prepared to manage corruption to the data that I read, but do not want my reading to be in any way change the behaviour of the writing process.
The process that is writing the file is a delphi program running locally on the server. It opens the file using fmCreate. fmCreate tries to open the file exclusively and fails if there are any other handles on the file.
I am reading the file from a python script that accesses the file remotely across our network.
I am interested in whether there is a solution, independent of whether it is supported by python or delphi. I want to know if there is any way of achieving this under windows without modifying the writing program.
Edit: To reiterate, this is not a duplicate. The other question was trying to get read access to a file that is being written to. I want to the writer to have access to a file that I have open for reading. These are different questions (although I fear the answer will be similar, that it can't be done.)
I think the real answer here, all of these years later, is to use opportunistic locks. With this, you can open the file for read access, while telling the OS that you want to be notified if another program wants to access the file. Basically, you can use the file as long as you like, and then back off if someone else needs it. This avoids the sharing/access violation that the other program would normally get, if you had just opened the file "normally".
There is an MSDN article on Opportunistic Locks. Raymond Chen also has a blog article about this, complete with sample code: Using opportunistic locks to get out of the way if somebody wants the file
The key is calling the DeviceIoControl function, with the FSCTL_REQUEST_OPLOCK flag, and passing it the handle to an event that you previously created by calling CreateEvent.
It should be straightforward to use this from Delphi, since it supports calling Windows API functions. I am not so sure about Python. But, given the arrangement in the question, it should not be necessary to modify the Python code. Just make your Delphi code use the opportunistic lock when it opens the file, and let it get out of the way when the Python script needs the file.
Also much easier and lighter weight than a filter driver or the Volume Shadow Copy service.
You can setup a filter driver which can act in two ways: (1) modify the flags when the file is opened, and (2) it can capture the data when it's written to the file and save a copy of the data elsewhere.
This approach is much more lightweight and efficient than volume shadow copy service, mentioned in comments, however it requires having a filter driver. There exist several drivers on the market (i.e. those are products which include a driver and let you write business logic in user mode) yet they are costly and can be an overkill in your case. Still, if you need the thing for private use only, contact me privately for a license for our CallbackFilter.
Update: if you want to let the writer open the file which has been already opened, then a filter which will modify flags when the file is being opened is your only option.
Related
its just a question to understand if maybe the function could create some problems/fails in the large file.
i have >10 users who want to read/write not exactly in the same time but nearly as a background progress with a .py script the same large file. each user has his own line where huge relation information to one other user has been written as a really large string. as example:
[['user1','user2','1'],['user6','user50','2'],['and so on']]
['user1','user2','this;is;the;really;long;string;..(i am 18k letters long)...']
['user6','user50','this;is;the;really;long;string;..(i am 16k letters long)...']
...and so on
now user 1 want just to read his line 1 and user 6 wants to remove his own line 2.
so now my questions:
i cant find any problems if all users just read the file, but if user 6 wants to delete his own line information and rewrite the line 0 with the new information and rewrite the other lines to a newline position, how would the other users >10 would read the file if user 6 needs more time to write the new file as the other users >10? they dont need so long to open the file and if they down wait to let user 6 finished his job, the others would read the wrong information for the file
would be enough to write the .py script
f = open(fileNameArr, "rw")
....
f.close()
to solve that problem? or maybe "rwb+" or what would be needed to do for that?
Should i insert some temp timeout function in the .py script as example i have to insert it in php set_time_limit(300); for long calculations and outputs?
for any input to understand a big thx up to you.
You should look up Unix file management - Unix doesn't give you a great out-of-the-box solution to this problem.
The short version is that any number of processes can read the same file at once, but under most sets of permissions, any process can overwrite the file. Unlike on, say, Windows, where the OS prevents multiple programs from editing the same file at once, on Unix any write will overwrite all previous writes - if two users start from the same base file and make different changes, then whoever calls .write() most recently will win. Yes, this does cause concurrency issues.
The answer above mentions some countermeasures - namely, enforcing file-locking at a software level in your program, which is essentially what I suggested in a comment - but to my knowledge there's no generalized solution to this issue.
Google Docs and the rest of Drive have collaborative file editing that, though the code is obviously not public, seems to use Operational Transformation as its main approach, in which, essentially, no user can directly modify the file, and instead of using typical file I/O commands each user sends the server its desired modifications and the server sorts out concurrency issues.
Maybe you should rethink the way you've designed this system? Why is all of this information stored within a single file, with each line dedicated to a specific user? Why not have multiple, smaller files, one for each user, which would cut down on the concurrency issues with reading/writing? Why not use a database to store this information instead, and let the database handle the concurrency issues? Most databases can handle arbitrarily large strings, and though some aren't easily scalable to the 30GB you mention in your question, others definitely are.
I have a project in mind, but there is a section that I don't know how to do. I'm using Python version 3.6 and windows 10. For example we have a file name of "example.txt" I want to prevent the name and its content of this file from being changed.
I did research on this topic, but I could not reach any research. Can we prevent the file's name (including its extension) from changing or its contents?To realize this, I think it is necessary to start as an administrator.
Thanks.
It is possible to stop another program from editing a file by locking it in python.
There is a module that does this called filelock. Take a look at the source code to see how it is done.
It is also worth noting that more advanced ransomware will try to stop processes so they can encrypt files, so this might not work in all cases.
So I made a python phonebook program which allows the user to add contacts, change contact info, delete contacts, etc. and write this data to a text file which I can read from every time the program is opened again and get existing contact data. However, in my program, I write to the text file in a very specific manner so I know what the format is and I can set it up to be read very easily. Since it is all formatted in a very specific manner, I want to prevent the user from opening the file and accidentally messing the data up with even just a simple space. How can I do this?
I want to prevent the user from opening the file and accidentally messing the data up...
I will advise you not to prevent users from accessing their own files. Messing with file permissions might result in some rogue files that the user won't be able to get rid of. Trust your user. If they delete or edit a sensitive file, it is their fault. Think of it this way - you have plenty of software installed on your own computer, but how often do you open them in an editor and make some damaging changes? Even if you do edit these files, does the application developer prevent you from doing so?
If you do intent to allow users to change/modify that file give them a good documentation on how to do it. This is the most apt thing to do. Also, make a backup file during run-time (see tempfile below) as an added layer of safety. Backups are almost always a good idea.
However, you can take some precautions to hide that data, so that users can't accidentally open them in an editor by double-clicking on it. There are plenty of options to do this including
Creating a binary file in a custom format
Zipping the text file using zipfile module.
Using tempfile module to create a temporary file, and zipping it using the previous option. (easy to discard if no changes needs to be saved)
Encryption
Everything from here on is not about preventing access, but about hiding the contents of your file
Note that all the above options doesn't have to be mutually exclusive. The advantages of using a zip file is that it will save some space, and it is not easy to read and edit in a text editor (binary data). It can be easily manipulated in your Python Script:
with ZipFile('spam.zip') as myzip:
with myzip.open('eggs.txt') as myfile:
print(myfile.read())
It is as simple as that! A temp file on the other hand, is a volatile (delete=True/False) file and can be discarded once you are done with it. You can easily copy its contents to another file or zip it before you close it as mentioned above.
with open tempfile.NamedTemporaryFile() as temp:
temp.write(b"Binary Data")
Again, another easy process. However, you must zip or encrypt it to achieve the final result. Now, moving on to encryption. The easiest way is an XOR cipher. Since we are simply trying to prevent 'readability' and not concerned about security, you can do the following:
recommended solution (XOR cipher):
from itertools import cycle
def xorcize(data, key):
"""Return a string of xor mutated data."""
return "".join(chr(ord(a)^ord(b)) for a, b in zip(data, cycle(key)))
data = "Something came in the mail today"
key = "Deez Nuts"
encdata = xorcize(data, key)
decdata = xorcize(encdata, key)
print(data, encdata, decdata, sep="\n")
Notice how small that function is? It is quite convenient to include it in any of your scripts. All your data can be encrypted before writing them to a file, and save it using a file extension such as ".dat" or ".contacts" or any custom name you choose. Make sure it is not opened in an editor by default (such as ".txt", ".nfo").
It is difficult to prevent user access to your data storage completely. However, you can either make it more difficult for the user to access your data or actually make it easier not to break it. In the second case, your intention would be to make it clear to the user what the rules are hope that not destroying the data is in the user's own best interest. Some examples:
Using a well established, human-readable serialization format, e.g. JSON. This is often the best solution as it actually allows an experienced user to easily inspect the data, or even modify it. Inexperienced users are unlikely to mess with the data anyways, and an experienced user knowing the format will follow the rules. At the same time, your parser will detect inconsistencies in the file structure.
Using a non-human readable, binary format, such as Pickle. Those files are likely to be left alone by the user as it is pretty clear that they are not meant to be modified outside the program.
Using a database, such as MySQL. Databases provide special protocols for data access which can be used to ensure data consistency and also make it easier to prevent unwanted access.
Assuming that you file format has a comment character, or can be modified to have one, add these lines to the top of your text file:
# Do not edit this file. This file was automatically generated.
# Any change, no matter how slight, may corrupt this file beyond repair.
The contact file belongs to your user, not to you. The best you can do is to inform the user. The best you can hope for is that the user will make intelligent use of your product.
I think the best thing to do in your case is just to choose a new file extension for your format.
It obviously doesn't prevent editing, but it clearly states for user that it has some specific format and probably shouldn't be edited manually. And GUI won't open it by default probably (it will ask what to edit it with).
And that would be enough for any case I can imagine if what you're worrying about is user messing up their own data. I don't think you can win with user who actively tries to mess up their data. Also I doubt any program does anything more. The usual "contract" is that user's data is, well, user's so it can be destroyed by the user.
If you actually won't to prevent editing you could change permissions to forbid editing with os.chmod for example. User would still be able to lift them manually and there will be some time window when you are actually writing, so it will be neither clean nor significantly more effective. And I would expect more trouble than benefit from such a solution.
If you want to actually make it impossible for a user to read/edit a file you can run your process from a different user (or use some heavier like SELinux or other MAC mechanism) and so you could make it really impossible to damage the data (with user's permissions). But it is not worth the effort if it is only about protecting the user from the not-so-catastophic effects of being careless.
I am writing a script that will be polling a directory looking for new files.
In this scenario, is it necessary to do some sort of error checking to make sure the files are completely written prior to accessing them?
I don't want to work with a file before it has been written completely to disk, but because the info I want from the file is near the beginning, it seems like it could be possible to pull the data I need without realizing the file isn't done being written.
Is that something I should worry about, or will the file be locked because the OS is writing to the hard drive?
This is on a Linux system.
Typically on Linux, unless you're using locking of some kind, two processes can quite happily have the same file open at once, even for writing. There are three ways of avoiding problems with this:
Locking
By having the writer apply a lock to the file, it is possible to prevent the reader from reading the file partially. However, most locks are advisory so it is still entirely possible to see partial results anyway. (Mandatory locks exist, but a strongly not recommended on the grounds that they're far too fragile.) It's relatively difficult to write correct locking code, and it is normal to delegate such tasks to a specialist library (i.e., to a database engine!) In particular, you don't want to use locking on networked filesystems; it's a source of colossal trouble when it works and can often go thoroughly wrong.
Convention
A file can instead be created in the same directory with another name that you don't automatically look for on the reading side (e.g., .foobar.txt.tmp) and then renamed atomically to the right name (e.g., foobar.txt) once the writing is done. This can work quite well, so long as you take care to deal with the possibility of previous runs failing to correctly write the file. If there should only ever be one writer at a time, this is fairly simple to implement.
Not Worrying About It
The most common type of file that is frequently written is a log file. These can be easily written in such a way that information is strictly only ever appended to the file, so any reader can safely look at the beginning of the file without having to worry about anything changing under its feet. This works very well in practice.
There's nothing special about Python in any of this. All programs running on Linux have the same issues.
On Unix, unless the writing application goes out of its way, the file won't be locked and you'll be able to read from it.
The reader will, of course, have to be prepared to deal with an incomplete file (bearing in mind that there may be I/O buffering happening on the writer's side).
If that's a non-starter, you'll have to think of some scheme to synchronize the writer and the reader, for example:
explicitly lock the file;
write the data to a temporary location and only move it into its final place when the file is complete (the move operation can be done atomically, provided both the source and the destination reside on the same file system).
If you have some control over the writing program, have it write the file somewhere else (like the /tmp directory) and then when it's done move it to the directory being watched.
If you don't have control of the program doing the writing (and by 'control' I mean 'edit the source code'), you probably won't be able to make it do file locking either, so that's probably out. In which case you'll likely need to know something about the file format to know when the writer is done. For instance, if the writer always writes "DONE" as the last four characters in the file, you could open the file, seek to the end, and read the last four characters.
Yes it will.
I prefer the "file naming convention" and renaming solution described by Donal.
I am writing a Python logger script which writes to a CSV file in the following manner:
Open the file
Append data
Close the file (I think this is necessary to save the changes, to be safe after every logging routine.)
PROBLEM:
The file is very much accessible through Windows Explorer (I'm using XP). If the file is opened in Excel, access to it is locked by Excel. When the script tries to append data, obviously it fails, then it aborts altogether.
OBJECTIVE:
Is there a way to lock a file using Python so that any access to it remains exclusive to the script? Or perhaps my methodology is poor in the first place?
Rather than closing and reopening the file after each access, just flush its buffer:
theloggingfile.flush()
This way, you keep it open for writing in Python, which should lock the file from other programs opening it for writing. I think Excel will be able to open it as read-only while it's open in Python, but I can't check that without rebooting into Windows.
EDIT: I don't think you need the step below. .flush() should send it to the operating system, and if you try to look at it in another program, the OS should give it the cached version. Use os.fsync to force the OS to really write it to the hard drive, e.g. if you're concerned about sudden power failures.
os.fsync(theloggingfile.fileno())
As far as I know, Windows does not support file locking. In other words, applications that don't know about your file being locked can't be prevented from reading a file.
But the remaining question is: how can Excel accomplish this?
You might want to try to write to a temporary file first (one that Excel does not know about) and replace the original file by it lateron.