Reading and writing to the same file from different programs - python

I have two programs written in python and converted to one-file exe using auto-py-to-exe.
the first program writes to a file, which is read by the second program. The problem is when the second program wants to read the file the same time as it is being written, the code stops with a permission error.
The solutions that seemed to work are:
Using time management which is not useful in my case, since the reading and writing times are not constant.
I could check if the file is accessible, which might be a solution, however, I suppose it would raise an error if while reading the file, the writer tries to change the file.
I could use the size of the file to check if writing to the file has been finished, and then execute the reader, however, this does not seem to be both logical and pythonic!
I found some solutions using os.pipe(), but to be honest, I couldn't understand what the process does. If this is a solution, I would be glad to have it explained in simple English.
That's it. Any suggestions?
P.S: OS is windows and I am using Python 3.9
Solved:
Thanks to the replies and suggestions, I didn't know that the try except commands accept ErrorType. Thus, I solved the problem by using 'except' and 'PermissionError'. the code runs in a loop and it is checked again in a few seconds.
However, the drawback is this: the reading time should be less than the time the writer comes back to rewrite the file! In my case, as suggested by friends, I combined the two programs so they are run sequentially.

Related

How the OS handles python and subprocesses of a python script...?

My question is somewhat unique. I am currently working on a project for my computer forensics class. This project is aimed at hiding disk data from investigators. The method by which this is supposed to be achieved is by writing the bytes of a "clean" file over the "bad" file. Once overwritten, the "bad" file is deleted.
This concept sounds simple enough, but what my partner and I have observed is interesting. If we open a file in a python script, we can easily overwrite the memory associated with that file on disk (verified using dd). We can also easily delete a file using from inside the script. However, a write then delete results in no write actually taking place, only the file's removal.
This makes sense from an OS optimization standpoint. From that point, we thought it might work if we split the writing and deleting into two separate scripts, and controlled both by a third. However, it seems that even if we run the scripts as a subprocess of another script, the same thing happens. We've tried to use bash scripts for the deletion process instead of pure python, and still, nothing sticks.
This project was supposed to be a whole mess of little anti-forensics tools like this, but this particular one has captured our whole attention because of this issue. Does anyone have an idea as to why this is happening and what we can do to move forward?
We know this can be achieved in C, etc, but we want to solve this using python because of the interesting constraints it's presented.
---EDIT---
This is a snippet from our controller, it calls "ghost.py" with the associated params.
ghost.py prints the edited file names/paths to stdout.
Relevant code follows:
proc = subprocess.Popen(['python', 'ghost.py', '-c', 'good.txt', '-d','/mnt/evil.txt'], stdout=subprocess.PIPE,)
files = proc.communicate()
for i in files:
if i != None and i != "\n":
os.system("./del.sh " + i)
Using a subprocess doesn't change any interesting aspect of your design, so don't use them. You probably need os.fsync(). Try this pattern:
myfile.write('all of my good data')
myfile.flush()
os.fsync(myfile.fileno())
myfile.close()
os.remove(myfile)
Reference: https://docs.python.org/2/library/os.html#os.fsync

Debugging a python script which first needs to read large files. Do I have to load them every time anew?

I have a python script which starts by reading a few large files and then does something else. Since I want to run this script multiple times and change some of the code until I am happy with the result, it would be nice if the script did not have to read the files every time anew, because they will not change. So I mainly want to use this for debugging.
It happens to often, that I run scripts with bugs in them, but I only see the error message after minutes, because the reading took so long.
Are there any tricks to do something like this?
(If it is feasible, I create smaller test files)
I'm not good at Python, but it seems to be able to dynamically reload code from a changed module: How to re import an updated package while in Python Interpreter?
Some other suggestions not directly related to Python.
Firstly, try to create a smaller test file. Is the whole file required to demonstrate the bug you are observing? Most probably it is only a small part of your input file that is relevant.
Secondly, are these particular files required, or the problem will show up on any big amount of data? If it shows only on particular files, then once again most probably it is related to some feature of these files and will show also on a smaller file with the same feature. If the main reason is just big amount of data, you might be able to avoid reading it by generating some random data directly in a script.
Thirdly, what is a bottleneck of your reading the file? Is it just hard drive performance issue, or do you do some heavy processing of the read data in your script before actually coming to the part that generates problems? In the latter case, you might be able to do that processing once and write the results to a new file, and then modify your script to load this processed data instead of doing the processing each time anew.
If the hard drive performance is the issue, consider a faster filesystem. On Linux, for example, you might be able to use /dev/shm.

Python open file in shared mode

I've seen a few questions related to this but nothing that definitively answers my question.
I have a short python script that does some simple tasks, then outputs some text to a log file, waits for more input, and loops.
At times, the file is opened in write mode ("w") and other times it is opened in append mode ("a") depending on the results of the other tasks. For the sake of simplicity let's say it is in write mode/append mode 50/50.
I am opening files by saying:
with open(fileName, mode) as file:
and writing to them by saying:
file.write(line)
While these files are being opened, written to, appended to, etc., I expect a command prompt to be doing some read activities on them (findstr, specifically).
1) What's going to happen if my script tries to write to the same file the command window is reading from?
2) Is there a way to explicitly set the open to shared mode?
3)Does using the 'logger' module help at all/handle this instead of just manually making my own log files?
Thanks
What you are referring to is generally called a "race condition" where two programs are trying to read / write the same file at the same time. Some operating systems can help you avoid this by implementing a file-lock mutex system, but on most operating systems you just get a corrupted file, a crashed program, or both.
Here's an interesting article talking about how to avoid race conditions in python:
http://blog.gocept.com/2013/07/15/reliable-file-updates-with-python/
One suggestion that the author makes is to copy the file to a temp file, make your writes/appends there and then move the file back. Race conditions happen when files are kept open for a long time, this way you are never actually opening the main file in python, so the only point at which a collision could occur is during the OS copy / move operations, which are much faster.

Linking Data between Simulink and Blender

I've been trying to determine a way to link data between a running Simulink model and Blender (or Python). I have no idea where to start on this, but I did find one piece of software that might've solved it, if I could get it to install correct; SimServer.
I found out about SimServer on StackOverflow (the original question is here), however I cannot get it to install correctly, it errors out during mex in the httpwrapper.c file stating that "syntax error; found SOCKET' expecting}'" (same if I remove the httpwrapper.c file from the mex command, it'll error out on another file the same way).
Is there a way to remedy this, or should I move on and try to find another solution? I feel as if another solution would be preferable and probably easier to install onto other machines. Is there someway I can pipe information from a running Simulink model to a file and have Blender/Python watch that file for changes and update a model in Blender Game in real-time?
If you are interested in writing data to a file from Simulink there are several ways to do that. I think the Easiest way would be to use add_exec_event_listener to add a callback listening to 'PostOutputs' event of your block. Within this callback you access data from block and write to a file.
You can find doc for add_exec_event_listener at http://www.mathworks.com/help/simulink/slref/add_exec_event_listener.html
Other ways to write to file from Simulink are
Using MATLAB Function block. Use your own "extrinsic" function to write to file.
Write S-Function in MATLAB or C/C++.
From the external program you can watch this file for updates. Having real-time in this approach is doubtful. There could be lags in writing to file in disk and for the other program to notice the changes.

Python, subprocesses and text file creation

Apologies if this kind of thing has been answered elsewhere. I am using Python to run a Windows executable file using subprocess.Popen(). The executable file produces a .txt file and some other output files as part of its operation. I then need to run another executable file using subprocess.Popen() that uses the output from the original .exe file.
The problem is, it is the .exe file and not Python that is controlling the creation of the output files, and so I have no control over knowing how long it takes the first text file to write to disk before I can use it as an input to the second .exe file.
Obviously I cannot run the second executable file before the first text file finishes writing to disk.
subprocess.wait() does not appear to be helpful because the first executable terminates before the text file has finished writing to disk. I also don't want to use some kind of function that waits an arbitrary period of time (say a few seconds) then proceeds with the execution of the second .exe file. This would be inefficient in that it may wait longer than necessary, and thus waste time. On the other hand it may not wait long enough if the output text file is very large.
So I guess I need some kind of listener that waits for the text file to finish being written before it moves on to execute the second subprocess.Popen() call. Is this possible?
Any help would be appreciated.
UPDATE (see Neil's suggestions, below)
The problem with os.path.getmtime() is that the modification time is updated more than once during the write, so very large text files (say ~500 Mb) require a relatively large wait time in between os.path.getmtime() calls. I use time.sleep() to do this. I guess this solution is workable but is not the most efficient use of time.
On the other hand, I am having bigger problems with trying to open the file for write access. I use the following loop:
while True:
try:
f = open(file, 'w')
except:
# For lack of something else to put in here
# (I don't want to print anything)
os.path.getmtime(file)
else:
break
This approach seems to work in that Python essentially pauses while the Windows executable is writing the file, but afterwards I go to use the text file in the next part of the code and find that the contents that were just written have been wiped.
I know they were written because I can see the file size increasing in Windows Explorer while the executable is doing its stuff, so I can only assume that the final call to open(file, 'w') (once the executable has done its job) causes the file to be wiped, somehow.
Obviously I am doing something wrong. Any ideas?
There's probably many ways to do what you want. One that springs to mind is that you could poll the modification time with os.path.getmtime(), and see when it changes. If the modification date is after you called the executable, but still a couple seconds ago, you could assume it's done.
Alternatively, you could try opening the file for write access (just without actually writing anything). If that fails, it means someone else is writing it.
This all sounds so fragile, but I assume your hands are somewhat tied, too.
One suggestion that comes to mind is if the text file that is written might have a recognizable end-of-file marker to it. I created a text file that looks like this:
BEGIN
DATA
DATA
DATA
END
Given this file, I could then tell if "END" had been written to the end of the file by using os.seek like this:
>>> import os
>>> fp = open('test.txt', 'r')
>>> fp.seek(-4, os.SEEK_END)
>>> fp.read()
'END\n'

Categories