Apologies if this kind of thing has been answered elsewhere. I am using Python to run a Windows executable file using subprocess.Popen(). The executable file produces a .txt file and some other output files as part of its operation. I then need to run another executable file using subprocess.Popen() that uses the output from the original .exe file.
The problem is, it is the .exe file and not Python that is controlling the creation of the output files, and so I have no control over knowing how long it takes the first text file to write to disk before I can use it as an input to the second .exe file.
Obviously I cannot run the second executable file before the first text file finishes writing to disk.
subprocess.wait() does not appear to be helpful because the first executable terminates before the text file has finished writing to disk. I also don't want to use some kind of function that waits an arbitrary period of time (say a few seconds) then proceeds with the execution of the second .exe file. This would be inefficient in that it may wait longer than necessary, and thus waste time. On the other hand it may not wait long enough if the output text file is very large.
So I guess I need some kind of listener that waits for the text file to finish being written before it moves on to execute the second subprocess.Popen() call. Is this possible?
Any help would be appreciated.
UPDATE (see Neil's suggestions, below)
The problem with os.path.getmtime() is that the modification time is updated more than once during the write, so very large text files (say ~500 Mb) require a relatively large wait time in between os.path.getmtime() calls. I use time.sleep() to do this. I guess this solution is workable but is not the most efficient use of time.
On the other hand, I am having bigger problems with trying to open the file for write access. I use the following loop:
while True:
try:
f = open(file, 'w')
except:
# For lack of something else to put in here
# (I don't want to print anything)
os.path.getmtime(file)
else:
break
This approach seems to work in that Python essentially pauses while the Windows executable is writing the file, but afterwards I go to use the text file in the next part of the code and find that the contents that were just written have been wiped.
I know they were written because I can see the file size increasing in Windows Explorer while the executable is doing its stuff, so I can only assume that the final call to open(file, 'w') (once the executable has done its job) causes the file to be wiped, somehow.
Obviously I am doing something wrong. Any ideas?
There's probably many ways to do what you want. One that springs to mind is that you could poll the modification time with os.path.getmtime(), and see when it changes. If the modification date is after you called the executable, but still a couple seconds ago, you could assume it's done.
Alternatively, you could try opening the file for write access (just without actually writing anything). If that fails, it means someone else is writing it.
This all sounds so fragile, but I assume your hands are somewhat tied, too.
One suggestion that comes to mind is if the text file that is written might have a recognizable end-of-file marker to it. I created a text file that looks like this:
BEGIN
DATA
DATA
DATA
END
Given this file, I could then tell if "END" had been written to the end of the file by using os.seek like this:
>>> import os
>>> fp = open('test.txt', 'r')
>>> fp.seek(-4, os.SEEK_END)
>>> fp.read()
'END\n'
Related
I am running a long Python program which prints values to a .txt file in an iterative way. I am trying to read the values using terminal "gedit/tail/less" commands and trying to plot them in Gnuplot. But I am not able to read the .txt file till the whole execution is over. What is the correct argument for such file handling ?
The files are written when they are closed or when the size of the buffer is too large to store.
That is even when you use file.write("something"), something isn't written in the file till you close the file, or with block is over.
with open("temp.txt","w") as w:
w.write("hey")
x=input("touch")
w.write("\nhello")
w.write(x)
run this code and try to read the file before touch, it'll be empty, but after the with block is over you can see the contents.
If you are going to access the file from many sources, then you have to be careful of this, and also not to modify it from multiple sources.
EDIT: I forgot to say, you have to continuously close the file and open it in append mode if you want some other program to read it while you are writing to the file.
I would like to check every minute if there was a file like "RESULTS.ODB" generated and if this file is bigger than 1.5 Gigabyte there starts another subprocess to get the Data from this file. How can i make sure that the file isnĀ“t in progress to be written and everything is included?
I hope you know what i mean. Any ideas how to handle that?
Thank you very much. :)
If you have no control over the writing process, then you are at some point bound to fail somewhere.
If you do have control over the writer, a simple way to "lock" files is to create a symlink. If your symlink creation fails, there is already a write in progress. If it succeeds, you just acquired the "lock".
But if you do not have any control over writing and creation of the file, there will be trouble. You can try the approach as outlined here: Ensuring that my program is not doing a concurrent file write
This will read timestamps of the file and "guess" from them if writing has completed or not. This is more reliable than checking the file size, as you could end up with a file over your size threshold but writing still in progress.
In this case the problem would be the writer starting to write before you have read the file in its entirety. Now your reader would fail when the file it was reading disappeared half way through.
If you are on a Unix platform, you have no control over write and you absolutely need to do this, I would do something like this:
Check if file exists and if it does, if the "last written" timestamp
is "old enough" for me to assume the file is there
Rename the file to a different name
Check the renamed file that it still matches your criteria
Get data from the renamed file
Nevertheless, this will eventually fail and you will lose an update, as there is no way to make this atomic. Renaming will remove the problem of overwriting the file before you have read it, but if the writer decides to start writing between 1 and 2, you not only will receive an incomplete file but you might also break the writer if it does not like the file disappearing half way through.
I would rather try to find a way to somehow chain the actions together. Either your writer triggering the read process or adding a locking mechanism. Writing 1.5GB of data is not instantaneous and eventually the unexpected will happen.
Or if you definitely cannot do anything like that, could you ensure for example that your writer writes maximum once in N minutes or so? If you could be sure it never writes twice within a 5 minute window, you would wait in your reader until the file is 3 minutes old and then rename it and read the renamed file. You could also check if you could prevent the writer from overwriting. If you can do this, then you can safely process the file in your reader when it is "old enough" and has not changed in whatever grace period you decide to give it, and when you have read it, you will delete the file allowing the next update to appear.
Without knowing more about your environment and processes involved this is the best I can come up with. But there is no universal solution to this problem. It needs a workaround that is tailored to your particular environment.
So I have a python script (let's call it file_1.py) that overwrites the content of a text file with new content, and it works just fine. I have another python script (file_2.py) that reads the file and performs actions on the data in the file. With file_2.py I've been trying to get when the text file is edited by file_1.py and then do some stuff with the new data as soon as it's added. I looked into the subprocess module but I couldn't really figure out how to use it across different files. Here's what I have so far:
file_1.py:
with open('text_file.txt','w') as f:
f.write(assemble(''.join(data))) # you can ignore what assemble does, this part already works.
file_2.py:
while True:
f = open('text_file.txt','r')
data = f.read()
function(data)
f.close()
I thought that since I close and reopen the file every loop, the data in the file would be updated. However, it appears I was wrong, as the data remains the same even though the file was updated. So how can I go about doing this?
Are you always overwiting the data in the first file, with the same data?
I mean, instead of appending or actually changing the data over time?
I see it working here when I change
with open('text_file.txt','wt') as f:
to
with open('text_file.txt','at') as f:
and I append some data. 'w' will overwrite and if data doesn't change you will see the same data over and over.
Edit:
Another possibility (as discussed in the comments to OP self-answer) is the need to use f.flush() after writing to the file. Despite the buffers being written to disk automatically when closing a file (or leaving a with block), that write can take a moment, and if the file is read again before that moment, the updates will not be there (yet). To remove that uncertainty call flush after updating, wich forces the disk write.
If you sleep your reading code for enough time between readings (that is the reads are slow enough), the manual flush might not be needed. But if in doubt, or to do it the simple way and be sure, just use flush().
Okay, so it looks like I've solved my problem. According to this website, it says:
Python automatically flushes the files when closing them. But you may want to flush the data before closing any file.
Since the "automatic flushing" thing wasn't working, I tried to manually flush the I/O using file.flush(), and it worked. I call that function every time right write to the file in file_1.py.
EDIT: It seems that when time.sleep() is called between readings of the file, it interferes and you have to manually flush the buffer.
i am scraping data through multiple websites.
To do that i have written multiple web scrapers with using selenium and PhantomJs.
Those scrapers return values.
My question is: is there a way i can feed those values to a single python program that will sort through that data in real time.
What i want to do is not save that data to analyze it later i want to send it to a program that will analyze it in real time.
what i have tried: i have no idea where to even start
Perhaps a named pipe would be suitable:
mkfifo whatever (you can also do this from within your python script; os.mkfifo)
You can write to whatever like a normal file (it will block until something reads it) and read from whatever with a different process (it will block if there is no data available)
Example:
# writer.py
with open('whatever', 'w') as h:
h.write('some data') # Blocks until reader.py reads the data
# reader.py
with open('whatever', 'r') as h:
print(h.read()) # Blocks until writer.py writes to the named pipe
You can try writing the data you want to share to a file and have the other script read and interpret it. Have the other script run in a loop to check if there is a new file or the file has been changed.
Simply use files for data exchange and a trivial locking mechanism.
Each writer or reader (only one reader, it seems) gets a unique number.
If a writer or reader wants to write to the file, it renames it to its original name + the number and then writes or reads, renaming it back after that.
The others wait until the file is available again under its own name and then access it by locking it in a similar way.
Of course you have shared memory and such, or memmapped files and semaphores. But this mechanism has worked flawlessly for me for over 30 years, on any OS, over any network. Since it's trivially simple.
It is in fact a poor man's mutex semaphore.
To find out if a file has changed, look to its writing timestamp.
But the locking is necessary too, otherwise you'll land into a mess.
I've seen a few questions related to this but nothing that definitively answers my question.
I have a short python script that does some simple tasks, then outputs some text to a log file, waits for more input, and loops.
At times, the file is opened in write mode ("w") and other times it is opened in append mode ("a") depending on the results of the other tasks. For the sake of simplicity let's say it is in write mode/append mode 50/50.
I am opening files by saying:
with open(fileName, mode) as file:
and writing to them by saying:
file.write(line)
While these files are being opened, written to, appended to, etc., I expect a command prompt to be doing some read activities on them (findstr, specifically).
1) What's going to happen if my script tries to write to the same file the command window is reading from?
2) Is there a way to explicitly set the open to shared mode?
3)Does using the 'logger' module help at all/handle this instead of just manually making my own log files?
Thanks
What you are referring to is generally called a "race condition" where two programs are trying to read / write the same file at the same time. Some operating systems can help you avoid this by implementing a file-lock mutex system, but on most operating systems you just get a corrupted file, a crashed program, or both.
Here's an interesting article talking about how to avoid race conditions in python:
http://blog.gocept.com/2013/07/15/reliable-file-updates-with-python/
One suggestion that the author makes is to copy the file to a temp file, make your writes/appends there and then move the file back. Race conditions happen when files are kept open for a long time, this way you are never actually opening the main file in python, so the only point at which a collision could occur is during the OS copy / move operations, which are much faster.