I want to read from a pipe, inspect each line one at a time and modify or ignore it, and then write it out to a file on disk.
My latest try was:
import os
outfile = open("outputfile",'w')
with os.popen('./ps1 postgres getSoxPrivs.sql') as mypipe:
for line in mypipe:
outfile.write(line)
outfile.close()
This never exits; I have to \q\q to get it to stop. But it does appear to write out the file. However, it adds a bunch of line feeds not in the original data.
I saw other people using subprocess but I never could get their examples to work for my case. Seems like there are multiple ways to get this done but I can't quite find the way to make this work properly for me.
Related
Here is my code of accessing&editing the file:
def edit_default_settings(self, setting_type, value):
with open("cam_settings.json", "r") as f:
cam_settings = json.load(f)
cam_settings[setting_type] = value
with open("cam_settings.json", 'w') as f:
json.dump(cam_settings, f, indent=4)
I use It in a program that runs for several hours in a day, and once in a ~week I'm noticing, that cam_settings.json file becoming empty (literally empty, the file explorer shows 0 bytes), but can't imagine how that is possible
Would be glad to hear some comments on what could go wrong
I can't see any issues with the code itself, but there can be an issue with the execution environment. Are you running the code in a multi-threaded environment or running multiple instances of the same program at once?
This situation can arise if this code is executed parallelly and multiple threads/processes try to access the file at the same time. Try logging each time the function was executed and if the function was executed successfully. Try exception handlers and error logging.
If this is a problem, using buffers or singleton pattern can solve the issue.
As #Chels said, the file is truncated when it's opened with 'w'. That doesn't explain why it stays that way; I can only imagine that happening if your code crashed. Maybe you need to check logs for code crashes (or change how your code is run so that crash reasons get logged, if they aren't).
But there's a way to make this process safer in case of crashes. Write to a separate file and then replace the old file with the new file, only after the new file is fully written. You can use os.replace() for this. You could do this simply with a differently-named file:
with open(".cam_settings.json.tmp", 'w') as f:
json.dump(cam_settings, f, indent=4)
os.replace(".cam_settings.json.tmp", "cam_settings.json")
Or you could use a temporary file from the tempfile module.
When openning a file with the "w" parameter, everytime you will write to it, the content of the file will be erased. (You will actually replace what's written already).
Not sure if this is what you are looking for, but could be one of the reasons why "cam_settings.json" becomes empty after the call of open("cam_settings.json", 'w')!
In such a case, to append some text, use the "a" parameter, as:
open("cam_settings.json", 'a')
On python 2.7, I am currently using the following code to send data via a post request to a webpage (unfortunately, I cannot really change this). I prepare a string data which I prepare according to http://everydayscripting.blogspot.co.at/2009/09/python-jquery-open-browser-and-post.html, then write it to a file, and then open the file with webbrowser.open:
f = tempfile.NamedTemporaryFile(delete=False)
f.write(data)
f.close()
webbrowser.open(f.name)
time.sleep(1)
f.unlink(f.name)
However, I had to learn that sleeping a little sometimes is a little too little: I might delete the file before the data were submitted.
How can I avoid this?
One idea is, of course, to delete the file later, but when could this be? The whole thing is a method in a class - is there a method that is relieably executed on destruction? Or is it somehow possible to start the browser in a way so that it does not return, until the tab is closed?
I have a file 'out.txt' that is updated continuously. I need to send the contents of this file periodically to another file 'received.txt' every N minutes. I do not want the previous lines to be sent. So the scripts needs to send the new data and update 'received.txt' with the new lines of txt, but not repeat lines.
I'm having a hard time putting this script together. I'm guessing I need some sort of loop to do this continuously. Here is what I have so far. (not in order)
EDIT: I am using Debian(Raspbian) on a Raspberry Pi
import sys
num_lines = sum(1 for line in open('out.txt')) # read the last line of the updated file
sys.stdout = open('received.txt', 'w') #write to the received.txt file
print 'test'
f = open('out.txt', 'r') #read the data from the last line
f.readline(num_lines)
for line in f:
print line
Any advice would be extremely helpful.
Thank you
There are a few ways to do this.
The simplest is to keep looping over the file even after EOF. You could do this by just wrapping a while True: around the for line in f:, or by just looping forever around f.readline().
But this will waste a lot of CPU power and possibly even disk access checking over and over as fast as possible whether the file is still at EOF. You can fix that by sleeping whenever you get to the end of the file, like this:
while True:
for line in f:
print line
time.sleep(0.5)
But if the file is not written to for a long time, you're still wasting CPU power (which may not seem like a problem, but imagine what happens when the computer wants to go to sleep, and it can't because you're making it work every half a second). And meanwhile, if the file is being written to a lot faster than twice/second, you're going to lag.
So, a better solution is to block until there's something to read.
Unfortunately, there's no easy cross-platform way to do this. Fortunately, there are relatively easy platform-specific ways to do it on most platforms, but I'd need to know your platform to help.
For example, on OS X or other *BSD systems, you can use kqueue to wait until a file has something to read:
from select import *
# the rest of your code until the reading loop
while True:
for line in f:
print line
kq = kqueue()
kq.control([kevent(f.fileno(), filter=KQ_FILTER_READ, flags=KQ_EV_ADD)], 0, 0)
kq.control(None, 1)
kq.close()
But that won't work on Windows, or linux, or any other platform. (Also, that's a pretty bad way to do it on BSD, it's just shorter to show this way than the right way. If you want to do this for OS X, find a good tutorial on using kqueue in Python, don't copy this code.)
I have a python script that runs a subprocess to get some data and then process it. What I'm trying to achieve is have the data written to a file, and then use the data from the file to do the processing (the reason is that the subprocess is slow, but can change based on the date, time, and parameters I use, and I need to run the script frequently)
I've tried various methods, including opening the file as w+ and trying to seek to the beginning after the write is done, but nothing seems to work - the file is written, but when I try to read back from it (using file.readline()) i get EOF back.
This is what I'm essentially trying to accomplish:
myFile = open(fileName, "w")
p = subprocess.Popen(args, stdout=myFile)
myFile.flush() # force the file to disk
os.fsync(myFile) # ..
myFile.close()
myFile = open(fileName, "r")
while myFile.readline():
pass # do stuff
myFile.close()
But even though the file is correctly written (after the script runs, i can see the contents of the file), readline never returns a valid line. Like I said I also tried using the same file object, and doing seek(0) on it, to no luck. This only worked when opening the file as r+, which fails when the file doesn't already exist.
Any help would be appreciated. Also if there's a cleaner way to do this, i'm open to it :)
PS: I realize I can Popen and stdout to a pipe, read from the pipe and then write line by line the data to the file as I do that, but I'm trying to separate the creation of the data file from the reading.
The subprocess almost certainly isn't finishing before you try to read from the file. In fact, it's likely that the subprocess isn't even writing anything before you try to read from the file. For true separation you're going to have to have the subprocess write to a temporary file then replace the file you read from, so that you either read the previous version or the new version but never get to see the partially-written file from the new version.
You can do this in a number of ways; the easiest would be to change the subprocess, but I don't know if that's an option for you here. Alternatively, you can wrap it in your own separate script to manage the files. You probably don't want to call the subprocess in the script that analyses the output file either; you'll want a cronjob or something to regenerate periodically.
This should work as is provided the subprocess is finishing in time (see James's answer).
If you want to wait for it to finish, add p.wait() after the Popen invocation.
What is your actual while loop, though? while myFile.readline() makes it seem as you're not actually saving the line for anything. Try this:
myFile = open(fileName, "r")
print myFile.readlines()
myFile.close()
Or, if you want to interactively examine the state of your program:
myFile = open(fileName, "r")
import pdb; pdb.set_trace()
myFile.close()
Then you can do things like print myFile.readlines() after it stops.
#James Aylett pointed me to the right path, it appears that my problem was that subprocess.Popen wasn't finished running when I call .flush().
The solution, is to call p.wait() right after the subprocess.Popen call, to allow for the underlying command to finish. After doing that, .flush does the right thing (since all the data is there), and I can proceed to read from the file.
So the above code becomes:
myFile = open(fileName, "w")
p = subprocess.Popen(args, stdout=myFile)
p.wait() # <-- Missing line
myFile.flush() # force the file to disk
os.fsync(myFile) # ..
myFile.close()
myFile = open(fileName, "r")
while myFile.readline():
pass # do stuff
myFile.close()
And then it all works!
Do files opened like file("foo.txt") have any info about file modification time?
Basically I want to know if the file has been modified or replaced since a certain time, but if the file is replaced between checking modification time and opening the file, then you have inaccurate information.
How can I be sure?
Thanks.
UPDATE
#rubayeet: Thanks for the answer (+1), I actually didn't think of that. But... What to do if the modification time has changed? Perhaps I reload the file again. But what if it changes that time? If the file is being touched regularly I could end up in a loop forever! What I really want is a way to just get an open file handle and a modification time to go with it, without a potential infinite loop.
PS The answer you gave was actually plenty good enough for my purposes as the file won't be changed regularly, its general interest on my part now.
UPDATE 2
Thinking the previous update through (and experimenting a little) I realize that simply knowing the file modification time at the point the file was opened is not so much use as if the file is modified while reading you can have some or all of the modified data in the stuff you read in, so you'd have to open and read/process the whole file, then check mtime again (as per #rubayeet's answer) to see if you may have stale data.
For simple modtimes you would use:
from os.path import getmtime
modtime = getmtime('/file/to/path')
If you want something like a callback functionality you could check the inotify bindings for python: pyinotify.
You essentialy set a watchmanager up, which notifies you in a event-loop if any changes happens in the monitored directory. You register for specific events, like opening a file (which changes the modtime if written to).
If you are interested in an exclusive access to a file, i would point to the fnctl module, which has some lowlevel and file-locking mechanism on filedescriptors.
import os
filepath = '/path/to/file'
modifytime1 = os.path.getmtime(filepath)
fp = open(filepath)
modifytime2 = os.path.getmtime(filepath)
if modifytime1 != modifytime2:
print "File modified after opening"