Intercept (using Python) data being written to a file from another process - python

I am working on something where this could come in handy in the future.
Does anyone know of a way I can intercept data (using Python) being written to a file (via some other language/process)?
I would know the path of the file I want to intercept and I preferably want to find a solution that would work on Windows. I know watchdog can watch for file changes but my goal would be to intercept the write before it touches the file.
For example they I have the following script running on my computer that just constantly writes to a file:
import time
filename = "testfile"
i = 1
while True:
with open(filename, 'a') as out:
out.write(str(i) + '\n')
time.sleep(1)
i += 1
Note: This is just an example. The data I want to intercept is not being written with Python. I don't know what it is written with.
In another script, I want to intercept everything being written to testfile.
I don't believe this is possible but I figured I would ask.

Using os.walk you can make a list of how many files you have in your whole directory and then keep checking it and cross reference it with a previous variable that says what the file count is, and when there is a difference you can open it using os.open.

Related

Python: Json file become empty

Here is my code of accessing&editing the file:
def edit_default_settings(self, setting_type, value):
with open("cam_settings.json", "r") as f:
cam_settings = json.load(f)
cam_settings[setting_type] = value
with open("cam_settings.json", 'w') as f:
json.dump(cam_settings, f, indent=4)
I use It in a program that runs for several hours in a day, and once in a ~week I'm noticing, that cam_settings.json file becoming empty (literally empty, the file explorer shows 0 bytes), but can't imagine how that is possible
Would be glad to hear some comments on what could go wrong
I can't see any issues with the code itself, but there can be an issue with the execution environment. Are you running the code in a multi-threaded environment or running multiple instances of the same program at once?
This situation can arise if this code is executed parallelly and multiple threads/processes try to access the file at the same time. Try logging each time the function was executed and if the function was executed successfully. Try exception handlers and error logging.
If this is a problem, using buffers or singleton pattern can solve the issue.
As #Chels said, the file is truncated when it's opened with 'w'. That doesn't explain why it stays that way; I can only imagine that happening if your code crashed. Maybe you need to check logs for code crashes (or change how your code is run so that crash reasons get logged, if they aren't).
But there's a way to make this process safer in case of crashes. Write to a separate file and then replace the old file with the new file, only after the new file is fully written. You can use os.replace() for this. You could do this simply with a differently-named file:
with open(".cam_settings.json.tmp", 'w') as f:
json.dump(cam_settings, f, indent=4)
os.replace(".cam_settings.json.tmp", "cam_settings.json")
Or you could use a temporary file from the tempfile module.
When openning a file with the "w" parameter, everytime you will write to it, the content of the file will be erased. (You will actually replace what's written already).
Not sure if this is what you are looking for, but could be one of the reasons why "cam_settings.json" becomes empty after the call of open("cam_settings.json", 'w')!
In such a case, to append some text, use the "a" parameter, as:
open("cam_settings.json", 'a')

How to run 2 files(.py) concurrently and update the variable to another file?

My definite goal is to update the real-time value to Matlab(simulink) from python to apply control system.
With separated processes, I get the real-time updating value.
The value type is an integer.
I want to pass this updating value to Matlab workspace.
So I tried using the command in Matlab workspace : pyrunfile('A.py')
However,
As you see this link, 10th line of "Limitations to Python Support",
https://fr.mathworks.com/help/matlab/matlab_external/limitations-to-python-support.html
Matlab doesn't support multiprocessing.
In other words, if I try to run the python file from Matlab workspace,
it doesn't work.
But multiprocessing is requisite for my work. (not working with multithread)
So my idea :
Run the file A.py which contains multiprocessing.
under A.py is still running, I pass the desired updating value to another file B.py with loop .
Export this value to Matlab workspace.
Matlab workspace -> simulink
Firstly , I would like to know whether it sounds feasible or not.
if not, I would like to have some other workflow suggestion.
summary :
python -> matlab is not possible because of multiprocessing.
python -> ?? -> matlab , is there any other method?
I'm not sure if this is the most efficient way, but you could write the variable to a file and read it from the other file.
#Read file
with open("file.txt", "r") as txt_file:
return txt_file.readlines()
#Open file
txt_file = open("file.txt", "w")
txt_file.write(var)
txt_file.close()
You can pass the values like that. I'm not sure how to do the rest, but i hope this helps
Also just make 2 instances of the command line, and run the files seperately to run both of them

How to share a variable between two python scripts run separately

I'm an extreme noob to python, so If there's a better way to do what I'm asking please let me know.
I have one file, which works with flask to create markers on a map. It has an array which stores these said markers. I'm starting the file through command prompt, and opening said file multiple times. Basically, how would one open a file multiple times, and have them share a variable (Not the same as having a subfile that shares variables with a superfile.) I'm okay with creating another file that starts the instances if needed, but I'm not sure how I'd do that.
Here is an example of what I'd like to accomplish. I have a file called, let's
say, test.py:
global number
number += 1
print(number)
I'd like it so that when I start this through command prompt (python test.py) multiple times, it'd print the following:
1
2
3
4
5
The only difference between above and what I have, is that what I have will be non-terminating and continuously running
What you seem to be looking for is some form of inter-process communication. In terms of python, each process has its own memory space and its own variables meaning that if I ran.
number += 1
print(number)
Multiple times then I would get 1,2..5 on a new line. No matter how many times I start the script, number would be a global.
There are a few ways where you can keep consistency.
Writing To A File (named pipe)
One of your scripts can have (generator.py)
import os
num = 1
try:
os.mkfifo("temp.txt")
except:
pass # In case one of your other files already started
while True:
file = open("temp.txt", "w")
file.write(num)
file.close() # Important because if you don't close the file
# The operating system will lock your file and your other scripts
# Won't have access
sleep(# seconds)
In your other scripts (consumer.py)
while True:
file = open("temp.txt", "r")
number = int(file.read())
print(number)
sleep(# seconds)
You would start 1 or so generator and as many consumers as you want. Note: this does have a race condition that can't really be avoided. When you write to the file, you should use a serializer like pickler or json to properly encode and decode your array object.
Other Ways
You can also look up how to use pipes (both named and unnamed), databases, ampq (IMHO the best way to do it but there is a learning curve and added dependencies), and if you are feeling bold use mmap.
Design Change
If you are willing to listen to a design change, Since you are making a flask application that has the variable in memory why don't you just make an endpoint to serve up your array and check the endpoint every so often?
import json # or pickle
import flask
app = Flask(__name__)
array = [objects]
converted = method_to_convert_to_array_of_dicts(array)
#app.route("/array")
def hello():
return json.dumps(array)
You will need to convert but then the web server can be hosted and your clients would just need something like
import requests
import json
while True:
result = requests.get('localhost/array')
array = json.loads(str(result.body)) # or some string form of result
sleep(...)
Your description is kind of confusing, but if I understand you correctly, one way of doing this would be to keep the value of the variable in a separate file.
When a script needs the value, read the value from the file and add one to it. If the file doesn't exist, use a default value of 1. Finally, rewrite the file with the new value.
However you said that this value would be shared among two python scripts, so you'd have to be careful that both scripts don't try to access the file at the same time.
I think you could use pickle.dump(your array, file) to serie the data(your array) intoto a file. And at next time running the script, you could just load the data back with pickle.dump(your array, file)

Python, run commands in specific order

I'm writing a script that gets the most recently modified file from a unix directory.
I'm certain it works, but I have to create a unittest to prove it.
The problem is the setUp function. I want to be able to predict the order the files are created in.
self.filenames = ["test1.txt", "test2.txt", "test3.txt", "filename.txt", "test4"]
newest = ''
for fn in self.filenames:
if pattern.match(fn): newest = fn
with open(fn, "w") as f: f.write("some text")
The pattern is "test.*.txt" so it just matches the first three in the list. In multiple tests, newest sometimes returns 'test3.txt' and sometimes 'test1.txt'.
Use os.utime to explicitly set modified time on the files that you have created. That way your test will run faster.
I doubt that the filesystem you are using supports fractional seconds on file create time.
I suggest you insert a call to time.sleep(1) in your loop so that the filesystem actually has a different timestamp on each created file.
It could be due to syncing. Just because you call write() on files in a certain order, it doesn't mean the data will be updated by the OS in that order.
Try calling f.flush() followed by os.fsync() on your file object before going to the next file. Giving some time between calls (using sleep()) might help also

Can I get File Modification Time from a file open for reading (python)

Do files opened like file("foo.txt") have any info about file modification time?
Basically I want to know if the file has been modified or replaced since a certain time, but if the file is replaced between checking modification time and opening the file, then you have inaccurate information.
How can I be sure?
Thanks.
UPDATE
#rubayeet: Thanks for the answer (+1), I actually didn't think of that. But... What to do if the modification time has changed? Perhaps I reload the file again. But what if it changes that time? If the file is being touched regularly I could end up in a loop forever! What I really want is a way to just get an open file handle and a modification time to go with it, without a potential infinite loop.
PS The answer you gave was actually plenty good enough for my purposes as the file won't be changed regularly, its general interest on my part now.
UPDATE 2
Thinking the previous update through (and experimenting a little) I realize that simply knowing the file modification time at the point the file was opened is not so much use as if the file is modified while reading you can have some or all of the modified data in the stuff you read in, so you'd have to open and read/process the whole file, then check mtime again (as per #rubayeet's answer) to see if you may have stale data.
For simple modtimes you would use:
from os.path import getmtime
modtime = getmtime('/file/to/path')
If you want something like a callback functionality you could check the inotify bindings for python: pyinotify.
You essentialy set a watchmanager up, which notifies you in a event-loop if any changes happens in the monitored directory. You register for specific events, like opening a file (which changes the modtime if written to).
If you are interested in an exclusive access to a file, i would point to the fnctl module, which has some lowlevel and file-locking mechanism on filedescriptors.
import os
filepath = '/path/to/file'
modifytime1 = os.path.getmtime(filepath)
fp = open(filepath)
modifytime2 = os.path.getmtime(filepath)
if modifytime1 != modifytime2:
print "File modified after opening"

Categories