I'm writing a script that gets the most recently modified file from a unix directory.
I'm certain it works, but I have to create a unittest to prove it.
The problem is the setUp function. I want to be able to predict the order the files are created in.
self.filenames = ["test1.txt", "test2.txt", "test3.txt", "filename.txt", "test4"]
newest = ''
for fn in self.filenames:
if pattern.match(fn): newest = fn
with open(fn, "w") as f: f.write("some text")
The pattern is "test.*.txt" so it just matches the first three in the list. In multiple tests, newest sometimes returns 'test3.txt' and sometimes 'test1.txt'.
Use os.utime to explicitly set modified time on the files that you have created. That way your test will run faster.
I doubt that the filesystem you are using supports fractional seconds on file create time.
I suggest you insert a call to time.sleep(1) in your loop so that the filesystem actually has a different timestamp on each created file.
It could be due to syncing. Just because you call write() on files in a certain order, it doesn't mean the data will be updated by the OS in that order.
Try calling f.flush() followed by os.fsync() on your file object before going to the next file. Giving some time between calls (using sleep()) might help also
Related
Here is my code of accessing&editing the file:
def edit_default_settings(self, setting_type, value):
with open("cam_settings.json", "r") as f:
cam_settings = json.load(f)
cam_settings[setting_type] = value
with open("cam_settings.json", 'w') as f:
json.dump(cam_settings, f, indent=4)
I use It in a program that runs for several hours in a day, and once in a ~week I'm noticing, that cam_settings.json file becoming empty (literally empty, the file explorer shows 0 bytes), but can't imagine how that is possible
Would be glad to hear some comments on what could go wrong
I can't see any issues with the code itself, but there can be an issue with the execution environment. Are you running the code in a multi-threaded environment or running multiple instances of the same program at once?
This situation can arise if this code is executed parallelly and multiple threads/processes try to access the file at the same time. Try logging each time the function was executed and if the function was executed successfully. Try exception handlers and error logging.
If this is a problem, using buffers or singleton pattern can solve the issue.
As #Chels said, the file is truncated when it's opened with 'w'. That doesn't explain why it stays that way; I can only imagine that happening if your code crashed. Maybe you need to check logs for code crashes (or change how your code is run so that crash reasons get logged, if they aren't).
But there's a way to make this process safer in case of crashes. Write to a separate file and then replace the old file with the new file, only after the new file is fully written. You can use os.replace() for this. You could do this simply with a differently-named file:
with open(".cam_settings.json.tmp", 'w') as f:
json.dump(cam_settings, f, indent=4)
os.replace(".cam_settings.json.tmp", "cam_settings.json")
Or you could use a temporary file from the tempfile module.
When openning a file with the "w" parameter, everytime you will write to it, the content of the file will be erased. (You will actually replace what's written already).
Not sure if this is what you are looking for, but could be one of the reasons why "cam_settings.json" becomes empty after the call of open("cam_settings.json", 'w')!
In such a case, to append some text, use the "a" parameter, as:
open("cam_settings.json", 'a')
I use multiple python scripts that collect data and write it into one single json data file.
It is not possible to combine the scripts.
The writing process is fast and it happens often that errors occur (e.g. some chars at the end duplicate), which is fatal, especially since I am using json format.
Is there a way to prevent a python script to write into a file if there are other script currently trying to write into the file? (It would be absolutely ok, if the data that the python script tries to write into the file gets lost, but it is important that the file syntax does not get somehow 'injured'.)
Code Snipped:
This opens the file and retrieves the data:
data = json.loads(open("data.json").read())
This appends a new dictionary:
data.append(new_dict)
And the old file is overwritten:
open("data.json","w").write( json.dumps(data) )
Info: data is a list which contains dicts.
Operating System: The hole process takes place on linux server.
On Windows, you could try to create the file, and bail out if an exception occurs (because file is locked by another script). But on Linux, your approach is bound to fail.
Instead, I would
write one file per new dictionary, suffixing filename by process ID and a counter
consuming process(es) don't read a single file, but the sorted files (according to modification time) and build the data from it
So in each script:
filename = "data_{}_{}.json".format(os.getpid(),counter)
counter+=1
open(filename ,"w").write( json.dumps(new_dict) )
and in the consumers (reading each dict of sorted files in a protected loop):
files = sorted(glob.glob("*.json"),key=os.path.getmtime())
data = []
for f in files:
try:
with open(f) as fh:
data.append(json.load(fh))
except Exception:
# IO error, malformed json file: ignore
pass
I will post my own solution, since it works for me:
Every single python script checks (before opening and writing the data file) whether a file called data_check exists. If so, the pyhthon script does not try to read and write the file and dismisses the data, that was supposed to be written into the file. If not, the python script creates the file data_check and then starts to read and wirte the file. After the writing process is done the file data_check is removed.
I'm an extreme noob to python, so If there's a better way to do what I'm asking please let me know.
I have one file, which works with flask to create markers on a map. It has an array which stores these said markers. I'm starting the file through command prompt, and opening said file multiple times. Basically, how would one open a file multiple times, and have them share a variable (Not the same as having a subfile that shares variables with a superfile.) I'm okay with creating another file that starts the instances if needed, but I'm not sure how I'd do that.
Here is an example of what I'd like to accomplish. I have a file called, let's
say, test.py:
global number
number += 1
print(number)
I'd like it so that when I start this through command prompt (python test.py) multiple times, it'd print the following:
1
2
3
4
5
The only difference between above and what I have, is that what I have will be non-terminating and continuously running
What you seem to be looking for is some form of inter-process communication. In terms of python, each process has its own memory space and its own variables meaning that if I ran.
number += 1
print(number)
Multiple times then I would get 1,2..5 on a new line. No matter how many times I start the script, number would be a global.
There are a few ways where you can keep consistency.
Writing To A File (named pipe)
One of your scripts can have (generator.py)
import os
num = 1
try:
os.mkfifo("temp.txt")
except:
pass # In case one of your other files already started
while True:
file = open("temp.txt", "w")
file.write(num)
file.close() # Important because if you don't close the file
# The operating system will lock your file and your other scripts
# Won't have access
sleep(# seconds)
In your other scripts (consumer.py)
while True:
file = open("temp.txt", "r")
number = int(file.read())
print(number)
sleep(# seconds)
You would start 1 or so generator and as many consumers as you want. Note: this does have a race condition that can't really be avoided. When you write to the file, you should use a serializer like pickler or json to properly encode and decode your array object.
Other Ways
You can also look up how to use pipes (both named and unnamed), databases, ampq (IMHO the best way to do it but there is a learning curve and added dependencies), and if you are feeling bold use mmap.
Design Change
If you are willing to listen to a design change, Since you are making a flask application that has the variable in memory why don't you just make an endpoint to serve up your array and check the endpoint every so often?
import json # or pickle
import flask
app = Flask(__name__)
array = [objects]
converted = method_to_convert_to_array_of_dicts(array)
#app.route("/array")
def hello():
return json.dumps(array)
You will need to convert but then the web server can be hosted and your clients would just need something like
import requests
import json
while True:
result = requests.get('localhost/array')
array = json.loads(str(result.body)) # or some string form of result
sleep(...)
Your description is kind of confusing, but if I understand you correctly, one way of doing this would be to keep the value of the variable in a separate file.
When a script needs the value, read the value from the file and add one to it. If the file doesn't exist, use a default value of 1. Finally, rewrite the file with the new value.
However you said that this value would be shared among two python scripts, so you'd have to be careful that both scripts don't try to access the file at the same time.
I think you could use pickle.dump(your array, file) to serie the data(your array) intoto a file. And at next time running the script, you could just load the data back with pickle.dump(your array, file)
How can I securely remove a file using python? The function os.remove(path) only removes the directory entry, but I want to securely remove the file, similar to the apple feature called "Secure Empty Trash" that randomly overwrites the file.
What function securely removes a file using this method?
You can use srm to securely remove files. You can use Python's os.system() function to call srm.
You can very easily write a function in Python to overwrite a file with random data, even repeatedly, then delete it. Something like this:
import os
def secure_delete(path, passes=1):
with open(path, "ba+") as delfile:
length = delfile.tell()
with open(path, "br+") as delfile:
for i in range(passes):
delfile.seek(0)
delfile.write(os.urandom(length))
os.remove(path)
Shelling out to srm is likely to be faster, however.
You can use srm, sure, you can always easily implement it in Python. Refer to wikipedia for the data to overwrite the file content with. Observe that depending on actual storage technology, data patterns may be quite different. Furthermore, if you file is located on a log-structured file system or even on a file system with copy-on-write optimisation, like btrfs, your goal may be unachievable from user space.
After you are done mashing up the disk area that was used to store the file, remove the file handle with os.remove().
If you also want to erase any trace of the file name, you can try to allocate and reallocate a whole bunch of randomly named files in the same directory, though depending on directory inode structure (linear, btree, hash, etc.) it may very tough to guarantee you actually overwrote the old file name.
So at least in Python 3 using #kindall's solution I only got it to append. Meaning the entire contents of the file were still intact and every pass just added to the overall size of the file. So it ended up being [Original Contents][Random Data of that Size][Random Data of that Size][Random Data of that Size] which is not the desired effect obviously.
This trickery worked for me though. I open the file in append to find the length, then reopen in r+ so that I can seek to the beginning (in append mode it seems like what caused the undesired effect is that it was not actually possible to seek to 0)
So check this out:
def secure_delete(path, passes=3):
with open(path, "ba+", buffering=0) as delfile:
length = delfile.tell()
delfile.close()
with open(path, "br+", buffering=0) as delfile:
#print("Length of file:%s" % length)
for i in range(passes):
delfile.seek(0,0)
delfile.write(os.urandom(length))
#wait = input("Pass %s Complete" % i)
#wait = input("All %s Passes Complete" % passes)
delfile.seek(0)
for x in range(length):
delfile.write(b'\x00')
#wait = input("Final Zero Pass Complete")
os.remove(path) #So note here that the TRUE shred actually renames to file to all zeros with the length of the filename considered to thwart metadata filename collection, here I didn't really care to implement
Un-comment the prompts to check the file after each pass, this looked good when I tested it with the caveat that the filename is not shredded like the real shred -zu does
The answers implementing a manual solution did not work for me. My solution is as follows, it seems to work okay.
import os
def secure_delete(path, passes=1):
length = os.path.getsize(path)
with open(path, "br+", buffering=-1) as f:
for i in range(passes):
f.seek(0)
f.write(os.urandom(length))
f.close()
Do files opened like file("foo.txt") have any info about file modification time?
Basically I want to know if the file has been modified or replaced since a certain time, but if the file is replaced between checking modification time and opening the file, then you have inaccurate information.
How can I be sure?
Thanks.
UPDATE
#rubayeet: Thanks for the answer (+1), I actually didn't think of that. But... What to do if the modification time has changed? Perhaps I reload the file again. But what if it changes that time? If the file is being touched regularly I could end up in a loop forever! What I really want is a way to just get an open file handle and a modification time to go with it, without a potential infinite loop.
PS The answer you gave was actually plenty good enough for my purposes as the file won't be changed regularly, its general interest on my part now.
UPDATE 2
Thinking the previous update through (and experimenting a little) I realize that simply knowing the file modification time at the point the file was opened is not so much use as if the file is modified while reading you can have some or all of the modified data in the stuff you read in, so you'd have to open and read/process the whole file, then check mtime again (as per #rubayeet's answer) to see if you may have stale data.
For simple modtimes you would use:
from os.path import getmtime
modtime = getmtime('/file/to/path')
If you want something like a callback functionality you could check the inotify bindings for python: pyinotify.
You essentialy set a watchmanager up, which notifies you in a event-loop if any changes happens in the monitored directory. You register for specific events, like opening a file (which changes the modtime if written to).
If you are interested in an exclusive access to a file, i would point to the fnctl module, which has some lowlevel and file-locking mechanism on filedescriptors.
import os
filepath = '/path/to/file'
modifytime1 = os.path.getmtime(filepath)
fp = open(filepath)
modifytime2 = os.path.getmtime(filepath)
if modifytime1 != modifytime2:
print "File modified after opening"