My question is somewhat unique. I am currently working on a project for my computer forensics class. This project is aimed at hiding disk data from investigators. The method by which this is supposed to be achieved is by writing the bytes of a "clean" file over the "bad" file. Once overwritten, the "bad" file is deleted.
This concept sounds simple enough, but what my partner and I have observed is interesting. If we open a file in a python script, we can easily overwrite the memory associated with that file on disk (verified using dd). We can also easily delete a file using from inside the script. However, a write then delete results in no write actually taking place, only the file's removal.
This makes sense from an OS optimization standpoint. From that point, we thought it might work if we split the writing and deleting into two separate scripts, and controlled both by a third. However, it seems that even if we run the scripts as a subprocess of another script, the same thing happens. We've tried to use bash scripts for the deletion process instead of pure python, and still, nothing sticks.
This project was supposed to be a whole mess of little anti-forensics tools like this, but this particular one has captured our whole attention because of this issue. Does anyone have an idea as to why this is happening and what we can do to move forward?
We know this can be achieved in C, etc, but we want to solve this using python because of the interesting constraints it's presented.
---EDIT---
This is a snippet from our controller, it calls "ghost.py" with the associated params.
ghost.py prints the edited file names/paths to stdout.
Relevant code follows:
proc = subprocess.Popen(['python', 'ghost.py', '-c', 'good.txt', '-d','/mnt/evil.txt'], stdout=subprocess.PIPE,)
files = proc.communicate()
for i in files:
if i != None and i != "\n":
os.system("./del.sh " + i)
Using a subprocess doesn't change any interesting aspect of your design, so don't use them. You probably need os.fsync(). Try this pattern:
myfile.write('all of my good data')
myfile.flush()
os.fsync(myfile.fileno())
myfile.close()
os.remove(myfile)
Reference: https://docs.python.org/2/library/os.html#os.fsync
Related
I have two programs written in python and converted to one-file exe using auto-py-to-exe.
the first program writes to a file, which is read by the second program. The problem is when the second program wants to read the file the same time as it is being written, the code stops with a permission error.
The solutions that seemed to work are:
Using time management which is not useful in my case, since the reading and writing times are not constant.
I could check if the file is accessible, which might be a solution, however, I suppose it would raise an error if while reading the file, the writer tries to change the file.
I could use the size of the file to check if writing to the file has been finished, and then execute the reader, however, this does not seem to be both logical and pythonic!
I found some solutions using os.pipe(), but to be honest, I couldn't understand what the process does. If this is a solution, I would be glad to have it explained in simple English.
That's it. Any suggestions?
P.S: OS is windows and I am using Python 3.9
Solved:
Thanks to the replies and suggestions, I didn't know that the try except commands accept ErrorType. Thus, I solved the problem by using 'except' and 'PermissionError'. the code runs in a loop and it is checked again in a few seconds.
However, the drawback is this: the reading time should be less than the time the writer comes back to rewrite the file! In my case, as suggested by friends, I combined the two programs so they are run sequentially.
I have a collection of expert advisor (EA) scripts written in the MQL5 programming language for the stock/forex trading platform, MetaTrader5. The extension of these files is mq5. I am looking for a way to programatically run these MQL5 files from my Python script on a regular basis. The EAs do some price transformations, eventually saving a set of csv files that will later be read by my Python script to apply Machine Learning models on them.
My first natural choice was the Python API for MetaTrader5. However, according to its documentation, it "is designed for convenient and fast obtaining of exchange data via interprocessor communication directly from the MetaTrader 5 terminal" and as such, it doesn't provide the functionality I need to be able to run MQL scripts using Python.
I have found some posts here on SO (such as #1, #2) about executing non-python files using Python but those posts seemed to always come with the precondition that they already had Python code written in them, only the extension differed - this is different from my goal.
I then came across Python's subprocess module and started experimenting with that.
print(os.path.isfile(os.path.join("path/to/dir","RSIcalc.mq5")))
with open(os.path.join("path/to/dir","RSIcalc.mq5")) as f:
subprocess.run([r"C:\Program Files\MetaTrader 5\terminal64.exe", f], capture_output=True)
The print statement returns True, so the mq5 file exists in the specified location. Then the code opens the MetaTrader5 terminal but nothing else happens, the EA doesn't get executed, process finishes immediately after that.
Am I even on the right track for what I'm trying to achieve here? If yes, what might be the solution for me to run these MQL5 scripts programatically from Python?
Edit:
I use Windows 10 64-bit.
subprocess is indeed the right module for what you want to achieve. But let's look at what you're doing here:
with open(os.path.join("path/to/dir","RSIcalc.mq5")) as f
You're creating a file descriptor handle called f, which is used to write or read contents from a file. If you do print(f) you'll see that it's a python object, that converted to string looks like <_io.TextIOWrapper name='RSIcalc.mq5' mode='r' encoding='UTF-8'>. It is extremely unlikely that such a string is what you want to pass as a command-line parameter to your terminal executable, which is what happens when you include it in your call to subprocess.run().
What you likely want to do is this:
full_path = os.path.abspath(os.path.join("path/to/dir","RSIcalc.mq5"))
result = subprocess.run([r"C:\Program Files\MetaTrader 5\terminal64.exe", full_path], capture_output=True)
Now, this assumes your terminal64 can execute arbitrary scripts passed as parameters. This may or may not be true - you might need extra parameters like "-f" before passing the file path, or you might have to feed script contents through the stdin pipe (unlikely, on Windows, but who knows). That's for you to figure out, but my code above should probably be your starting point.
I don’t think you need to be passing a file object to your sub process statement. In my experience. A program will run a file when the path to the file is provided as a command line argument. Try this:
subprocess.run([r"C:\\Program Files\\MetaTrader 5\\terminal64.exe", os.path.join(“path/to/dir”, “RSIcalc.mq5”], capture_output=True)
This is the same as typing C:\Program Files\MetaTrader 5\terminal64.exe path\to\dir\RSIcalc.mq5 in your terminal.
I have a project in mind, but there is a section that I don't know how to do. I'm using Python version 3.6 and windows 10. For example we have a file name of "example.txt" I want to prevent the name and its content of this file from being changed.
I did research on this topic, but I could not reach any research. Can we prevent the file's name (including its extension) from changing or its contents?To realize this, I think it is necessary to start as an administrator.
Thanks.
It is possible to stop another program from editing a file by locking it in python.
There is a module that does this called filelock. Take a look at the source code to see how it is done.
It is also worth noting that more advanced ransomware will try to stop processes so they can encrypt files, so this might not work in all cases.
Suppose I have the file test.py. I import it using the interactive interpreter, I add some strings to data in my file. How would I go about saving the file afterwards? Closing the interpreter keeps test.py in its original state. I've looked at a couple questions that told me to use pickle. However, one of the issues that was pointed out in the documentation was that pickle is not readable at all. I've tried using the open function, but I'm not sure how I would go about using it effectively.
Bottom line, how would I take data from a python script, add or remove parts of it, and then save it for use afterwards?
test.py:
# test.py
data = []
Python Shell:
>>> import test
>>> test.data.append("Hello There!")
>>> test.data
['Hello There!']
There is no way to work with Python "data" files while saving them and preserving them as scripts.
While pickle lacks human readability, it's now become my choice in handling data and saving them. This is for the current program that I am writing, as well as future programs to come.
While pickle lacks human readability, at least it is machine readable, meaning that I won't have to go through loads of useless work. It's a compromise for the better.
Thanks to everyone that helped me come to like pickle.
I am writing a script that will be polling a directory looking for new files.
In this scenario, is it necessary to do some sort of error checking to make sure the files are completely written prior to accessing them?
I don't want to work with a file before it has been written completely to disk, but because the info I want from the file is near the beginning, it seems like it could be possible to pull the data I need without realizing the file isn't done being written.
Is that something I should worry about, or will the file be locked because the OS is writing to the hard drive?
This is on a Linux system.
Typically on Linux, unless you're using locking of some kind, two processes can quite happily have the same file open at once, even for writing. There are three ways of avoiding problems with this:
Locking
By having the writer apply a lock to the file, it is possible to prevent the reader from reading the file partially. However, most locks are advisory so it is still entirely possible to see partial results anyway. (Mandatory locks exist, but a strongly not recommended on the grounds that they're far too fragile.) It's relatively difficult to write correct locking code, and it is normal to delegate such tasks to a specialist library (i.e., to a database engine!) In particular, you don't want to use locking on networked filesystems; it's a source of colossal trouble when it works and can often go thoroughly wrong.
Convention
A file can instead be created in the same directory with another name that you don't automatically look for on the reading side (e.g., .foobar.txt.tmp) and then renamed atomically to the right name (e.g., foobar.txt) once the writing is done. This can work quite well, so long as you take care to deal with the possibility of previous runs failing to correctly write the file. If there should only ever be one writer at a time, this is fairly simple to implement.
Not Worrying About It
The most common type of file that is frequently written is a log file. These can be easily written in such a way that information is strictly only ever appended to the file, so any reader can safely look at the beginning of the file without having to worry about anything changing under its feet. This works very well in practice.
There's nothing special about Python in any of this. All programs running on Linux have the same issues.
On Unix, unless the writing application goes out of its way, the file won't be locked and you'll be able to read from it.
The reader will, of course, have to be prepared to deal with an incomplete file (bearing in mind that there may be I/O buffering happening on the writer's side).
If that's a non-starter, you'll have to think of some scheme to synchronize the writer and the reader, for example:
explicitly lock the file;
write the data to a temporary location and only move it into its final place when the file is complete (the move operation can be done atomically, provided both the source and the destination reside on the same file system).
If you have some control over the writing program, have it write the file somewhere else (like the /tmp directory) and then when it's done move it to the directory being watched.
If you don't have control of the program doing the writing (and by 'control' I mean 'edit the source code'), you probably won't be able to make it do file locking either, so that's probably out. In which case you'll likely need to know something about the file format to know when the writer is done. For instance, if the writer always writes "DONE" as the last four characters in the file, you could open the file, seek to the end, and read the last four characters.
Yes it will.
I prefer the "file naming convention" and renaming solution described by Donal.