Python Disk Imaging

Python Disk Imaging - python

Trying to make a script for disk imaging (such as .dd format) in python. Originally started as a project to get another hex debugger and kinda got more interested in trying to get raw data from the drive. which turned into wanting to be able to image the drive first. Anyways, I've been looking around for about a week or so and found the best way get get information from the drive on smaller drives appears to be something like:
with file("/dev/sda") as f:
i=file("~/imagingtest.dd", "wb")
i.write(f.read(SIZE))
with size being the disk size. Problem is, which seems to be a well known issue, trying to use large disks shows up as (even in my case total size of 250059350016 bytes):
"OverflowError: Python int too large to convert to C long"
Is there a more appropriate way to get around this issue? As it works fine for a small flash drive, but trying to image a drive, fails.
I've seen mention of possibly just iterating by sector size (512) per the number of sectors (in my case 488397168) however would like to verify exactly how to do this in a way that would be functional.
Thanks in advance for any assistance, sorry for any ignorance you easily notice.

Yes, that's how you should do it. Though you could go higher than the sector size if you wanted.
with open("/dev/sda",'rb') as f:
with open("~/imagingtest.dd", "wb") as i:
while True:
if i.write(f.read(512)) == 0:
break

Read the data in blocks. When you reach the end of the device, .read(blocksize) will return the empty string.
You can use iter() with a sentinel to do this easily in a loop:
from functools import partial
blocksize = 12345
with open("/dev/sda", 'rb') as f:
for block in iter(partial(f.read, blocksize), ''):
# do something with the data block
You really want to open the device in binary mode, 'rb' if you want to make sure no line translations take place.
However, if you are trying to create copy into another file, you want to look at shutil.copyfile():
import shutil
shutil.copyfile('/dev/sda', 'destinationfile')
and it'll take care of the opening, reading and writing for you. If you want to have more control of the blocksize used for that, use shutil.copyfileobj(), open the file objects yourself and specify a blocksize:
import shutil
blocksize = 12345
with open("/dev/sda", 'rb') as f, open('destinationfile', 'wb') as dest:
shutil.copyfileobj(f, dest, blocksize)

Related

How to temporary save data in Python?

I read position data from a GPS Sensor in a dictionary, which I am sending in cyclic interval to a server.
If I have no coverage, the data will be saved in a list.
If connection can be reestablished, all list items will be transmitted.
But if the a power interruption occurs, all temp data elements will be lost.
What would be the best a pythonic solution to save this data?
I am using a SD card as storage, so i am not sure, if writing every element to a file would be the best solution.
Current implementation:
stageddata = []
position = {'lat':'1.2345', 'lon':'2.3455', 'timestamp':'2020-10-18T15:08:04'}
if not transmission(position):
stageddata.append(position)
else:
while stageddata:
position = stageddata.pop()
if not transmission(position):
stageddata.append(position)
return
EDIT: Finding the "best" solution may be very subjective. I agree with zvone, a power outage can be prevented. Perhaps a shutdown routine should save the temporary data.
So question may be how to pythonic save a given list to a file?

A good solution for temporary storage in Python is tempfile.
You can use it, e.g., like the following:
import tempfile
with tempfile.TemporaryFile() as fp:
# Store your varibale
fp.write(your_variable_to_temp_store)
# Do some other stuff
# Read file
fp.seek(0)
fp.read()

I agree with the comment of zvone. In order to know the best solution, we would need more information.
The following would be a robust and configurable solution.
import os
import pickle
backup_interval = 2
backup_file = 'gps_position_backup.bin'
def read_backup_data():
file_backup_data = []
if os.path.exists(backup_file):
with open(backup_file, 'rb') as f:
while True:
try:
coordinates = pickle.load(f)
except EOFError:
break
file_backup_data += coordinates
return file_backup_data
# When the script is started and backup data exists, stageddata uses it
stageddata = read_backup_data()
def write_backup_data():
tmp_backup_file = 'tmp_' + backup_file
with open(tmp_backup_file, 'wb') as f:
pickle.dump(stageddata, f)
os.replace(tmp_backup_file, backup_file)
print('Wrote data backup!')
# Mockup variable and method
transmission_return = False
def transmission(position):
return transmission_return
def try_transmission(position):
if not transmission(position):
stageddata.append(position)
if len(stageddata) % backup_interval == 0:
write_backup_data()
else:
while stageddata:
position = stageddata.pop()
if not transmission(position):
stageddata.append(position)
return
else:
if len(stageddata) % backup_interval == 0:
write_backup_data()
if __name__ == '__main__':
# transmission_return is False, so write to backup_file
for counter in range(10):
position = {'lat':'1.2345', 'lon':'2.3455'}
try_transmission(position)
# transmission_return is True, transmit positions and "update" backup_file
transmission_return = True
position = {'lat':'1.2345', 'lon':'2.3455'}
try_transmission(position)
I moved your code into some some functions. With the variable backup_interval, it is possible to control how often a backup is written to disk.
Additional Notes:
I use the built-in pickle module, since the data does not have to be human readable or transformable for other programming languages. Alternatives are JSON, which is human readable, or msgpack, which might be faster, but needs an extra package to be installed. The tempfile is not a pythonic solution, as it cannot easily be retrieved in case the program crashes.
stageddata is written to disk when it hits the backup_interval (obviously), but also when transmission returns True within the while loop. This is needed to "synchronize" the data on disk.
The data is written to disk completely new every time. A more sophisticated approach would be to just append the newly added positions, but then the synchronizing part, that I described before, would be more complicated too. Additionally, the safer temporary file approach (see Edit below) would not work.
Edit: I just reconsidered your use case. The main problem here is: Restoring data, even if the program gets interrupted at any time (due to power interruption or whatever). My first solution just wrote the data to disk (which solves part of the problem), but it could still happen, that the program crashes the moment when writing to disk. In that case the file would probably be corrupt and the data lost. I adapted the function write_backup_data(), so that it writes to a temporary file first and then replaces the old file. So now, even if a lot of data has to be written to disk and the crash happens there, the previous backup file would still be available.

Maybe saving it as a binary code could help to minimize the storage. 'pickle' and 'shelve' modules will help with storing objects and serializing (To serialize an object means to convert its state to a byte stream so that the byte stream can be reverted back into a copy of the object), but you should be carefull that when you resolve the power interruption it does not overwrite the data you have been storing, with open(file, "a") (a== append), you could avoid that.

Python script that writes result to txt file - why the lag?

I'm using Windows 7 and I have a super-simple script that goes over a directory of images, checking a specified condition for each image (in my case, whether there's a face in the image, using dlib), while writing the paths of images that fulfilled the condition to a text file:
def process_dir(dir_path):
i = 0
with open(txt_output, 'a') as f:
for filename in os.listdir(dir_path):
# loading image to check whether dlib detects a face:
image_path = os.path.join(dir_path, filename)
opencv_img = cv2.imread(image_path)
dets = detector(opencv_img, 1)
if len(dets) > 0 :
f.write(image_path)
f.write('\n')
i = i + 1
print i
Now the following thing happens: there seems to be a significant lag in appending lines to files. For example, I can see the script has "finished" checking several images (i.e, the console prints ~20, meaning 20 files who fulfill the condition have been found) but the .txt file is still empty. At first I thought there was a problem with my script, but after waiting a while I saw that they were in fact added to the file, only it seems to be updated in "batches".
This may not seem like the most crucial issue (and it's definitely not), but still I'm wondering - what explains this behavior? As far as I understand, every time the f.write(image_path) line is executed the file is changed - then why do I see the update with a lag?

Data written to a file object won't necessarily show up on disk immediately.
In the interests of efficiency, most operating systems will buffer the writes, meaning that data is only written out to disk when a certain amount has accumulated (usually 4K).
If you want to write your data right now, use the flush() function, as others have said.

Did you try using with buffersize 0, open(txt_output, 'a', 0).

I'm, not sure about Windows (please, someone correct me here if I'm wrong), but I believe this is because of how the write buffer is handled. Although you are requesting a write, the buffer only writes every so often (when the buffer is full), and when the file is closed. You can open the file with a smaller buffer:
with open(txt_output, 'a', 0) as f:
or manually flush it at the end of the loop:
if len(dets) > 0 :
f.write(image_path)
f.write('\n')
f.flush()
i = i + 1
print i
I would personally recommend flushing manually when you need to.

It sounds like you're running into file stream buffering.
In short, writing to a file is a very slow process (relative to other sorts of things that the processor does). Modifying the hard disk is about the slowest thing you can do, other than maybe printing to the screen.
Because of this, most file I/O libraries will "buffer" your output, meaning that as you write to the file the library will save your data in an in-memory buffer instead of modifying the hard disk right away. Only when the buffer fills up will it "flush" the buffer (write the data to disk), after which point it starts filling the buffer again. This often reduces the number of actual write operations by quite a lot.
To answer your question, the first question to answer is, do really need to append to the file immediately every time you find a face? It will probably slow down your processing by a noticeable amount, especially if you're processing a large number of files.
If you really do need to update immediately, you basically have two options:
Manually flush the write buffer each time you write to the file. In Python, this usually means calling f.flush(), as #JamieCounsell pointed out.
Tell Python to just not use a buffer, or more accurately to use a buffer of size 0. As #VikasMadhusudana pointed out, you can tell Python how big of a buffer to use with a third argument to open(): open(txt_output, 'a', 0) for a 0-byte buffer.
Again, you probably don't need this; the only case I can think that might require this sort of thing is if you have some other external operation that's watching the file and triggers off of new data being added to it.
Hope that helps!

It's flush related, try:
print(image_path, file=f) # Python 3
or
print >>f, image_page # Python 2
instead of:
f.write(image_path)
f.write('\n')
print flushes.
another good thing about print is it gives you the newline for free.

How do i replace a specific value in a file in python

Im trying to replace the zero's with a value. So far this is my code, but what do i do next?
g = open("January.txt", "r+")
for i in range(3):
dat_month = g.readline()
Month: January
Item: Lawn
Total square metres purchased:
0
monthly value = 0

You could do that -
but that is not the usual approach, and certainly is not the correct approach for text files.
The correct way to do it is to write another file, with the information you want updated in place, and then rename the new file to the old one. That is the only sane way of doing this with text files, since the information size in bytes for the fields is variable.
As for the impresion that you are "writing 200 bytes to the disk" instead of a single byte, changing your value, don't let that fool you: at the Operating system level, all file access has to be done in blocks, which are usually a couple of kilobytes long (in special cases, and tunned filesystems it could be a couple hundred bytes). Anyway, you will never, in a user-space program, much less in a high level language like Python, trigger a diskwrite of less than a few hundred bytes.
Now, for the code:
import os
my_number = <number you want to place in the line you want to rewrite>
with open("January.txt", "r") as in_file, open("newfile.txt", "w") as out_file:
for line in in_file:
if line.strip() == "0":
out_file.write(str(my_number) + "\n")
else:
out_file.write(line)
os.unlink("January.txt")
os.rename("newfile.txt", "January.txt")
So - that is the general idea -
of course you should not write code with all values hardcoded in that way (i.e. the values to be checked and written fixed in the program code, as are the filenames).
As for the with statement - it is a special construct of the language wich is very appropriate to oppening files and manipulating then in a block, like in this case - but it is not needed.
Programing apart, the concept you have to keep in mind is this:
when you use an application that lets you edit a text file, a spreadsheet, an image, you, as user, may have the impression that after you are done and have saved your work, the updates are comitted to the same file. In the vast, vast majority of use cases, that is not what happens: the application uses internally a pattern like the one I presented above - a completly new file is written to disk and the old one is deleted, or renamed. The few exceptions could be simple database applications, which could replace fixed width fields inside the file itself on updates. Modern day databases certainly do not do that, resorting to appending the most recent, updated information, to the end of the file. PDF files are another kind that were not designed to be replaced entirely on each update, when being created: but also in that case, the updated information is written at the end of the file, even if the update is to take place in a page in the beginning of the rendered document.

dat_month = dat_month.replace("0", "45678")
To write to a file you do:
with open("Outfile.txt", "wt") as outfile:
And then
outfile.write(dat_month)

Try this:
import fileinput
import itertools
import sys
with fileinput.input('January.txt', inplace=True) as file:
beginning = tuple(itertools.islice(file, 3))
sys.stdout.writelines(beginning)
sys.stdout.write(next(file).replace('0', 'a value'))
sys.stdout.write(next(file).replace('0', 'a value'))
sys.stdout.writelines(file)

How can a Python program load and read specific lines from a file?

I have a huge file of numbers in binary format, and only certain parts of it needs to be parsed into an array. I looked into numpy.fromfile and open, but they don't have the option to read from location A to location B in the file. Can this be done?

If you're dealing with "huge files", I would not simply read-ignore everything up until the point where you actually need the data.
Instead: file objects in Python have a .seek() method which you can use to jump right where you need to start parsing the data efficiently bypassing everything before.
with open('huge_file.dat', 'rb') as f:
f.seek(1024 * 1024 * 1024) # skip 1GB
...
See also: http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

If you know about the precise location of the data you are interested in, you could just use the seek(<n bytes>) method on the file object as documented. Just call it once (with the given offset) before you start to read.

Python securely remove file

How can I securely remove a file using python? The function os.remove(path) only removes the directory entry, but I want to securely remove the file, similar to the apple feature called "Secure Empty Trash" that randomly overwrites the file.
What function securely removes a file using this method?

You can use srm to securely remove files. You can use Python's os.system() function to call srm.

You can very easily write a function in Python to overwrite a file with random data, even repeatedly, then delete it. Something like this:
import os
def secure_delete(path, passes=1):
with open(path, "ba+") as delfile:
length = delfile.tell()
with open(path, "br+") as delfile:
for i in range(passes):
delfile.seek(0)
delfile.write(os.urandom(length))
os.remove(path)
Shelling out to srm is likely to be faster, however.

You can use srm, sure, you can always easily implement it in Python. Refer to wikipedia for the data to overwrite the file content with. Observe that depending on actual storage technology, data patterns may be quite different. Furthermore, if you file is located on a log-structured file system or even on a file system with copy-on-write optimisation, like btrfs, your goal may be unachievable from user space.
After you are done mashing up the disk area that was used to store the file, remove the file handle with os.remove().
If you also want to erase any trace of the file name, you can try to allocate and reallocate a whole bunch of randomly named files in the same directory, though depending on directory inode structure (linear, btree, hash, etc.) it may very tough to guarantee you actually overwrote the old file name.

So at least in Python 3 using #kindall's solution I only got it to append. Meaning the entire contents of the file were still intact and every pass just added to the overall size of the file. So it ended up being [Original Contents][Random Data of that Size][Random Data of that Size][Random Data of that Size] which is not the desired effect obviously.
This trickery worked for me though. I open the file in append to find the length, then reopen in r+ so that I can seek to the beginning (in append mode it seems like what caused the undesired effect is that it was not actually possible to seek to 0)
So check this out:
def secure_delete(path, passes=3):
with open(path, "ba+", buffering=0) as delfile:
length = delfile.tell()
delfile.close()
with open(path, "br+", buffering=0) as delfile:
#print("Length of file:%s" % length)
for i in range(passes):
delfile.seek(0,0)
delfile.write(os.urandom(length))
#wait = input("Pass %s Complete" % i)
#wait = input("All %s Passes Complete" % passes)
delfile.seek(0)
for x in range(length):
delfile.write(b'\x00')
#wait = input("Final Zero Pass Complete")
os.remove(path) #So note here that the TRUE shred actually renames to file to all zeros with the length of the filename considered to thwart metadata filename collection, here I didn't really care to implement
Un-comment the prompts to check the file after each pass, this looked good when I tested it with the caveat that the filename is not shredded like the real shred -zu does

The answers implementing a manual solution did not work for me. My solution is as follows, it seems to work okay.
import os
def secure_delete(path, passes=1):
length = os.path.getsize(path)
with open(path, "br+", buffering=-1) as f:
for i in range(passes):
f.seek(0)
f.write(os.urandom(length))
f.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.