Before you say "There´s already a thread covering that" - read further, there´s not.
I simply need to "address" the very first sector of a NTFS-Filesystem and read byte after byte (raw data). I do NOT need a program which does this, I need the code.
What I got so far:
drive = r"\\.\PhysicalDrive1"
pyLog = "C:\\ohMyPy\mft.txt"
hd = open(drive,encoding='cp850')
mft = hd.readlines(1024*10000)
with open(pyLog,'w',encoding='cp850') as f:
f.writelines(mft)
f.close
You need to open the files in binary mode ('rb'/'wb') otherwise Python will modify newline characters on Windows. Encoding is not needed when the file is opened in binary mode. Also, you can open both files in the same context manager (with) as shown below.
drive_filename = r'\\.\PhysicalDrive1'
log_filename = r'C:\ohMyPy\mft.txt'
with open(drive_filename, 'rb') as drive, open(log_filename, 'wb') as logfile:
logfile.write(drive.read(1024*10000))
I can read my MBR as follows;
drive = r"\\.\PhysicalDrive0"
hd = open(drive,'rb')
mbr = hd.read(512)
magic is in 'rb' = open file for reading in binary mode, i.e. do not change line-end characters.
Related
Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()
I have a simple server on my Windows PC written in python that reads files from a directory and then sends the file to the client via TCP.
Files like HTML and Javascript are received by the client correctly (sent and original file match).
The issue is that image data is truncated.
Oddly, different images are truncated at different lengths, but it's consistent per image.
For example, a specific 1MB JPG is always received as 95 bytes. Another image which should be 7KB, is received as 120 bytes.
Opening the truncated image files in notepad++, the data that is there is correct. (The only issue is that the file ends too soon).
I do not see a pattern for where the files end. The chars/bytes immediately before and after truncation are different per image.
I've tried three different ways for the server to read the files, but they all have the same result.
Here is a snippet of the reading and sending of files:
print ("Cache size=" + str(os.stat(filename).st_size))
#1st attempt, using readlines
fileobj = open(filename, "r")
cacheBuffer = fileobj.readlines()
for i in range(0, len(cacheBuffer)):
tcpCliSock.send(cacheBuffer[i])
#2nd attempt, using line, same result
with open(filename) as f:
for line in f:
tcpCliSock.send(f)
#3rd attempt, using f.read(), same result
with open(filename) as f:
tcpCliSock.send(f.read())
The script prints to the console the size of the file read, and the number of bytes matches the original image. So this proves the problem is in sending, right?
If the issue is with sending, what can I change to have the whole image sent properly?
Since you're dealing with images, which are binary files, you need to open the files in binary mode.
open(filename, 'rb')
From the Python documentation for open():
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.)
Since your server is running on Windows, as you read the file, Python is converting every \r\n it sees to \n. For text files, this is nice: You can write platform-independent code that only deals with \n characters. For binary files, this completely corrupts your data. That's why it's important to use 'b' when dealing with binary files, but also important to leave it off when dealing with text files.
Also, as TCP is a stream protocol, it's better to stream the data into the socket in smaller pieces. This avoids the need to read an entire file into memory, which will keep your memory usage down. Like this:
with open(filename, 'rb') as f:
while True:
data = f.read(4096)
if len(data) == 0:
break
tcpCliSock.send(data)
I have some trouble reading the .text section of a binary file.
The binary is compiled by gcc.
readelf -S binary_file
This command shows that
.text PROGBITS 0000831C 00031C 000340
The address if the .text section is 0000831c, offset = 00031c and size = 000340
I have tried
file = open('binary_file')
content = file.readlines()
And the Capstone could not recognize.
If the .text content looks like
f102 030e 0000 a0e3
how to read it as
content = b'\xf1\x02\x03\x0e\x00\x00\xa0\xe3'
By default, open() opens a file in text mode. To open a file in binary mode, you need to supply the appropriate mode: 'rb' - which means open for reading in binary mode.
readlines() is designed to read a line of text from a file, so it does not make sense to use it for reading from a binary file.
You want something like:
file = open('binary_file', 'rb')
content = file.read()
Here is the code
def main():
f = open("image.jpg", "rb")
filedata = f.read()
f.close()
print "Creating Test Image"
f = open("ftp_test.jpg", "w+")
f.write(filedata)
f.close()
print "Done!"
if __name__ == '__main__':
main()
Im not sure, why but here is the original image
and here is the resulting picture from the code
I'm not sure what to do so I decided to come to the experts since I'm only 14. I am also adding more to it like TCP communication. So I can send files over the internet.
You're reading the file in binary with rb, so write back in binary too, by using wb.
f = open("ftp_test.jpg", "wb+")
From the official docs:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
I need to do some manipulation of a number of pdf files. As a first step I wanted to copy them from a single directory into a tree that supports my needs. I used the following code
for doc in docList:
# these steps just create the directory structure I need from the file name
fileName = doc.split('\\')[-1]
ID = fileName.split('_')[0]
basedate = fileName.split('.')[0].split('_')[-1].strip()
rdate = '\\R' + basedate + '-' +'C' + basedate
newID = str(cikDict[ID])
newpath = basePath + newID + rdate
# check existence of the new path
if not os.path.isdir(newpath):
os.makedirs(newpath)
# reads the file in and then writes it to the new directory
fstring = open(doc).read()
outref = open(newpath +'\\' + fileName, 'wb')
outref.write(fstring)
outref.close()
When I run this code the directories are created and the there are files with the correct name in each directory. However, when I click to open a file I get an error from Acrobat informing me that the file was damaged and could not be repaired.
I was able to copy the files using
shutil.copy(doc,newpath)
To replace the last four lines - but I have not been able to figure out why I can't read the file as a string and then write it in a new location.
One thing I did was compare what was read from the source to what the file content was after a read after it had been written:
>>> newstring = open(newpath + '\\' +fileName).read()
>>> newstring == fstring
True
So it does not appear the content was changed?
I have not been able to figure out why I can't read the file as a string and then write it in a new location.
Please be aware that PDF is a binary file format, not a text file format. Methods treating files (or data in general) as text may change it in different ways, especially:
Reading data as text interprets bytes and byte sequences as characters according to some character encoding. Writing text back as data again transforms according some character encoding, too.
If the applied encodings differ, the result obviously differs from the original file. But even if the same encoding was used, differences can creep in: If the original file contains bytes which have no meaning in the applied encoding, some replacement character is used instead and the final result file contains the encoding of that replacement character, not the original byte sequence. Furthermore some encodings have multiple possible encodings for the same character. Thus, some input byte sequence may be replaced by some other sequence representing the same character in the output.
End-of-line sequences may be changed according to the preferences of the platform.
Binary files may contain different byte sequences used as end-of-line marker on one or the other platform, e.g. CR, LF, CRLF, ... Methods treating the data as text may replace all of them by the one sequence favored on the local platform. But as these bytes in binary files may have a different meaning than end-of-line, this replacement may be destructive.
Control characters in general may be ignored
In many encodings the bytes 0..31 have meanings as control characters. Methods treating binary data as text may interpret them somehow which may result in a changed output again.
All these changes can utterly destroy binary data, e.g. compressed streams inside PDFs.
You could try using binary mode for reading files by also opening them with a b in the mode string. Using binary mode both while reading and writing may solve your issue.
One thing I did was compare what was read from the source to what the file content was after a read after it had been written:
>>> newstring = open(newpath + '\\' +fileName).read()
>>> newstring == fstring
True
So it does not appear the content was changed?
Your comparison also reads the files as text. Thus, you do not compare the actual byte contents of the original and the copied file but their interpretations according to the encoding assumed while reading them. So damage has already been done on both sides of your comparison.
You should use shutil to copy files. It is platform aware and you avoid problems like this.
But you already discovered that.
You would be better served using with to open and close files. Then the files are opened and closed automatically. It is more idiomatic:
with open(doc, 'rb') as fin, open(fn_out, 'wb') as fout:
fout.write(fin.read()) # the ENTIRE file is read with .read()
If potentially you are dealing with a large file, read and write in chunks:
with open(doc, 'rb') as fin, open(fn_out, 'wb') as fout:
while True:
chunk=fin.read(1024)
if chunk:
fout.write(chunk)
else:
break
Note the 'rb' and 'wb' arguments to open. Since you are clearly opening this file under Windows, that prevents the interpretation of the file into a Windows string.
You should also use os.path.join rather than newpath + '\\' +fileName type operation.