Here is the code
def main():
f = open("image.jpg", "rb")
filedata = f.read()
f.close()
print "Creating Test Image"
f = open("ftp_test.jpg", "w+")
f.write(filedata)
f.close()
print "Done!"
if __name__ == '__main__':
main()
Im not sure, why but here is the original image
and here is the resulting picture from the code
I'm not sure what to do so I decided to come to the experts since I'm only 14. I am also adding more to it like TCP communication. So I can send files over the internet.
You're reading the file in binary with rb, so write back in binary too, by using wb.
f = open("ftp_test.jpg", "wb+")
From the official docs:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
Related
I have a simple server on my Windows PC written in python that reads files from a directory and then sends the file to the client via TCP.
Files like HTML and Javascript are received by the client correctly (sent and original file match).
The issue is that image data is truncated.
Oddly, different images are truncated at different lengths, but it's consistent per image.
For example, a specific 1MB JPG is always received as 95 bytes. Another image which should be 7KB, is received as 120 bytes.
Opening the truncated image files in notepad++, the data that is there is correct. (The only issue is that the file ends too soon).
I do not see a pattern for where the files end. The chars/bytes immediately before and after truncation are different per image.
I've tried three different ways for the server to read the files, but they all have the same result.
Here is a snippet of the reading and sending of files:
print ("Cache size=" + str(os.stat(filename).st_size))
#1st attempt, using readlines
fileobj = open(filename, "r")
cacheBuffer = fileobj.readlines()
for i in range(0, len(cacheBuffer)):
tcpCliSock.send(cacheBuffer[i])
#2nd attempt, using line, same result
with open(filename) as f:
for line in f:
tcpCliSock.send(f)
#3rd attempt, using f.read(), same result
with open(filename) as f:
tcpCliSock.send(f.read())
The script prints to the console the size of the file read, and the number of bytes matches the original image. So this proves the problem is in sending, right?
If the issue is with sending, what can I change to have the whole image sent properly?
Since you're dealing with images, which are binary files, you need to open the files in binary mode.
open(filename, 'rb')
From the Python documentation for open():
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.)
Since your server is running on Windows, as you read the file, Python is converting every \r\n it sees to \n. For text files, this is nice: You can write platform-independent code that only deals with \n characters. For binary files, this completely corrupts your data. That's why it's important to use 'b' when dealing with binary files, but also important to leave it off when dealing with text files.
Also, as TCP is a stream protocol, it's better to stream the data into the socket in smaller pieces. This avoids the need to read an entire file into memory, which will keep your memory usage down. Like this:
with open(filename, 'rb') as f:
while True:
data = f.read(4096)
if len(data) == 0:
break
tcpCliSock.send(data)
I have another doubt related to reading the dat file.
The file format is DAT file (.dat)
The content inside the file is in bytes.
When I tried the run open file code, the program built and ran successfully. However, the python shell has no output (I can't see the contents from the file).
Since the content inside the file is in bytes, should I modify the code ? What is the code to use for bytes?
Thank you.
There is no "DAT" file format and, as you say, the file contains bytes - as do all files.
It's possible that the file contains binary data for which it's best to open the file in binary mode. You do that by specifying b as part of the mode parameter to open(), like this:
f = open('file.dat', 'rb')
data = f.read() # read the entire file into data
print(data)
f.close()
Note that the full mode parameter is set to rb which means open the file in binary mode for reading.
A better way is to use with:
with open('file.dat', 'rb') as f:
data = f.read()
print(data)
No need to explicitly close the file.
If you know that the file contains text, possibly encoded in some specific encoding, e.g. UTF8, then you can specify the encoding when you open the file (Python 3):
with open('file.dat', encoding='UTF8') as f:
for line in f:
print(line)
In Python 2 you can use io.open().
Before you say "There´s already a thread covering that" - read further, there´s not.
I simply need to "address" the very first sector of a NTFS-Filesystem and read byte after byte (raw data). I do NOT need a program which does this, I need the code.
What I got so far:
drive = r"\\.\PhysicalDrive1"
pyLog = "C:\\ohMyPy\mft.txt"
hd = open(drive,encoding='cp850')
mft = hd.readlines(1024*10000)
with open(pyLog,'w',encoding='cp850') as f:
f.writelines(mft)
f.close
You need to open the files in binary mode ('rb'/'wb') otherwise Python will modify newline characters on Windows. Encoding is not needed when the file is opened in binary mode. Also, you can open both files in the same context manager (with) as shown below.
drive_filename = r'\\.\PhysicalDrive1'
log_filename = r'C:\ohMyPy\mft.txt'
with open(drive_filename, 'rb') as drive, open(log_filename, 'wb') as logfile:
logfile.write(drive.read(1024*10000))
I can read my MBR as follows;
drive = r"\\.\PhysicalDrive0"
hd = open(drive,'rb')
mbr = hd.read(512)
magic is in 'rb' = open file for reading in binary mode, i.e. do not change line-end characters.
the question is that how to write string decoded from base64 to a file? I use next piece of code:
import base64
input_file = open('Input.txt', 'r')
coded_string = input_file.read()
decoded = base64.b64decode(coded_string)
output_file = open('Output.txt', 'w')
output_file.write(decoded)
output_file.close()
Input.txt contains base64 string (smth. like PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48cmV2aW). After script execution I suppose to see xml in Output.txt but output file contains some wrong symbols (like <?xml version="1.0" encoding="UTF-8"?><review-case create®vFFSТ#2). At the same time if I not read from base64 string from file Input.txt but specify it in script as coded_string = '''PD94bWwgdmVyc2lvbj0iMS4wIiBlbm...''' then Output.txt contains correct xml. Is this something wrong with utf encoding? How to fix this? I use Python2.7 on Windows 7. Thanks in advance.
You probably figured out, now 5 years later, but here is the solution if anyone needs it.
import base64
with open('Input.txt', 'r') as input_file:
coded_string = input_file.read()
decoded = base64.b64decode(coded_string)
with open('Output.txt', 'w', encoding="utf-8") as output_file:
output_file.write(decoded.decode("utf-8"))
under windows you open with 'rb' instead of 'r'.
in your case your code should be :
input_file = open('Input.txt', 'rb')
instead of
input_file = open('Input.txt', 'r')
btw:
http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.
hope it helps
According to Pydocs,
fp = file('blah.xml', 'w+b')
or
fp = file('blah.xml', 'wb')
means open the file in write and binary mode. This is an xml file, however, so why do these two chaps
http://www.pixelmender.com/2010/10/12/scraping-data-using-scrapy-framework/
and
http://doc.scrapy.org/topics/exporters.html#scrapy.contrib.exporter.XmlItemExporter
recommend doing so in their tutorial/docs pages about exporting Scrapy items? In other words, why would anyone open a new xml file in 'b' mode?
It just doesn't make sense with plain XML files.
On Unix there is no difference between binary and non-binary. On Windows written '\n' get translated to '\r\n' if you write non-binary.
But it will make a difference if you embed binary BLOBs, but I don't see those on the sites you mentioned.