read whole file at once - python

I need to read whole source data from file something.zip (not uncompress it)
I tried
f = open('file.zip')
s = f.read()
f.close()
return s
but it returns only few bytes and not whole source data. Any idea how to achieve it? Thanks

Use binary mode(b) when you're dealing with binary file.
def read_zipfile(path):
with open(path, 'rb') as f:
return f.read()
BTW, use with statement instead of manual close.

As mentioned there is an EOF character (0x1A) that terminates the .read() operation. To reproduce this and demonstrate:
# Create file of 256 bytes
with open('testfile', 'wb') as fout:
fout.write(''.join(map(chr, range(256))))
# Text mode
with open('testfile') as fin:
print 'Opened in text mode is:', len(fin.read())
# Opened in text mode is: 26
# Binary mode - note 'rb'
with open('testfile', 'rb') as fin:
print 'Opened in binary mode is:', len(fin.read())
# Opened in binary mode is: 256

This should do it:
In [1]: f = open('/usr/bin/ping', 'rb')
In [2]: bytes = f.read()
In [3]: len(bytes)
Out[3]: 9728
For comparison, here's the file I opened in the code above:
-rwx------+ 1 xx yy 9.5K Jan 19 2005 /usr/bin/ping*

Related

How to convert binary string to binary python

I have a problem and I searched a lot and I didn't found an answer.
I read from file as example : "video.mp4" by binary
I get as an example : b'\x00\x02\x1a\x00' (binary list)
I saved it as string in file : b'\x00\x02\x1a\x00' (string)
I read it again as string : "b'\x00\x02\x1a\x00" (string)
I want to convert it again to binary, but no result !!
sometimes I get it like this b"b'\x00\x02\x1a\x00"'
Any answer ??
Oh sorry, here i made simple code because the original is messed up
#!/usr/bin/python
FILE = open("video.mp4", "rb")
FILE2 = open("video", "w")
chunk = FILE.read(8)
FILE2.write(str(chunk))
FILE.close()
FILE2.close()
FILE = open("video", "r")
line = FILE.readline()
print(line)
print(line == str(chunk))
FILE.close()
FILE = open("video_binary", "wb")
FILE.write(line) # Here I want line to convert to binary
FILE.close()
But it same thing

Converting the file of decimal values into hex format

I am new to python and I am trying to write a simple script that will convert each row of my .txt file containing decimals into hex format and will save it into another .txt file. My input has the 16 bit values in decimal format such as
15166
46818
26814
640
44756
27831
2646
This is a snippet that I have so far:
import binascii
filename = '1.txt'
content = f.read()
out = binascii.hexlify(content)
f = open('out.txt', 'wb')
f.write(out)
f.close()
This is the output that I am getting 31353136360d0a34363831380d0a32363831340d0a3634300d0a34343735360d0a32373833310d0a323634360d0a393237360d0a323238390d0a333330320d0a33393137370d0a393535340d0a363239310d0a31353438310d0a33353632300d0a35373330310d0a33323933350d0a3834380d0a34313639330d0a33353538340d0a31363936390d0a31313539300d0a31343639350d0a36333931350d0a393238340d0a33323339370d0a343235330d0a33323934320d0a31303139340d0a34393238360d0a34383430370d0a31333330350d0a3336340d0a36323735340d0a32313438310d0a35323734350d0a31303931310d0a34323835380d0a373731370d0a34393530320d0a35313034380d0a36323832330d0a34343833370d0a36313934300d0a33393137310d0a33333032320d0a32333836360d0a36313335360d0a31393038380d0a35393135340d0a36353335320d0a32343233300d0a32303936310d0a34313134330d0a35343433350d0a36343038380d0a35323334340d0a33373136370d0a32363734390d0a36353439300d0a36353236360d0a36313234320d0a33343933360d0a313532360d0a35313236310d0a33353039350d0a36303931350d0a34313336350d0a32333235370d0a333133350d0a33373433380d0a34363837350d0a363831390d0a34373034320d0a31373035380d0a363734350d0a35313135340d0a333535330d0a33343134320d0a36353334360d0a34343334310d0a35333330370d0a35333232320d0a34313336300d0a33383037300d0a32363134350d0a34343532310d0a34373836360d0a34393033360d0a36323037320d0a34373630330d0a34363337300d0a34303534360d0a31393231330d0a373930340d0a393839340d0a31383337350d0a35383231360d0a33353033380d0a31333338310d0a32313637350d0a33383333370d0a35393430340d0a31333933300d0a31353830370d0a33373434370d0a31313832370d0a34383331360d0a32393433350d0a32363831360d0a36313035360d0a34303533350d0a33383335340d0a31373037370d0a34383236360d0a31363237350d0a34343331370d0a35343836320d0a34303730370d0a32363735370d0a32353438380d0a3737320d0a32363038330d0a32373339370d0a35323934380d0a34313537340d0a32363934310d0a3433353539
So I need that each entry will be separated and displayed as a list in my output file. If I do have the line
for c in out:
print(c)
I get the huge list with two decimals in each of it and seemed to be wrong. Please post any solution for this problem.
Another way to do it is using hex like so:
filename = '1.txt'
newfile = '2.txt'
with open(filename, 'r') as f:
numbers = f.read().splitlines()
with open(newfile, 'w') as n:
for num in numbers:
n.write('{}\n'.format(hex(int(num))))

Python 3.6 - Read encoded text from file and convert to string

Hopefully someone can help me out with the following. It is probably not too complicated but I haven't been able to figure it out. My "output.txt" file is created with:
f = open('output.txt', 'w')
print(tweet['text'].encode('utf-8'))
print(tweet['created_at'][0:19].encode('utf-8'))
print(tweet['user']['name'].encode('utf-8'))
f.close()
If I don't encode it for writing to file, it will give me errors. So "output" contains 3 rows of utf-8 encoded output:
b'testtesttest'
b'line2test'
b'\xca\x83\xc9\x94n ke\xc9\xaan'
In "main.py", I am trying to convert this back to a string:
f = open("output.txt", "r", encoding="utf-8")
text = f.read()
print(text)
f.close()
Unfortunately, the b'' - format is still not removed. Do I still need to decode it? If possible, I would like to keep the 3 row structure.
My apologies for the newbie question, this is my first one on SO :)
Thank you so much in advance!
With the help of the people answering my question, I have been able to get it to work. The solution is to change the way how to write to file:
tweet = json.loads(data)
tweet_text = tweet['text'] # content of the tweet
tweet_created_at = tweet['created_at'][0:19] # tweet created at
tweet_user = tweet['user']['name'] # tweet created by
with open('output.txt', 'w', encoding='utf-8') as f:
f.write(tweet_text + '\n')
f.write(tweet_created_at+ '\n')
f.write(tweet_user+ '\n')
Then read it like:
f = open("output.txt", "r", encoding='utf-8')
tweettext = f.read()
print(text)
f.close()
Instead of specifying the encoding when opening the file, use it to decode as you read.
f = open("output.txt", "rb")
text = f.read().decode(encoding="utf-8")
print(text)
f.close()
If b and the quote ' are in your file, that means this in a problem with your file. Someone probably did write(print(line)) instead of write(line). Now to decode it, you can use literal_eval. Otherwise #m_callens answer's should be ok.
import ast
with open("b.txt", "r") as f:
text = [ast.literal_eval(line) for line in f]
for l in text:
print(l.decode('utf-8'))
# testtesttest
# line2test
# ʃɔn keɪn

Open() and codecs.open() in Python 2.7 behave strangely different

I have a text file with first line of unicode characters and all other lines in ASCII.
I try to read the first line as one variable, and all other lines as another. However, when I use the following code:
# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()
I get the following output:
<open file '1.txt', mode 'rb' at 0x01235230>
28
7
And now for something completely differerent:
<open file '1.txt', mode 'r' at 0x017875A0>
28
77
If I don't use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().
Why does such thing happen?
And why does codecs.open() read file in binary mode, despite the 'r' parameter is added?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687
Because you used .readline() first, the codecs.open() file has filled a linebuffer; the subsequent call to .readlines() returns only the buffered lines.
If you call .readlines() again, the rest of the lines are returned:
>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71
The work-around is to not mix .readline() and .readlines():
f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use io.open() instead of codecs.open(); the io library is what Python 3 uses to implement the built-in open() function and is a lot more robust and versatile than the codecs module.

Python3 (v3.2.2) extra bits when writing binary files

I have been working on function to map the bytes of a binary file to another set of bytes. I am reading from and writing to the same file. My problem is that every time i do it i end up with extra bytes unless i move to the end of the file before closing it, here is my code:
with open(self._path,'r+b') as source:
for lookAt in range(0,self._size[1]*self._size[2],1):
source.seek(lookAt*self._size[0],0)
readBuffer = array.array('B')
readBuffer.fromfile(source, self._size[0])
newLine = array.array('B',[mappingDict[mat] for mat in readBuffer])
source.seek(lookAt*self._size[0],0)
newLine.tofile(source)
source.seek(0,2) # Magic line that solves stupid bug
source.close()
I am using the array module to read and write data since i got the same problem when i used read() and write(). I do not understand why the 'Magic line' solves the problem since that's never used. I will appreciate any insight i can get on this.
Comment (answer follows):
I see the same behavior as you:
#!/usr/bin/env python3
import os
import sys
filename = '/tmp/a'
with open(filename, 'wb') as f:
f.write(b'1234a67b8ca')
print(open(filename, 'rb').read())
bufsize = 3
table = bytes.maketrans(b'abcde', b'xyzzz') # mapping
with open(filename, 'r+b') as f:
for i in range(0, os.path.getsize(filename), bufsize):
f.seek(i, os.SEEK_SET)
b = f.read(bufsize) # result shouldn't depend on it due to 1 -> 1
if not b:
break
f.seek(i, os.SEEK_SET)
f.write(b.translate(table))
f.seek(0, os.SEEK_END) # magic
print(open(filename, 'rb').read())
Output (with magic line or buffering=0 or f.flush() after f.write)
b'1234a67b8ca'
b'1234x67y8zx'
Output (without magic line)
b'1234a67b8ca'
b'1234a67b8zx1234x67y8'
Answer:
If your mapping is 1 -> 1 you could use bytes.translate():
#!/usr/bin/env python3
import io
import os
import sys
filename = '/tmp/a'
data = b'1234a67b8ca'*10000
with open(filename, 'wb') as f:
f.write(data)
assert data == open(filename, 'rb').read()
print(data[:10]+data[-10:])
bufsize = io.DEFAULT_BUFFER_SIZE
table = bytes.maketrans(b'abcde', b'xyzzz') # mapping
with open(filename, 'r+b') as f:
while True:
b = f.read(bufsize) # result shouldn't depend on bufsize due to 1 -> 1
if not b:
break
f.seek(-len(b), os.SEEK_CUR)
f.write(b.translate(table))
f.flush()
tr_data = data.translate(table)
assert tr_data == open(filename, 'rb').read()
print(tr_data[:10]+tr_data[-10:])
It seems that io.BufferedRandom can't do interlaced read/seek/write (bug in Python3) without flush().
Having experimented with this a little, I conjecture that it is a bug in Python 3.
In support of my conjecture, I offer the following code (based on #J.F. Sebastian's):
import os
import sys
filename = '/tmp/a'
with open(filename, 'wb') as f:
f.write(b'1234a67b8ca')
print(open(filename, 'rb').read())
bufsize = 3
with open(filename, 'r+b') as f:
for i in range(0, os.path.getsize(filename), bufsize):
f.seek(i, os.SEEK_SET)
b = f.read(bufsize)
f.seek(i, os.SEEK_SET)
f.write(b)
# f.seek(0, os.SEEK_END) # magic
print(open(filename, 'rb').read())
When run using Python 2.7.1, it works as you'd expect, and the magic line makes no difference.
When run using Python 3.1.2, it inexplicably requires the magic no-op seek() in order to make it work as expected.
At this point I'd suggest demonstrating the code to core Python 3 developers to get their opinion on whether this is indeed a bug.

Categories