Write decoded from base64 string to file - python

the question is that how to write string decoded from base64 to a file? I use next piece of code:
import base64
input_file = open('Input.txt', 'r')
coded_string = input_file.read()
decoded = base64.b64decode(coded_string)
output_file = open('Output.txt', 'w')
output_file.write(decoded)
output_file.close()
Input.txt contains base64 string (smth. like PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz48cmV2aW). After script execution I suppose to see xml in Output.txt but output file contains some wrong symbols (like <?xml version="1.0" encoding="UTF-8"?><review-case create®vFFSТ#2). At the same time if I not read from base64 string from file Input.txt but specify it in script as coded_string = '''PD94bWwgdmVyc2lvbj0iMS4wIiBlbm...''' then Output.txt contains correct xml. Is this something wrong with utf encoding? How to fix this? I use Python2.7 on Windows 7. Thanks in advance.

You probably figured out, now 5 years later, but here is the solution if anyone needs it.
import base64
with open('Input.txt', 'r') as input_file:
coded_string = input_file.read()
decoded = base64.b64decode(coded_string)
with open('Output.txt', 'w', encoding="utf-8") as output_file:
output_file.write(decoded.decode("utf-8"))

under windows you open with 'rb' instead of 'r'.
in your case your code should be :
input_file = open('Input.txt', 'rb')
instead of
input_file = open('Input.txt', 'r')
btw:
http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.
hope it helps

Related

UTF8 conversion takes too long and crashes if file is too big

I have written the following python code that should convert a file to UTF8. It works well but I noticed that if the file is too big (in this case we are talking of 10GB of file!) the program crashes!
In general it seems that it takes too much time: 9minutes to convert a 2GB of text files: maybe I can make it more efficient? I think it's because I'm first reading the whole file and then save it, could be that?
import sys
import codecs
filename= sys.argv[1]
with codecs.open(filename, 'r', encoding='iso-8859-1') as f:
text = f.read()
with codecs.open(filename, 'w', encoding='utf8') as f:
f.write(text)
Yes, this may happen because you're reading the whole file in one line.
It's better to read this file by pieces, convert them to utf-8 and then write those pieces to another file.
import sys
import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
sourceFileName = sys.argv[1]
targetFileName = sourceFileName + '-converted'
with codecs.open(sourceFileName, "r", "iso-8859-1") as sourceFile:
with codecs.open(targetFileName, "w", "utf-8") as targetFile:
while True:
contents = sourceFile.read(BLOCKSIZE)
if not contents: break
targetFile.write(contents)
I took code from this question (And modified it a bit)

In Python, how to use a file for writing bytes in it and reading as text

I would like to save bytes to a file and then read that file as a text. Can I do it with one with? What should I use, wb, r or wbr?
myBytesVar = b'line1\nline2'
with open('myFile.txt', 'wb') as fw:
fw.write(myBytesVar)
with open('myFile.txt', 'r') as fr:
myVar = fr.read()
print(myVar)
You don't need to re-read the file if you already have its contents stored in myBytesVar:
myBytesVar = b'line1\nline2'
with open('myFile.txt', 'wb') as fw:
fw.write(myBytesVar)
myVar = myBytesVar.decode('utf-8')
The encoding Python assumes when reading files as text without an explicit encoding is platform-dependent, so I'm just assuming UTF-8 will work.
Here is some information on what mode we should use :
The default mode is 'r' (open for reading text, synonym of 'rt'). For
binary read-write access, the mode 'w+b' opens and truncates the file
to 0 bytes. 'r+b' opens the file without truncation.
Read more here.
https://docs.python.org/3/library/functions.html#open
If you want to do it with one "with": When you write it "wb" is good.
when you read the file try it
myvar = open('MyVar.txt', 'r').read()
print(myvar)

Unicode(UTF-8) can't display correctly? (Python)

I have following code in Pyhton:
# myFile.csv tend to looks like:
# 'a1', 'ふじさん', 'c1'
# 'a2', 'ふじさん', 'c2'
# 'a3', 'ふじさん', 'c3'
s = u"unicodeText" # unicodeText like, ふじさん بعدة أش 일본富士山Ölkələr
with codecs.open('myFile.csv', 'w+', 'utf-8') as f: # codecs open
f.write(s.encode('utf-8', 'ignore'))
I was using Vim to edit the code and using Vim to open "myFile.csv";
It can success display unicode text from terminal;
but not able to display unicode text from Excel, nor from browser;
My platform is osx
I don't know if is my configuration problem or actually I code it wrong way, if you any idea, please advise. Deeply appreciate!
change open to codecs.open.
Thanks for point out f.close(), deleted.
Excel (at least on Windows) likes a Unicode BOM at the start of a .csv file even with UTF-8. There is a codec for that, utf-8-sig.
Also, Python 3's normal open is all that is required and no need for f.close() in a with:
#coding:utf8
data = '''\
a1,ふじさん,c1
a2,ふじさん,c2
a3,ふじさん,c3
'''
with open('myFile.csv', 'w', encoding='utf-8-sig') as f:
f.write(data)
It seems you're trying to open the file in text mode (because you specify an encoding), but then you try to write binary data (because you encode the text before writing it to the file). You need to either open the file as binary and write encoded text, or open it as text and write text.
Furthermore, your attempt to open it as text isn't even working because you're passing utf-8 as the buffering parameter instead of the encoding parameter. See the open() documentation`.
But even if you did all that correctly, this still wouldn't really help you with an Excel file, because those have a complicated binary structure. I recommend you use something like the xlrd to read xls files and Xlswriter to write them.
Here is a simple example that should work for .csv:
with open('file.csv', 'w', encoding='utf-8') as fh:
fh.write('This >µ< is a unicode GREEK LETTER MU\n')
or alternatively
with open('file.csv', 'wb') as fh:
fh.write('This >µ< is a unicode GREEK LETTER MU\n'.encode('utf-8'))
codecs.open opens a wrapped reader/writer which will do encoding/decoding for you. So you do not need to encode your string for writing. You need to pass the 'ignore' parameter in your open call.
with open('myFile.csv', 'w+', 'utf-8', 'ignore') as f:
f.write(s)
Note that you do not need to call close as you use a with statement.
Original answer, scratch that:
Third parameter of open is buffering requiring an integer.
You should write pass the encoding like this:
with open('myFile.xls', 'w+', encoding='utf-8') as f:
Note that you open the file in text mode. No need to encode the string for writing.
Also your file mode 'w+' is a bit odd. I'm not sure, but I think it will truncate your file. If you want to append to the file you should use 'a' as mode.

Python reading from a file and saving to utf-8

I'm having problems reading from a file, processing its string and saving to an UTF-8 File.
Here is the code:
try:
filehandle = open(filename,"r")
except:
print("Could not open file " + filename)
quit()
text = filehandle.read()
filehandle.close()
I then do some processing on the variable text.
And then
try:
writer = open(output,"w")
except:
print("Could not open file " + output)
quit()
#data = text.decode("iso 8859-15")
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()
This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....
Process text to and from Unicode at the I/O boundaries of your program using open with the encoding parameter. Make sure to use the (hopefully documented) encoding of the file being read. The default encoding varies by OS (specifically, locale.getpreferredencoding(False) is the encoding used), so I recommend always explicitly using the encoding parameter for portability and clarity (Python 3 syntax below):
with open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with open(filename, 'w', encoding='utf8') as f:
f.write(text)
If still using Python 2 or for Python 2/3 compatibility, the io module implements open with the same semantics as Python 3's open and exists in both versions:
import io
with io.open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
f.write(text)
You can also get through it by the code below:
file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()
You can't do that using open. use codecs.
when you are opening a file in python using the open built-in function you will always read/write the file in ascii. To write it in utf-8 try this:
import codecs
file = codecs.open('data.txt','w','utf-8')
The encoding parameter is what does the trick.
my_list = ['1', '2', '3', '4']
with open('test.txt', 'w', encoding='utf8') as file:
for i in my_list:
file.write(i + '\n')
You can try using utf-16, it might work.
data = pd.read_table(filename, encoding='utf-16', delimiter="\t")

How to write Unix end of line characters in Windows?

How can I write to files using Python (on Windows) and use the Unix end of line character?
e.g. When doing:
f = open('file.txt', 'w')
f.write('hello\n')
f.close()
Python automatically replaces \n with \r\n.
The modern way: use newline=''
Use the newline= keyword parameter to io.open() to use Unix-style LF end-of-line terminators:
import io
f = io.open('file.txt', 'w', newline='\n')
This works in Python 2.6+. In Python 3 you could also use the builtin open() function's newline= parameter instead of io.open().
The old way: binary mode
The old way to prevent newline conversion, which does not work in Python 3, is to open the file in binary mode to prevent the translation of end-of-line characters:
f = open('file.txt', 'wb') # note the 'b' meaning binary
but in Python 3, binary mode will read bytes and not characters so it won't do what you want. You'll probably get exceptions when you try to do string I/O on the stream. (e.g. "TypeError: 'str' does not support the buffer interface").
For Python 2 & 3
See: The modern way: use newline='' answer on this very page.
For Python 2 only (original answer)
Open the file as binary to prevent the translation of end-of-line characters:
f = open('file.txt', 'wb')
Quoting the Python manual:
On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.
You'll need to use the binary pseudo-mode when opening the file.
f = open('file.txt', 'wb')
def dos2unix(inp_file, out_file=None):
if out_file:
out_file_tmp = out_file
else:
out_file_tmp = inp_file + '_tmp'
if os.path.isfile(out_file_tmp):
os.remove(out_file_tmp)
with open(out_file_tmp, "w", newline='\n') as fout:
with open(inp_file, "r") as fin:
lines = fin.readlines()
lines = map(lambda line: line.strip() + '\n', lines)
fout.writelines(lines)
if not out_file:
shutil.move(out_file_tmp, inp_file)
print(f'dos2unix() {inp_file} is overwritten with converted data !')
else:
print(f'dos2unix() {out_file} is created with converted data !')

Categories