Convert file to base64 string on Python 3

Convert file to base64 string on Python 3 - python

I need to convert image (or any file) to base64 string. I use different ways, but result is always byte, not string. Example:
import base64
file = open('test.png', 'rb')
file_content = file.read()
base64_one = base64.encodestring(file_content)
base64_two = base64.b64encode(file_content)
print(type(base64_one))
print(type(base64_two))
Returned
<class 'bytes'>
<class 'bytes'>
How do I get a string, not byte? Python 3.4.2.

Base64 is an ascii encoding so you can just decode with ascii
>>> import base64
>>> example = b'\x01'*10
>>> example
b'\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01'
>>> result = base64.b64encode(example).decode('ascii')
>>> print(repr(result))
'AQEBAQEBAQEBAQ=='

I need to write base64 text in file ...
So then stop worrying about strings and just do that instead.
with open('output.b64', 'wb'):
write(base64_one)

The following code worked for me:
import base64
file_text = open(file, 'rb')
file_read = file_text.read()
file_encode = base64.encodebytes(file_read)
I initially tried base64.encodestring() but that function has been deprecated as per this issue.

Related

Numpy savetxt to a string

I would like to load the result of numpy.savetxt into a string. Essentially the following code without the intermediate file:
import numpy as np
def savetxts(arr):
np.savetxt('tmp', arr)
with open('tmp', 'rb') as f:
return f.read()

For Python 3.x you can use the io module:
>>> import io
>>> s = io.BytesIO()
>>> np.savetxt(s, (1, 2, 3), '%.4f')
>>> s.getvalue()
b'1.0000\n2.0000\n3.0000\n'
>>> s.getvalue().decode()
'1.0000\n2.0000\n3.0000\n'
Note: I couldn't get io.StringIO() to work. Any ideas?

You can use StringIO (or cStringIO):
This module implements a file-like class, StringIO, that reads and writes a string buffer (also known as memory files).
The description of the module says it all. Just pass an instance of StringIO to np.savetxt instead of a filename:
>>> s = StringIO.StringIO()
>>> np.savetxt(s, (1,2,3))
>>> s.getvalue()
'1.000000000000000000e+00\n2.000000000000000000e+00\n3.000000000000000000e+00\n'
>>>

Have a look at array_str or array_repr: http://docs.scipy.org/doc/numpy/reference/routines.io.html

Just requires extending previous answers with decode to UTF8 in order to generate a string. Very useful for exporting data to human readable text files.
import io
import numpy as np
s = io.BytesIO()
np.savetxt(s, np.linspace(0,10, 30).reshape(-1,3), delim=',' '%.4f')
outStr = s.getvalue().decode('UTF-8')

Storing a random byte string in Python

For my project, I need to be able to store random byte strings in a file and read the byte string again later. For example, I want to store randomByteString from the following code:
>>> from os import urandom
>>> randomByteString=urandom(8)
>>> randomByteString
b'zOZ\x84\xfb\xceM~'
What would be the proper way to do this?
Edit: Forgot to mention that I also want to store 'normal' string alongside the byte strings.

Code like:
>>> fh = open("e:\\test","wb")
>>> fh.write(randomByteString)
8
>>> fh.close()
Operate the file as binary mode. Also, you could do it in a better manner if the file operations are near one place (Thanks to #Blender):
>>> with open("e:\\test","wb") as fh:
fh.write(randomByteString)
Update: if you want to strong normal strings, you could encode it and then write it like:
>>> "test".encode()
b'test'
>>> fh.write("test".encode())
Here the fh means the same file handle opened previously.

Works just fine. You can't expect the output to make much sense though.
>>> import os
>>> with open("foo.txt", "wb") as fh:
... fh.write(os.urandom(8))
...
>>> fh.close()
>>> with open("foo.txt", "r") as fh:
... for line in fh.read():
... print line
...
^J^JM-/
^O
R
M-9
J
~G

Python JSON preserve encoding

I have a file like this:
aarónico
aaronita
ababol
abacá
abacería
abacero
ábaco
#more words, with no ascii chars
When i read and print that file to the console, it prints exactly the same, as expected, but when i do:
f.write(json.dumps({word: Lookup(line)}))
This is saved instead:
{"aar\u00f3nico": ["Stuff"]}
When i expected:
{"aarónico": ["Stuff"]}
I need to get the same when i jason.loads() it, but i don't know where or how to do the encoding or if it's needed to get it to work.
EDIT
This is the code that saves the data to a file:
with open(LEMARIO_FILE, "r") as flemario:
with open(DATA_FILE, "w") as f:
while True:
word = flemario.readline().strip()
if word == "":
break
print word #this is correct
f.write(json.dumps({word: RAELookup(word)}))
f.write("\n")
And this one loads the data and returns the dictionary object:
with open(DATA_FILE, "r") as f:
while True:
new = f.readline().strip()
if new == "":
break
print json.loads(new) #this is not
I cannot lookup the dictionaries if the keys are not the same as the saved ones.
EDIT 2
>>> import json
>>> f = open("test", "w")
>>> f.write(json.dumps({"héllö": ["stuff"]}))
>>> f.close()
>>> f = open("test", "r")
>>> print json.loads(f.read())
{u'h\xe9ll\xf6': [u'stuff']}
>>> "héllö" in {u'h\xe9ll\xf6': [u'stuff']}
False

This is normal and valid JSON behaviour. The \uxxxx escape is also used by Python, so make sure you don't confuse python literal representations with the contents of the string.
Demo in Python 3.3:
>>> import json
>>> print('aar\u00f3nico')
aarónico
>>> print(json.dumps('aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps('aar\u00f3nico')))
aarónico
In python 2.7:
>>> import json
>>> print u'aar\u00f3nico'
aarónico
>>> print(json.dumps(u'aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps(u'aar\u00f3nico')))
aarónico
When reading and writing from and to files, and when specifying just raw byte strings (and "héllö" is a raw byte string) then you are not dealing with Unicode data. You need to learn about the differences between encoded and Unicode data first. I strongly recommend you read at least 2 of the following 3 articles:
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
You were lucky with your "héllö" python raw byte string representation, Python managed to decode it automatically for you. The value read back from the file is perfectly normal and correct:
>>> print u'h\xe9ll\xf6'
héllö

StringIO and pystache generate spurious null bytes

I am parsing a mustache file into a string, and after that I want to process that string with the csv module. For that I generate a file like interface to the string using StringIO. The csv module is complaining with:
_csv.Error: line contains NULL byte
So I made a simple test:
import pystache
from cStringIO import StringIO
txt = pystache.render('Hello {{name}}', {'name' : 'Steve'})
f = StringIO(txt)
data = f.read()
print txt.find('\x00')
print data.find('\x00')
print txt.count('\x00')
print data.count('\x00')
Which produces:
-1
1
0
33
Somehow the StringIO object is inserting NULL bytes. This does not happen if I use a string which has not been pre-processed with pystache:
from cStringIO import StringIO
txt = "Hello Steve"
f = StringIO(txt)
data = f.read()
print txt.find('\x00')
print data.find('\x00')
print txt.count('\x00')
print data.count('\x00')
The result is as expected:
-1
-1
0
0
What could the problem be?

txt = "Hello Steve" is a bytestring, could the preprocessed string be a unicode string?

Python file input string: how to handle escaped unicode characters?

In a text file (test.txt), my string looks like this:
Gro\u00DFbritannien
Reading it, python escapes the backslash:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'
How can I have this interpreted as unicode? decode() and unicode() won't do the job.
The following code writes Gro\u00DFbritannien back to the file, but I want it to be Großbritannien
>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)

You want to use the unicode_escape codec:
>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien
See the docs for the vast number of standard encodings that come as part of the Python standard library.

Use the built-in 'unicode_escape' codec:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien\n'
>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'
You may also use codecs.open():
>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> input
u'Gro\xdfbritannien\n'
The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert file to base64 string on Python 3 - python

Base64 is an ascii encoding so you can just decode with ascii >>> import base64 >>> example = b'\x01'*10 >>> example b'\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01' >>> result = base64.b64encode(example).decode('ascii') >>> print(repr(result)) 'AQEBAQEBAQEBAQ=='

I need to write base64 text in file ... So then stop worrying about strings and just do that instead. with open('output.b64', 'wb'): write(base64_one)

The following code worked for me: import base64 file_text = open(file, 'rb') file_read = file_text.read() file_encode = base64.encodebytes(file_read) I initially tried base64.encodestring() but that function has been deprecated as per this issue.

Related

Numpy savetxt to a string

Storing a random byte string in Python

Python JSON preserve encoding

StringIO and pystache generate spurious null bytes

Python file input string: how to handle escaped unicode characters?

Categories

Resources