StringIO and pystache generate spurious null bytes - python

I am parsing a mustache file into a string, and after that I want to process that string with the csv module. For that I generate a file like interface to the string using StringIO. The csv module is complaining with:
_csv.Error: line contains NULL byte
So I made a simple test:
import pystache
from cStringIO import StringIO
txt = pystache.render('Hello {{name}}', {'name' : 'Steve'})
f = StringIO(txt)
data = f.read()
print txt.find('\x00')
print data.find('\x00')
print txt.count('\x00')
print data.count('\x00')
Which produces:
-1
1
0
33
Somehow the StringIO object is inserting NULL bytes. This does not happen if I use a string which has not been pre-processed with pystache:
from cStringIO import StringIO
txt = "Hello Steve"
f = StringIO(txt)
data = f.read()
print txt.find('\x00')
print data.find('\x00')
print txt.count('\x00')
print data.count('\x00')
The result is as expected:
-1
-1
0
0
What could the problem be?

txt = "Hello Steve" is a bytestring, could the preprocessed string be a unicode string?

Related

String Replace in csv

Below, I am trying to replace data in a csv. The code works, but it replaces anything matching stocklevelin the file.
def updatestocklevel(quantity, stocklevel, code):
newlevel = stocklevel - quantity
stocklevel = str(stocklevel)
newlevel = str(newlevel)
s = open("stockcontrol.csv").read()
s = s.replace (stocklevel ,newlevel) #be careful - will currently replace any number in the file matching stock level!
f = open("stockcontrol.csv", 'w')
f.write(s)
f.close()
My csv looks like this;
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
Elsewhere in my program, I have a function that searches for the code (8 digits)
Is it possible to say, if the code is in the line of the csv, replace the data in the second column? (data[2])
All the occurances of stocklevel are getting replaced with the value of newlevel as you are calling s.replace (stocklevel ,newlevel).
string.replace(s, old, new[, maxreplace]): Return a copy of string s
with all occurrences of substring old replaced by new. If the optional
argument maxreplace is given, the first maxreplace occurrences are
replaced.
source
As you suggested, you need to get the code and use it replace the stock level.
This is a sample script which takes the 8 digit code and the new stock level as the command line arguments adn replaces it:
import sys
import re
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
data=f.readlines()
print data
for i,line in enumerate(data):
if re.search('%s,\d+'%code,line): # search for the line with 8-digit code
data[i] = '%s,%d\n'%(code,newval) # replace the stock value with new value in the same line
f.close()
f=open("in.csv","w")
f.write("".join(data))
print data
f.close()
Another solution using the csv module of Python:
import sys
import csv
data=[]
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
reader=csv.DictReader(f,fieldnames=['code','level'])
for line in reader:
if line['code'] == code:
line['level']= newval
data.append('%s,%s'%(line['code'],line['level']))
f.close()
f=open("stockcontrol.csv","w")
f.write("\n".join(data))
f.close()
Warning: Keep a back up of the input file while trying out these scripts as they overwrite the input file.
If you save the script in a file called test.py then invoke it as:
python test.py 34512340 10.
This should replace the stockvalue of code 34512340 to 10.
Why not using good old regular expressions?
import re
code, new_value = '11813651', '885' # e.g, change 85 to 885 for code 11813651
print (re.sub('(^%s,).*'%code,'\g<1>'+new_value,open('stockcontrol.csv').read()))
Since it's a csv file I'd suggest using Python's csv module. You will need to write to a new file since reading and writing to the same file will turn out bad. You can always rename it afterwards.
This example uses StringIO (Python 2) to embed your csv data in the code and treat it as a file. Normally you would open a file to get the input.
Updated
import csv
# Python 2 and 3
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
CSV = """\
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
"""
def replace(key, value):
fr = StringIO(CSV)
with open('out.csv', 'w') as fw:
r = csv.reader(fr)
w = csv.writer(fw)
for row in r:
if row[0] == key:
row[1] = value
w.writerow(row)
replace('99823658', 42)

Convert file to base64 string on Python 3

I need to convert image (or any file) to base64 string. I use different ways, but result is always byte, not string. Example:
import base64
file = open('test.png', 'rb')
file_content = file.read()
base64_one = base64.encodestring(file_content)
base64_two = base64.b64encode(file_content)
print(type(base64_one))
print(type(base64_two))
Returned
<class 'bytes'>
<class 'bytes'>
How do I get a string, not byte? Python 3.4.2.
Base64 is an ascii encoding so you can just decode with ascii
>>> import base64
>>> example = b'\x01'*10
>>> example
b'\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01'
>>> result = base64.b64encode(example).decode('ascii')
>>> print(repr(result))
'AQEBAQEBAQEBAQ=='
I need to write base64 text in file ...
So then stop worrying about strings and just do that instead.
with open('output.b64', 'wb'):
write(base64_one)
The following code worked for me:
import base64
file_text = open(file, 'rb')
file_read = file_text.read()
file_encode = base64.encodebytes(file_read)
I initially tried base64.encodestring() but that function has been deprecated as per this issue.

Python Print/Write String containing "\f"

I am trying to read a file and replace every "a ... a" by a '\footnotemark'
with open('myfile', 'r') as myfile:
data = myfile.read()
data = re.sub('<a.+?</a>', '\footnotemark', data)
Somehow Python always makes '\footnotemark' to '\x0cootnotemark' ('\f' to '\x0c'). I tried so far
Escaping: '{2 Backslashes}footnotemark'
raw String: r'\footnotemark' or r'"\footnotemark"'
None of these worked
Example input:
fooasdasd bar
Example output:
foo\footnotemark bar
Assuming Python2 since You haven't mentioned anything about version
#/usr/bin/python
import re
# myfile is saved with utf-8 encoding
with open('myfile', 'r') as myfile:
text = myfile.read()
print text
data = re.sub('<a.+?</a>', r'\\footnotemark', text)
print data
outputs
fooasdasd bar
foo\footnotemark bar

How to write a number as text while writing in csv file in python

import csv
a = ['679L', 'Z60', '033U', '0003']
z = csv.writer(open("test1.csv", "wb"))
z.writerow(a)
Consider the code above
Output:
676L Z60 33U 3
I need to get it in the text format itself as
676L Z60 033U 0003
How to do that.
The Python csv module does not treat strings as numbers when writing the file:
>>> import csv
>>> from StringIO import StringIO
>>> a = ['679L', 'Z60', '033U', '0003']
>>> out = StringIO()
>>> z = csv.writer(out)
>>> z.writerow(a)
>>> out.getvalue()
'679L,Z60,033U,0003\r\n'
If you are seeing 3 in some other tool when reading you need to fix that tool; Python is not at fault here.
You can instruct the csv.writer() to put quotes around anything that is not a number; this could make it clearer to whatever reads your CSV that the column is not numeric. Set quoting to csv.QUOTE_NONNUMERIC:
>>> out = StringIO()
>>> z = csv.writer(out, quoting=csv.QUOTE_NONNUMERIC)
>>> z.writerow(a)
>>> out.getvalue()
'"679L","Z60","033U","0003"\r\n'
but this won't prevent Excel from treating the column as numeric anyway.
If you are loading this into Excel then don't use the Open feature. Instead create a new empty worksheet and use the Import feature instead. This will let you designate a column as Text rather than General.

Python JSON preserve encoding

I have a file like this:
aarónico
aaronita
ababol
abacá
abacería
abacero
ábaco
#more words, with no ascii chars
When i read and print that file to the console, it prints exactly the same, as expected, but when i do:
f.write(json.dumps({word: Lookup(line)}))
This is saved instead:
{"aar\u00f3nico": ["Stuff"]}
When i expected:
{"aarónico": ["Stuff"]}
I need to get the same when i jason.loads() it, but i don't know where or how to do the encoding or if it's needed to get it to work.
EDIT
This is the code that saves the data to a file:
with open(LEMARIO_FILE, "r") as flemario:
with open(DATA_FILE, "w") as f:
while True:
word = flemario.readline().strip()
if word == "":
break
print word #this is correct
f.write(json.dumps({word: RAELookup(word)}))
f.write("\n")
And this one loads the data and returns the dictionary object:
with open(DATA_FILE, "r") as f:
while True:
new = f.readline().strip()
if new == "":
break
print json.loads(new) #this is not
I cannot lookup the dictionaries if the keys are not the same as the saved ones.
EDIT 2
>>> import json
>>> f = open("test", "w")
>>> f.write(json.dumps({"héllö": ["stuff"]}))
>>> f.close()
>>> f = open("test", "r")
>>> print json.loads(f.read())
{u'h\xe9ll\xf6': [u'stuff']}
>>> "héllö" in {u'h\xe9ll\xf6': [u'stuff']}
False
This is normal and valid JSON behaviour. The \uxxxx escape is also used by Python, so make sure you don't confuse python literal representations with the contents of the string.
Demo in Python 3.3:
>>> import json
>>> print('aar\u00f3nico')
aarónico
>>> print(json.dumps('aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps('aar\u00f3nico')))
aarónico
In python 2.7:
>>> import json
>>> print u'aar\u00f3nico'
aarónico
>>> print(json.dumps(u'aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps(u'aar\u00f3nico')))
aarónico
When reading and writing from and to files, and when specifying just raw byte strings (and "héllö" is a raw byte string) then you are not dealing with Unicode data. You need to learn about the differences between encoded and Unicode data first. I strongly recommend you read at least 2 of the following 3 articles:
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
You were lucky with your "héllö" python raw byte string representation, Python managed to decode it automatically for you. The value read back from the file is perfectly normal and correct:
>>> print u'h\xe9ll\xf6'
héllö

Categories