How to store output of os.urandom(8) in CouchDB? - python

I am trying to store some cryptographic data in couchdb. I need to store a salt and encrypted password in couchdb. The salt is generated using python's os.urandom(8) and the sample output of that would look like:
'z/\xfe\xdf\xdeJ=y'
I'm using python-couchdb api to store the document. When I try to save the document I get:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "build/bdist.macosx-10.7-intel/egg/couchdb/client.py", line 343, in __setitem__
status, headers, data = resource.put_json(body=content)
File "build/bdist.macosx-10.7-intel/egg/couchdb/http.py", line 499, in put_json
**params)
File "build/bdist.macosx-10.7-intel/egg/couchdb/http.py", line 514, in _request_json
headers=headers, **params)
File "build/bdist.macosx-10.7-intel/egg/couchdb/http.py", line 510, in _request
credentials=self.credentials)
File "build/bdist.macosx-10.7-intel/egg/couchdb/http.py", line 260, in request
body = json.encode(body).encode('utf-8')
File "build/bdist.macosx-10.7-intel/egg/couchdb/json.py", line 68, in encode
return _encode(obj)
File "build/bdist.macosx-10.7-intel/egg/couchdb/json.py", line 129, in <lambda>
dumps(obj, allow_nan=False, ensure_ascii=False)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 204, in encode
return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 3: ordinal not in range(128)

Encode it as either base64 or as hex before saving, or save it in a binary field.

Encode the output of urandom in base 64 like this:
os.urandom(8).encode('base64')
As per the example in this thread

Related

exception reading in large tab separated file chunked

I have a 350MB tab separated text file. If I try to read it into memory I get an out of memory exception. So I am trying something along those lines (i.e. only read in a few columns):
import pandas as pd
input_file_and_path = r'C:\Christian\ModellingData\X.txt'
column_names = [
'X1'
# , 'X2
]
raw_data = pd.DataFrame()
for chunk in pd.read_csv(input_file_and_path, names=column_names, chunksize=1000, sep='\t'):
raw_data = pd.concat([raw_data, chunk], ignore_index=True)
print(raw_data.head())
Unfortunately, I get this:
Traceback (most recent call last):
File "pandas\_libs\parsers.pyx", line 1134, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1240, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1256, in pandas._libs.parsers.TextReader._string_convert
File "pandas\_libs\parsers.pyx", line 1494, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 5: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/xxxx/EdaDataPrepRange1.py", line 17, in <module>
for chunk in pd.read_csv(input_file_and_path, header=None, names=column_names, chunksize=1000, sep='\t'):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1007, in __next__
return self.get_chunk()
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1070, in get_chunk
return self.read(nrows=size)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 903, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1094, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1141, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1240, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1256, in pandas._libs.parsers.TextReader._string_convert
File "pandas\_libs\parsers.pyx", line 1494, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 5: invalid start byte
Any ideas. Btw how can I generally deal with large files and impute for example missing variables? Ultimately, I need to read in everything to determine, for example, the median to be imputed.
use encoding="utf-8" while using pd.read_csv
Here they have used this encoding. see if this works. open(file path, encoding='windows-1252'):
Reference: 'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
Working Solution
to use encoding encoding="ISO-8859-1"
Regarding your large file problem, just use a file handler and context manager:
with open("your_file.txt") as fileObject:
for line in fileObject:
do_something_with(line)
## No need to close file as 'with' automatically does that
This won't load the whole file into memory. Instead, it'll load a line at a time, and will 'forget' previous lines unless you store a reference.
Also, regarding your encoding problem, just use encoding="utf-8" while using pd.read_csv.

Python UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 12: ordinal not in range(128)

I am trying to return a file in a StreamingHttpResponse from a class based view using Django rest framework. However I get a very cryptic error message with a stack trace that does not contain any references to my code:
16/Jun/2017 11:08:48] "GET /api/v1/models/49 HTTP/1.1" 200 0
Traceback (most recent call last):
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 138, in run
self.finish_response()
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 179, in finish_response
for data in self.result:
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/util.py", line 30, in __next__
data = self.filelike.read(self.blksize)
File "/Users/jonathan/anaconda/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 12: ordinal not in range(128)
[...]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 141, in run
self.handle_error()
File "/Users/jonathan/anaconda/lib/python3.6/site-packages/django/core/servers/basehttp.py", line 88, in handle_error
super(ServerHandler, self).handle_error()
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 368, in handle_error
self.finish_response()
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 180, in finish_response
self.write(data)
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 274, in write
self.send_headers()
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 331, in send_headers
if not self.origin_server or self.client_is_modern():
File "/Users/jonathan/anaconda/lib/python3.6/wsgiref/handlers.py", line 344, in client_is_modern
return self.environ['SERVER_PROTOCOL'].upper() != 'HTTP/0.9'
TypeError: 'NoneType' object is not subscriptable
My get method looks like this:
def get(self, request, pk, format=None):
"""
Get model by primary key (pk)
"""
try:
model = QSARModel.objects.get(pk=pk)
except Exception:
raise Http404
filename = model.pluginFileName
chunk_size = 8192
response = StreamingHttpResponse(
FileWrapper( open(filename), chunk_size ),
content_type = 'application/zip' )
return response
From googling a bit I get the feeling that this is related to ASCII / UTF8 encoding but I don't understand how that applies to my situation. I am dealing with a binary file. In fact it is a Java jar file but that should be pretty much a zip file as far as I understand. What is going on here and what am I doing wrong?
this is related to language translation. when non ascii filenames with the django storage system. So add following lines in your apache envvars
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
https://code.djangoproject.com/wiki/django_apache_and_mod_wsgi#AdditionalTweaking

UnicodeDecodeError after restart on server

I've implemented a product called Odoo/OpenERP and recently performed a restart after changing some strings in multiple files.
These strings contained standard characters with no accented characters. After the restart, any access to the site results in the below:
2015-06-24 08:09:35,884 1584 ERROR XXXXXX-Odoo-Production werkzeug: Error on request:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/werkzeug/serving.py", line 177, in run_wsgi
execute(self.server.app)
File "/usr/lib/python2.7/dist-packages/werkzeug/serving.py", line 165, in execute
application_iter = app(environ, start_response)
File "/opt/odoo-production/openerp/service/server.py", line 290, in app
return self.app(e, s)
File "/opt/odoo-production/openerp/service/wsgi_server.py", line 216, in application
return application_unproxied(environ, start_response)
File "/opt/odoo-production/openerp/service/wsgi_server.py", line 202, in application_unproxied
result = handler(environ, start_response)
File "/opt/odoo-production/openerp/http.py", line 1290, in __call__
return self.dispatch(environ, start_response)
File "/opt/odoo-production/openerp/http.py", line 1264, in __call__
return self.app(environ, start_wrapped)
File "/usr/lib/python2.7/dist-packages/werkzeug/wsgi.py", line 579, in __call__
return self.app(environ, start_response)
File "/opt/odoo-production/openerp/http.py", line 1426, in dispatch
ir_http = request.registry['ir.http']
File "/opt/odoo-production/openerp/http.py", line 346, in registry
return openerp.modules.registry.RegistryManager.get(self.db) if self.db else None
File "/opt/odoo-production/openerp/modules/registry.py", line 339, in get
update_module)
File "/opt/odoo-production/openerp/modules/registry.py", line 356, in new
registry = Registry(db_name)
File "/opt/odoo-production/openerp/modules/registry.py", line 60, in __init__
self._db = openerp.sql_db.db_connect(db_name)
File "/opt/odoo-production/openerp/sql_db.py", line 623, in db_connect
db, uri = dsn(to)
File "/opt/odoo-production/openerp/sql_db.py", line 614, in dsn
return db_or_uri, '%sdbname=%s' % (_dsn, db_or_uri)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 28: ordinal not in range(128)
I've checked the files I've made changes to, but these seem fine.
Does anyone know where I can start to debug this issue? I'm relatively new to Python and debugging/understanding this trace is something I'm not familiar with.
So you have to use unicode strings because when your string is processed, somewhere it needs to use it as unicode, and it fails because the string type is str. You can make it into unicode by
mystring = unicode(mystring, 'utf-8')
This will change the type of your string from str to unicode

ElementTree Unicode Encode/Decode Error

For a project I'm supposed to enhance some XML and store it in a file. The problem I encountered is that I keep getting the following error:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Bart\Dropbox\Studie\2013-2014\BSc-KI\cite_parser\parser.py", line 193, in parse_references
outputXML = ET.tostring(root, encoding='utf8', method='xml')
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1126, in tostring
ElementTree(element).write(file, encoding, method=method)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
ECLI:NL:RVS:2012:BY1564
File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: ordinal not in range(128)
That error was generated by:
outputXML = ET.tostring(root, encoding='utf8', method='xml')
When looking for a solution to this problem i found several suggestions saying I should add .decode('utf-8') to the function but that results in an Encoding error (first it was decoding) from the write function so that doesn't work...
The encoding error:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Bart\Dropbox\Studie\2013-2014\BSc-KI\cite_parser\parser.py", line 197, in parse_references
myfile.write(outputXML)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 13559: ordinal not in range(128)
It is generated by the following code:
outputXML = ET.tostring(root, encoding='utf8', method='xml').decode('utf-8')
Source (or at least the relevant parts):
# URL encodes the parameters
encoded_parameters = urllib.urlencode({'id':ecli})
# Opens XML file
feed = urllib2.urlopen("http://data.rechtspraak.nl/uitspraken/content?"+encoded_parameters, timeout = 3)
# Parses the XML
ecliFile = ET.parse(feed)
# Fetches root element of current tree
root = ecliFile.getroot()
# Write the XML to a file without any extra indents or newlines
outputXML = ET.tostring(root, encoding='utf8', method='xml')
# Write the XML to the file
with open(file, "w") as myfile:
myfile.write(outputXML)
And last but not least an URL to an XML sample: http://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:RVS:2012:BY1542
The exception is caused by a byte string value.
text in the traceback is supposed to be a unicode value, but if it is a plain byte string, Python will implicitly first decode it (with the ASCII codec) to Unicode just so you can then encode it again.
It is that decoding that fails.
Because you didn't actually show us what you insert into the XML tree, it is hard to tell you what to fix, other than to make sure you always use Unicode values when inserting text.
Demo:
>>> root.attrib['oops'] = u'Data with non-ASCII codepoints \u2014 (em dash)'.encode('utf8')
>>> ET.tostring(root, encoding='utf8', method='xml')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1126, in tostring
ElementTree(element).write(file, encoding, method=method)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
v = _escape_attrib(v, encoding)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/xml/etree/ElementTree.py", line 1090, in _escape_attrib
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 31: ordinal not in range(128)
>>> root.attrib['oops'] = u'Data with non-ASCII codepoints \u2014 (em dash)'
>>> ET.tostring(root, encoding='utf8', method='xml')
'<?xml version=\'1.0\' encoding=\'utf8\'?> ...'
Setting a bytestring attribute, containing bytes outside the ASCII range, triggers the excetpion; using a unicode value instead ensured the result could be produced.

Interestinting situation. DataBase Error? Python. Django

In test server it is working. But production gives this traceback:
what different? And what does that error mean?
Different is only in python versions. In test server it is 2.6.5 and production 2.5.2. How can I get rid of this error with out changing version?
True
2008-10-16 15:20:00
- did not match our database
Traceback (most recent call last):
File "./mr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/base.py", line 195, in run_from_argv
self.execute(*args, **options.__dict__)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/core/management/base.py", line 222, in execute
output = self.handle(*args, **options)
File "/usr/local/cluster/dynamic/website/video/remmedia/management/commands/remmedia.py", line 50, in handle
self.FirstTimeLoad()
File "/usr/local/cluster/dynamic/website/video/remmedia/management/commands/remmedia.py", line 117, in FirstTimeLoad
med,created=RemMedia.objects.get_or_create(index=program.Id+50000000, defaults=fields)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/manager.py", line 123, in get_or_create
return self.get_query_set().get_or_create(**kwargs)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/query.py", line 335, in get_or_create
obj.save(force_insert=True)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/base.py", line 410, in save
self.save_base(force_insert=force_insert, force_update=force_update)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/base.py", line 495, in save_base
result = manager._insert(values, return_id=update_pk)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/manager.py", line 177, in _insert
return insert_query(self.model, values, **kwargs)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/query.py", line 1087, in insert_query
return query.execute_sql(return_id)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/sql/subqueries.py", line 320, in execute_sql
cursor = super(InsertQuery, self).execute_sql(None)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/models/sql/query.py", line 2369, in execute_sql
cursor.execute(sql, params)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/django/db/backends/mysql/base.py", line 84, in execute
return self.cursor.execute(query, args)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/MySQLdb/cursors.py", line 158, in execute
query = query % db.literal(args)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/MySQLdb/connections.py", line 265, in literal
return self.escape(o, self.encoders)
File "/usr/local/cluster/dynamic/virtualenv/lib/python2.5/site-packages/MySQLdb/connections.py", line 198, in string_literal
return db.string_literal(obj)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)
After some time thinking and testing. I notice that error shows only when i do this:
med,created=RemMedia.objects.get_or_create(index=program.Id+50000000, defaults=fields)
After some more time of thinking end testing, i found the error. It is in dictionary fields. There is one variable which i get from BeautifulSoup the code looks like:
soup=BeautifulSoup(program.Description.encode('utf-8'))
name=soup.find('div',{'class':'head'})
fields=dict(
name=name.string,
description=program.Description.encode('utf-8'),
program_name=program.Name.encode('utf-8'),
program_date_time=program.RealDateTime,
topic_data_time=program.RealDateTime,
topic_tag='',
created=program.Updated,
media=media
)
The problem is with variable name in fields dictionary.
Question remain : How to convert it so it will not gives an error?
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)
Do you have a character with an ascii value > 128? Probably an accented letter.
http://www.asciitable.com/
If the problem lies within the database, it's probably a setting. If it's within the python code, you may try placing this line at the top of your python file that deals with the string.
# coding:utf-8

Categories