Find which django model field contains a bad value - python

I'm new to stackoverflow and to python\django. I have already solved my problem, but I hoped that I can get help about how to solve it faster next time.
I have a very simple python function which copies table records from one db to another (sql server to sqllite). The table has hundreds of columns. When I save the model object to sqllite, django throws the following exception:
'utf8' codec can't decode byte ...
I understand that the data in one of the columns is problematic for utf8 conversion. What I wanted to know is what columns this is. I tried different approaches but eventually I had to write the following code to find the bad column:
build = Builds.objects.using('realdb').get(buildid=12524)
n = Builds()
for field in Builds._meta.fields:
val = getattr(build, field.name);
try:
setattr(n, field.name, val)
n.save(using="default")
except:
return HttpResponse(field.name + ": " + val.__str__())
It basically copies column values one be one to the new model object and stops when it encounters an error. Is there a better way to do this next time? I tried breaking on exception in PyCharm, but it breaks on all the many of exceptions thrown within django framework itself.
Alon.

I don't think there's any way of determining which specific field is causing the problem without resorting to testing each and every field as you're doing here. You can try to repair the problem fields instead of returning the error.
Take a look at this section of the unicode docs. Basically you can coerce the values by replacing the non-unicode portion or removing the non-unicode portion altogether.
Alternatively, if you know what encoding the strings are in, you can decode the string and re-encode appropriately using string.encode and string.decode respectively.

Related

Python error - Unicode/Ascii problems with value pulled out of MySql database

This has been asked a million times but every single thing I try hasn't worked and all are for slightly different issues. I'm losing my mind over it!
I have a Python Script which pulls data from a MySql database - all works well.
Database Information:
I believe the information in the database is correct. I am trying to parse multiple records into word documents - that is why I am not too bothered about accuracy - even if the bad characters are removed - that is fine.
The Charset of the database is UTF-8 and the field I am working with is VarChar
I am using mysql.connector python module to connect
However, I am getting errors and I've realised it's because of values with unicode in, such as this:
The value of this item is "DOMAINoardroom".
I have tried:
text = order[11].encode().decode("utf-8")
text = order[11].encode("ascii", errors="ignore").decode()
text = str(order[11].encode("utf8", errors="ignore"))
The latter does work however it outputs it as b'DOMAIN\x08oardroom' due to it being bytes
I can get it to accept the text by print(text) to the screen. However when I try to output it to a word document (using the docx module), it produces an error:
table = document.add_table(rows=total_orders*2, cols=1)
row = table.rows[0].cells
row[0].text = row_text
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
I am not particularly fussy over how it handles the unicode, e.g. remove it if needed, but I just need it to parse without error.
Any thoughts or advice here?

pickle unicode strings with non-ascii caracters to mysql in django

Consider the I have an dictionary that I want to store in db using python's pickle.
My question is: which django models' field should I use?
So far I've been using a CharField, but there seems to be an error:
I pickle a u'\xe9' (i.e. 'É'), and I get:
Incorrect string value: '\xE1, ist...' for column 'edition' at row 1
(the ,"ist..." was because I have more text after the 'É').
I'm using
data = dict();
data['foo'] = input_that_has_the_caracter
to_save_in_db = cPickle.dumps(data)
Should I use a binary field and pickle with a protocol that uses binary? Because I have to change the db in order to do that, so it is better to be sure first...
You should check if you are using a proper encoding for your table AND column in your database backend (I'm assuming MySQL since your error message seems to be from it). In MySQL columns can have different encoding than the table. See if it's UTF-8.

Django saving form strings in database with extra characters (u'string')

I've been having problems in django while trying to save the form.cleaned_data in a postgres database.
user_instance.first_name = form.cleaned_data['first_name']
the data is being saved this way (u'Firstname',) with the 'u' prefix and parenthesis like if I were saving a tuple in the database.
I've used this tons of times with a mysql database and never happened before,
My django version is 1.3.1
UPDATE
i was using commas this way
user_profile.phone_area = phone_area,
user_profile.phone_number = phone_number,
user_profile.email = email,
I edited someone else's source code and forgot to delete the commas, that's why it was generating tuples. Thank you for your help
Aside from validation, form.clean_data() will perform some implicit conversions to Python data types. You can simply perform an explicit conversion by wrapping the returned value with the str() or the unicode() built-in. Afterwards, format the string using strip("(''),").

Type testing on values selected from a database

I'm working on extracting data from a SQL Server database with a latin_1 character set into a Greenplum/postgres database with a utf-8 character set. I'm trying to convert the string values immediately before insert, but when I do this:
row=[i.decode('latin_1') for i in row]
row=[i.encode('utf-8') for i in row]
I get an error stating that decode is not a member of type int. That makes sense in that there are integer values coming in. But there are also strings. In other posts of this kind I've read, the answer was always immediately and resoundingly 'you should always know what type is coming over'. In many respects I do, since it's a static query, but it seems awfully klunky, and honestly unmaintainable, to define a set of values for i in which I want to do the conversion for each and every query I write. It would seem type testing would be the clean, encapsulable, and reusable answer here, no?
Any suggestions?
I'd use a small function like this:
def convert(s):
try:
return s.decode('latin-1').encode('utf8')
except AttributeError:
return s
and then
row = map(convert, row) # or a compr if you prefer that
The advantage is that it also handles types other than int automatically.
row = [i.decode('latin_1') if not isinstance(i,int) else i for i in row]

Django, database retrieval not working but deleting fields and adding new fields is working

I have been able to get my database queries to work properly for deleting existing entries and also adding new entries to the database but I am completely stumped as to why I am unable to retrieve anything from my database. I am trying a query such as:
from web1.polls.models import Poll
retquery = Poll.objects.all()
print retquery
--prints: "[ ]"
Also, if I try this, it just returns "poll object"
from web1.polls.models import Poll
retquery = Poll.objects.all()[0]
print retquery
--prints: "poll object"
I have looked at everything and there are definitely entries in the database, I have tried this with a number of different models where everything else is working otherwise so I don't know what I can do at this point, any advice is greatly appreciated
if you have provide well __unicode__ method to your model, you won't have such problem...
http://www.djangoproject.com/documentation/models/str/
I finally figured this out, I am leaving this up since this can be really confusing for someone fairly new to Python, such as myself, as Django's docs don't make this entirely clear, I'm used to printing out an object and having it list everything in it but for some reason the type of object that is returned is not in such a format that doing this will work, so you need to take the resulting variable from the query and add the name of the field to the end in order to access it, such as:
If I have a field named "question" that I want to retrieve:
retquery = Poll.objects.all()
print retquery.question
This will work, whereas the way I was doing it before it just printed nothing making me think that the returned object was empty

Categories