pygtk spinbutton "greek" floating point - python

I'm trying to use the data collected by a form I to a sqlite query. In this form I've made a spin button which gets any numeric input (ie. either2,34 or 2.34) and sends it in the form of 2,34 which python sees as str.
I've already tried to float() the value but it doesn't work. It seems to be a locale problem but somehow locale.setlocale(locale.LC_ALL, '') is unsupported (says WinXP).
All these happen even though I haven't set anything to greek (language, locale, etc) but somehow Windows does its magic.
Can someone help?
PS: Of course my script starts with # -*- coding: utf-8 -*- so as to have anything in greek (even comments) in the code.

AFAIK, WinXP supports setlocale just fine.
If you want to do locale-aware conversions, try using locale.atof('2,34') instead of float('2,34').

Related

python 3 can't recognize this character

I'm using python3 to read through a string and extracting certain elements into a list, using the following on the top of my script:
# -*- coding: utf-8 -*-
import ast
import re
Works with all except for one character: 󀕅 which in unicode is: U+C0545 and on the command line looks like:
I would just like to skip this character, but the script can't recognize it. Is there any way skip this character?
I don't know whether this should be taken as an authoritative source, but http://www.fileformat.info/info/unicode/char/c0545/index.htm indicates that this is not a valid unicode character. Some systems may choose to represent it using some placeholder glyph, others may raise an error or behave in other strange ways.
In your python code, your best bet may be to handle the exception and do what is appropriate in the context.
Without seeing the actual source where the exception happens and the actual exception text, it's hard to guess what's really wrong.

django countries encoding is not giving correct name

I am using django_countries module for countries list, the problem is there are couple of countries with special characters like 'Åland Islands' and 'Saint Barthélemy'.
I am calling this method to get the country name:
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name
I know that country_label is lazy translated proxy object of django utils, but it is not giving the right name rather it gives 'Ã…land Islands'. any suggestions for this please?
Django stores unicode string using code points and identifies the string as unicode for further processing.
UTF-8 uses four 8-bit bytes encoding, so the unicode string that's being used by Django needs to be decoded or interpreted from code point notation to its UTF-8 notation at some point.
In the case of Åland Islands, what seems to be happening is that it's taking the UTF-8 byte encoding and interpret it as code points to convert the string.
The string django_countries returns is most likely u'\xc5land Islands' where \xc5 is the UTF code point notation of Å. In UTF-8 byte notation \xc5 becomes \xc3\x85 where each number \xc3 and \x85 is a 8-bit byte. See:
http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc5&mode=hex
Or you can use country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('utf-8') to go from u'\xc5land Islands' to '\xc3\x85land Islands'
If you take then each byte and use them as code points, you'll see it'll give you these characters: Ã…
See: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=xc3&mode=hex
And: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=x85&mode=hex
See code snippet with html notation of these characters.
<div id="test">ÅÅ</div>
So I'm guessing you have 2 different encodings in you application. One way to get from u'\xc5land Islands' to u'\xc3\x85land Islands' would be to in an utf-8 environment encode to UTF-8 which would convert u'\xc5' to '\xc3\x85' and then decode to unicode from iso-8859 which would give u'\xc3\x85land Islands'. But since it's not in the code you're providing, I'm guessing it's happening somewhere between the moment you set country_label and the moment your output isn't displayed properly. Either automatically because of encodings settings, or through an explicit assignation somewhere.
FIRST EDIT:
To set encoding for you app, add # -*- coding: utf-8 -*- at the top of your py file and <meta charset="UTF-8"> in of your template.
And to get unicode string from a django.utils.functional.proxy object you can call unicode(). Like this:
country_label = unicode(fields.Country(form.cleaned_data.get('country')[0:2]).name)
SECOND EDIT:
One other way to figure out where the problem is would be to use force_bytes (https://docs.djangoproject.com/en/1.8/ref/utils/#module-django.utils.encoding) Like this:
from django.utils.encoding import force_bytes
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name
forced_country_label = force_bytes(country_label, encoding='utf-8', strings_only=False, errors='strict')
But since you already tried many conversions without success, maybe the problem is more complex. Can you share your version of django_countries, Python and your django app language settings?
What you can do also is go see directly in your djano_countries package (that should be in your python directory), find the file data.py and open it to see what it looks like. Maybe the data itself is corrupted.
try:
from __future__ import unicode_literals #Place as first import.
AND / OR
country_label = fields.Country(form.cleaned_data.get('country')[0:2]).name.encode('latin1').decode('utf8')
Just this this week I encountered a similar encoding error. I believe the problem is because the machine encoding is differ with the one on Python. Try to add this to your .bashrc or .zshrc.
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
Then, open up a new terminal and run the Django app again.

Python and UTF-8: kind of confusing

I am on google app engine with Python 2.5. My application have to deal with multilanguages so I have to deal with utf-8.
I have done lots of google but dont get what I want.
1.Whats the usage of # -*- coding: utf-8 -*- ?
2.What is the difference between
s=u'Witaj świecie'
s='Witaj świecie'
'Witaj świecie' is a utf-8 string.
3.When I save the .py file to 'utf-8', do I still need the u before every string?
u'blah' turns it into a different kind of string (type unicode rather than type str) - it makes it a sequence of unicode codepoints. Without it, it is a sequence of bytes. Only bytes can be written to disk or to a network stream, but you generally want to work in Unicode (although Python, and some libraries, will do some of the conversion for you) - the encoding (utf-8) is the translation between these. So, yes, you should use the u in front of all your literals, it will make your life much easier. See Programatic Unicode for a better explanation.
The coding line tells Python what encoding your file is in, so that Python can understand it. Again, reading from disk gives bytes - but Python wants to see the characters. In Py2, the default encoding for code is ASCII, so the coding line lets you put things like ś directly in your .py file in the first place - other than that, it doesn't change how your code works.

SyntaxError: Non-ASCII character '\xa3' in file when function returns '£'

Say I have a function:
def NewFunction():
return '£'
I want to print some stuff with a pound sign in front of it and it prints an error when I try to run this program, this error message is displayed:
SyntaxError: Non-ASCII character '\xa3' in file 'blah' but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
Can anyone inform me how I can include a pound sign in my return function? I'm basically using it in a class and it's within the '__str__' part that the pound sign is included.
I'd recommend reading that PEP the error gives you. The problem is that your code is trying to use the ASCII encoding, but the pound symbol is not an ASCII character. Try using UTF-8 encoding. You can start by putting # -*- coding: utf-8 -*- at the top of your .py file. To get more advanced, you can also define encodings on a string by string basis in your code. However, if you are trying to put the pound sign literal in to your code, you'll need an encoding that supports it for the entire file.
Adding the following two lines at the top of my .py script worked for me (first line was necessary):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
First add the # -*- coding: utf-8 -*- line to the beginning of the file and then use u'foo' for all your non-ASCII unicode data:
def NewFunction():
return u'£'
or use the magic available since Python 2.6 to make it automatic:
from __future__ import unicode_literals
The error message tells you exactly what's wrong. The Python interpreter needs to know the encoding of the non-ASCII character.
If you want to return U+00A3 then you can say
return u'\u00a3'
which represents this character in pure ASCII by way of a Unicode escape sequence. If you want to return a byte string containing the literal byte 0xA3, that's
return b'\xa3'
(where in Python 2 the b is implicit; but explicit is better than implicit).
The linked PEP in the error message instructs you exactly how to tell Python "this file is not pure ASCII; here's the encoding I'm using". If the encoding is UTF-8, that would be
# coding=utf-8
or the Emacs-compatible
# -*- encoding: utf-8 -*-
If you don't know which encoding your editor uses to save this file, examine it with something like a hex editor and some googling. The Stack Overflow character-encoding tag has a tag info page with more information and some troubleshooting tips.
In so many words, outside of the 7-bit ASCII range (0x00-0x7F), Python can't and mustn't guess what string a sequence of bytes represents. https://tripleee.github.io/8bit#a3 shows 21 possible interpretations for the byte 0xA3 and that's only from the legacy 8-bit encodings; but it could also very well be the first byte of a multi-byte encoding. But in fact, I would guess you are actually using Latin-1, so you should have
# coding: latin-1
as the first or second line of your source file. Anyway, without knowledge of which character the byte is supposed to represent, a human would not be able to guess this, either.
A caveat: coding: latin-1 will definitely remove the error message (because there are no byte sequences which are not technically permitted in this encoding), but might produce completely the wrong result when the code is interpreted if the actual encoding is something else. You really have to know the encoding of the file with complete certainty when you declare the encoding.
Adding the following two lines in the script solved the issue for me.
# !/usr/bin/python
# coding=utf-8
Hope it helps !
You're probably trying to run Python 3 file with Python 2 interpreter. Currently (as of 2019), python command defaults to Python 2 when both versions are installed, on Windows and most Linux distributions.
But in case you're indeed working on a Python 2 script, a not yet mentioned on this page solution is to resave the file in UTF-8+BOM encoding, that will add three special bytes to the start of the file, they will explicitly inform the Python interpreter (and your text editor) about the file encoding.

If a command line program is unsure of stdout's encoding, what encoding should it output?

I have a command line program written in Python, and when I pipe it through another program on the command line, sys.stdout.encoding is None. This makes sense, I suppose -- the output could be another program, or a file you're redirecting it into, or whatever, and it doesn't know what encoding is desired. But neither do I! This program will be used by many different people (humor me) in different ways. Should I play it safe and output only ascii (replacing non-ascii chars with question marks)? Or should I output UTF-8, since it's so widespread these days?
I suggest you use the current locale.
Python2> import locale
Python2> locale.getpreferredencoding()
'UTF-8'
The system knows what it should be, and the other side, if it also uses the current locale, will do the right thing.
You should use the value returned by locale.getpreferredencoding().
if your application doesn't really deal with a whole lot of internationalisation, ascii should suffice. but if not, i'd say utf-8 or better still utf-16 should be the order of the day.
You should output UTF-8 because thats what everyone should be using. It's a bug not to be. ;)

Categories