I want to assign a string of characters to a variable but it says
: there isn't a "code to show.
I have a string that i want to assign to a variable
d="stunning:/ËstÊnɪÅ/"
Unsupported characters in input
or
word="stuning:/ˈstraɪkɪŋ/"
Unsupported characters in input
so basically the interpreter doesn't allow me to assign it to a variable, so I can't code on it.
How can I extract, delete those characters from the text, or is there anything to do , so python will support this kind of input.
I've tried to converted it into others format like ansi, utf, etc. but without success.
P.S.: I'm using python 2.7
Set the source file encoding accordingly to the actual encoding of the file, so that the interpreter knows how to parse it.
For instance, if you use UTF-8, just add this string to the header of the file:
# -*- coding: utf8 -*-
It must be the first or the second line of the file. See PEP 0263: Defining Python Source Code Encodings.
Just a hint (waiting for the actual code): prepend u to the string to mark it as unicode.
u"/ËstraɪkɪÅ/"
Related
Is there any way to define a string with accented letters in python?
An extreme example is this one:
message = "ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
Error:
SyntaxError: Non-UTF-8 code starting with '\xc2'
When souce code contains something else than ASCII, you have to add a line to tell the python interpreter:
#!/usr/bin/env python
# encoding: utf-8
Read more in PEP-0263 for the exact rules how to include the encoding hint in a magic comment.
If you use Python 3.x you can use accented (Unicode) strings without doing anything special. If you are using Python 2.x, use u prefix to denote Unicode:
message = u"ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
Also remember to include the following line at the top of your script:
# coding=utf-8
PEP-0263 explains this in detail:
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:
# coding=<encoding name>
suffixes = {
1: ["ो", "े", "ू", "ु", "ी", "ि", "ा"]}
When I done
message given by IDLE is
Unsupported characters in input
Also not see the proper font in MS-DOS.
What encoding is your source file in?
If it is UTF8, put the comment
# -*- coding: utf-8 -*-
at the top of the file.
If you don't declare encoding in your first or second line in your python source file, then the python interpreter will use ASCII encoding system to decode the characters in the file. As these characters you used couldn't be decoded by ASCII encoding system, errors happended.
The solution is as #RemcoGerlich said. Here is the doc.
The encoding is used for all lexical analysis, in particular to find the end of a string, and to interpret the contents of Unicode literals. String literals are converted to Unicode for syntactical analysis, then converted back to their original encoding before interpretation starts. The encoding declaration must appear on a line of its own.
This seems to be a known bug in the 2.x IDLE console: http://bugs.python.org/issue15809. A fix was made for Python 3.x, but doesn't appear to be backported.
Instead, use an alternative console, such as iPython/Jupyter, or a fully-fledged IDE, such as PyCharm.
Say I have a function:
def NewFunction():
return '£'
I want to print some stuff with a pound sign in front of it and it prints an error when I try to run this program, this error message is displayed:
SyntaxError: Non-ASCII character '\xa3' in file 'blah' but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
Can anyone inform me how I can include a pound sign in my return function? I'm basically using it in a class and it's within the '__str__' part that the pound sign is included.
I'd recommend reading that PEP the error gives you. The problem is that your code is trying to use the ASCII encoding, but the pound symbol is not an ASCII character. Try using UTF-8 encoding. You can start by putting # -*- coding: utf-8 -*- at the top of your .py file. To get more advanced, you can also define encodings on a string by string basis in your code. However, if you are trying to put the pound sign literal in to your code, you'll need an encoding that supports it for the entire file.
Adding the following two lines at the top of my .py script worked for me (first line was necessary):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
First add the # -*- coding: utf-8 -*- line to the beginning of the file and then use u'foo' for all your non-ASCII unicode data:
def NewFunction():
return u'£'
or use the magic available since Python 2.6 to make it automatic:
from __future__ import unicode_literals
The error message tells you exactly what's wrong. The Python interpreter needs to know the encoding of the non-ASCII character.
If you want to return U+00A3 then you can say
return u'\u00a3'
which represents this character in pure ASCII by way of a Unicode escape sequence. If you want to return a byte string containing the literal byte 0xA3, that's
return b'\xa3'
(where in Python 2 the b is implicit; but explicit is better than implicit).
The linked PEP in the error message instructs you exactly how to tell Python "this file is not pure ASCII; here's the encoding I'm using". If the encoding is UTF-8, that would be
# coding=utf-8
or the Emacs-compatible
# -*- encoding: utf-8 -*-
If you don't know which encoding your editor uses to save this file, examine it with something like a hex editor and some googling. The Stack Overflow character-encoding tag has a tag info page with more information and some troubleshooting tips.
In so many words, outside of the 7-bit ASCII range (0x00-0x7F), Python can't and mustn't guess what string a sequence of bytes represents. https://tripleee.github.io/8bit#a3 shows 21 possible interpretations for the byte 0xA3 and that's only from the legacy 8-bit encodings; but it could also very well be the first byte of a multi-byte encoding. But in fact, I would guess you are actually using Latin-1, so you should have
# coding: latin-1
as the first or second line of your source file. Anyway, without knowledge of which character the byte is supposed to represent, a human would not be able to guess this, either.
A caveat: coding: latin-1 will definitely remove the error message (because there are no byte sequences which are not technically permitted in this encoding), but might produce completely the wrong result when the code is interpreted if the actual encoding is something else. You really have to know the encoding of the file with complete certainty when you declare the encoding.
Adding the following two lines in the script solved the issue for me.
# !/usr/bin/python
# coding=utf-8
Hope it helps !
You're probably trying to run Python 3 file with Python 2 interpreter. Currently (as of 2019), python command defaults to Python 2 when both versions are installed, on Windows and most Linux distributions.
But in case you're indeed working on a Python 2 script, a not yet mentioned on this page solution is to resave the file in UTF-8+BOM encoding, that will add three special bytes to the start of the file, they will explicitly inform the Python interpreter (and your text editor) about the file encoding.
I have a python script in which one I specify an argument :
parser = optparse.OptionParser()
parser.add_option("-D", "--departure", dest="departure",default="", type="string",help="specify departure")
and in my script i have to to a few things with the string entered.
When I type : -D "Düsseldorf"
the string is not recognized properly in the script
somebody told me to do u"Düsseldorf" but I need to stock "Düsseldorf" in a variable
something like variable = u+"Düsseldorf" .... hmm I really don;t know how to do that.
Thank you for your help.
Regards.
PEP-0264 explains you how to use Unicode in python scripts.
Or, for lazy ones, start your script with:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
print u"Düsseldorf"
And do not forget to solve it as UTF-8 without BOM.
Not only do you need to specify a character encoding for your Python source that can represent the ü character:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
But you also need to keep in mind that command line arguments (in Unix at least, I can't speak for Windows) are bytes. So you should specify the option as a byte-string not a character (Unicode) string.
For example:
parser.add_option("-D", "--departure", dest="departure",
default=u"Düsseldorf".encode('UTF-8'),
type="string",help="specify departure")
Now the default argument is a byte-string, just like all the other arguments you have passed to the add_option method.
Additionally you must ensure that if someone enters this string into their terminal, they do so with a terminal character encoding of UTF-8. If they use a different terminal character encoding, a different byte-string will show up in the command line. This is simply how Unix works, and Python has no power to change it.
I have a python file that contains a long string of HTML. When I compile & run this file/script I get this error:
_SyntaxError: Non-ASCII character '\x92' in file C:\Users...\GlobalVars.py on line 2509, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details_
I have followed the instructions and gone to the url suggested. But putting something like this at the top of my script still doesn't work:
#!/usr/bin/python
# -*- coding: latin-1 -*-
What do you think I can do to stop this compiler error from occuring?
First, in order to prevent problems like the one specified in the question you should not ever use other encoding than utf-8 for python source code.
This is the correct header to use
#! /usr/bin/env python
# -*- coding: utf-8 -*-
Now you have to convert the file from whatever encoding you may have to utf-8, probably your current text editor is able to do that.
If you wonder why I say this remember that it is impossible for a text editor to safely guess your non-unicode encoding because there is no BOM for non-unicode. For this reason most decent editors are using UTF-8 as default even when encoding is not specified. And BTW, the encoding specified in the python file header is for Python only, most editors ignore what you wrote there.
Also, as you can see Python is trying to decode a character above 128 using ASCII (not latin-1), this is supposed to fail. I am not sure why this happens but I don't even care too much because there is a much better way to solve the problem.
It must be at the top of the script that has the non-ASCII text, and it must match the actual encoding of the file. \x92 is CP1252, not Latin-1.
If you are just concerned about getting rid of this error without getting into the details of it(which you can get from the other answers on this page), you can do the following -
1) Copy your code and paste it in Notepad++
2) Select Encoding -> Encode in UTF-8
3) Select View -> Show Symbol -> Show All Characters
Now it would be visible to you that which symbol is causing the issue(x92 would be visible). Replace/Remove it to solve the problem.
Found this and hope it's helpful to the next person:
http://www.sitepoint.com/forums/showthread.php?567734-Anyone-know-what-this-error-means
Code point 0x92 (146 decimal) is the right single quotation mark, or
apostrophe (’) in Windows-1252. It's an invalid character in ISO 8859
and in UTF-8, since the 0x80-0x9F range is reserved for C1 control
characters.
Not sure if I'm busting copyright. If so please remove the blockquote.
The encoding declaration indicates that you think the file is in latin-1 encoding, but the python interpreter is finding that a char at or very near line 2509 in GlobalVars.py that is not what you think it is.
You should first confirm the encoding of GlobalVars.py. Is it really latin-1?
Next, you should check the characters near line 2509. Are they also latin-1, or were they cut and pasted from a web page or somewhere else (maybe there are UTF-8 chars mixed up in there)?
If you have chars in your source file that aren't what you think they are, then you may need to clean up the file before going any further.
add these lines on top of your code
#! /usr/bin/env python
# -*- coding: utf-8 -*-
An easy workaround solution if your file is really in latin-1 is to change the html string with its representation.
Afaik:
\x92 => 146 in decimal => Æ => Æ
If your character is not Æ, then your file is not encoded into latin-1 ;-) (and you might wanna check if utf-8/cp1292 works better as a quick win)
EDIT:
Of course, you want to check your ACTUAL file encoding before trying. I might be wrong, not 100% sure \x92 is Æ in Iso8859-1 : according to this page, it doesn't seem defined.