os.system('sh ~/scripts/bingo.sh')
this gives an error of Non ascii characters.
import subprocess
subprocess.call(["sh","/full/path/bingo.sh"])
"Python 2 uses ascii as the default encoding for source files, which means you must specify another encoding at the top of the file to use non-ascii unicode characters in literals. Python 3 uses utf-8 as the default encoding for source files, so this is less of an issue"
I was able to find this by googling "Python error 'Non ascii characters'"
From my understanding this is a good answer
How to make the python interpreter correctly handle non-ASCII characters in string operations?
hope this helps
(I am new to coding and was looking into this to help my own understanding of how to avoid this problem for myself. Take what I have to say with a grain of salt.)
Is there any way to define a string with accented letters in python?
An extreme example is this one:
message = "ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
Error:
SyntaxError: Non-UTF-8 code starting with '\xc2'
When souce code contains something else than ASCII, you have to add a line to tell the python interpreter:
#!/usr/bin/env python
# encoding: utf-8
Read more in PEP-0263 for the exact rules how to include the encoding hint in a magic comment.
If you use Python 3.x you can use accented (Unicode) strings without doing anything special. If you are using Python 2.x, use u prefix to denote Unicode:
message = u"ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
Also remember to include the following line at the top of your script:
# coding=utf-8
PEP-0263 explains this in detail:
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:
# coding=<encoding name>
I'm generating file names from a list pulled out from a postgres DB with Python 2.7.9. In this list there are words with special char. Normally I use ''.join() to record the name and fire it to my loader but I have just one name that want be recognized. the .py is set for utf-8 coding, but the words are in Portuguese, I think latin-1 coding.
from pydub import AudioSegment
from pydub.playback import play
templist = ['+ Orégano','- Búfala','+ Rúcola']
count_ins = (len(templist)-1)
while (count_ins >= 0 ):
kot_istructions = AudioSegment.from_ogg('/home/effe/voice_orders/Voz/'+"".join(templist[count_ins])+'.ogg')
count_ins-=1
play(kot_istructions)
The first two files are loaded:
/home/effe/voice_orders/Voz/+ Orégano.ogg
/home/effe/voice_orders/Voz/- Búfala.ogg
The third should be:
/home/effe/voice_orders/Voz/+ Rúcola.ogg
But python is trying to load
/home/effe/voice_orders/Voz/+ R\xc3\xbacola.ogg
Why just this one? I've tried to use normalize() to remove the accent but since this is a string the method didn't work.
Print works well, as db update. Just file name creation doesn't works as expected.
Suggestions?
It seems the root cause might be that the encoding of these names in inconsisitent within your database.
If you run:
>>> 'R\xc3\xbacola'.decode('utf-8')
You get
u'R\xfacola'
which is in fact a Python unicode, correctly representing the name. So, what should you do? Although it's a really unclean programming style, you could play .encode()/.decode() whackamole, where you try to decode the raw string from your db using utf-8, and failing that, latin-1. It would look something like this:
try:
clean_unicode = dirty_string.decode('utf-8')
except UnicodeDecodeError:
clean_unicode = dirty_string.decode('latin-1')
As a general rule, always work with clean unicode objects within your own source, and only convert to an encoding on saving it out. Also, don't let people insert data into a database without specifying the encoding, as that will stop you from having this problem in the first place.
Hope that helps!
Solved: Was a problem with the file. Deleting and build it again do the job.
Recently I have referenced a file in my desktop using python 3.4 64bit GUI application. The problem I have got is as follows:
The code I tried is :
fo=open("c:\users\Ismail Nuru\Sesktop\myfile\lab.txt","r+")
string1=fo.read()
print(string1)
fo.close()
Python is trying to use the \uXXXX part of your string as a unicode escape sequence.
To fix this, you have 3 options here:
Use a raw string r'C:\users\Ismail Nuru\Sesktop\myfile\lab.txt'
Double the backslashes 'C:\\users\\Ismail Nuru\\Sesktop\\myfile\\lab.txt'
Or use forward slashes 'C:/users/Ismail Nuru/Sesktop/myfile/lab.txt')
I know similar questions have been asked a million times, but despite reading through many of them I can't find a solution that applies to my situation.
I have a django application, in which I've created a management script. This script reads some text files, and outputs them to the terminal (it will do more useful stuff with the contents later, but I'm still testing it out) and the characters come out with escape sequences like \xc3\xa5 instead of the intended å. Since that escape sequence means Ã¥, which is a common misinterpretation of å because of encoding problems, I suspect there are at least two places where this is going wrong. However, I can't figure out where - I've checked all the possible culprits I can think of:
The terminal encoding is UTF-8; echo $LANG gives en_US.UTF-8
The text files are encoded in UTF-8; file * in the directory where they reside results in all entries being listed as "UTF-8 Unicode text" except one, which does not contain any non-ASCII characters and is listed as "ASCII text". Running iconv -f ascii -t utf8 thefile.txt > utf8.txt on that file yields another file with ASCII text encoding.
The Python scripts are all UTF-8 (or, in several cases, ASCII with no non-ASCII characters). I tried inserting a comment in my management script with some special characters to force it to save as UTF-8, but it did not change the behavior. The above observations on the text files apply on all Python script files as well.
The Python script that handles the text files has # -*- encoding: utf-8 -*- at the top; the only line preceding that is #!/usr/bin/python3, but I've tried both changing to .../python for Python 2.7 or removing it entirely to leave it up to Django, without results.
According to the documentation, "Django natively supports Unicode data", so I "can safely pass around Unicode strings" anywhere in the application.
I really can't think of anywhere else to look for a non-UTF-8 link in the chain. Where could I possibly have missed a setting to change to UTF-8?
For completeness: I'm reading from the files with lines = file.readlines() and printing with the standard print() function. No manual encoding or decoding happens at either end.
UPDATE:
In response to quiestions in comments:
print(sys.getdefaultencoding(), sys.stdout.encoding, f.encoding) yields ('ascii', 'UTF-8', None) for all files.
I started compiling an SSCCE, and quickly found that the problem is only there if I try to print the value in a tuple. In other words, print(lines[0].strip()) works fine, but print(lines[0].strip(), lines[1].strip()) does not. Adding .decode('utf-8') yields a tuple where both strings are marked with a prepending u and \xe5 (the correct escape sequence for å) instead of the odd characters before - but I can't figure out how to print them as regular strings, with no escape characters. I've tested another call to .decode('utf-8') as well as wrapping in str() but both fail with UnicodeEncodeError complaining that \xe5 can't be encoded in ascii. Since a single string works correctly, I don't know what else to test.
SSCCE:
# -*- coding: utf-8 -*-
import os, sys
for root,dirs,files in os.walk('txt-songs'):
for filename in files:
with open(os.path.join(root,filename)) as f:
print(sys.getdefaultencoding(), sys.stdout.encoding, f.encoding)
lines = f.readlines()
print(lines[0].strip()) # works
print(lines[0].strip(), lines[1].strip()) # does not work
The big problem here is that you're mixing up Python 2 and Python 3. In particular, you've written Python 3 code, and you're trying to run it in Python 2.7. But there are a few other problems along the way. So, let me try to explain everything that's going wrong.
I started compiling an SSCCE, and quickly found that the problem is only there if I try to print the value in a tuple. In other words, print(lines[0].strip()) works fine, but print(lines[0].strip(), lines[1].strip()) does not.
The first problem here is that the str of a tuple (or any other collection) includes the repr, not the str, of its elements. The simple way to solve this problem is to not print collections. In this case, there is really no reason to print a tuple at all; the only reason you have one is that you've built it for printing. Just do something like this:
print '({}, {})'.format(lines[0].strip(), lines[1].strip())
In cases where you already have a collection in a variable, and you want to print out the str of each element, you have to do that explicitly. You can print the repr of the str of each with this:
print tuple(map(str, my_tuple))
… or print the str of each directly with this:
print '({})'.format(', '.join(map(str, my_tuple)))
Notice that I'm using Python 2 syntax above. That's because if you actually used Python 3, there would be no tuple in the first place, and there would also be no need to call str.
You've got a Unicode string. In Python 3, unicode and str are the same type. But in Python 2, it's bytes and str that are the same type, and unicode is a different one. So, in 2.x, you don't have a str yet, which is why you need to call str.
And Python 2 is also why print(lines[0].strip(), lines[1].strip()) prints a tuple. In Python 3, that's a call to the print function with two strings as arguments, so it will print out two strings separated by a space. In Python 2, it's a print statement with one argument, which is a tuple.
If you want to write code that works the same in both 2.x and 3.x, you either need to avoid ever printing more than one argument, or use a wrapper like six.print_, or do a from __future__ import print_function, or be very careful to do ugly things like adding in extra parentheses to make sure your tuples are tuples in both versions.
So, in 3.x, you've got str objects and you just print them out. In 2.x, you've got unicode objects, and you're printing out their repr. You can change that to print out their str, or to avoid printing a tuple in the first place… but that still won't help anything.
Why? Well, printing anything, in either version, just calls str on it and then passes it to sys.stdio.write. But in 3.x, str means unicode, and sys.stdio is a TextIOWrapper; in 2.x, str means bytes, and sys.stdio is a binary file.
So, the pseudocode for what ultimately happens is:
sys.stdio.wrapped_binary_file.write(s.encode(sys.stdio.encoding, sys.stdio.errors))
sys.stdio.write(s.encode(sys.getdefaultencoding()))
And, as you saw, those will do different things, because:
print(sys.getdefaultencoding(), sys.stdout.encoding, f.encoding) yields ('ascii', 'UTF-8', None)
You can simulate Python 3 here by using a io.TextIOWrapper or codecs.StreamWriter and then using print >>f, … or f.write(…) instead of print, or you can explicitly encode all your unicode objects like this:
print '({})'.format(', '.join(element.encode('utf-8') for element in my_tuple)))
But really, the best way to deal with all of these problems is to run your existing Python 3 code in a Python 3 interpreter instead of a Python 2 interpreter.
If you want or need to use Python 2.7, that's fine, but you have to write Python 2 code. If you want to write Python 3 code, that's great, but you have to run Python 3.3. If you really want to write code that works properly in both, you can, but it's extra work, and takes a lot more knowledge.
For further details, see What's New In Python 3.0 (the "Print Is A Function" and "Text Vs. Data Instead Of Unicode Vs. 8-bit" sections), although that's written from the point of view of explaining 3.x to 2.x users, which is backward from what you need. The 3.x and 2.x versions of the Unicode HOWTO may also help.
For completeness: I'm reading from the files with lines = file.readlines() and printing with the standard print() function. No manual encoding or decoding happens at either end.
In Python 3.x, the standard print function just writes Unicode to sys.stdout. Since that's a io.TextIOWrapper, its write method is equivalent to this:
self.wrapped_binary_file.write(s.encode(self.encoding, self.errors))
So one likely problem is that sys.stdout.encoding does not match your terminal's actual encoding.
And of course another is that your shell's encoding does not match your terminal window's encoding.
For example, on OS X, I create a myscript.py like this:
print('\u00e5')
Then I fire up Terminal.app, create a session profile with encoding "Western (ISO Latin 1)", create a tab with that session profile, and do this:
$ export LANG=en_US.UTF-8
$ python3 myscript.py
… and I get exactly the behavior you're seeing.
It seems from your comment that you are using python-2 and not python-3.
If you are using python-3, it's worth reading the unicode howto guide on reading/writing to understand what python is doing.
The basic flow if encoding is:
DECODE from encoding to unicode -> Processing -> Encode from unicode to encoding
In python3 the bytes are decoded to strings and strings are encoded to bytes.
The bytes to string decoding is handled for you with open().
[..] the built-in open() function can return a file-like object that
assumes the file’s contents are in a specified encoding and accepts
Unicode parameters for methods such as read() and write(). This works
through open()‘s encoding and errors parameters [..]
So to read in unicode from a utf-8 encoded file you should be doing this:
# python-3
with open('utf8.txt', mode='r', encoding='utf-8') as f:
lines = f.readlines() # returns unicode
If you want similar functionality using python-2, you can use codecs.open():
# python-2
import codecs
with codecs.open('utf8.txt', mode='r', encoding='utf-8') as f:
lines = f.readlines() # returns unicode