UTF-8 minus sign rejected in python command-line arguments - python

I am running python 2.6 on Ubuntu Lucent and having trouble getting the minus sign in negative command-line arguments to be interpreted properly, especially when the call to the script is initiated through the OS via Rails (using backquotes). In particular, the minus sign seems to be coming in as UTF-8.
When command-line arguments are interpreted manually, as in:
lng = float(sys.argv[4])
it triggers the error:
ValueError: invalid literal for float(): ‐122.768
As a hack, I can get around this by matching on the first three bytes as '\xe2', '\x80', and '\x90', and replacing them with my own negative sign.
When command-line arguments are interpreted through argparse (ver. 1.2.1), as in:
parser.add_argument('--coords', metavar='Coord', dest='coordinates', type=float, nargs=3, help='Latitude, Longitude, and Altitude')
it triggers the error:
sC.py: error: argument --coords: invalid float value: '\xe2\x80\x90122.76838'
Any help would be appreciated!

Your input data contains a Unicode character that isn't the standard ascii hyphen.
import unicodedata as ud
data = '\xe2\x80\x90122.76838'
unicode_data = data.decode('utf8')
print repr(ud.name(unicode_data[0]))
print repr(ud.name(u'-')) # An ascii hyphen
Output:
'HYPHEN'
'HYPHEN-MINUS'
While they may look the same when printed, they are not. Restrict or sanitize the input.
print float(unicode_data.replace(u'\N{HYPHEN}',u'-'))
Output:
-122.76838

You might have to use your hack and tell argparse to expect a string.
As far as Python, your system, and RoR are concerned - and ― aren't related in any way. If you want to solve this problem (instead of hack it) you have go up to the rails code, and see where it gets its data from. Somewhere along the line fancy output was important.

Related

Python: Image's path as a raw string an input to a function

Python: I want to get an image as an input from the user as a raw string! I used input() to get the path. Giving it as a raw string makes the program work, I can do it by appending r before the path, but Image.open(' ') also takes r as a string and producing an error. Can someone help me in resolving this problem.
path=input('Please enter the path of the image')
im=Image.open(path)
get an error as no file found
if i give..
y='r'+path
im=Image.open(y)
then the error is
OSError: [Errno 22] Invalid argument: 'rC:\\Users\\User\\Desktop\.......jpeg'
I am new to python, so please help me if there is any method by which I can solve this issue.
raw strings are for a programmer's convenience; you don't have to have your users enter raw strings as normal input.
See the end of this post for the solution to your problem. Because you said you are new to Python, I have decided to give a detailed answer here.
Why raw strings?
Normal strings assign special meaning to the \ (backslash) character. This is fine as \ can be escaped by using \\ (two backslashes) to represent a single backslash.
However, this can sometimes become ugly.
Consider, for example, a path: C:\Users\Abhishek\test.txt. To represent this as a normal string in Python, all \ must be escaped:
string = 'C:\\Users\\Abhishek\\test.txt'
You can avoid this by using raw strings. Raw strings don't treat \ specially.
string = r'C:\Users\Abhishek\test.txt'
That's it. This is the only use of raw strings, viz., convenience.
Solution
If you are using Python 2, use raw_input instead of input. If you are using Python 3 (as you should be) input is fine. Don't try to input the path as a raw string.

Python - How can I convert a special character to the unicode representation?

In a dictionary, I have the following value with equals signal:
{"appVersion":"o0u5jeWA6TwlJacNFnjiTA=="}
To be explicit, I need to replace the = for the unicode representation '\u003d' (basically the reverse process of [json.loads()][1]). How can I set the unicode value to a variable without store the value with two scapes (\\u003d)?.
I've tryed of different ways, including the enconde/decode, repr(), unichr(61), etc, and even searching a lot, cound't find anything that does this, all the ways give me the following final result (or the original result):
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
Since now, thanks for your attention.
EDIT
When I debug the code, it gives me the value of the variable with 2 escapes. The program will get this value and use it to do the following actions, including the extra escape. I'm using this code to construct a json by the json.dumps() and the result returned is a unicode with 2 escapes.
Follow a print of the final result after the JSON construction. I need to find a way to store the value in the var with just one escape.
I don't know if make difference, but I'm doing this to a custom BURP Plugin, manipulating some selected requests.
Here is an image of my POC, getting the value of the var.
The extra backslash is not actually added, The Python interpreter uses the repr() to indicate that it's a backslash not something like \t or \n when the string containing \ gets printed:
I hope this helps:
>>> t['appVersion'] = t["appVersion"].replace('=', '\u003d')
>>> t['appVersion']
'o0u5jeWA6TwlJacNFnjiTA\\u003d\\u003d'
>>> print(t['appVersion'])
o0u5jeWA6TwlJacNFnjiTA\u003d\u003d
>>> t['appVersion'] == 'o0u5jeWA6TwlJacNFnjiTA\u003d\u003d'
True

How to make a created file hidden? [duplicate]

Trying to hide folder without success. I've found this :
import ctypes
ctypes.windll.kernel32.SetFileAttributesW('G:\Dir\folder1', 2)
but it did not work for me. What am I doing wrong?
There are two things wrong with your code, both having to do with the folder name literal. The SetFileAttributesW() function requires a Unicode string argument. You can specify one of those by prefixing a string with the character u. Secondly, any literal backslash characters in the string will have to be doubled or you could [also] add an r prefix to it. A dual prefix is used in the code immediately below.
import ctypes
FILE_ATTRIBUTE_HIDDEN = 0x02
ret = ctypes.windll.kernel32.SetFileAttributesW(ur'G:\Dir\folder1',
FILE_ATTRIBUTE_HIDDEN)
if ret:
print('attribute set to Hidden')
else: # return code of zero indicates failure -- raise a Windows error
raise ctypes.WinError()
You can find Windows' system error codes here. To see the results of the attribute change in Explorer, make sure its "Show hidden files" option isn't enabled.
To illustrate what #Eryk Sun said in a comment about arranging for the conversion to Unicode from byte strings to happen automatically, you would need to perform the following assignment before calling the function to specify the proper conversion of its arguments. #Eryk Sun also has an explanation for why this isn't the default for pointers-to-strings in the W versions of the WinAPI functions -- see the comments.
ctypes.windll.kernel32.SetFileAttributesW.argtypes = (ctypes.c_wchar_p, ctypes.c_uint32)
Then, after doing that, the following will work (note that an r prefix is still required due to the backslashes):
ret = ctypes.windll.kernel32.SetFileAttributesW(r'G:\Dir\folder1',
FILE_ATTRIBUTE_HIDDEN)
Try this code:
import os
os.system("attrib +h " + "your file name")

Base.py: Unicode equal comparison failed to convert both arguments to Unicode in

I was taking a look to a couple of related questions which treats about the same issue, but I still don't find a way to solve it.
It turns out that every time I execute a Django-related command, it prints me out the expected output plus something like this:
/Library/Python/2.7/site-packages/django/db/backends/sqlite3/base.py:307: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
return name == ":memory:" or "mode=memory" in force_text(name)
And here is the context of that line:
def is_in_memory_db(self, name):
return name == ":memory:" or "mode=memory" in force_text(name)
Despite the Django server works, it's kind of annoying having always this message printed out on my screen. So, why is this happening and how could this be solved?
use decode('utf-8') to make correct comparing:
name.decode('utf-8') == ":memory:" or "mode=memory" in force_text(name)
Use full info:
Unicode HOWTO
Solving Unicode Problems in Python 2.7

Interpreting Unicode from Terminal

I'm having issues reading Unicode text from the shell into Python. I have a test document with the following metadata atrribute:
kMDItemAuthors = (
"To\U0304ny\U0308 Sta\U030ark"
)
I see this when I run mdls -name kMDItemAuthors path/to/the/file
I am attempting to get this data into usable form within a Python script. However, I cannot get the Unicode represented text into actual Unicode in Python.
Here's what I am currently doing:
import unicodedata
import subprocess
import os
os.environ['LANG'] = 'en_US.UTF-8'
cmd = 'mdls -name kMDItemAuthors path/to/the/file'
proc = subprocess.Popen(cmd,
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
(stdout, stderr) = proc.communicate()
u = unicode(stdout, 'utf8')
a = unicodedata.normalize('NFC', u)
Now, when I print(a), I get the exact same string representation is above. I have tried normalizing with all of the options (NFC, NFD, NFKC, NFKD), all with the same result.
The weirder thing is, when I try this code:
print('To\U0304ny\U0308 Sta\U030ark')
I get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-7: truncated \UXXXXXXXX escape
So, when that sub-string is within the variable, there's no problem, but as a raw string, it creates an issue.
I had felt pretty strong in my understanding of Python and Unicode, but now the shell has broken me. Any help would be greatly appreciated.
PS. I am running all this in Python 2.7.X
You have multiple problems here.
Like all escape sequences, Python only interprets the \U sequence in string literals in your source code. If a file actually has a \ followed by a U in it, Python isn't going to treat that as anything other than a \ and a U, any more than it'll treat a \ followed by an n as a newline. If you want to unescape them manually, you can, by using the unicodeescape codec. (But note that this will treat your file as ASCII, not UTF-8. If you actually have both UTF-8 and \U sequences, you will have to decode it as UTF8, then encode it with unicodeescape, then decode it back with unicodeescape.)
A Python \U sequence requires 8 digits, not 4. If you only have 4, you have to use \u. So, whatever program generated this string, it can't be parsed with unicodeescape. You might be able to hack it into shape by some quick&dirty workaround like s.replace(r'\U', r'\U0000') or s.replace('r\U', r'\u'), or you may have to write a simple parser for it.
In your test, you're trying to use \U escapes in a string literal. You can only do that in Unicode string literals, like print(u'To\U0304ny\U0308 Sta\U030ark'). (If you do that, of course, you'll get the previous error again.)
Also, since this appears to be a Mac, you probably shouldn't be doing os.environ['LANG'] = 'en_US.UTF-8'. If Python sees that it's on OS X, it assumes everything is UTF-8. Anything you do to try to force UTF-8 will probably do nothing, and could in theory confuse it so it doesn't notice it's on OS X. Unless you're trying to work around a driver program that intentionally sets the locale to "C" before calling your script, you're usually better off not doing this.
as mentioned in the other answers just slightly more direct code example
>>> s="To\U0304ny\U0308 Sta\U030ark"
>>> s
'To\\U0304ny\\U0308 Sta\\U030ark'
>>> s.replace("\\U","\\u").decode("unicode-escape")
u'To\u0304ny\u0308 Sta\u030ark'
>>> print s.replace("\\U","\\u").decode("unicode-escape")
Tōnÿ Stårk
>>>
\U is for characters outside the BMP, i.e. it takes 8 hex digits. For characters within the BMP use \u.
>>> print u'To\u0304ny\u0308 Sta\u030ark'
Tōnÿ Stårk
3>> print('To\u0304ny\u0308 Sta\u030ark')
Tōnÿ Stårk

Categories