String/unicode references in Python embedded dictionaries [duplicate] - python

This question already has answers here:
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Python string interning
(2 answers)
Closed 5 years ago.
I have a question about the Python 2.7.5-Python 2.7.13. It may be
about semantics or it may be a genuine Python bug. I'm not entirely
sure which. Here is the simplest code I can construct with the
issue
Python 2.7.13 |Enthought, Inc. (x86_64)| (default, Mar 2 2017, 08:20:50)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
>>> dd = {'foo': {'yy':u'Tannenbaum'}}
>>> dd['foo']['yy'] is u'Tannenbaum'
False
>>> dd['foo']['yy'] == u'Tannenbaum'
True
Note: If 'Tannebaum' is changed from unicode to a string the outcome changes. Both of the final tests are true. The question is: Why do the two final tests differ in the unicode case? My understanding is that since unicode and strings are both immutables the "is" and "==" tests should never differ in value. But I get this behavior in both Python 2.7.13 and the old 2.7.5 that came installed on my Mac. Am I relying on something I shouldn't rely on? Is the moral that I should never use "is" for string equality? But what is the principle that tells me that?
Postscript: I have access to a Python 3.6.2 on another machine, and lo and behold, I cannot reproduce this anomaly.
Python 3.6.2 (default, Jul 30 2017, 12:03:06)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> dd = {'foo': {'yy':u'Tannenbaum'}}
>>> dd['foo']['yy'] is u'Tannenbaum'
True
>>> dd['foo']['yy'] == u'Tannenbaum'
True

Related

Replace accented letters with the respective non-accented ones at Python 3

I am not sure that this popular answer works in Python 3 since there is no unicode in Python 3.
Therefore, how can replace accented letters with the respective non-accented ones at Python 3?
For example,
sentence = 'intérêt'
to
new_sentence = 'interet'
The linked answer references the third-party module unidecode, not Python 2's unicode type.
$ python3
Python 3.7.1 (default, Nov 19 2018, 13:04:22)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import unidecode
>>> unidecode.unidecode('intérêt')
'interet'

Division in python [duplicate]

This question already has answers here:
How can I force division to be floating point? Division keeps rounding down to 0?
(11 answers)
Closed 4 years ago.
Going through the python doc and doing the following operation:
ravi#user-ThinkCentre-M90:~$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 17/3
5
>>>
Why the output is not a float num?
you should use 17/3.0
It is a different place of python2 and python3.
try dividing it this way:
17/3.0

Python 3.6 variable annotations and numeric literals

In the documentation on Python in the section "What's new in Python 3.6" among other things there are presented variable annotations and using underscores in numeric literals.
However I tried shown examples and not all of them were passed.
Are these examples incomplete and do they require some additional code that is assumed under the hood?
For example this statement
primes: List[int] = []
issues
NameError: name 'List' is not defined
This statement
print( 1_000_000_000_000_000 )
is also considered as wrong.
The first case works if you first import List from typing. Most types used with type-hints aren't built-in, they need to be imported first.
The second case also works if you are running under 3.6. On my machine it correctly prints:
Python 3.6.2 | packaged by conda-forge | (default, Jul 23 2017, 22:59:30)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print( 1_000_000_000_000_000 )
1000000000000000
If the error message you receive is: SyntaxError: invalid syntax you're on 3.5 or less. If it's SyntaxError: invalid token you're not using the underscores correctly. I'm guessing you're receiving the first.
So, you might want to double check you're running with 3.6 (python -V).

Does python support unicode beyond basic multilingual plane?

Below is a simple test. repr seems to work fine. yet len and x for x in doesn't seem to divide the unicode text correctly in Python 2.6 and 2.7:
In [1]: u"爨爵"
Out[1]: u'\U0002f920\U0002f921'
In [2]: [x for x in u"爨爵"]
Out[2]: [u'\ud87e', u'\udd20', u'\ud87e', u'\udd21']
Good news is Python 3.3 does the right thing ™.
Is there any hope for Python 2.x series?
Yes, provided you compiled your Python with wide-unicode support.
By default, Python is built with narrow unicode support only. Enable wide support with:
./configure --enable-unicode=ucs4
You can verify what configuration was used by testing sys.maxunicode:
import sys
if sys.maxunicode == 0x10FFFF:
print 'Python built with UCS4 (wide unicode) support'
else:
print 'Python built with UCS2 (narrow unicode) support'
A wide build will use UCS4 characters for all unicode values, doubling memory usage for these. Python 3.3 switched to variable width values; only enough bytes are used to represent all characters in the current value.
Quick demo showing that a wide build handles your sample Unicode string correctly:
$ python2.6
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxunicode
1114111
>>> [x for x in u'\U0002f920\U0002f921']
[u'\U0002f920', u'\U0002f921']

Nan with Python 2.5 on Windows

How do I create a Nan with Python 2.5 on Windows?
float('nan') fails with the error ValueError: invalid literal for float(): nan
Summary of the answers: Neither float('inf') nor float('nan') works with Python 2.5 and Windows. This is a bug that was fixed in Python 2.6.
If you are using numpy, then you can use numpy.inf and numpy.nan.
If you need a workaround without numpy, then you can use an expression that overflows such as 1e1000 to get an inf, and 1e1000 / 1e1000 or 1e1000 - 1e1000 to get a nan.
Another way is dividing inf by itself:
>>> float('inf') / float('inf')
nan
Or in a more obscure way, which might not work across platforms (but works around that specific bug in Python 2.5 on Windows):
>>> 1e31337 / 1e31337
nan
>>> 1e31337 - 1e31337
nan
There is already an accepted answer to this question, but I think the following should work if you don't want to rely on overflow and have numpy installed ... (not tested as I don't have python2.5 or windows)
>>> import numpy as np
>>> np.nan
nan
>>> np.inf
inf
Upgrade your Python distribution if possible. The behavior you listed is considered a bug. (Note: Cython link.)
Canonically, Python is supposed to support this definition of nan in a cross-platform manner. This behavior appears to have been fixed in Python 2.6 and 3.0.
(Additional reading)
Of course, this works in the Linux versions of Python:
$ python2.4
Python 2.4.3 (#1, Sep 21 2011, 19:55:41)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> float('nan')
nan
$ python2.5
Python 2.5.2 (r252:60911, Jun 26 2008, 10:20:40)
[GCC 4.1.2 20070626 (Red Hat 4.1.2-14)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> float('nan')
nan

Categories