Python sys.maxint, sys.maxunicode on Linux and windows - python

On 64-bit Debian Linux 6:
Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxint
9223372036854775807
>>> sys.maxunicode
1114111
On 64-bit Windows 7:
Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxint
2147483647
>>> sys.maxunicode
65535
Both Operating Systems are 64-bit. They have sys.maxunicode, according to wikipedia There are 1,114,112 code points in unicode. Is sys.maxunicode on Windows wrong?
And why do they have different sys.maxint?

I don't know what your question is, but sys.maxunicode is not wrong on Windows.
See the docs:
sys.maxunicode
An integer giving the largest supported code point for a Unicode character. The value of this depends on the configuration option that
specifies whether Unicode characters are stored as UCS-2 or UCS-4.
Python on Windows uses UCS-2, so the largest code point is 65,535 (and the supplementary-plane characters are encoded by 2*16 bit "surrogate pairs").
About sys.maxint, this shows at which point Python 2 switches from "simple integers" (123) to "long integers" (12345678987654321L). Obviously Python for Windows uses 32 bits, and Python for Linux uses 64 bits. Since Python 3, this has become irrelevant because the simple and long integer types have been merged into one. Therefore, sys.maxint is gone from Python 3.

Regarding the difference is sys.maxint, see What is the bit size of long on 64-bit Windows?. Python uses the long type internally to store a small integer on Python 2.x.

Related

Replace accented letters with the respective non-accented ones at Python 3

I am not sure that this popular answer works in Python 3 since there is no unicode in Python 3.
Therefore, how can replace accented letters with the respective non-accented ones at Python 3?
For example,
sentence = 'intérêt'
to
new_sentence = 'interet'
The linked answer references the third-party module unidecode, not Python 2's unicode type.
$ python3
Python 3.7.1 (default, Nov 19 2018, 13:04:22)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import unidecode
>>> unidecode.unidecode('intérêt')
'interet'

Changing the directory using os.chdir UNC filepath in python

I am attempting to change a directory via its IP address or using it's unc (as I am working in windows). This is due to the external server being mapped to different drives for different users.
Using os.chdir(r'path\\to\remote\directory') does not seem to work and I wonder if there are any alternatives that python doesn't hate i.e. an IP address?
Works fine for me:
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.chdir(r"\\myserver\myshare")
>>> os.getcwd()
'\\\\myserver\\myshare'
It's hard to tell if the r'path\\to\remote\directory' typo is also in your actual code and how you determined it "does not work".

Does python support unicode beyond basic multilingual plane?

Below is a simple test. repr seems to work fine. yet len and x for x in doesn't seem to divide the unicode text correctly in Python 2.6 and 2.7:
In [1]: u"爨爵"
Out[1]: u'\U0002f920\U0002f921'
In [2]: [x for x in u"爨爵"]
Out[2]: [u'\ud87e', u'\udd20', u'\ud87e', u'\udd21']
Good news is Python 3.3 does the right thing ™.
Is there any hope for Python 2.x series?
Yes, provided you compiled your Python with wide-unicode support.
By default, Python is built with narrow unicode support only. Enable wide support with:
./configure --enable-unicode=ucs4
You can verify what configuration was used by testing sys.maxunicode:
import sys
if sys.maxunicode == 0x10FFFF:
print 'Python built with UCS4 (wide unicode) support'
else:
print 'Python built with UCS2 (narrow unicode) support'
A wide build will use UCS4 characters for all unicode values, doubling memory usage for these. Python 3.3 switched to variable width values; only enough bytes are used to represent all characters in the current value.
Quick demo showing that a wide build handles your sample Unicode string correctly:
$ python2.6
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxunicode
1114111
>>> [x for x in u'\U0002f920\U0002f921']
[u'\U0002f920', u'\U0002f921']

Nan with Python 2.5 on Windows

How do I create a Nan with Python 2.5 on Windows?
float('nan') fails with the error ValueError: invalid literal for float(): nan
Summary of the answers: Neither float('inf') nor float('nan') works with Python 2.5 and Windows. This is a bug that was fixed in Python 2.6.
If you are using numpy, then you can use numpy.inf and numpy.nan.
If you need a workaround without numpy, then you can use an expression that overflows such as 1e1000 to get an inf, and 1e1000 / 1e1000 or 1e1000 - 1e1000 to get a nan.
Another way is dividing inf by itself:
>>> float('inf') / float('inf')
nan
Or in a more obscure way, which might not work across platforms (but works around that specific bug in Python 2.5 on Windows):
>>> 1e31337 / 1e31337
nan
>>> 1e31337 - 1e31337
nan
There is already an accepted answer to this question, but I think the following should work if you don't want to rely on overflow and have numpy installed ... (not tested as I don't have python2.5 or windows)
>>> import numpy as np
>>> np.nan
nan
>>> np.inf
inf
Upgrade your Python distribution if possible. The behavior you listed is considered a bug. (Note: Cython link.)
Canonically, Python is supposed to support this definition of nan in a cross-platform manner. This behavior appears to have been fixed in Python 2.6 and 3.0.
(Additional reading)
Of course, this works in the Linux versions of Python:
$ python2.4
Python 2.4.3 (#1, Sep 21 2011, 19:55:41)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> float('nan')
nan
$ python2.5
Python 2.5.2 (r252:60911, Jun 26 2008, 10:20:40)
[GCC 4.1.2 20070626 (Red Hat 4.1.2-14)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> float('nan')
nan

Python2.4 and 2.6 behaves differently for os.path.getmtime() on Windows

Getting two different modification time when calculated from different Python versions on Windows XP.
Python2.4
C:\Copy of elisp>c:\python24\python
Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.getmtime("auto-complete-emacs-lisp.el")
1251684178
>>> ^Z
Python2.6
C:\Copy of elisp>C:\Python26\python
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.getmtime("auto-complete-emacs-lisp.el")
1251687778.0
>>>
There is a difference of 3600 seconds reported by Python2.6 and Python2.4.
What is the reason of this strange behavior?
It's a bug in Microsoft's implementation of the C standard library. Python 2.4 used to use the stdlib fstat call to get file information, and hence could end up an hour out in locales that use DST.
In Python 2.5 and later, os.stat calls the direct Win32-only API to get file information when running on Windows, resulting in the correct output. See this thread for more.
There is a difference of 3600 seconds ...
This should be the kicker. It's a timezone problem, pure and simple.
Now all you have to do is find out why 2.4 and 2.6 are using different timezone information :-)

Categories