I've hit what should be a basic question... but my googles are failing me and I need a sanity check.
If I run the following in my Python shell:
>>> import sys
>>> sys.version
from two different Python environments, I get:
'2.7.8 (default, Nov 10 2014, 08:19:18) \n[GCC 4.9.2 20141101 (Red Hat 4.9.2-1)]'
and...
'2.7.8 (default, Apr 15 2015, 09:26:43) \n[GCC 4.9.2 20150212 (Red Hat 4.9.2-6)]'
Does that mean the two environments are actually running slightly different Python guts or is it enough that the '2.7.8' bit in that version string is the same so I can be confident these are 1:1 identical Python interpreters?
If I am guaranteed they are the same, then what's the significance of the date and other parts of that version output string?
All you need to compare is the first bit, the 2.7.8 string.
The differences you see are due to the compiler used to build the binary, and when the binary was built. That shouldn't really make a difference here.
The string is comprised of information you can find in machine-readable form elsewhere; specifically:
platform.python_version()
Returns the Python version as string 'major.minor.patchlevel'.
platform.python_build()
Returns a tuple (buildno, builddate) stating the Python build number and date as strings.
platform.python_compiler()
Returns a string identifying the compiler used for compiling Python.
For your sample strings, what differs is the date the binary was build (second value of the platform.python_build() tuple) and the exact revision of the GCC compiler used (from the platform.python_compiler() string). Only when there are specific problems with the compiler would this matter.
You should normally only care about the Python version information, which is more readily available as the sys.version_info tuple.
Related
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 1 year ago.
I am currently running python 3.8.5 and created a pandas DataFrame with some data. After performing some calculations I ended up with single float value, which was .007. I then multiplied the value by 100 to get the percent. However, I noticed that the value returned was 0.7000000000000001 instead of the expected .7. I have not tested in other python environments but have tried it in pycharm and jupyter lab. Bother return the same result.
So, I restarted the kernal and tried print(.007*100) and the returned result was the same. I also tried print(.006*100) and got the expected result: .6
Is this a bug in python 3.8.5? Has anyone else experienced / can replicate this issue?
Python 3.8.5 (default, Sep 4 2020, 02:22:02)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
print(.007*100)
print(.006*100)
print(.008*100)
0.7000000000000001
0.6
0.8
Is this a bug in python 3.8.5?
This issue is not specific to version 3.8.5 and is caused by how floats in python works. Python decimal module docs have discussion about that. You might use decimal.Decimal to avoid this problem, consider following example
import decimal
print(0.1+0.1+0.1==0.3)
print(decimal.Decimal("0.1")+decimal.Decimal("0.1")+decimal.Decimal("0.1")==decimal.Decimal("0.3"))
output
False
True
I am trying to use a code which was written for python 2 and may run with python 3.6.0, but it does not run with python 3.6.4. It imports the IN module, and uses IN.IP_RECVERR. I tried to google it, but it is a 'bit' hard to find anything about a module called IN (naming fail?). To demonstrate in REPL, that it works in python 2, but not in 3.6.4:
$ python2
Python 2.7.14 (default, Jan 5 2018, 10:41:29)
[GCC 7.2.1 20171224] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import IN
>>> IN.IP_RECVERR
11
>>>
$ python3
Python 3.6.4 (default, Jan 5 2018, 02:35:40)
[GCC 7.2.1 20171224] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import IN
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'IN'
>>>
What is the replacement for this IN module in newer versions of python 3?
This is presumably the private plat-linux/IN.py module, which was never intended to be used. There have been plans to remove these plat-* files for a few zillion years, but it looks like it finally happened in issue 28027 for 3.6. As mentioned in What's New in Python 3.6:
The undocumented IN, CDROM, DLFCN, TYPES, CDIO, and STROPTS modules have been removed. They had been available in the platform specific Lib/plat-*/ directories, but were chronically out of date, inconsistently available across platforms, and unmaintained. The script that created these modules is still available in the source distribution at Tools/scripts/h2py.py.
Most of the useful constants that are at least somewhat portable (as in you can expect them to be available and work the same on your old laptop's linux and your brand-new Galaxy's linux, if not on OS X or Solaris) have long been made available through other places in the stdlib.
I think this specific one you're looking for is an example of not completely useless, but not portable enough to put anywhere safe, because linux documents the existence of IP_RECVERR, but not its value. So, you really need the version from your own system's ip headers.
The way to do this safely, if you actually need the IN module, is to run Tools/scripts/h2py.py with the Python version you're using, on the specific platform you need. That will generate an IN.py from the appropriate headers on your system (or on your cross-compilation target), which you can then use on that system. If you want to distribute your code, you'd probably need to put a step to do that into the setup.py, so it'll be run at install time (and at wheel-building time for people who install pre-built wheels, but you may need to be careful to make sure the targets are specific enough).
If you don't need to be particularly portable, you just need to access the one value in a few scripts that you're only deploying on your laptop or your company's set of identical containers or the like, you may be better off hardcoding the values (with a nice scare comment explaining the details).
I have a quick question that I cannot seem to clarify, despite my research on Stack Overflow and beyond. My questions involves the Windows SystemParametersInfo function with its variants SystemParametersInfoW (Unicode) and SystemParametersInfoA (ANSI) in relation to a Python 3.x script.
In a Python script I am writing, I came across two different explanations into when to use these variants. This answer to a question says that for 64-bit machines you must use SystemParametersInfoW while for 32-bit machines you must use SystemParametersInfoA, thus you should run a function to determine which bit machine the script is running on. However, another answer here (and I've seen more people advocate for this type of answer) and here says that SystemParametersInfoW must be used with Python 3.x since it passes a Unicode string while SystemParametersInfoA is used for Python 2.x and below since it passes a byte string conducive with ANSI.
So what is the right answer here as I would need to proceed forward differently with my script? Again, I am using Python 3.5 so it would make sense that the second answer fits, however is there any truth in the bit of the machine being a factor between using SystemParametersInfoW and SystemParametersInfoA? Is it a mixture of both answers or should I go ahead and use SystemParametersInfoW regardless of whether it will be used on a 32 or 64 bit machine? Do I even need to determine the bit of the machine the script is running on? Thank you for your help in clarifying this issue!
Internally, Windows uses Unicode. The SystemParametersInfoA function converts ANSI parameter strings to Unicode and internally calls SystemParametersInfoW. You can call either from Python whether 32- or 64-bit, in Python 2.x or 3.x. Usually you want the W version to pass and retrieve Unicode strings since Windows is internally Unicode. The A version can lose information.
Example that works in Python 2 or 3, 32- or 64-bit. Note that the W version returns a Unicode string in the buffer, while the A version returns a byte string.
from __future__ import print_function
from ctypes import *
import sys
print(sys.version)
SPI_GETDESKWALLPAPER = 0x0073
dll = WinDLL('user32')
buf = create_string_buffer(200)
ubuf = create_unicode_buffer(200)
if dll.SystemParametersInfoA(SPI_GETDESKWALLPAPER,200,buf,0):
print(buf.value)
if dll.SystemParametersInfoW(SPI_GETDESKWALLPAPER,200,ubuf,0):
print(ubuf.value)
Output (Python 2.X 32-bit and Python 3.X 64-bit):
C:\>py -2 test.py
2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)]
c:\windows\web\wallpaper\theme1\img1.jpg
c:\windows\web\wallpaper\theme1\img1.jpg
C:\>py -3 test.py
3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)]
b'c:\\windows\\web\\wallpaper\\theme1\\img1.jpg'
c:\windows\web\wallpaper\theme1\img1.jpg
On Windows 3.x/95/98/ME it is likely that only SystemParametersInfoA works correctly. On all other systems both the A and W flavor will work regardless of the OS bitness.
Assuming you only support "recent" versions of Windows, you should just pick the flavor most comfortable for you to use in your language and that usually means the flavor that matches the default string type in your language.
If you want to support both Python v2 & v3 you would have to choose at run-time which function to call if you are using the default string type.
Below is a simple test. repr seems to work fine. yet len and x for x in doesn't seem to divide the unicode text correctly in Python 2.6 and 2.7:
In [1]: u"爨爵"
Out[1]: u'\U0002f920\U0002f921'
In [2]: [x for x in u"爨爵"]
Out[2]: [u'\ud87e', u'\udd20', u'\ud87e', u'\udd21']
Good news is Python 3.3 does the right thing ™.
Is there any hope for Python 2.x series?
Yes, provided you compiled your Python with wide-unicode support.
By default, Python is built with narrow unicode support only. Enable wide support with:
./configure --enable-unicode=ucs4
You can verify what configuration was used by testing sys.maxunicode:
import sys
if sys.maxunicode == 0x10FFFF:
print 'Python built with UCS4 (wide unicode) support'
else:
print 'Python built with UCS2 (narrow unicode) support'
A wide build will use UCS4 characters for all unicode values, doubling memory usage for these. Python 3.3 switched to variable width values; only enough bytes are used to represent all characters in the current value.
Quick demo showing that a wide build handles your sample Unicode string correctly:
$ python2.6
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxunicode
1114111
>>> [x for x in u'\U0002f920\U0002f921']
[u'\U0002f920', u'\U0002f921']
In answering another question, I suggested to use timeit to test the difference between indexing a list with positive integers vs. negative integers. Here's the code:
import timeit
t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=10000000)
print (t)
t=timeit.timeit('mylist[-1]',setup='mylist=list(range(100))',number=10000000)
print (t)
I ran this code with python 2.6:
$ python2.6 test.py
0.587687015533
0.586369991302
Then I ran it with python 3.2:
$ python3.2 test.py
0.9212150573730469
1.0225799083709717
Then I scratched my head, did a little google searching and decided to post these observations here.
Operating system: OS-X (10.5.8) -- Intel Core2Duo
That seems like a pretty significant difference to me (a factor of over 1.5 difference). Does anyone have an idea why python3 is so much slower -- especially for such a common operation?
EDIT
I've run the same code on my Ubuntu Linux desktop (Intel i7) and achieved comparable results with python2.6 and python 3.2. It seems that this is an issue which is operating system (or processor) dependent (Other users are seeing the same behavior on Linux machines -- See comments).
EDIT 2
The startup banner was requested in one of the answers, so here goes:
Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
and:
Python 3.2 (r32:88452, Feb 20 2011, 10:19:59)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
UPDATE
I've just installed fresh versions of python2.7.3 and python3.2.3 from http://www.python.org/download/
In both cases, I took the
"Python x.x.3 Mac OS X 32-bit i386/PPC Installer (for Mac OS X 10.3 through 10.6 [2])"
since I am on OS X 10.5. Here are the new timings (which are reasonably consistent through multiple trials):
python 2.7
$python2.7 test.py
0.577006101608
0.590042829514
python 3.2.3
$python3.2 test.py
0.8882801532745361
1.034242868423462
This appears to be an artifact of some builds of Python 3.2. The best hypothesis at this point is that all 32-bit Intel builds have the slowdown, but no 64-bit ones do. Read on for further details.
You didn't run nearly enough tests to determine anything. Repeating your test a bunch of times, I got values ranging from 0.31 to 0.54 for the same test, which is a huge variation.
So, I ran your test with 10x the number, and repeat=10, using a bunch of different Python2 and Python3 installs. Throwing away the top and bottom results, averaging the other 8, and dividing by 10 (to get a number equivalent to your tests), here's what I saw:
1. 0.52/0.53 Lion 2.6
2. 0.49/0.50 Lion 2.7
3. 0.48/0.48 MacPorts 2.7
4. 0.39/0.49 MacPorts 3.2
5. 0.39/0.48 HomeBrew 3.2
So, it looks like 3.2 is actually slightly faster with [99], and about the same speed with [-1].
However, on a 10.5 machine, I got these results:
1. 0.98/1.02 MacPorts 2.6
2. 1.47/1.59 MacPorts 3.2
Back on the original (Lion) machine, I ran in 32-bit mode, and got this:
1. 0.50/0.48 Homebrew 2.7
2. 0.75/0.82 Homebrew 3.2
So, it seems like 32-bitness is what matters, and not Leopard vs. Lion, gcc 4.0 vs. gcc 4.2 or clang, hardware differences, etc. It would help to test 64-bit builds under Leopard, with different compilers, etc., but unfortunately my Leopard box is a first-gen Intel Mini (with a 32-bit Core Solo CPU), so I can't do that test.
As further circumstantial evidence, I ran a whole slew of other quick tests on the Lion box, and it looks like 32-bit 3.2 is ~50% slower than 2.x, while 64-bit 3.2 is maybe a little faster than 2.x. But if we really want to back that up, someone needs to pick and run a real benchmark suite.
Anyway, my best guess at this point is that when optimizing the 3.x branch, nobody put much effort into 32-bit i386 Mac builds. Which is actually a reasonable choice for them to have made.
Or, alternatively, they didn't even put much effort into 32-bit i386 period. That possibility might explain why the OP saw 2.x and 3.2 giving similar results on a linux box, while Otto Allmendinger saw 3.2 being similarly slower to 2.6 on a linux box. But since neither of them mentioned whether they were running 32-bit or 64-bit linux, it's hard to know whether that's relevant.
There are still lots of other different possibilities that we haven't ruled out, but this seems like the best one.
here is a code that illustrates at least part of the answer:
$ python
Python 2.7.3 (default, Apr 20 2012, 22:44:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
2.55517697334
>>> t=timeit.timeit('mylist[99L]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.89904499054
$ python3
Python 3.2.3 (default, May 3 2012, 15:54:42)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.9906489849090576
python3 does not have old int type.
Python 3 range() is the Python 2 xrange(). If you want to simulate the Python 2 range() in Python 3 code, you have to use list(range(num). The bigger the num is, the bigger difference will be observed with your original code.
Indexing should be independent on what is stored inside the list as the list stores only references to the target objects. The references are untyped and all of the same kind. The list type is therefore a homogeneous data structure -- technically. Indexing means to turn the index value into the start address + offset. Calculating the offset is very efficient with at most one subtraction. This is very cheap extra operation when compared with the other operations.