List indexing efficiency (python 2 vs python 3)

List indexing efficiency (python 2 vs python 3) - python

In answering another question, I suggested to use timeit to test the difference between indexing a list with positive integers vs. negative integers. Here's the code:
import timeit
t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=10000000)
print (t)
t=timeit.timeit('mylist[-1]',setup='mylist=list(range(100))',number=10000000)
print (t)
I ran this code with python 2.6:
$ python2.6 test.py
0.587687015533
0.586369991302
Then I ran it with python 3.2:
$ python3.2 test.py
0.9212150573730469
1.0225799083709717
Then I scratched my head, did a little google searching and decided to post these observations here.
Operating system: OS-X (10.5.8) -- Intel Core2Duo
That seems like a pretty significant difference to me (a factor of over 1.5 difference). Does anyone have an idea why python3 is so much slower -- especially for such a common operation?
EDIT
I've run the same code on my Ubuntu Linux desktop (Intel i7) and achieved comparable results with python2.6 and python 3.2. It seems that this is an issue which is operating system (or processor) dependent (Other users are seeing the same behavior on Linux machines -- See comments).
EDIT 2
The startup banner was requested in one of the answers, so here goes:
Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
and:
Python 3.2 (r32:88452, Feb 20 2011, 10:19:59)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
UPDATE
I've just installed fresh versions of python2.7.3 and python3.2.3 from http://www.python.org/download/
In both cases, I took the
"Python x.x.3 Mac OS X 32-bit i386/PPC Installer (for Mac OS X 10.3 through 10.6 [2])"
since I am on OS X 10.5. Here are the new timings (which are reasonably consistent through multiple trials):
python 2.7
$python2.7 test.py
0.577006101608
0.590042829514
python 3.2.3
$python3.2 test.py
0.8882801532745361
1.034242868423462

This appears to be an artifact of some builds of Python 3.2. The best hypothesis at this point is that all 32-bit Intel builds have the slowdown, but no 64-bit ones do. Read on for further details.
You didn't run nearly enough tests to determine anything. Repeating your test a bunch of times, I got values ranging from 0.31 to 0.54 for the same test, which is a huge variation.
So, I ran your test with 10x the number, and repeat=10, using a bunch of different Python2 and Python3 installs. Throwing away the top and bottom results, averaging the other 8, and dividing by 10 (to get a number equivalent to your tests), here's what I saw:
1. 0.52/0.53 Lion 2.6
2. 0.49/0.50 Lion 2.7
3. 0.48/0.48 MacPorts 2.7
4. 0.39/0.49 MacPorts 3.2
5. 0.39/0.48 HomeBrew 3.2
So, it looks like 3.2 is actually slightly faster with [99], and about the same speed with [-1].
However, on a 10.5 machine, I got these results:
1. 0.98/1.02 MacPorts 2.6
2. 1.47/1.59 MacPorts 3.2
Back on the original (Lion) machine, I ran in 32-bit mode, and got this:
1. 0.50/0.48 Homebrew 2.7
2. 0.75/0.82 Homebrew 3.2
So, it seems like 32-bitness is what matters, and not Leopard vs. Lion, gcc 4.0 vs. gcc 4.2 or clang, hardware differences, etc. It would help to test 64-bit builds under Leopard, with different compilers, etc., but unfortunately my Leopard box is a first-gen Intel Mini (with a 32-bit Core Solo CPU), so I can't do that test.
As further circumstantial evidence, I ran a whole slew of other quick tests on the Lion box, and it looks like 32-bit 3.2 is ~50% slower than 2.x, while 64-bit 3.2 is maybe a little faster than 2.x. But if we really want to back that up, someone needs to pick and run a real benchmark suite.
Anyway, my best guess at this point is that when optimizing the 3.x branch, nobody put much effort into 32-bit i386 Mac builds. Which is actually a reasonable choice for them to have made.
Or, alternatively, they didn't even put much effort into 32-bit i386 period. That possibility might explain why the OP saw 2.x and 3.2 giving similar results on a linux box, while Otto Allmendinger saw 3.2 being similarly slower to 2.6 on a linux box. But since neither of them mentioned whether they were running 32-bit or 64-bit linux, it's hard to know whether that's relevant.
There are still lots of other different possibilities that we haven't ruled out, but this seems like the best one.

here is a code that illustrates at least part of the answer:
$ python
Python 2.7.3 (default, Apr 20 2012, 22:44:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
2.55517697334
>>> t=timeit.timeit('mylist[99L]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.89904499054
$ python3
Python 3.2.3 (default, May 3 2012, 15:54:42)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.9906489849090576
python3 does not have old int type.

Python 3 range() is the Python 2 xrange(). If you want to simulate the Python 2 range() in Python 3 code, you have to use list(range(num). The bigger the num is, the bigger difference will be observed with your original code.
Indexing should be independent on what is stored inside the list as the list stores only references to the target objects. The references are untyped and all of the same kind. The list type is therefore a homogeneous data structure -- technically. Indexing means to turn the index value into the start address + offset. Calculating the offset is very efficient with at most one subtraction. This is very cheap extra operation when compared with the other operations.

Related

What is the replacement for python IN package?

I am trying to use a code which was written for python 2 and may run with python 3.6.0, but it does not run with python 3.6.4. It imports the IN module, and uses IN.IP_RECVERR. I tried to google it, but it is a 'bit' hard to find anything about a module called IN (naming fail?). To demonstrate in REPL, that it works in python 2, but not in 3.6.4:
$ python2
Python 2.7.14 (default, Jan 5 2018, 10:41:29)
[GCC 7.2.1 20171224] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import IN
>>> IN.IP_RECVERR
11
>>>
$ python3
Python 3.6.4 (default, Jan 5 2018, 02:35:40)
[GCC 7.2.1 20171224] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import IN
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'IN'
>>>
What is the replacement for this IN module in newer versions of python 3?

This is presumably the private plat-linux/IN.py module, which was never intended to be used. There have been plans to remove these plat-* files for a few zillion years, but it looks like it finally happened in issue 28027 for 3.6. As mentioned in What's New in Python 3.6:
The undocumented IN, CDROM, DLFCN, TYPES, CDIO, and STROPTS modules have been removed. They had been available in the platform specific Lib/plat-*/ directories, but were chronically out of date, inconsistently available across platforms, and unmaintained. The script that created these modules is still available in the source distribution at Tools/scripts/h2py.py.
Most of the useful constants that are at least somewhat portable (as in you can expect them to be available and work the same on your old laptop's linux and your brand-new Galaxy's linux, if not on OS X or Solaris) have long been made available through other places in the stdlib.
I think this specific one you're looking for is an example of not completely useless, but not portable enough to put anywhere safe, because linux documents the existence of IP_RECVERR, but not its value. So, you really need the version from your own system's ip headers.
The way to do this safely, if you actually need the IN module, is to run Tools/scripts/h2py.py with the Python version you're using, on the specific platform you need. That will generate an IN.py from the appropriate headers on your system (or on your cross-compilation target), which you can then use on that system. If you want to distribute your code, you'd probably need to put a step to do that into the setup.py, so it'll be run at install time (and at wheel-building time for people who install pre-built wheels, but you may need to be careful to make sure the targets are specific enough).
If you don't need to be particularly portable, you just need to access the one value in a few scripts that you're only deploying on your laptop or your company's set of identical containers or the like, you may be better off hardcoding the values (with a nice scare comment explaining the details).

Python Version (sys.version) Meaning?

I've hit what should be a basic question... but my googles are failing me and I need a sanity check.
If I run the following in my Python shell:
>>> import sys
>>> sys.version
from two different Python environments, I get:
'2.7.8 (default, Nov 10 2014, 08:19:18) \n[GCC 4.9.2 20141101 (Red Hat 4.9.2-1)]'
and...
'2.7.8 (default, Apr 15 2015, 09:26:43) \n[GCC 4.9.2 20150212 (Red Hat 4.9.2-6)]'
Does that mean the two environments are actually running slightly different Python guts or is it enough that the '2.7.8' bit in that version string is the same so I can be confident these are 1:1 identical Python interpreters?
If I am guaranteed they are the same, then what's the significance of the date and other parts of that version output string?

All you need to compare is the first bit, the 2.7.8 string.
The differences you see are due to the compiler used to build the binary, and when the binary was built. That shouldn't really make a difference here.
The string is comprised of information you can find in machine-readable form elsewhere; specifically:
platform.python_version()
Returns the Python version as string 'major.minor.patchlevel'.
platform.python_build()
Returns a tuple (buildno, builddate) stating the Python build number and date as strings.
platform.python_compiler()
Returns a string identifying the compiler used for compiling Python.
For your sample strings, what differs is the date the binary was build (second value of the platform.python_build() tuple) and the exact revision of the GCC compiler used (from the platform.python_compiler() string). Only when there are specific problems with the compiler would this matter.
You should normally only care about the Python version information, which is more readily available as the sys.version_info tuple.

How can I use future division in the IDLE startup file

In Python 2.7, how can I make the IDLE app use \__future__ division without typing from \__future__ import division manually every time I start IDLE?
If I put from \__future__ import division at the top of my .idlestartup file it is ignored, even though the other things in .idlestartup get executed. For example:
~> cat >.idlestartup
from __future__ import division
print("Executing .idlestartup")
~> idle -s
Here's what my IDLE window looks like after I try dividing:
Python 2.7.8 |Anaconda 2.1.0 (x86_64)| (default, Aug 21 2014, 15:21:46)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "copyright", "credits" or "license()" for more information.
>>>
Executing .idlestartup
>>> 2/3
0
>>>
I am using Mac OS X 10.9.5 Mavericks (also had the same problem on earlier versions of OS X). Note that the command line version above was included to make it easier to show what I'm talking about, but the version I'm more interested in is running the IDLE app from the GUI.
The solution suggested by Ashwini Chaudhary below worked for running the Anaconda version from the command line but not for running the IDLE app.
I was finally able to get future division working automatically in the IDLE app by adding "sys.argv.insert(1, '-Qnew')" to /Applications/IDLE.app/Contents/MacOS/IDLE. Both that and Ashwini Chaudhary's solution below seem brittle. I wonder if there is a cleaner way.

Adding the __future__ statement at the top of /usr/lib/python2.7/idlelib/PyShell.py did the job for me.
I am on Ubuntu, the path may vary for other OS:
>>> import idlelib
>>> idlelib.PyShell.__file__
'/usr/lib/python2.7/idlelib/PyShell.py'

Why is python so much slower on windows?

I learned about pystones today and so I decided to see what my various environments were like. I ran pystones on my laptop that is running windows on the bare metal and got these results
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from test import pystone
>>> for i in range(0,10):
... pystone.pystones()
...
(1.636334799754252, 30556.094026423627)
(2.1157907919853756, 23631.82607155689)
(2.5324817108003685, 19743.479207278437)
(2.541626695533182, 19672.4405231788)
(2.536022267835051, 19715.915208695682)
(2.540327088340973, 19682.50475676099)
(2.544761766911506, 19648.20465716261)
(2.540296805235016, 19682.739393664764)
(2.533851636391205, 19732.804905346253)
(2.536483186973612, 19712.3325148696)
Then I ran it on some of our linux VMs and got 2.7-3.4 times better performance. So I fired up my vmware Linux VM on my laptop and reran the same test and got these results:
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in range(0,10):
... pystone.pystones()
...
(1.75, 28571.428571428572)
(1.17, 42735.042735042734)
(1.6600000000000001, 30120.48192771084)
(1.8399999999999999, 27173.913043478264)
(1.8200000000000003, 27472.52747252747)
(1.8099999999999987, 27624.30939226521)
(1.3099999999999987, 38167.938931297744)
(1.7800000000000011, 28089.88764044942)
(1.8200000000000038, 27472.527472527414)
(1.490000000000002, 33557.04697986573)
I can't quite understand how the linux VM running inside the same windows is actually FASTER than python running on the same bare metal under windows.
What is so different about python on windows that it performs slower on the bare OS than it does inside a VM running Linux on the same box?
More details
Windows platform Win7x64
32 bit python running on both platforms
32 bit linux VM running the windows platform in VMWare

Had similar problem on windows 10 - it was because of windows defender.
I had to exclude python directories and process in windows defender settings and restart computer.
Before: I had to wait like ~20 seconds to run any python code - now it's milliseconds.

I can't answer your question, however consider this list of things that could be making a difference:
You're using different versions of Python. "2.7.2+" indicates that your linux Python was built from a version control checkout rather than a release.
They were compiled with different compilers (and conceivably meaningfully different optimization levels).
You haven't mentioned reproducing this much. It's conceivable it was a fluke if you haven't.
Your VM might be timing inaccurately.
You're linking different implementations of Python's dependencies, notably libc as Ignacio Vazquez-Abrams points out.
I don't know what pystone's actual benchmarks are like, but many things work differently--things like unicode handling or disk IO could be system-dependent factors.

Do you run antivirus software on that Windows box? This perhaps could explain it. I personally like to add Python, Cygwin and my sources directory to antivirus exclusion list - I think I get a small, but noticeable speedup. Maybe that explains your results.

Benchmark your startup, but there are just simply some slow modules to initialize on windows. A tiny hack that saves me a second on startup every time:
import os
import mimetypes #mimetypes gets imported later in dep chain
if __name__ == "__main__":
# stub this out, so registry db wont ever be read, not needed
mimetypes._winreg = None
Another source of slowness is, multiple standard library modules compile and cache their regexes at import time. re.compile just looks like its slow on windows

Finding the architectures that Python was built for, but from within Python itself

Essentially I am looking for a way to find the following, but from within Python without having to run system commands:
$ file `which python2.7`
/Library/.../2.7/bin/python2.7: Mach-O universal binary with 2 architectures
/Library/.../2.7/bin/python2.7 (for architecture i386): Mach-O executable i386
/Library/.../2.7/bin/python2.7 (for architecture x86_64): Mach-O 64-bit executable x86_64
Something like:
>>> get_mac_python_archs()
['i386', 'x86_64']
>>>
Possible?

As far as I know, there is no truly reliable way other than to examine the executable files themselves to see which architectures have been lipo-ed together, in other words, what file does. While the distutils.util.get_platform() noted elsewhere probably comes the closest, it is based on configuration information at Python build time and the criteria used has changed between releases and even among distributions of the same release.
For example, if you built a Python 2.6 on OS X 10.6 with the 4-way universal option (ppc, ppc64, i386, x86_64), get_platform() should report macosx-10.6-universal. However, the Apple-suppled Python 2.6 in OS X 10.6 reports the same string even though it is only a 3-way build (no ppc64). EDIT: That may not be the best example since, come to think of it, you probably couldn't build a ppc64 variant with the 10.6 SDK. However, the point still holds that the platform string is too context dependent to be totally reliable. It may be reliable enough for some needs, though. Otherwise, calling out to file or otool etc is likely the best way to go.

The function platform.architecture returns just the platform working:
Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.machine()
'i386'
the nearest I can get to is by using distutils.util.get_platform:
>>> import distutils.util
>>> distutils.util.get_platform()
'macosx-10.3-fat'
which is the full answer if you are using Python 2.7/3.2, as you can see in the documentation. "Starting from Python 2.7 and Python 3.2 the architecture fat3 is used for a 3-way universal build (ppc, i386, x86_64) and intel is used for a univeral build with the i386 and x86_64 architectures"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.