Why is python so much slower on windows? - python

I learned about pystones today and so I decided to see what my various environments were like. I ran pystones on my laptop that is running windows on the bare metal and got these results
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from test import pystone
>>> for i in range(0,10):
... pystone.pystones()
...
(1.636334799754252, 30556.094026423627)
(2.1157907919853756, 23631.82607155689)
(2.5324817108003685, 19743.479207278437)
(2.541626695533182, 19672.4405231788)
(2.536022267835051, 19715.915208695682)
(2.540327088340973, 19682.50475676099)
(2.544761766911506, 19648.20465716261)
(2.540296805235016, 19682.739393664764)
(2.533851636391205, 19732.804905346253)
(2.536483186973612, 19712.3325148696)
Then I ran it on some of our linux VMs and got 2.7-3.4 times better performance. So I fired up my vmware Linux VM on my laptop and reran the same test and got these results:
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in range(0,10):
... pystone.pystones()
...
(1.75, 28571.428571428572)
(1.17, 42735.042735042734)
(1.6600000000000001, 30120.48192771084)
(1.8399999999999999, 27173.913043478264)
(1.8200000000000003, 27472.52747252747)
(1.8099999999999987, 27624.30939226521)
(1.3099999999999987, 38167.938931297744)
(1.7800000000000011, 28089.88764044942)
(1.8200000000000038, 27472.527472527414)
(1.490000000000002, 33557.04697986573)
I can't quite understand how the linux VM running inside the same windows is actually FASTER than python running on the same bare metal under windows.
What is so different about python on windows that it performs slower on the bare OS than it does inside a VM running Linux on the same box?
More details
Windows platform Win7x64
32 bit python running on both platforms
32 bit linux VM running the windows platform in VMWare

Had similar problem on windows 10 - it was because of windows defender.
I had to exclude python directories and process in windows defender settings and restart computer.
Before: I had to wait like ~20 seconds to run any python code - now it's milliseconds.

I can't answer your question, however consider this list of things that could be making a difference:
You're using different versions of Python. "2.7.2+" indicates that your linux Python was built from a version control checkout rather than a release.
They were compiled with different compilers (and conceivably meaningfully different optimization levels).
You haven't mentioned reproducing this much. It's conceivable it was a fluke if you haven't.
Your VM might be timing inaccurately.
You're linking different implementations of Python's dependencies, notably libc as Ignacio Vazquez-Abrams points out.
I don't know what pystone's actual benchmarks are like, but many things work differently--things like unicode handling or disk IO could be system-dependent factors.

Do you run antivirus software on that Windows box? This perhaps could explain it. I personally like to add Python, Cygwin and my sources directory to antivirus exclusion list - I think I get a small, but noticeable speedup. Maybe that explains your results.

Benchmark your startup, but there are just simply some slow modules to initialize on windows. A tiny hack that saves me a second on startup every time:
import os
import mimetypes #mimetypes gets imported later in dep chain
if __name__ == "__main__":
# stub this out, so registry db wont ever be read, not needed
mimetypes._winreg = None
Another source of slowness is, multiple standard library modules compile and cache their regexes at import time. re.compile just looks like its slow on windows

Related

What is the replacement for python IN package?

I am trying to use a code which was written for python 2 and may run with python 3.6.0, but it does not run with python 3.6.4. It imports the IN module, and uses IN.IP_RECVERR. I tried to google it, but it is a 'bit' hard to find anything about a module called IN (naming fail?). To demonstrate in REPL, that it works in python 2, but not in 3.6.4:
$ python2
Python 2.7.14 (default, Jan 5 2018, 10:41:29)
[GCC 7.2.1 20171224] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import IN
>>> IN.IP_RECVERR
11
>>>
$ python3
Python 3.6.4 (default, Jan 5 2018, 02:35:40)
[GCC 7.2.1 20171224] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import IN
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'IN'
>>>
What is the replacement for this IN module in newer versions of python 3?
This is presumably the private plat-linux/IN.py module, which was never intended to be used. There have been plans to remove these plat-* files for a few zillion years, but it looks like it finally happened in issue 28027 for 3.6. As mentioned in What's New in Python 3.6:
The undocumented IN, CDROM, DLFCN, TYPES, CDIO, and STROPTS modules have been removed. They had been available in the platform specific Lib/plat-*/ directories, but were chronically out of date, inconsistently available across platforms, and unmaintained. The script that created these modules is still available in the source distribution at Tools/scripts/h2py.py.
Most of the useful constants that are at least somewhat portable (as in you can expect them to be available and work the same on your old laptop's linux and your brand-new Galaxy's linux, if not on OS X or Solaris) have long been made available through other places in the stdlib.
I think this specific one you're looking for is an example of not completely useless, but not portable enough to put anywhere safe, because linux documents the existence of IP_RECVERR, but not its value. So, you really need the version from your own system's ip headers.
The way to do this safely, if you actually need the IN module, is to run Tools/scripts/h2py.py with the Python version you're using, on the specific platform you need. That will generate an IN.py from the appropriate headers on your system (or on your cross-compilation target), which you can then use on that system. If you want to distribute your code, you'd probably need to put a step to do that into the setup.py, so it'll be run at install time (and at wheel-building time for people who install pre-built wheels, but you may need to be careful to make sure the targets are specific enough).
If you don't need to be particularly portable, you just need to access the one value in a few scripts that you're only deploying on your laptop or your company's set of identical containers or the like, you may be better off hardcoding the values (with a nice scare comment explaining the details).

How can I use __future__ division in the IDLE startup file

In Python 2.7, how can I make the IDLE app use \__future__ division without typing from \__future__ import division manually every time I start IDLE?
If I put from \__future__ import division at the top of my .idlestartup file it is ignored, even though the other things in .idlestartup get executed. For example:
~> cat >.idlestartup
from __future__ import division
print("Executing .idlestartup")
~> idle -s
Here's what my IDLE window looks like after I try dividing:
Python 2.7.8 |Anaconda 2.1.0 (x86_64)| (default, Aug 21 2014, 15:21:46)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "copyright", "credits" or "license()" for more information.
>>>
Executing .idlestartup
>>> 2/3
0
>>>
I am using Mac OS X 10.9.5 Mavericks (also had the same problem on earlier versions of OS X). Note that the command line version above was included to make it easier to show what I'm talking about, but the version I'm more interested in is running the IDLE app from the GUI.
The solution suggested by Ashwini Chaudhary below worked for running the Anaconda version from the command line but not for running the IDLE app.
I was finally able to get future division working automatically in the IDLE app by adding "sys.argv.insert(1, '-Qnew')" to /Applications/IDLE.app/Contents/MacOS/IDLE. Both that and Ashwini Chaudhary's solution below seem brittle. I wonder if there is a cleaner way.
Adding the __future__ statement at the top of /usr/lib/python2.7/idlelib/PyShell.py did the job for me.
I am on Ubuntu, the path may vary for other OS:
>>> import idlelib
>>> idlelib.PyShell.__file__
'/usr/lib/python2.7/idlelib/PyShell.py'

Print colorized output - working from console but not from script

I have weird problem that I cannot put my finger on. There is a program that I use (and contribute from time to time) that has colorized console output. Everything worked great until I reinstalled Windows. Now I cannot get colorized output.
This is the script that is used for colorizing.
I have managed to narrow down the problem to, more or less, simple situation, but I have no idea what is wrong.
This is console prompt that works as expected (string test is printed in red):
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.insert(0, r'c:\bin\SV\tea\src')
>>> from tea.console.color import cprint, Color
>>> cprint('test\n', Color.red)
test
>>>
But when I run following script with same version of python I get output test but not in red color (there is no color, just default console color):
import sys
sys.path.insert(0, r'c:\bin\SV\tea\src')
from tea.console.color import cprint, Color
cprint('test\n', Color.red)
The same setup worked before I reinstalled my system.
I have checked, environment variables in interactive mode and script are the same.
I have tried this in standard windows command prompt and Console, program that I
usually use.
OS in question is Windows 8 and before reinstall this was also used on Windows 8.
Same code with same setup works at computer at work (Windows 7).
I have Python 2.7 and Python 3.3 installed (as I did before). I have tried to run script
with calling python interpreter directly (c:\Python27\python.exe) or with py -2,
but it does not help.
IPython and mercurial colorizes output as it should.
Any ideas what can I try to make this work?
Edit:
Maybe it was not clear, but script I use to colorize output is given in a link in question. Here it is once again:
https://bitbucket.org/alefnula/tea/src/dc14009a19d66f92463549332a321b29c71d47b8/src/tea/console/color.py?at=default
I have found the problem and solution.
I believe that the problem was the bug in x64 ctypes module. I had Python 2.7 x64 installed and with that version following line (from script that I linked in question):
ctypes.windll.kernel32.SetConsoleTextAttribute(std_out_handle, code)
returns error code 6 with description The handle is invalid. After some investigation, I deduced that problem might be x64 version of python, so I installed 32-bit version and everything works as expected.
Since this solves my problem, and I do not have the time for deeper analysis I will leave it at this, just wanted to give some kind of resolution for question.

List indexing efficiency (python 2 vs python 3)

In answering another question, I suggested to use timeit to test the difference between indexing a list with positive integers vs. negative integers. Here's the code:
import timeit
t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=10000000)
print (t)
t=timeit.timeit('mylist[-1]',setup='mylist=list(range(100))',number=10000000)
print (t)
I ran this code with python 2.6:
$ python2.6 test.py
0.587687015533
0.586369991302
Then I ran it with python 3.2:
$ python3.2 test.py
0.9212150573730469
1.0225799083709717
Then I scratched my head, did a little google searching and decided to post these observations here.
Operating system: OS-X (10.5.8) -- Intel Core2Duo
That seems like a pretty significant difference to me (a factor of over 1.5 difference). Does anyone have an idea why python3 is so much slower -- especially for such a common operation?
EDIT
I've run the same code on my Ubuntu Linux desktop (Intel i7) and achieved comparable results with python2.6 and python 3.2. It seems that this is an issue which is operating system (or processor) dependent (Other users are seeing the same behavior on Linux machines -- See comments).
EDIT 2
The startup banner was requested in one of the answers, so here goes:
Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
and:
Python 3.2 (r32:88452, Feb 20 2011, 10:19:59)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
UPDATE
I've just installed fresh versions of python2.7.3 and python3.2.3 from http://www.python.org/download/
In both cases, I took the
"Python x.x.3 Mac OS X 32-bit i386/PPC Installer (for Mac OS X 10.3 through 10.6 [2])"
since I am on OS X 10.5. Here are the new timings (which are reasonably consistent through multiple trials):
python 2.7
$python2.7 test.py
0.577006101608
0.590042829514
python 3.2.3
$python3.2 test.py
0.8882801532745361
1.034242868423462
This appears to be an artifact of some builds of Python 3.2. The best hypothesis at this point is that all 32-bit Intel builds have the slowdown, but no 64-bit ones do. Read on for further details.
You didn't run nearly enough tests to determine anything. Repeating your test a bunch of times, I got values ranging from 0.31 to 0.54 for the same test, which is a huge variation.
So, I ran your test with 10x the number, and repeat=10, using a bunch of different Python2 and Python3 installs. Throwing away the top and bottom results, averaging the other 8, and dividing by 10 (to get a number equivalent to your tests), here's what I saw:
1. 0.52/0.53 Lion 2.6
2. 0.49/0.50 Lion 2.7
3. 0.48/0.48 MacPorts 2.7
4. 0.39/0.49 MacPorts 3.2
5. 0.39/0.48 HomeBrew 3.2
So, it looks like 3.2 is actually slightly faster with [99], and about the same speed with [-1].
However, on a 10.5 machine, I got these results:
1. 0.98/1.02 MacPorts 2.6
2. 1.47/1.59 MacPorts 3.2
Back on the original (Lion) machine, I ran in 32-bit mode, and got this:
1. 0.50/0.48 Homebrew 2.7
2. 0.75/0.82 Homebrew 3.2
So, it seems like 32-bitness is what matters, and not Leopard vs. Lion, gcc 4.0 vs. gcc 4.2 or clang, hardware differences, etc. It would help to test 64-bit builds under Leopard, with different compilers, etc., but unfortunately my Leopard box is a first-gen Intel Mini (with a 32-bit Core Solo CPU), so I can't do that test.
As further circumstantial evidence, I ran a whole slew of other quick tests on the Lion box, and it looks like 32-bit 3.2 is ~50% slower than 2.x, while 64-bit 3.2 is maybe a little faster than 2.x. But if we really want to back that up, someone needs to pick and run a real benchmark suite.
Anyway, my best guess at this point is that when optimizing the 3.x branch, nobody put much effort into 32-bit i386 Mac builds. Which is actually a reasonable choice for them to have made.
Or, alternatively, they didn't even put much effort into 32-bit i386 period. That possibility might explain why the OP saw 2.x and 3.2 giving similar results on a linux box, while Otto Allmendinger saw 3.2 being similarly slower to 2.6 on a linux box. But since neither of them mentioned whether they were running 32-bit or 64-bit linux, it's hard to know whether that's relevant.
There are still lots of other different possibilities that we haven't ruled out, but this seems like the best one.
here is a code that illustrates at least part of the answer:
$ python
Python 2.7.3 (default, Apr 20 2012, 22:44:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
2.55517697334
>>> t=timeit.timeit('mylist[99L]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.89904499054
$ python3
Python 3.2.3 (default, May 3 2012, 15:54:42)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.9906489849090576
python3 does not have old int type.
Python 3 range() is the Python 2 xrange(). If you want to simulate the Python 2 range() in Python 3 code, you have to use list(range(num). The bigger the num is, the bigger difference will be observed with your original code.
Indexing should be independent on what is stored inside the list as the list stores only references to the target objects. The references are untyped and all of the same kind. The list type is therefore a homogeneous data structure -- technically. Indexing means to turn the index value into the start address + offset. Calculating the offset is very efficient with at most one subtraction. This is very cheap extra operation when compared with the other operations.

Python Module To Detect Linux Distro Version

Is there an existing python module that can be used to detect which distro of Linux and which version of the distro is currently installed.
For example:
RedHat Enterprise 5
Fedora 11
Suse Enterprise 11
etc....
I can make my own module by parsing various files like /etc/redhat-release but I was wondering if a module already exists?
Cheers,
Ivan
Look up the docs for the platform module: http://docs.python.org/library/platform.html
Example:
>>> platform.uname()
('Linux', 'localhost', '2.6.31.5-desktop-1mnb', '#1 SMP Fri Oct 23 00:05:22 EDT 2009', 'x86_64', 'AMD Athlon(tm) 64 X2 Dual Core Processor 3600+')
>>> platform.linux_distribution()
('Mandriva Linux', '2010.0', 'Official')
I've written a package called distro (now used by pip) which aims to replace distro.linux_distribution. It works on many distributions which might return weird or empty tuples when using platform.
https://github.com/nir0s/distro (distro, on pypi)
It provides a much more elaborate API to retrieve distribution related information.
$ python
Python 2.7.12 (default, Nov 7 2016, 11:55:55)
[GCC 6.2.1 20160830] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import distro
>>> distro.linux_distribution()
(u'Antergos Linux', '', u'ARCHCODE')
By the way, platform.linux_distribution is to be removed in Python 3.7.
The above answer doesn't work on RHEL 5.x. The quickest way is on a redhat-like system is to read and look at the /etc/redhat-release file. This file is updated every time you run an update and the system gets upgraded by a minor release number.
$ python
>>> open('/etc/redhat-release','r').read().split(' ')[6].split('.')
['5', '5']
If you take the split parts out it will just give you string. No module like you asked, but I figured it was short and elegant enough that you may find it useful.
Might not be the best way, but I used subprocess to execute 'uname -v' and then looked for the distro name in the output.
import subprocess
process = subprocess.Popen(['uname','-v'], stdout=subprocess.PIPE)
stdout = process.communicate()[0]
distro = format(stdout).rstrip("\n")
if 'FreeBSD' in distro:
print "It's FreeBSD"
elif 'Ubuntu' in distro:
print "It's Ubuntu"
elif 'Darwin' in distro:
print "It's a Mac"
else:
print "Unknown distro"

Categories