I am new to Python. I am adapting someone else's code from Python 2.X to 3.5. The code loads a file via cPickle. I changed all "cPickle" occurrences to "pickle" as I understand pickle superceded cPickle in 3.5. I get this execution error:
NameError: name 'cPickle' is not defined
Pertinent code:
import pickle
import gzip
...
def load_data():
f = gzip.open('../data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f, fix_imports=True)
f.close()
return (training_data, validation_data, test_data)
The error occurs in the pickle.load line when load_data() is called by another function. However, a) neither cPickle or cpickle no longer appear in any source files anywhere in the project (searched globally) and b) the error does not occur if I run the lines within load_data() individually in the Python shell (however, I do get another data format error). Is pickle calling cPickle, and if so how do I stop it?
Shell:
Python 3.5.0 |Anaconda 2.4.0 (x86_64)| (default, Oct 20 2015, 14:39:26)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
IDE: IntelliJ 15.0.1, Python 3.5.0, anaconda
Unclear how to proceed. Any help appreciated. Thanks.
Actually, if you have pickled objects from python2.x, then generally can be read by python3.x. Also, if you have pickled objects from python3.x, they generally can be read by python2.x, but only if they were dumped with a protocol set to 2 or less.
Python 2.7.10 (default, Sep 2 2015, 17:36:25)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> x = [1,2,3,4,5]
>>> import math
>>> y = math.sin
>>>
>>> import pickle
>>> f = open('foo.pik', 'w')
>>> pickle.dump(x, f)
>>> pickle.dump(y, f)
>>> f.close()
>>>
dude#hilbert>$ python3.5
Python 3.5.0 (default, Sep 15 2015, 23:57:10)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('foo.pik', 'rb') as f:
... x = pickle.load(f)
... y = pickle.load(f)
...
>>> x
[1, 2, 3, 4, 5]
>>> y
<built-in function sin>
Also, if you are looking for cPickle, it's now _pickle, not pickle.
>>> import _pickle
>>> _pickle
<module '_pickle' from '/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload/_pickle.cpython-35m-darwin.so'>
>>>
You also asked how to stop pickle from using the built-in (C++) version. You can do this by using _dump and _load, or the _Pickler class if you like to work with the class objects. Confused? The old cPickle is now _pickle, however dump, load, dumps, and loads all point to _pickle… while _dump, _load, _dumps, and _loads point to the pure python version. For instance:
>>> import pickle
>>> # _dumps is a python function
>>> pickle._dumps
<function _dumps at 0x109c836a8>
>>> # dumps is a built-in (C++)
>>> pickle.dumps
<built-in function dumps>
>>> # the Pickler points to _pickle (C++)
>>> pickle.Pickler
<class '_pickle.Pickler'>
>>> # the _Pickler points to pickle (pure python)
>>> pickle._Pickler
<class 'pickle._Pickler'>
>>>
So if you don't want to use the built-in version, then you can use pickle._loads and the like.
It's looking like the pickled data that you're trying to load was generated by a version of the program that was running on Python 2.7. The data is what contains the references to cPickle.
The problem is that Pickle, as a serialization format, assumes that your standard library (and to a lesser extent your code) won't change layout between serialization and deserialization. Which it did -- a lot -- between Python 2 and 3. And when that happens, Pickle has no path for migration.
Do you have access to the program that generated mnist.pkl.gz? If so, port it to Python 3 and re-run it to regenerate a Python 3-compatible version of the file.
If not, you'll have to write a Python 2 program that loads that file and exports it to a format that can be loaded from Python 3 (depending on the shape of your data, JSON and CSV are popular choices), then write a Python 3 program that loads that format then dumps it as Python 3 pickle. You can then load that Pickle file from your original program.
Of course, what you should really do is stop at the point where you have ability to load the exported format from Python 3 -- and use the aforementioned format as your actual, long-term storage format.
Using Pickle for anything other than short-term serialization between trusted programs (loading Pickle is equivalent to running arbitrary code in your Python VM) is something you should actively avoid, among other things because of the exact case you find yourself in.
In Anaconda Python3.5 :
one can access cPickle as
import _pickle as cPickle
credits to Mike McKerns
This bypasses the technical issues, but there might be a py3 version of that file named mnist_py3k.pkl.gz If so, try opening that file instead.
There is a code in github that does it: https://gist.github.com/rebeccabilbro/2c7bb4d1acfbcdcf9156e7b9b7577cba
I have tried it and it worked. You just need to specify the encoding, in this case it is 'latin1':
pickle.load(open('mnist.pkl','rb'), encoding = 'latin1')
Related
I am creating a interpolation object with the "tri" module of matplotlib, and would like to pickle it, for in the actual application it takes a long time to generate. Unfortunately, when I call the unpickled object, Python 2.7 crashes with a segfault.
I would like to know three things: 1) How to pickle this LinearTriInterpolator object successfully? 2) Is the segfault due to my ignorance, a problem in matplotlib, or in Pickle? 3) what is causing this?
I have have created a simple test code; when called the first interpolation returns 1.0, and the second, using with a the unpickled object, causes a segfault. cPickle shows the same behavior.
from pylab import *
from numpy import *
import cPickle as pickle
import matplotlib.tri as tri
#make points
x=array([0.0,1.0,1.0,0.0])
y=array([0.0,0.0,1.0,1.0])
z=x+y
#make triangulation
triPnts=tri.Triangulation(x,y)
theInterper=tri.LinearTriInterpolator(triPnts,z)
#test interpolator
print 'Iterped value is ',theInterper([0.5],[0.5])
#now pickle and unpickle interper
pickle.dump(theInterper,open('testPickle.pckl','wb'),-1)
#load pickle
unpickled_Interper=pickle.load(open('testPickle.pckl','rb'))
#and test
print 'Iterped value is ',unpickled_Interper([0.5],[0.5])
My python is:
Enthought Canopy Python 2.7.6 | 64-bit | (default, Sep 15 2014,
17:43:19) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
I'm using python requests library. My application performs simple get request from a site and prints results.
The site requires authorization with ntlm. Fortunately I can rely on HttpNtlmAuth, which works fine.
session = requests.Session()
session.auth = HttpNtlmAuth(domain + "\\" + username,
password,
session)
But if application is executed several times - each time I need to ask for username and password. It is very uncomfortable. Storing credentials is undesirable.
Could I store session object itself and reuse it several times? From server point of view - it should be fine.
Is there a way to pickle and unpickle session?
If you use the dill package, you should be able to pickle the session where pickle itself fails.
>>> import dill as pickle
>>> pickled = pickle.dumps(session)
>>> restored = pickle.loads(pickled)
Get dill here: https://github.com/uqfoundation/dill
Actually, dill also makes it easy to store your python session across restarts, so you
could pickle your entire python session like this:
>>> pickle.dump_session('session.pkl')
Then restart python, and pick up where you left off.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill as pickle
>>> pickle.load_session('session.pkl')
>>> restored
<requests.sessions.Session object at 0x10c012690>
My understanding is that in Pyside QString has been dropped. One can write a Python string into a QLineEdit, and when the QLineEdit is read, it is returned as a unicode string (16-bits per character).
Trying to write this string from my Gui process to a sub-process started using QProcess does not seem to work and just returns 0L (see below). If one changes the unicode string back to a Python string using the str() function, then self.my_process.write(str(u'test')) now returns 4L. This behaviour does not seem correct to me.
Would it be possible for someone to explain why QProcess.write() does not seem to work on unicode strings?
(Pdb) PySide.QtCore.QString()
*** AttributeError: 'module' object has no attribute 'QString'
(Pdb) self.myprocess.write(u'test')
0L
(Pdb) self.myprocess.write(str(u'test'))
4L
(Pdb)
PySide has never provided classes like QString, QStringList, QVariant, etc. It has always done implicit conversion to and from the equivalent python types - that is, in PyQt terminology, it only implements the v2 API (see PSEP 101 for more details).
However, the behaviour of QProcess when attempting to write unicode strings seems somewhat broken in PySide compared with PyQt4. Here's a simple test in PyQt4:
Python 2.7.8 (default, Sep 24 2014, 18:26:21)
[GCC 4.9.1 20140903 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyQt4 import QtCore
>>> QtCore.PYQT_VERSION_STR
'4.11.2'
>>> p = QtCore.QProcess()
>>> p.start('cat'); p.waitForStarted()
True
>>> p.write(u'fóó'); p.waitForReadyRead()
3L
True
>>> p.readAll()
PyQt4.QtCore.QByteArray('f\xf3\xf3')
So it seems that PyQt will implicitly encode unicode strings as 'latin-1' before passing them to QProcess.write() (which of course expects either const char * or a QByteArray). If you want a different encoding, it must be done explicitly:
>>> p.write(u'fóó'.encode('utf-8')); p.waitForReadyRead()
5L
True
>>> p.readAll()
PyQt4.QtCore.QByteArray('f\xc3\xb3\xc3\xb3')
Now let's see what happens with PySide:
Python 2.7.8 (default, Sep 24 2014, 18:26:21)
[GCC 4.9.1 20140903 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from PySide import QtCore, __version__
>>> __version__
'1.2.2'
>>> p = QtCore.QProcess()
>>> p.start('cat'); p.waitForStarted()
True
>>> p.write(u'fóó'); p.waitForReadyRead()
0L
^C
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyboardInterrupt
So: no implicit encoding, and the process just blocks instead of raising an error (which would seem to be a bug). However, re-trying with explicit encoding works as expected:
>>> p.start('cat'); p.waitForStarted()
True
>>> p.write(u'fóó'.encode('utf-8')); p.waitForReadyRead()
5L
True
>>> p.readAll()
PySide.QtCore.QByteArray('fóó')
Can anyone explain why importing cv and numpy would change the behaviour of python's struct.unpack? Here's what I observe:
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from struct import pack, unpack
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
This is correct
>>> import cv
libdc1394 error: Failed to initialize libdc1394
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
Still ok, after importing cv
>>> import numpy
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
And OK after importing cv and then numpy
Now I restart python:
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from struct import pack, unpack
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
>>> import numpy
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
So far so good, but now I import cv AFTER importing numpy:
>>> import cv
libdc1394 error: Failed to initialize libdc1394
>>> unpack("f",pack("I",31))[0]
0.0
I've repeated this a number of times, including on multiple servers, and it always goes the same way. I've also tried it with struct.unpack and struct.pack, which also makes no difference.
I can't understand how importing numpy and cv could have any impact at all on the output of struct.unpack (pack remains the same, btw).
The "libdc1394" thing is, I believe, a red-herring: ctypes error: libdc1394 error: Failed to initialize libdc1394
Any ideas?
tl;dr: importing numpy and then opencv changes the behaviour of struct.unpack.
UPDATE: Paulo's answer below shows that this is reproducible. Seborg's comment suggests that it's something to do with the way python handles subnormals, which sounds plausible. I looked into Contexts but that didn't seem to be the problem, as the context was the same after the imports as it had been before them.
This isn't an answer, but it's too big for a comment. I played with the values a bit to find the limits.
Without loading numpy and cv:
>>> unpack("f", pack("i", 8388608))
(1.1754943508222875e-38,)
>>> unpack("f", pack("i", 8388607))
(1.1754942106924411e-38,)
After loading numpy and cv, the first line is the same, but the second:
>>> unpack("f", pack("i", 8388607))
(0.0,)
You'll notice that the first result is the lower limit for 32 bit floats. I then tried the same with d.
Without loading the libraries:
>>> unpack("d", pack("xi", 1048576))
(2.2250738585072014e-308,)
>>> unpack("d", pack("xi", 1048575))
(2.2250717365114104e-308,)
And after loading the libraries:
>>> unpack("d",pack("xi", 1048575))
(0.0,)
Now the first result is the lower limit for 64 bit float precision.
It seems that for some reason, loading the numpy and cv libraries, in that order, constrains unpack to use 32 and 64 bit precision and return 0 for lower values.
I'm trying to write a Python program to deal with RSS, however I'm having some issues downloading the files directly from the internet.
I am using urllib.request.urlopen() to get the files. Here is the bit of code that I am having trouble with:
import xml.etree.ElementTree as et
import urllib.request as urlget
self.sourceUrl = sourceUrl #sourceUrl was an argument
self.root = et.fromstring(urlget.urlopen(sourceUrl).read())
I have tracked the problem down to a single line:
urllib.request.urlopen calls urllib.request.opener.open()
which then calls self._open()
which then calls self._call_chain()
which then calls urllib.request.HTTPHandler.http_open()
which then calls urllib.request.AbstractHTTPHandler.do_open()
which then calls http.client.HTTPConnection.getresponse()
which then calls http.client.HTTTResponse.begin()
which then calls self._read_status()
Problem line (found by being the only line to appear upon pausing execution many times):
Python33\Lib\http\client.py Line 317
if len(line) > _MAXLINE:
I can continue the code, but only if I babysit it through Step Over until I get back to my code.
In my tests, this problem never occurred, so I can't think if why I am getting it now.
Thanks in advance for any help!
EDIT: Source can be found here. I lost motivation to work on this project quite some time ago, and haven't touched it since. I might redo the entire thing if I get some more motivation, but I don't expect to any time soon. If you wish to answer, I invite you to have at it, it might be beneficial to others. Be warned, however, that the code is terrible, as at the time I had relatively little experience. I can't really find my way around it, but I've figured out that you have to look at data/code/functions.py
Also note, that, as far as I can remember, it wasn't calling an error, it was just that the program was hanging for minutes at a time before I got impatient.
Without more code, it will be hard to help you. What is the URL of your feed. What does it return when you try to simply access it.
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as et
>>> import urllib.request as urlget
>>> sourceurl = "http://www.la-grange.net/feed"
>>> root = et.fromstring(urlget.urlopen(sourceurl).read())
>>> root
<Element '{http://www.w3.org/2005/Atom}feed' at 0x1013a82b8>
>>>