Store python requests session in persistent storage - python

I'm using python requests library. My application performs simple get request from a site and prints results.
The site requires authorization with ntlm. Fortunately I can rely on HttpNtlmAuth, which works fine.
session = requests.Session()
session.auth = HttpNtlmAuth(domain + "\\" + username,
password,
session)
But if application is executed several times - each time I need to ask for username and password. It is very uncomfortable. Storing credentials is undesirable.
Could I store session object itself and reuse it several times? From server point of view - it should be fine.
Is there a way to pickle and unpickle session?

If you use the dill package, you should be able to pickle the session where pickle itself fails.
>>> import dill as pickle
>>> pickled = pickle.dumps(session)
>>> restored = pickle.loads(pickled)
Get dill here: https://github.com/uqfoundation/dill
Actually, dill also makes it easy to store your python session across restarts, so you
could pickle your entire python session like this:
>>> pickle.dump_session('session.pkl')
Then restart python, and pick up where you left off.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill as pickle
>>> pickle.load_session('session.pkl')
>>> restored
<requests.sessions.Session object at 0x10c012690>

Related

Python pickle calls cPickle?

I am new to Python. I am adapting someone else's code from Python 2.X to 3.5. The code loads a file via cPickle. I changed all "cPickle" occurrences to "pickle" as I understand pickle superceded cPickle in 3.5. I get this execution error:
NameError: name 'cPickle' is not defined
Pertinent code:
import pickle
import gzip
...
def load_data():
f = gzip.open('../data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f, fix_imports=True)
f.close()
return (training_data, validation_data, test_data)
The error occurs in the pickle.load line when load_data() is called by another function. However, a) neither cPickle or cpickle no longer appear in any source files anywhere in the project (searched globally) and b) the error does not occur if I run the lines within load_data() individually in the Python shell (however, I do get another data format error). Is pickle calling cPickle, and if so how do I stop it?
Shell:
Python 3.5.0 |Anaconda 2.4.0 (x86_64)| (default, Oct 20 2015, 14:39:26)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
IDE: IntelliJ 15.0.1, Python 3.5.0, anaconda
Unclear how to proceed. Any help appreciated. Thanks.
Actually, if you have pickled objects from python2.x, then generally can be read by python3.x. Also, if you have pickled objects from python3.x, they generally can be read by python2.x, but only if they were dumped with a protocol set to 2 or less.
Python 2.7.10 (default, Sep 2 2015, 17:36:25)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> x = [1,2,3,4,5]
>>> import math
>>> y = math.sin
>>>
>>> import pickle
>>> f = open('foo.pik', 'w')
>>> pickle.dump(x, f)
>>> pickle.dump(y, f)
>>> f.close()
>>>
dude#hilbert>$ python3.5
Python 3.5.0 (default, Sep 15 2015, 23:57:10)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('foo.pik', 'rb') as f:
... x = pickle.load(f)
... y = pickle.load(f)
...
>>> x
[1, 2, 3, 4, 5]
>>> y
<built-in function sin>
Also, if you are looking for cPickle, it's now _pickle, not pickle.
>>> import _pickle
>>> _pickle
<module '_pickle' from '/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload/_pickle.cpython-35m-darwin.so'>
>>>
You also asked how to stop pickle from using the built-in (C++) version. You can do this by using _dump and _load, or the _Pickler class if you like to work with the class objects. Confused? The old cPickle is now _pickle, however dump, load, dumps, and loads all point to _pickleā€¦ while _dump, _load, _dumps, and _loads point to the pure python version. For instance:
>>> import pickle
>>> # _dumps is a python function
>>> pickle._dumps
<function _dumps at 0x109c836a8>
>>> # dumps is a built-in (C++)
>>> pickle.dumps
<built-in function dumps>
>>> # the Pickler points to _pickle (C++)
>>> pickle.Pickler
<class '_pickle.Pickler'>
>>> # the _Pickler points to pickle (pure python)
>>> pickle._Pickler
<class 'pickle._Pickler'>
>>>
So if you don't want to use the built-in version, then you can use pickle._loads and the like.
It's looking like the pickled data that you're trying to load was generated by a version of the program that was running on Python 2.7. The data is what contains the references to cPickle.
The problem is that Pickle, as a serialization format, assumes that your standard library (and to a lesser extent your code) won't change layout between serialization and deserialization. Which it did -- a lot -- between Python 2 and 3. And when that happens, Pickle has no path for migration.
Do you have access to the program that generated mnist.pkl.gz? If so, port it to Python 3 and re-run it to regenerate a Python 3-compatible version of the file.
If not, you'll have to write a Python 2 program that loads that file and exports it to a format that can be loaded from Python 3 (depending on the shape of your data, JSON and CSV are popular choices), then write a Python 3 program that loads that format then dumps it as Python 3 pickle. You can then load that Pickle file from your original program.
Of course, what you should really do is stop at the point where you have ability to load the exported format from Python 3 -- and use the aforementioned format as your actual, long-term storage format.
Using Pickle for anything other than short-term serialization between trusted programs (loading Pickle is equivalent to running arbitrary code in your Python VM) is something you should actively avoid, among other things because of the exact case you find yourself in.
In Anaconda Python3.5 :
one can access cPickle as
import _pickle as cPickle
credits to Mike McKerns
This bypasses the technical issues, but there might be a py3 version of that file named mnist_py3k.pkl.gz If so, try opening that file instead.
There is a code in github that does it: https://gist.github.com/rebeccabilbro/2c7bb4d1acfbcdcf9156e7b9b7577cba
I have tried it and it worked. You just need to specify the encoding, in this case it is 'latin1':
pickle.load(open('mnist.pkl','rb'), encoding = 'latin1')

OpenCV and Numpy interacting badly

Can anyone explain why importing cv and numpy would change the behaviour of python's struct.unpack? Here's what I observe:
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from struct import pack, unpack
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
This is correct
>>> import cv
libdc1394 error: Failed to initialize libdc1394
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
Still ok, after importing cv
>>> import numpy
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
And OK after importing cv and then numpy
Now I restart python:
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from struct import pack, unpack
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
>>> import numpy
>>> unpack("f",pack("I",31))[0]
4.344025239406933e-44
So far so good, but now I import cv AFTER importing numpy:
>>> import cv
libdc1394 error: Failed to initialize libdc1394
>>> unpack("f",pack("I",31))[0]
0.0
I've repeated this a number of times, including on multiple servers, and it always goes the same way. I've also tried it with struct.unpack and struct.pack, which also makes no difference.
I can't understand how importing numpy and cv could have any impact at all on the output of struct.unpack (pack remains the same, btw).
The "libdc1394" thing is, I believe, a red-herring: ctypes error: libdc1394 error: Failed to initialize libdc1394
Any ideas?
tl;dr: importing numpy and then opencv changes the behaviour of struct.unpack.
UPDATE: Paulo's answer below shows that this is reproducible. Seborg's comment suggests that it's something to do with the way python handles subnormals, which sounds plausible. I looked into Contexts but that didn't seem to be the problem, as the context was the same after the imports as it had been before them.
This isn't an answer, but it's too big for a comment. I played with the values a bit to find the limits.
Without loading numpy and cv:
>>> unpack("f", pack("i", 8388608))
(1.1754943508222875e-38,)
>>> unpack("f", pack("i", 8388607))
(1.1754942106924411e-38,)
After loading numpy and cv, the first line is the same, but the second:
>>> unpack("f", pack("i", 8388607))
(0.0,)
You'll notice that the first result is the lower limit for 32 bit floats. I then tried the same with d.
Without loading the libraries:
>>> unpack("d", pack("xi", 1048576))
(2.2250738585072014e-308,)
>>> unpack("d", pack("xi", 1048575))
(2.2250717365114104e-308,)
And after loading the libraries:
>>> unpack("d",pack("xi", 1048575))
(0.0,)
Now the first result is the lower limit for 64 bit float precision.
It seems that for some reason, loading the numpy and cv libraries, in that order, constrains unpack to use 32 and 64 bit precision and return 0 for lower values.

PYTHON 3.3.1 - Using urllib to directly open a file, code gets stuck at a specific line

I'm trying to write a Python program to deal with RSS, however I'm having some issues downloading the files directly from the internet.
I am using urllib.request.urlopen() to get the files. Here is the bit of code that I am having trouble with:
import xml.etree.ElementTree as et
import urllib.request as urlget
self.sourceUrl = sourceUrl #sourceUrl was an argument
self.root = et.fromstring(urlget.urlopen(sourceUrl).read())
I have tracked the problem down to a single line:
urllib.request.urlopen calls urllib.request.opener.open()
which then calls self._open()
which then calls self._call_chain()
which then calls urllib.request.HTTPHandler.http_open()
which then calls urllib.request.AbstractHTTPHandler.do_open()
which then calls http.client.HTTPConnection.getresponse()
which then calls http.client.HTTTResponse.begin()
which then calls self._read_status()
Problem line (found by being the only line to appear upon pausing execution many times):
Python33\Lib\http\client.py Line 317
if len(line) > _MAXLINE:
I can continue the code, but only if I babysit it through Step Over until I get back to my code.
In my tests, this problem never occurred, so I can't think if why I am getting it now.
Thanks in advance for any help!
EDIT: Source can be found here. I lost motivation to work on this project quite some time ago, and haven't touched it since. I might redo the entire thing if I get some more motivation, but I don't expect to any time soon. If you wish to answer, I invite you to have at it, it might be beneficial to others. Be warned, however, that the code is terrible, as at the time I had relatively little experience. I can't really find my way around it, but I've figured out that you have to look at data/code/functions.py
Also note, that, as far as I can remember, it wasn't calling an error, it was just that the program was hanging for minutes at a time before I got impatient.
Without more code, it will be hard to help you. What is the URL of your feed. What does it return when you try to simply access it.
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as et
>>> import urllib.request as urlget
>>> sourceurl = "http://www.la-grange.net/feed"
>>> root = et.fromstring(urlget.urlopen(sourceurl).read())
>>> root
<Element '{http://www.w3.org/2005/Atom}feed' at 0x1013a82b8>
>>>

How to implement Semaphores in python 2.x when not using default RUN method

The code flow is something as below.
result = []
def Discover(myList=[]):
for item in myList:
t = threading.Thread(target=myFunc, Args=[item])
t.start()
def myFunc(item):
result.append(item+item)
Now this will start multiple threads and in current scenario the threads does some memory intensive Tasks. Thus I want to include semaphores in this so that myList behaves as a queue and number of threads must be in a limited size. What is the better way to do that?
Never use mutable objects as default parameter value in function definition. In your case: def Discover(myList=[])
Use Queue.Queue instead of list to provide myList if it's necessary to update list of "tasks" when threads are running. Or... Use multiprocessing.pool.ThreadPool in order to limit number of running threads at the same time.
Use Queue.Queue instead of list to provide results variable. list implementation is not thread-safe, so you probably will get many problems with it.
You can find some examples in other SO questions, i.e. here.
P.S. ThreadPool available in Python 2.7+
$ python
Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.pool import ThreadPool
>>> ThreadPool
<class 'multiprocessing.pool.ThreadPool'>

Does configuring django's setting.TIME_ZONE affect datetime.datetime.now()?

The documentation says:
http://docs.djangoproject.com/en/dev/ref/settings/#time-zone
Note that this is the time zone to
which Django will convert all
dates/times -- not necessarily the
timezone of the server. For example,
one server may serve multiple
Django-powered sites, each with a
separate time-zone setting.
Normally, Django sets the
os.environ['TZ'] variable to the time
zone you specify in the TIME_ZONE
setting. Thus, all your views and
models will automatically operate in
the correct time zone.
I've read this several times and it's not clear to me what's going on with the TIME_ZONE setting.
Should I be managing UTC offsets if I want models with a date-time stamp to display to the users local-time zone?
For example on save use, datetime.datetime.utcnow() instead of datetime.datetime.now(), and in the view do something like:
display_datetime = model.date_time + datetime.timedelta(USER_UTC_OFFSET)
Much to my surprise, it does appear to.
web81:~/webapps/dominicrodger2/dominicrodger$ python2.5 manage.py shell
Python 2.5.4 (r254:67916, Aug 5 2009, 12:42:40)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import settings
>>> settings.TIME_ZONE
'Europe/London'
>>> from datetime import datetime
>>> datetime.now()
datetime.datetime(2009, 10, 15, 6, 29, 58, 85662)
>>> exit()
web81:~/webapps/dominicrodger2/dominicrodger$ date
Thu Oct 15 00:31:10 CDT 2009
And yes, I did get distracted whilst writing this answer :-)
I use the TIME_ZONE setting so that my automatically added timestamps on object creation (using auto_now_add, which I believe is soon to be deprecated) show creation times in the timezone I set.
If you want to convert those times into the timezones of your website visitors, you'll need to do a bit more work, as per the example you gave. If you want to do lots of timezone conversion to display times in your website visitors' timezones, then I'd strongly advise you to set your TIME_ZONE settings to store times in UTC, because it'll make your life easier in the long run (you can just use UTC-offsets, rather than having to worry about daylight savings).
If you're interested, I believe the timezone is set from the TIME_ZONE setting here.
Edit, per your comment that it doesn't work on Windows, this is because of the following in the Django source:
if hasattr(time, 'tzset'):
# Move the time zone info into os.environ. See ticket #2315 for why
# we don't do this unconditionally (breaks Windows).
os.environ['TZ'] = self.TIME_ZONE
time.tzset()
Windows:
C:\Documents and Settings\drodger>python
ActivePython 2.6.1.1 (ActiveState Software Inc.) based on
Python 2.6.1 (r261:67515, Dec 5 2008, 13:58:38) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> hasattr(time, 'tzset')
False
Linux:
web81:~$ python2.5
Python 2.5.4 (r254:67916, Aug 5 2009, 12:42:40)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> hasattr(time, 'tzset')
True
With TIME_ZONE as UTC, utcnow() and now() are the same. This is probably what you want. Then you can record times as now/utcnow and functions like timesince will work perfectly for every user. To display absolute times to specific users, you can use utc offsets as you suggest.

Categories