How to deepcopy shelve objects in Python

How to deepcopy shelve objects in Python - python

Is it possible to deepcopy a shelve object in Python? When I try to deepcopy it, I get the following error:
import shelve,copy
input = shelve.open("test.dict", writeback=True)
input.update({"key1": 1, "key2": 2})
newinput = copy.deepcopy(input)
>> object.__new__(DB) is not safe, use DB.__new__()
Does it mean shelves are not-copyable?
Edit: Maybe it might be better if I elaborate my problem more: I am keeping a large dictionary as a shelve object, and I want to save the whole shelve object (= all key, val pairs I generated so far) to a seperate file while I keep adding new items to the original dict.
Probably I could first sync the shelve and copy the shelve file on disk explicitly, however I don't like that approach.

No, I don't think they are copiable (unless you monkey patch the class or convert into a dict). Here's why :
copy.copy() and copy.deepcopy() call the __copy__() and __deepcopy__() methods for the instances which does not depend on a "standard" type (which are atomic, list, tuple and instance methods ). If the class does not have those attributes, it falls back to __reduce_ex__ and __reduce__ . (see copy.py in your sources)
Unfortunately, the shelve object Shelf is based on UserDict.DictMixin which does not define copy() (and neither does Shelf) :
class DictMixin:
# Mixin defining all dictionary methods for classes that already have
# a minimum dictionary interface including getitem, setitem, delitem,
# and keys. Without knowledge of the subclass constructor, the mixin
# does not define __init__() or copy(). In addition to the four base
# methods, progressively more efficiency comes with defining
# __contains__(), __iter__(), and iteritems().
It may be a good idea to submit an issue to the shelve module bug tracker.

You could obtain a shallow copy by dict(input) and deepcopy that. Then maybe create another shelve on a new file and populate it via the update method.
newinput = shelve.open("newtest.dict")
newinput.update(copy.deepcopy(dict(input)))

Related

How do I define a postfix function in Python?

I know that if you create your own object you can define your own methods on that object.
my_object_instance.mymethod()
I also know you can define infix functions with the infix package.
obj1 |func| obj2
What I want is the ability to define a function which accepts an existing type in postfix notation.
For example given a list l we may want to check if it is sorted. Defining a typical function might give us
if is_sorted(l): #dosomething
but it might be more idiomatic if one could write
if l.is_sorted(): #dosomething
Is this possible without creating a custom type?

The correct way is inheritance, creating a custom type by inheriting list and adding the new functionality. Monkeypatching is not a strength of Python. But since you specifically asked:
Is this possible without creating a custom type?
What kindall mentioned stands, Python does not allow it. But since nothing in the implementation is truly read-only, you can approximate the result by hacking in the class dict.
>>> def is_sorted(my_list):
... return sorted(my_list) == my_list
...
>>> import gc
>>> gc.get_referents(list.__dict__)[0]['is_sorted'] = is_sorted
>>> [1,2,3].is_sorted()
True
>>> [1,3,2].is_sorted()
False
The new "method" will appear in vars(list), the name will be there in dir([]), and it will also be available/usable on instances which were created before the monkeypatch was applied.
This approach uses the garbage collector interface to obtain, via the class mappingproxy, a reference to the underlying dict. And garbage collection by reference counting is a CPython implementation detail. Suffice it to say, this is dangerous/fragile and you should not use it in any serious code.
If you like this kind of feature, you might enjoy ruby as a programming language.

Python does not generally allow monkey-patching of built-in types because the common built-in types aren't written in Python (but rather C) and do not allow the class dictionary to be modified. You have to subclass them to add methods as you want to.

If Python's obj.dict is an attribute, then is it lazy evaluated or does Python busily compose it?

I just read the Python's __dict__ is actually an attribute, not a method. And vars(obj) is the same as obj.__dict__.
If __dict__ is an attribute, then does Python actively maintain it (seems unlikely), or lazily compose it -- and how?

The way I picture it is that an object stores it's attributes in a dictionary, which you can access via
obj.__dict__
That is a dictionary, not a string. It is print(obj.__dict__) or str(obj.__dict__) that creates a string from a dictionary.
The attributes can also be accessed by name, which is translated into a method call, and (effectively) dictionary access.
obj.a => obj.__getattr__('a') <=> obj.__dict__['a']
And as you note vars(obj) is another way of fetching this dictionary.
The interpreter maintains a large number of dictionaries, including one attached to each object (with a few exceptions). But don't confuse maintaining a dictionary with the act of displaying it. The dictionary exists whether you print it or not.
And as discussed in the comments, one object's dictionary can contain pointers to other objects, each of which will have their own dictionary of attributes.

For a class derived from dict, can I deepcopy only the aspect that is derived from dict?

I have an object of a class derived from dict with several additions of functionality. Nevertheless, after a certain point it gets used by parts of the program only as a dict, so the additional functionality plays no role to those parts of the program.
So in order to make sure that no callee would be able to manipulate the actual dict aspect of my class instance, I was thinking of "extracting" its dict aspect using copy.deepcopy.
is that at all possible?
how do I pull it off?
In C++ I'd be able to do a dynamic_cast<> to get a type that I want.

The dict object will accept any iterable that yields (key,value) tuples (or any dict like object, see 1). Presuming that iteration has not been modified on the custom object, you should be able to do something like so;
import copy
dict_version = copy.deepcopy(dict(dict_like_object))

list vs UserList and dict vs UserDict

Coding this day, which of the above is preferred and recommended (both in Python 2 and 3) for subclassing?
I read that UserList and UserDict have been introduced because in the past list and dict couldn't be subclassed, but since this isn't an issue anymore, is it encouraged to use them?

Depending on your usecase, these days you'd either subclass list and dict directly, or you can subclass collections.MutableSequence and collections. MutableMapping; these options are there in addition to using the User* objects.
The User* objects have been moved to the collections module in Python 3; but any code that used those in the Python 2 stdlib has been replaced with the collections.abc abstract base classes. Even in Python 2, UserList and UserDict are augmented collections.* implementations, adding methods list and dict provide beyond the basic interface.
The collections classes make it clearer what must be implemented for your subclass to be a complete implementation, and also let you implement smaller subsets (such as collections.Mapping, implementing a read-only mapping, or collections.Sequence for a tuple-like object).
The User* implementations should be used when you need to implement everything beyond the basic interface too; e.g. if you need to support addition, sorting, reversing and counting just like list does.
For anything else you are almost always better off using the collections abstract base classes as a basis; the built-in types are optimised for speed and are not that subclass-friendly. For example, you'll need to override just about every method on list where normally a new list is returned, to ensure your subclass is returned instead.
Only if you need to build code that insists on using a list or dict object (tested by using isinstance() is subclassing the types an option to consider. This is why collections.OrderedDict is a subclass of dict, for example.

No they are not encouraged anymore. You should not use the UserDict class as it is deprecated. The docs says you can just subclass dict directly. The userdict module is gone in Python 3.0

Best way to store and use a large text-file in python

I'm creating a networked server for a boggle-clone I wrote in python, which accepts users, solves the boards, and scores the player input. The dictionary file I'm using is 1.8MB (the ENABLE2K dictionary), and I need it to be available to several game solver classes. Right now, I have it so that each class iterates through the file line-by-line and generates a hash table(associative array), but the more solver classes I instantiate, the more memory it takes up.
What I would like to do is import the dictionary file once and pass it to each solver instance as they need it. But what is the best way to do this? Should I import the dictionary in the global space, then access it in the solver class as globals()['dictionary']? Or should I import the dictionary then pass it as an argument to the class constructor? Is one of these better than the other? Is there a third option?

If you create a dictionary.py module, containing code which reads the file and builds a dictionary, this code will only be executed the first time it is imported. Further imports will return a reference to the existing module instance. As such, your classes can:
import dictionary
dictionary.words[whatever]
where dictionary.py has:
words = {}
# read file and add to 'words'

Even though it is essentially a singleton at this point, the usual arguments against globals apply. For a pythonic singleton-substitute, look up the "borg" object.
That's really the only difference. Once the dictionary object is created, you are only binding new references as you pass it along unless if you explicitly perform a deep copy. It makes sense that it is centrally constructed once and only once so long as each solver instance does not require a private copy for modification.

Adam, remember that in Python when you say:
a = read_dict_from_file()
b = a
... you are not actually copying a, and thus using more memory, you are merely making b another reference to the same object.
So basically any of the solutions you propose will be far better in terms of memory usage. Basically, read in the dictionary once and then hang on to a reference to that. Whether you do it with a global variable, or pass it to each instance, or something else, you'll be referencing the same object and not duplicating it.
Which one is most Pythonic? That's a whole 'nother can of worms, but here's what I would do personally:
def main(args):
run_initialization_stuff()
dictionary = read_dictionary_from_file()
solvers = [ Solver(class=x, dictionary=dictionary) for x in len(number_of_solvers) ]
HTH.

Depending on what your dict contains, you may be interested in the 'shelve' or 'anydbm' modules. They give you dict-like interfaces (just strings as keys and items for 'anydbm', and strings as keys and any python object as item for 'shelve') but the data is actually in a DBM file (gdbm, ndbm, dbhash, bsddb, depending on what's available on the platform.) You probably still want to share the actual database between classes as you are asking for, but it would avoid the parsing-the-textfile step as well as the keeping-it-all-in-memory bit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.