I don't understand when should I use dictionaries instead of classes on Python. If a class can do the same as dictionaries and more, why do we use dictionaries? For example, the next two snippets do the same:
Using class:
class PiggyBank:
def __init__(self, dollars, cents):
self.dollars = dollars
self.cents = cents
piggy1 = PiggyBank(2, 2)
print(piggy1.dollars) # 2
print(piggy1.__dict__) # {'dollars': 2, 'cents': 2}
Using dictionary:
def create_piggy(dollars, cents):
return {'dollars': dollars, 'cents': cents}
piggy2 = create_piggy(2, 2)
print(piggy2['dollars']) # 2
print(piggy2) # {'dollars': 2, 'cents': 2}
So at the end, I am creating two static objects with the same information. When should I use a class or a function for creating an instance?
You can use dictionary if it suffices your usecase and you are not forced to use a Class. But if you need some added benefits of class (like below) you may write is as Class
Some use cases when you might need Class
When you need to update code frequently
When you need encapsulation
When you need to reuse the same interface,code or logic
dicts are often called "associative arrays". They're like regular lists in Python, except that instead of using an integer as an index, you can use strings (or any hashable type).
You can certainly implement list-like and dict-like objects (as well as set-like objects and str-like objects) with Python classes, but why would you if there's already an object type that does exactly what you need?
So if all you're looking for is an associative array, a dict will most likely serve you just fine. But if you jump through hoops to implement a class that already does what a dict does, you'll be "re-inventing the wheel," as well as giving the future readers of your code extra work trying to figure out that all your class is doing is re-implementing a dict with no real extra functionality.
Basically, if all that you need is a dict, just use a dict. Otherwise, you'll be writing a lot of extra code (that may be prone to bugs) for no real gain.
You would typically want to use a dictionary when you're doing something dynamic -- something where you don't know all of the keys or information when you're writing the code. Note also that classes have restrictions like not being able to use numbers as properties.
As a classic example (which has better solutions using collections.Counter, but we'll still keep it for its educational value), if you have a list of values that you want to count you can use a dictionary to do so efficiently:
items_to_count = ['a', 'a', 'a', 'b', 5, 5]
count = {}
for item in items_to_count:
if item in count:
count[item] += 1
else:
count[item] = 1
# count == {'a': 3, 'b': 1, 5: 2}
why do we use dictionaries?
Dictionaries are built-in containers with quite a bit of functionality that can be used at no extra coding cost to us. Since they are built-in a lot of that functionality is probably optimized.
Related
Tl;dr is bold-faced text.
I'm working with an image dataset that comes with boolean "one-hot" image annotations (Celeba to be specific). The annotations encode facial features like bald, male, young. Now I want to make a custom one-hot list (to test my GAN model). I want to provide a literate interface. I.e., rather than specifying features[12]=True knowing that 12 - counting from zero - corresponds to the male feature, I want something like features[male]=True or features.male=True.
Suppose the header of my .txt file is
Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Chubby Male Wearing_Necktie Young
and I want to codify Young, Bald, and Chubby. The expected output is
[ 0. 0. 0. 1. 0. 1. 0. 0. 1.]
since Bald is the fourth entry of the header, Chubby is the sixth, and so on. What is the clearest way to do this without expecting a user to know Bald is the fourth entry, etc.?
I'm looking for a Pythonic way, not necessarily the fastest way.
Ideal Features
In rough order of importance:
A way to accomplish my stated goal that is already standard in the Python community will take precedence.
A user/programmer should not need to count to an attribute in the .txt header. This is the point of what I'm trying to design.
A user should not be expected to have non-standard libraries like aenum.
A user/programmer should not need to reference the .txt header for attribute names/available attributes. One example: if a user wants to specify the gender attribute but does not know whether to use male or female, it should be easy to find out.
A user/programmer should be able to find out the available attributes via documentation (ideally generated by Sphinx api-doc). That is, the point 4 should be possible reading as little code as possible. Attribute exposure with dir() sufficiently satisfies this point.
The programmer should find the indexing tool natural. Specifically, zero-indexing should be preferred over subtracting from one-indexing.
Between two otherwise completely identical solutions, one with better performance would win.
Examples:
I'm going to compare and contrast the ways that immediately came to my mind. All examples use:
import numpy as np
header = ("Arched_Eyebrows Attractive Bags_Under_Eyes "
"Bald Bangs Chubby Male Wearing_Necktie Young")
NUM_CLASSES = len(header.split()) # 9
1: Dict Comprehension
Obviously we could use a dictionary to accomplish this:
binary_label = np.zeros([NUM_CLASSES])
classes = {head: idx for (idx, head) in enumerate(header.split())}
binary_label[[classes["Young"], classes["Bald"], classes["Chubby"]]] = True
print(binary_label)
For what it's worth, this has the fewest lines of code and is the only one that doesn't rely on a standard library over builtins. As for negatives, it isn't exactly self-documenting. To see the available options, you must print(classes.keys()) - it's not exposed with dir(). This borders on not satisfying feature 5 because it requires a user to know classes is a dict to exposure features AFAIK.
2: Enum:
Since I'm learning C++ right now, Enum is the first thing that came to mind:
import enum
binary_label = np.zeros([NUM_CLASSES])
Classes = enum.IntEnum("Classes", header)
features = [Classes.Young, Classes.Bald, Classes.Chubby]
zero_idx_feats = [feat-1 for feat in features]
binary_label[zero_idx_feats] = True
print(binary_label)
This gives dot notation and the image options are exposed with dir(Classes). However, enum uses one-indexing by default (the reason is documented). The work-around makes me feel like enum is not the Pythonic way to do this, and entirely fails to satisfy feature 6.
3: Named Tuple
Here's another one out of the standard Python library:
import collections
binary_label = np.zeros([NUM_CLASSES])
clss = collections.namedtuple(
"Classes", header)._make(range(NUM_CLASSES))
binary_label[[clss.Young, clss.Bald, clss.Chubby]] = True
print(binary_label)
Using namedtuple, we again get dot notation and self-documentation with dir(clss). But, the namedtuple class is heavier than enum. By this I mean, namedtuple has functionality I do not need. This solution appears to be a leader among my examples, but I do not know if it satisfies feature 1 or if an alternative could "win" via feature 7.
4: Custom Enum
I could really break my back:
binary_label = np.zeros([NUM_CLASSES])
class Classes(enum.IntEnum):
Arched_Eyebrows = 0
Attractive = 1
Bags_Under_Eyes = 2
Bald = 3
Bangs = 4
Chubby = 5
Male = 6
Wearing_Necktie = 7
Young = 8
binary_label[
[Classes.Young, Classes.Bald, Classes.Chubby]] = True
print(binary_label)
This has all the advantages of Ex. 2. But, it comes with obvious the obvious drawbacks. I have to write out all the features (there's 40 in the real dataset) just to zero-index! Sure, this is how to make an enum in C++ (AFAIK), but it shouldn't be necessary in Python. This is a slight failure on feature 6.
Summary
There are many ways to accomplish literate zero-indexing in Python. Would you provide a code snippet of how you would accomplish what I'm after and tell me why your way is right?
(edit:) Or explain why one of my examples is the right tool for the job?
Status Update:
I'm not ready to accept an answer yet in case anyone wants to address the following feedback/update, or any new solution appears. Maybe another 24 hours? All the responses have been helpful, so I upvoted everyone's so far. You may want to look over this repo I'm using to test solutions. Feel free to tell me if my following remarks are (in)accurate or unfair:
zero-enum:
Oddly, Sphinx documents this incorrectly (one-indexed in docs), but it does document it! I suppose that "issue" doesn't fail any ideal feature.
dotdict:
I feel that Map is overkill, but dotdict is acceptable. Thanks to both answerers that got this solution working with dir(). However, it doesn't appear that it "works seamlessly" with Sphinx.
Numpy record:
As written, this solution takes significantly longer than the other solutions. It comes in at 10x slower than a namedtuple (fastest behind pure dict) and 7x slower than standard IntEnum (slowest behind numpy record). That's not drastic at current scale, nor a priority, but a quick Google search indicates np.in1d is in fact slow. Let's stick with
_label = np.zeros([NUM_CLASSES])
_label[[header_rec[key].item() for key in ["Young", "Bald", "Chubby"]]] = True
unless I've implemented something wrong in the linked repo. This brings the execution speed into a range that compares with the other solutions. Again, no Sphinx.
namedtuple (and rassar's critiques)
I'm not convinced of your enum critique. It seems to me that you believe I'm approaching the problem wrong. It's fine to call me out on that, but I don't see how using the namedtuple is fundamentally different from "Enum [which] will provide separate values for each constant." Have I misunderstood you?
Regardless, namedtuple appears in Sphinx (correctly numbered, for what it's worth). On the Ideal Features list, this chalks up identically to zero-enum and profiles ahead of zero-enum.
Accepted Rationale
I accepted the zero-enum answer because the answer gave me the best challenger for namedtuple. By my standards, namedtuple is marginally the best solution. But salparadise wrote the answer that helped me feel confident in that assessment. Thanks to all who answered.
How about a factory function to create a zero indexed IntEnum since that is the object that suits your needs, and Enum provides flexibility in construction:
from enum import IntEnum
def zero_indexed_enum(name, items):
# splits on space, so it won't take any iterable. Easy to change depending on need.
return IntEnum(name, ((item, value) for value, item in enumerate(items.split())))
Then:
In [43]: header = ("Arched_Eyebrows Attractive Bags_Under_Eyes "
...: "Bald Bangs Chubby Male Wearing_Necktie Young")
In [44]: Classes = zero_indexed_enum('Classes', header)
In [45]: list(Classes)
Out[45]:
[<Classes.Arched_Eyebrows: 0>,
<Classes.Attractive: 1>,
<Classes.Bags_Under_Eyes: 2>,
<Classes.Bald: 3>,
<Classes.Bangs: 4>,
<Classes.Chubby: 5>,
<Classes.Male: 6>,
<Classes.Wearing_Necktie: 7>,
<Classes.Young: 8>]
You can use a custom class which I like to call as DotMap or as mentioned here is this SO discussion as Map:
https://stackoverflow.com/a/32107024/2598661 (Map, longer complete version)
https://stackoverflow.com/a/23689767/2598661 (dotdict, shorter lighter version)
About Map:
It has the features of a dictionary since the input to a Map/DotMap is a dict. You can access attributes using features['male'].
Additionally you can access the attributes using dot i.e. features.male and the attributes will be exposed when you do dir(features).
It is only as heavy as it needs to be in order to enable the dot functionality.
Unlike namedtuple you don't need to pre-define it and you can add and remove keys willy nilly.
The Map function described in the SO question is not Python3 compatible because it uses iteritems(). Just replace it with items() instead.
About dotdict:
dotdict provides the same advantages of Map with the exception that it does not override the dir() method therefore you will not be able to obtain the attributes for documentation. #SigmaPiEpsilon has provided a fix for this here.
It uses the dict.get method instead of dict.__getitem__ therefore it will return None instead of throwing KeyError when you are access attributes that don't exist.
It does not recursively apply dotdict-iness to nested dicts therefore you won't be able to use features.foo.bar.
Here's the updated version of dotdict which solves the first two issues:
class dotdict(dict):
__getattr__ = dict.__getitem__ # __getitem__ instead of get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
def __dir__(self): # by #SigmaPiEpsilon for documentation
return self.keys()
Update
Map and dotdict don't have the same behavior as pointed out by #SigmaPiEpsilon so I added separate descriptions for both.
Of your examples, 3 is the most pythonic answer to your question.
1, as you said, does not even answer your question, since the names are not explicit.
2 uses enums, which though being in the standard library are not pythonic and generally not used in these scenarios in Python.
(Edit): In this case you only really need two different constants - the target values and the other ones. An Enum will provide separate values for each constant, which is not what the goal of your program is and seems to be a roundabout way of approaching the problem.
4 is just not maintainable if a client wants to add options, and even as it is it's painstaking work.
3 uses well-known classes from the standard library in a readable and succinct way. Also, it does not have any drawbacks, as it is perfectly explicit. Being too "heavy" doesn't matter if you don't care about performance, and anyway the lag will be unnoticeable with your input size.
Your requirements if I understand correctly can be divided into two parts:
Access the position of header elements in the .txt by name in the most pythonic way possible and with minimum external dependencies
Enable dot access to the data structure containing the names of the headers to be able to call dir() and setup easy interface with Sphinx
Pure Python Way (no external dependencies)
The most pythonic way to solve the problem is of course the method using dictionaries (dictionaries are at the heart of python). Searching a dictionary through key is also much faster than other methods. The only problem is this prevents dot access. Another answer mentions the Map and dotdict as alternatives. dotdict is simpler but it only enable dot access, it will not help in the documentation aspect with dir() since dir() calls the __dir__() method which is not overridden in these cases. Hence it will only return the attributes of Python dict and not the header names. See below:
>>> class dotdict(dict):
... __getattr__ = dict.get
... __setattr__ = dict.__setitem__
... __delattr__ = dict.__delitem__
...
>>> somedict = {'a' : 1, 'b': 2, 'c' : 3}
>>> somedotdict = dotdict(somedict)
>>> somedotdict.a
1
>>> 'a' in dir(somedotdict)
False
There are two options to get around this problem.
Option 1: Override the __dir__() method like below. But this will only work when you call dir() on the instances of the class. To make the changes apply for the class itself you have to create a metaclass for the class. See here
#add this to dotdict
def __dir__(self):
return self.keys()
>>> somedotdictdir = dotdictdir(somedict)
>>> somedotdictdir.a
1
>>> dir(somedotdictdir)
['a', 'b', 'c']
Option 2: A second option which makes it much closer to user-defined object with attributes is to update the __dict__ attribute of the created object. This is what Map also uses. A normal python dict does not have this attribute. If you add this then you can call dir() to get attributes/keys and also all the additional methods/attributes of python dict. If you just want the stored attribute and values you can use vars(somedotdictdir) which is also useful for documentation.
class dotdictdir(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
self.__dict__.update({k : v for k,v in self.items()})
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
self.__dict__.update({key : value})
__getattr__ = dict.get #replace with dict.__getitem__ if want raise error on missing key access
__setattr__ = __setitem__
__delattr__ = dict.__delitem__
>>> somedotdictdir = dotdictdir(somedict)
>>> somedotdictdir
{'a': 3, 'c': 6, 'b': 4}
>>> vars(somedotdictdir)
{'a': 3, 'c': 6, 'b': 4}
>>> 'a' in dir(somedotdictdir)
True
Numpy way
Another option will be to use a numpy record array which allows dot access. I noticed in your code you are already using numpy. In this case too __dir__() has to be overrridden to get the attributes. This may result in faster operations (not tested) for data with lots of other numeric values.
>>> headers = "Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Chubby Male Wearing_Necktie Young".split()
>>> header_rec = np.array([tuple(range(len(headers)))], dtype = zip(headers, [int]*len(headers)))
>>> header_rec.dtype.names
('Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Chubby', 'Male', 'Wearing_Necktie', 'Young')
>>> np.in1d(header_rec.item(), [header_rec[key].item() for key in ["Young", "Bald", "Chubby"]]).astype(int)
array([0, 0, 0, 1, 0, 1, 0, 0, 1])
In Python 3, you will need to use dtype=list(zip(headers, [int]*len(headers))) since zip became its own object.
Update:
As of CPython 3.6, dictionaries have a version (thank you pylang for showing this to me).
If they added the same version to list and made it public, all 3 asserts from my original post would pass! It would definitely meet my needs. Their implementation differs from what I envisioned, but I like it.
As it is, I don't feel I can use dictionary version:
It isn't public. Jake Vanderplas shows how to expose it in a post, but he cautions: definitely not code you should use for any purpose beyond simply having fun. I agree with his reasons.
In all of my use cases, the data is conceptually arrays of elements each of which has the same structure. A list of tuples is a natural fit. Using a dictionary would make the code less natural and probably more cumbersome.
Does anyone know if there are plans to add version to list?
Are there plans to make it public?
If there are plans to add version to list and make it public, I would feel awkward putting forward an incompatible VersionedList now. I would just implement the bare minimum I need and get by.
Original post below
Turns out that many of the times I wanted an immutable list, a VersionedList would have worked almost as well (sometimes even better).
Has anyone implemented a versioned list?
Is there a better, more Pythonic, concept that meets my needs? (See motivation below.)
What I mean by a versioned list is:
A class that behaves like a list
Any change to an instance or elements in the instance results in instance.version() being updated. So, if alist is a normal list:
a = VersionedList(alist)
a_version = a.version()
change(a)
assert a_version != a.version()
reverse_last_change(a)
If a list was hashable, hash() would achieve the above and meet all the needs identified in the motivation below. We need to define 'version()' in a way that doesn't have all of the same problems as 'hash()'.
If identical data in two lists is highly unlikely to ever happen except at initialization, we aren't going to have a reason to test for deep equality. From (https://docs.python.org/3.5/reference/datamodel.html#object.hash) The only required property is that objects which compare equal have the same hash value. If we don't impose this requirement on 'version()', it seems likely that 'version()' won't have all of the same problems that makes lists unhashable. So unlike hash, identical contents doesn't mean the same version
#contents of 'a' are now identical to original, but...
assert a_version != a.version()
b = VersionedList(alist)
c = VersionedList(alist)
assert b.version() != c.version()
For VersionList, it would be good if any attempt to modify the result of __get__ automatically resulted in a copy instead of modifying the underlying implementation data. I think that the only other option would be to have __get__ always return a copy of the elements, and this would be very inefficient for all of the use cases I can think of. I think we need to restrict the elements to immutable objects (deeply immutable, for example: exclude tuples with list elements). I can think of 3 ways to achieve this:
Only allow elements that can't contain mutable elements (int, str, etc are fine, but exclude tuples). (This is far too limiting for my cases)
Add code to __init__, __set__, etc to traverse inputs to deeply check for mutable sub-elements. (expensive, any way to avoid this?)
Also allow more complex elements, but require that they are deeply immutable. Perhaps require that they expose a deeply_immutable attribute. (This turns out to be easy for all the use cases I have)
Motivation:
If I am analyzing a dataset, I often have to perform multiple steps that return large datasets (note: since the dataset is ordered, it is best represented by a List not a set).
If at the end of several steps (ex: 5) it turns out that I need to perform different analysis (ex: back at step 4), I want to know that the dataset from step 3 hasn't accidentally been changed. That way I can start at step 4 instead of repeating steps 1-3.
I have functions (control-points, first-derivative, second-derivative, offset, outline, etc) that depend on and return array-valued objects (in the linear algebra sense). The base 'array' is knots.
control-points() depends on: knots, algorithm_enum
first-derivative() depends on: control-points(), knots
offset() depends on: first-derivative(), control-points(), knots, offset_distance
outline() depends on: offset(), end_type_enum
If offset_distance changes, I want to avoid having to recalculate first-derivative() and control-points(). To avoid recalculation, I need to know that nothing has accidentally changed the resultant 'arrays'.
If 'knots' changes, I need to recalculate everything and not depend on the previous resultant 'arrays'.
To achieve this, knots and all of the 'array-valued' objects could be VersionedList.
FYI: I had hoped to take advantage of an efficient class like numpy.ndarray. In most of my use cases, the elements logically have structure. Having to mentally keep track of multi-dimensions of indexes meant implementing and debugging the algorithms was many times more difficult with ndarray. An implementation based on lists of namedtuples of namedtuples turned out to be much more sustainable.
Private dicts in 3.6
In Python 3.6, dictionaries are now private (PEP 509) and compact (issue 27350), which track versions and preserve order respectively. These features are presently true when using the CPython 3.6 implementation. Despite the challenge, Jake VanderPlas demonstrates in his blog post a detailed demonstration of exposing this versioning feature from CPython within normal Python. We can use his approach to:
determine when a dictionary has been updated
preserve the order
Example
import numpy as np
d = {"a": np.array([1,2,3]),
"c": np.array([1,2,3]),
"b": np.array([8,9,10]),
}
for i in range(3):
print(d.get_version()) # monkey-patch
# 524938
# 524938
# 524938
Notice the version number does not change until the dictionary is updated, as shown below:
d.update({"c": np.array([10, 11, 12])})
d.get_version()
# 534448
In addition, the insertion order is preserved (the following was tested in restarted sessions of Python 3.5 and 3.6):
list(d.keys())
# ['a', 'c', 'b']
You may be able to take advantage of this new dictionary behavior, saving you from implementing a new datatype.
Details
For those interested, the latter get_version()is a monkey-patched method for any dictionary, implemented in Python 3.6 using the following modified code derived from Jake VanderPlas' blog post. This code was run prior to calling get_version().
import types
import ctypes
import sys
assert (3, 6) <= sys.version_info < (3, 7) # valid only in Python 3.6
py_ssize_t = ctypes.c_ssize_t
# Emulate the PyObjectStruct from CPython
class PyObjectStruct(ctypes.Structure):
_fields_ = [('ob_refcnt', py_ssize_t),
('ob_type', ctypes.c_void_p)]
# Create a DictStruct class to wrap existing dictionaries
class DictStruct(PyObjectStruct):
_fields_ = [("ma_used", py_ssize_t),
("ma_version_tag", ctypes.c_uint64),
("ma_keys", ctypes.c_void_p),
("ma_values", ctypes.c_void_p),
]
def __repr__(self):
return (f"DictStruct(size={self.ma_used}, "
f"refcount={self.ob_refcnt}, "
f"version={self.ma_version_tag})")
#classmethod
def wrap(cls, obj):
assert isinstance(obj, dict)
return cls.from_address(id(obj))
assert object.__basicsize__ == ctypes.sizeof(PyObjectStruct)
assert dict.__basicsize__ == ctypes.sizeof(DictStruct)
# Code for monkey-patching existing dictionaries
class MappingProxyStruct(PyObjectStruct):
_fields_ = [("mapping", ctypes.POINTER(DictStruct))]
#classmethod
def wrap(cls, D):
assert isinstance(D, types.MappingProxyType)
return cls.from_address(id(D))
assert types.MappingProxyType.__basicsize__ == ctypes.sizeof(MappingProxyStruct)
def mappingproxy_setitem(obj, key, val):
"""Set an item in a read-only mapping proxy"""
proxy = MappingProxyStruct.wrap(obj)
ctypes.pythonapi.PyDict_SetItem(proxy.mapping,
ctypes.py_object(key),
ctypes.py_object(val))
mappingproxy_setitem(dict.__dict__,
'get_version',
lambda self: DictStruct.wrap(self).ma_version_tag)
Firstly, you should know that I am incredibly new to programming, so I will love any detailed explanations.
So what I am attempting to make is a program that basically creates people. This includes unique characteristics as such their name, income, job, etc. And since I planned to make a large number of 'people,' I hoped I could merely state how many people I wanted made, and I would get each of them as a object class. To name them I figured I could do 'person1,' 'person2,' and so on. My trouble came when I found out you can't make strings into objects. (Or rather, it is heavily frowned upon.)
After researching I was able to make each person a dictionary, with a key like 'income' and a value like '60000.' However, when it comes to manipulating the data created it seems much better to uses classes and methods instead.
Thank you, and sorry if this is bad or if I am overlooking something.
Edit: I realized I could ask this better, how can I instantiate a large number of persons, or how do I make the needed variables to instantiate? I suck at explaining things...
It seems to me that you are asking two distinct questions (correct me if I'm wrong). The first - how should you store your data. The second - how can you do that repeatedly with ease.
There are a couple of ways you can store the data. I don't know your exact usecase so I can't say exactly which one would work best (you mentioned creating objects in your question so I'll use that for further examples)
Objects
class Person(object):
def __init__(self, name, income):
self.name = name
self.income = income
Namedtuples
>>> from collections import namedtuple
>>> a = namedtuple("person", ['name', 'income'])
>>> a
<class '__main__.person'>
>>> ab = a("Dannnnno", 100)
>>> ab
person(name='Dannnnno', income=100)
>>> ab.name
'Dannnnno'
>>> ab.income
100
Dictionaries
someperson = {0 : {name:"Dannnno", income:100}}
someotherperson = {1: {name:"kcd", income:100}}
As for creating large numbers of them - either create a class like GroupOfPeople or use a function.
Using the Classes example from above (I assume you could translate the other two examples appropriately)
class GroupOfPeople(object):
def __init__(self, num_people):
self.people = [Person("Default", 0) for i in range num_people]
####
def MakeLotsOfPeople(num_people):
return [Person("Default", 0) for i in range num_people]
You could then edit those separate Person instances to whatever you want. You could also edit the class/function to accept another input (like a filename perhaps) that stored all of your name/income/etc data.
If you want a dictionary of the group of people just replace the list comprehensions with a dictionary comprehension, like so
{i : Person("Default", 0) for i in range num_people}
Look up Object Oriented Programming. This is the concept you are trying to wrap your head around.
http://en.wikipedia.org/wiki/Object-oriented_programming
As a way to get used to python, I am trying to translate some of my code to python from Autohotkey_L.
I am immediately running into tons of choices for collection objects. Can you help me figure out a built in type or a 3rd party contributed type that has as much as possible, the functionality of the AutoHotkey_L object type and its methods.
AutoHotkey_L Objects have features of a python dict, list, and a class instance.
I understand that there are tradeoffs for space and speed, but I am just interested in functionality rather than optimization issues.
Don't write Python as <another-language>. Write Python as Python.
The data structure should be chosen just to have the minimal ability you need to use.
list — an ordered sequence of elements, with 1 flexible end.
collections.deque — an ordered sequence of elements, with 2 flexible ends (e.g. a queue).
set / frozenset — an unordered sequence of unique elements.
collections.Counter — an unordered sequence of non-unique elements.
dict — an unordered key-value relationship.
collections.OrderedDict — an ordered key-value relationship.
bytes / bytearray — a list of bytes.
array.array — a homogeneous list of primitive types.
Looking at the interface of Object,
dict would be the most suitable for finding a value by key
collections.OrderedDict would be the most suitable for the push/pop stuff.
when you need MinIndex / MaxIndex, where a sorted key-value relationship (e.g. red black tree) is required. There's no such type in the standard library, but there are 3rd party implementations.
It would be impossible to recommend a particular class without knowing how you intend on using it. If you are using this particular object as an ordered sequence where elements can be repeated, then you should use a list; if you are looking up values by their key, then use a dictionary. You will get very different algorithmic runtime complexity with the different data types. It really does not take that much time to determine when to use which type.... I suggest you give it some further consideration.
If you really can't decide, though, here is a possibility:
class AutoHotKeyObject(object):
def __init__(self):
self.list_value = []
self.dict_value = {}
def getDict(self):
return self.dict_value
def getList(self):
return self.list_value
With the above, you could use both the list and dictionary features, like so:
obj = AutoHotKeyObject()
obj.getList().append(1)
obj.getList().append(2)
obj.getList().append(3)
print obj.getList() # Prints [1, 2, 3]
obj.getDict()['a'] = 1
obj.getDict()['b'] = 2
print obj.getDict() # Prints {'a':1, 'b':2}
Suppose I've got two dicts in Python:
mydict = { 'a': 0 }
defaults = {
'a': 5,
'b': 10,
'c': 15
}
I want to be able to expand mydict using the default values from defaults, such that 'a' remains the same but 'b' and 'c' are filled in. I know about dict.setdefault() and dict.update(), but each only do half of what I want - with dict.setdefault(), I have to loop over each variable in defaults; but with dict.update(), defaults will blow away any pre-existing values in mydict.
Is there some functionality I'm not finding built into Python that can do this? And if not, is there a more Pythonic way of writing a loop to repeatedly call dict.setdefaults() than this:
for key in defaults.keys():
mydict.setdefault(key, defaults[key])
Context: I'm writing up some data in Python that controls how to parse an XML tree. There's a dict for each node (i.e., how to process each node), and I'd rather the data I write up be sparse, but filled in with defaults. The example code is just an example... real code has many more key/value pairs in the default dict.
(I realize this whole question is but a minor quibble, but it's been bothering me, so I was wondering if there was a better way to do this that I am not aware of.)
Couldnt you make mydict be a copy of default, That way, mydict would have all the correct values to start with?
mydict = default.copy()
If you don't mind creating a new dictionary in the process, this will do the trick:
newdict = dict(defaults)
newdict.update(mydict)
Now newdict contains what you need.
Since Python 3.9, you can do:
mydict = defaults | mydict
I found this solution to be the most elegant in my usecase.
You can do this the same way Python's collections.DefaultDict works:
class MultiDefaultDict(dict):
def __init__(self, defaults, **kwargs):
self.defaults = defaults
self.update(kwargs)
def __missing__(self, key):
return self.defaults[key]
>>> mydict2 = MultiDefaultDict(defaults, a=0)
>>> mydict2['a']
0
>>> mydict2['b']
10
>>> mydict2
{'a': 0}
The other solutions posted so far duplicate all the default values; this one shares them, as requested. You may or may not want to override other dict methods like __contains__(), __iter__(), items(), keys(), values() -- this class as defined here iterates over the non-default items only.
defaults.update(mydict)
Personally I like to append the dictionary object. It works mostly like a dictionary except that you have to create the object first.
class d_dict(dict):
'Dictionary object with easy defaults.'
def __init__(self,defaults={}):
self.setdefault(defaults)
def setdefault(self,defaults):
for key, value in defaults.iteritems():
if not key in self:
dict.__setitem__(self,key,value)
This provides the exact same functionality as the dict type except that it overrides the setdefault() method and will take a dictionary containing one or more items. You can set the defaults at creation.
This is just a personal preference. As I understand all that dict.setdefault() does is set the items which haven't been set yet. So probably the simplest in place option is:
new_dict = default_dict.copy()
new_dict.update({'a':0})
However, if you do this more than once you might make a function out of this. At this point it may just be easier to use a custom dict object, rather than constantly adding defaults to your dictionaries.