checking a Python dictionary for specific keys [duplicate]

checking a Python dictionary for specific keys [duplicate] - python

This question already has answers here:
Should I use 'has_key()' or 'in' on Python dicts? [duplicate]
(9 answers)
Closed last month.
There are several different ways to check if a Python dictionary contains a specific key, i.e.
d = {}
if key in d:
if d.contains(key):
if d.has_key(key):
it's silly for a language to allow you to do the same thing several different ways, UNLESS, each of the methods was doing something entirely different. Could someone please contrast the three techniques above, how are they different?

They're all the same and they're all around for historical reasons, but you should use key in d.

Method #1 is the accepted way to do it. Method #2 doesn't actually exist, at least in any versions of Python that I'm aware of; I'd be interested to see where you found that. Method #3 used to be the accepted way, but is now deprecated.
So there really is just one way.

d.__contains__(key) is what is used it key in d (since in operator calls __contains__ method of the dictionary)
has_key is deprecated and does the same as __contains__

key in d is the accepted way to do it.
__contains__ is the ‘“magic” attribute’ (ref) that implements the above syntax. Most, if not all, special syntax is implemented via such methods. E.g., the with statement is implemented via __enter__ and __exit__. Such methods exist for allowing special functionality to be provided for user-defined classes.
the has_key method no longer exists in Python 3 and is deprecated in Python 2.

Related

What's the point of an immutable-but-non-hashable container class?

Pandas defines a FrozenList object, described in its documentation as a
[c]ontainer that doesn't allow setting item but because it's technically non-hashable, will be used for lookups, appropriately, etc.
I don't understand what benefit there is in having an immutable-but-non-hashable container. Could someone give me an example where the hashability of a standard Python tuple would cause problems?
The answer to this similar question does not give any reason for why one would want an immutable-but-non-hashable container. In particular, it does not explain why a Python tuple's hashability makes it unsuitable for whatever task the FrozenLists are being used for. Therefore,
THIS QUESTION IS NOT A DUPLICATE.

This class is indeed a bit confusing, because the discussion on that question indicates that it is mutable via pandas internals, but I don't see how that ability is actually used. Being unhashable has no specific benefit. The benefit is that the object is mutable (although, again, as that question states, it is mutable only via the C internals, not via Python code). If an object is mutable, it cannot be hashable. (Or rather, it can cannot have its hash value depend on its mutable state; for an object like a list/tuple that has no real state except its mutable contents, there's no sensible way to make it hashable if it's mutable.)
In addition, there is this comment in the source code:
# Sidenote: This has to be of type list, otherwise it messes up PyTables
# typechecks
As the other question also says, there are other reasons to use a custom class instead of tuple. For instance, FrozenList has a custom __repr__.
Basically, it does not appear that the class was written because someone thought "I need an immutable but nonhashable container". Rather, it appears it was written because someone thought "I need a class that is mutable, but only secretly, and I need to be able to give it custom methods, and I need it to be a subclass of list to avoid breaking this other library." Some of the comments on that other question/answer suggest that nonhashability per se may never have been a necessary criterion, and even if was, it may no longer be necessary due to other changes in how pandas works. I didn't write the class, so I can't be sure, but it seems to me that the comment you quoted in your question is misleading about what the real impetus was for making such a class.

When to use attributes vs. when to use properties in python? [duplicate]

This question already has answers here:
What's the difference between a Python "property" and "attribute"?
(7 answers)
Closed 2 months ago.
Just a quick question, I'm having a little difficulty understanding where to use properties vs. where use to plain old attributes. The distinction to me is a bit blurry. Any resources on the subject would be superb, thank you!

Properties are more flexible than attributes, since you can define functions that describe what is supposed to happen when setting, getting or deleting them. If you don't need this additional flexibility, use attributes – they are easier to declare and faster.
In languages like Java, it is usually recommended to always write getters and setters, in order to have the option to replace these functions with more complex versions in the future. This is not necessary in Python, since the client code syntax to access attributes and properties is the same, so you can always choose to use properties later on, without breaking backwards compatibilty.

The point is that the syntax is interchangeable. Always start with attributes. If you find you need additional calculations when accessing an attribute, replace it with a property.

In addition to what Daniel Roseman said, I often use properties when I'm wrapping something i.e. when I don't store the information myself but wrapped object does. Then properties make excellent accessors.

Properties are attributes + a posteriori encapsulation.
When you turn an attribute into a property, you just define some getter and setter that you "attach" to it, that will hook the data access. Then, you don't need to rewrite the rest of your code, the way for accessing the data is the same, whatever your attribute is a property or not.
Thanks to this very clever and powerful encapsulation mechanism, in Python you can usually go with attributes (without a priori encapsulation, so without any getter nor setter), unless you need to do special things when accessing the data.
If so, then you just can define setters and getters, only if needed, and "attach" them to the attribute, turning it into a property, without any incidence on the rest of your code (whereas in Java, the first thing you usually do when creating a field, usually private, is to create it's associated getter and setter method).
Nice page about attributes, properties and descriptors here

How do you investigate python's implementation of built-in methods?

I'm currently going through a basic compsci course. We use Python's in a lot. I'm curious how it's implemented, what the code that powers in looks like.
I can think of how my implementation of such a thing would work, but something I've learned after turning in a couple homework assignments is that my ways of doing things are usually pretty terrible and inefficient.. So I want to start investigating 'good' code.

The thing about builtin functions and types and operators and so on is that they are not implemented in Python. Rather, they're implemented in C, which is a much more painful and verbose programming language that won't always translate well to Python (usually because things are easier some other way in Python.)
With that said, you can investigate all of Python's implementation online, via their public source repository.
The implementation for in is scattered -- there's one implementation per type, plus a more general implementation that calls the type-specific implementation (more on that later). For example, for lists, we'd look for the implementation of lists. In the Python source tree, the source for all builtin objects is in the Objects directory. In that directory you'll find listobject.c , which contains the implementation for the list object and all its methods.
On the repository at the time of answering, if you look at line 393 you'll find the implementation of the in operator (also known as the __contains__ method, which explains the name of the function). It's fairly straightforward, just loops through all the elements of the list until either the element is found, or there's no more elements, and returns the result of the search. :)
If it helps, in Python the idiomatic way to write this would be:
def __contains__(self, obj):
for item in self:
if item == obj:
return True
return False
I said earlier that there was a more general implementation. That can be seen in the implementation of PySequence_Contains in abstract.c. It tries to call the type-specific version, and if that fails, resorts to regular iteration. That loop there is what a regular Python for loop looks like when you write it in C (using the Python C-API).

From the Data Model section of the Python Language Reference:
The membership test operators (in and not in) are normally implemented
as an iteration through a sequence. However, container objects can
supply the following special method with a more efficient
implementation, which also does not require the object be a sequence.
object.__contains__(self, item)
Called to implement membership test operators. Should return true if item is in self,
false otherwise. For mapping objects, this should
consider the keys of the mapping rather than the values or the
key-item pairs.
For objects that don’t define __contains__(), the membership test first tries
iteration via __iter__(), then the old sequence iteration
protocol via __getitem__(), see this section in the language
reference.
So, by default Python iterates over a sequence to implement the in operator. If an object defines the __contains__ method, Python uses it instead of iterating. So what happens in the __contains__ method? To know exactly, you would have to browse the source. But I can tell you that Python's lists implement __contains__ using iteration. Python dictionaries and sets are implemented as hash tables, and therefore support faster membership testing.

Python's built-in methods are written in the C language - you can see their code by checking out Python's source code yourself.
However, if you want to take a look of an equivalent implementation of all the methods in Python itself, you can check PyPy - which features a Python implementation 100% written in Python and a subset of it (rpython).
The in operator calls the __contains__ method in a string object - so you can check for the implementation of the string in both projects - but the actual searching code will be buried deeper.
Here is some of the code in CPython for it, for example:
http://hg.python.org/cpython/file/c310233b1d64/Objects/stringlib/fastsearch.h

You can browse the Python-Source online: http://hg.python.org/
A good start is to clone the repository you need and then use grep to find the things you need.

Python syntax reasoning (why not fall back for . the way django template syntax does?)

My karate instructor is fond of saying, "a block is a lock is a throw is a blow." What he means is this: When we come to a technique in a form, although it might seem to look like a block, a little creativity and examination shows that it can also be seen as some kind of joint lock, or some kind of throw, or some kind of blow.
So it is with the way the django template syntax uses the dot (".") character. It perceives it first as a dictionary lookup, but it will also treat it as a class attribute, a method, or list index - in that order. The assumption seems to be that, one way or another, we are looking for a piece of knowledge. Whatever means may be employed to store that knowledge, we'll treat it in such a way as to get it into the template.
Why doesn't python do the same? If there's a case where I might have assigned a dictionary term spam['eggs'], but know for sure that spam has an attribute eggs, why not let me just write spam.eggs and sort it out the way django templates do?
Otherwise, I have to except an AttributeError and add three additional lines of code.
I'm particularly interested in the philosophy that drives this setup. Is it regarded as part of strong typing?

django templates and python are two, unrelated languages. They also have different target audiences.
In django templates, the target audience is designers, who proabably don't want to learn 4 different ways of doing roughly the same thing ( a dictionary lookup ). Thus there is a single syntax in django templates that performs the lookup in several possible ways.
python has quite a different audience. developers actually make use of the many different ways of doing similar things, and overload each with distinct meaning. When one fails it should fail, because that is what the developer means for it to do.

JUST MY correct OPINION's opinion is indeed correct. I can't say why Guido did it this way but I can say why I'm glad that he did.
I can look at code and know right away if some expression is accessing the 'b' key in a dict-like object a, the 'b' attribute on the object a, a method being called on or the b index into the sequence a.
Python doesn't have to try all of the above options every time there is an attribute lookup. Imagine if every time one indexed into a list, Python had to try three other options first. List intensive programs would drag. Python is slow enough!
It means that when I'm writing code, I have to know what I'm doing. I can't just toss objects around and hope that I'll get the information somewhere somehow. I have to know that I want to lookup a key, access an attribute, index a list or call a method. I like it that way because it helps me think clearly about the code that I'm writing. I know what the identifiers are referencing and what attributes and methods I'm expecting the object of those references to support.
Of course Guido Van Rossum might have just flipped a coin for all I know (He probably didn't) so you would have to ask him yourself if you really want to know.
As for your comment about having to surround these things with try blocks, it probably means that you're not writing very robust code. Generally, you want your code to expect to get some piece of information from a dict-like object, list-like object or a regular object. You should know which way it's going to do it and let anything else raise an exception.
The exception to this is that it's OK to conflate attribute access and method calls using the property decorator and more general descriptors. This is only good if the method doesn't take arguments.

The different methods of accessing
attributes do different things. If
you have a function foo the two lines
of code
a = foo,
a = foo()
do two
very different things. Without
distinct syntax to reference and call
functions there would be no way for
python to know whether the variable
should be a reference to foo or the
result of running foo. The () syntax removes the ambiguity.
Lists and dictionaries are two very different data structures. One of the things that determine which one is appropriate in a given situation is how its contents can be accessed (key Vs index). Having separate syntax for both of them reinforces the notion that these two things are not the same and neither one is always appropriate.
It makes sense for these distinctions to be ignored in a template language, the person writing the html doesn't care, the template language doesn't have function pointers so it knows you don't want one. Programmers who write the python that drive the template however do care about these distinctions.

In addition to the points already posted, consider this. Python uses special member variables and functions to provide metadata about the object. Both the interpreter and programmers make heavy use of these. For example, both dicts and lists have a __len__ member function. Now, if a dict's data were accessed by using the . operator, a potential ambiguity arises if the dict has a key called __len__. You could special-case these, but many objects have a __dict__ attribute which is a mapping of member names and values. If that object happened to be a container, which also defined a __len__ attribute, you would end up with an utter mess.
Problems like this would end up turning Python into a mishmash of special cases that the programmer would have to constantly be aware of. This would detract from the reason why many people use Python in the first place, i.e., its elegant simplicity.
Now, consider that new users often shadow built-ins (if the code in SO questions is any indication) and having something like this starts to look like a really bad idea, since it would exacerbate the problem many-fold.

In addition to the responses above, it's not practical to merge dictionary lookup and object lookup in general because of the restrictions on object members.
What if your key has whitespace? What if it's an int, or a frozenset, etc.? Dot notation can't account for these discrepancies, so while it's an acceptable tradeoff for a templating language, it's unacceptable for a general-purpose programming language like Python.

Function overloading in Python: Missing [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
As function overloading says:
Function overloading is absent in Python.
As far as I feel this a big handicap since its also an object-oriented (OO) language. Initially I found that unable to differentiate between the argument types was difficult, but the dynamic nature of Python made it easy (e.g. list, tuples, strings are much similar).
However, counting the number of arguments passed and then doing the job is like an overkill.

Now, unless you're trying to write C++ code using Python syntax, what would you need overloading for?
I think it's exactly opposite. Overloading is only necessary to make strongly-typed languages act more like Python. In Python you have keyword argument, and you have *args and **kwargs.
See for example: What is a clean, Pythonic way to have multiple constructors in Python?

As unwind noted, keyword arguments with default values can go a long way.
I'll also state that in my opinion, it goes against the spirit of Python to worry a lot about what types are passed into methods. In Python, I think it's more accepted to use duck typing -- asking what an object can do, rather than what it is.
Thus, if your method may accept a string or a tuple, you might do something like this:
def print_names(names):
"""Takes a space-delimited string or an iterable"""
try:
for name in names.split(): # string case
print name
except AttributeError:
for name in names:
print name
Then you could do either of these:
print_names("Ryan Billy")
print_names(("Ryan", "Billy"))
Although an API like that sometimes indicates a design problem.

You don't need function overloading, as you have the *args and **kwargs arguments.
The fact is that function overloading is based on the idea that passing different types you will execute different code. If you have a dynamically typed language like Python, you should not distinguish by type, but you should deal with interfaces and their compliance with the code you write.
For example, if you have code that can handle either an integer, or a list of integers, you can try iterating on it and if you are not able to, then you assume it's an integer and go forward. Of course it could be a float, but as far as the behavior is concerned, if a float and an int appear to be the same, then they can be interchanged.

Oftentimes you see the suggestion use use keyword arguments, with default values, instead. Look into that.

You can pass a mutable container datatype into a function, and it can contain anything you want.
If you need a different functionality, name the functions differently, or if you need the same interface, just write an interface function (or method) that calls the functions appropriately based on the data received.
It took a while to me to get adjusted to this coming from Java, but it really isn't a "big handicap".

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.