A more idiomatic way to fill a dictionary in Python

A more idiomatic way to fill a dictionary in Python - python

Currently, I'm trying to fill a dictionary in Python but I think what I'm doing is a little bit redundant. Is there a more pythonic way to do the following:
if not pattern_file_map.get(which_match):
pattern_file_map[which_match] = [line]
else:
pattern_file_map[which_match].append(line)
where pattern_file_map is a dictionary.
I know that there is a certain idiom to use when checking if there is a key in a dictionary, like
this question, but I I just want to fill this dictionary with a lists.

You could use
pattern_file_map.setdefault(which_match, []).append(line)
instead.
Other people might suggest using a collections.defaultdict(list) instead, which is another option, but be warned that this might hide errors, since it will silently create all keys you access.

You could try using a collections.defaultdict.

Since you're adding to a maybe existing value:
pattern_file_map.setdefault(which_match, []).append(line)
If using dict.get():
li = pattern_file_map.get(which_match, [])
li.append(line)
pattern_file_map[which_match] = li
Of course, both cases are restricted to the case where your dict's values are lists.

Related

Logic behind accessing dictionary values in python

Rookie here and I couldn't find a proper explanation for this.
We have a simple dict:
a_dict = {'color': 'blue', 'fruit': 'apple', 'pet': 'dog'}
to loop through and access the values of this dict I have to call
for key in a_dict:
print(key, '->', a_dict[key])
I am saying about
a_dict[key]
specifically. Why python use this convention? Where is a logic behind this? When I want to get values of a dictionary I should call it something like
a_dict[value] or a_dict[values] etc
instead (thinking logically).
Could anyone explain it to make more sense please?
edit:
to be clear: why python use a_dict[key] to access dict VALUE instead of a_dict[value]. LOGICALLY.

according to your question, I think you meant why python does not use index instead of key to reach values in the dict.
Please take note that there are 4 main data container in python, and each for its usage. (there are also other containers like counter and ...)
for example elements of list and tuple is reachable by their indices.
a = [1,2,3,4,5]
print(a[0]) would print 1
but dictionary as its name shows, maps from some objects (keys in python terminology) to some other objects(values in python terminology). so we would call the key instead of index and the output would be the value.
a = { 'a':1 , 'b':2 }
print(a['a']) would print 1
I hope it makes it a bit more clear for you.

I think you are misunderstanding some terminology around dictionaries:
In your example, your keys are color, fruit, and pet.
Your values are blue, apple, and dog.
In python, you access your values by calling a_dict[key], for example a_dict["color"] will return "blue".
If python instead used your suggested method of a_dict[value], you would have to know what your value was before trying to access it, e.g. a_dict["blue"] would be needed to get "blue", which makes very little sense.
As in Feras's answer, try reading up more on how dictionaries work here

Its because, a dictionary in python, maps the keys and values with a hash function internally in the memory.
Thus, to get the value, you've to pass in the key.
You can sort of think it like indices of the list vs the elements of the list, now to extract a particular element, you would use lst[index]; this is the same way dictionaries work; instead of passing in index you would've to pass in
the key you used in the dictionary, like dict[key].
One more comparison is the dictionary (the one with words and meanings), in that the meanings are mapped to the words, now you would of course search for the word and not the meaning given to the word, directly.

You are searching for a value wich you don't know if it exists or not in the dict, so the a_dict[key] is logic and correct

List expressions in Python

I'm parsing a xml file ... so there's a field called case: Sometimes it's a single OrderedDict, other times it's a list of OrderedDict. That's it:
OrderedDict([(u'duration', u'2.111'), (u'className', u'foo'), (u'testName', u'runTest'), (u'skipped', u'false'), (u'failedSince', u'0')])
[OrderedDict([(u'duration', u'0.062'), (u'className', u'foo'), (u'testName', u'runTest'), (u'skipped', u'false'), (u'failedSince', u'0')]), OrderedDict([(u'duration', u'0.461'), (u'className', u'bar'), (u'testName', u'runTest'), (u'skipped', u'false'), (u'failedSince', u'0')])]
I want to always have that expression as a single list. The reason is to have a for loop to take care of that. I thought about doing something like:
[case]
But as the later I would have [[case]]. I don't think list joins or concatenations would help me. A trivial solution would be to check if case is of the type list or OrderedDict, however I was looking for a simpler, one line, pythonic solution like the one I described above. How can I accomplish that?

Since list and OrderedDict are both kinds of containers, checking the type sounds like it might be the simplest solution, if you're sure that the xml parse will always use the list type.
There's no reason you can't do this in a one-liner:
case = [case] if not isinstance(case, list) else case

Look up python dict value by expression

I have a dict that has unix epoch timestamps for keys, like so:
lookup_dict = {
1357899: {} #some dict of data
1357910: {} #some other dict of data
}
Except, you know, millions and millions and millions of entries. I'd like to subset this dict, over and over again. Ideally, I'd love to be able to write something like I can in R, like:
lookup_value = 1357900
dict_subset = lookup_dict[key >= lookup_value]
# dict_subset now contains {1357910: {}}
But I confess, I can't find any actual proof that this is something Python can do without having, one way or the other, to iterate over every row. If I understand Python correctly (and I might not), key lookup of the form key in dict uses binary search, and is thus very fast; any way to do a binary search, on dict keys?

To do this without iterating, you're going to need the keys in sorted order. Then you just need to do a binary search for the first one >= lookup_value, instead of checking each one for >= lookup_value.
If you're willing to use a third-party library, there are plenty out there. The first two that spring to mind are bintrees (which uses a red-black tree, like C++, Java, etc.) and blist (which uses a B+Tree). For example, with bintrees, it's as simple as this:
dict_subset = lookup_dict[lookup_value:]
And this will be as efficient as you'd hope—basically, it adds a single O(log N) search on top of whatever the cost of using that subset. (Of course usually what you want to do with that subset is iterate the whole thing, which ends up being O(N) anyway… but maybe you're doing something different, or maybe the subset is only 10 keys out of 1000000.)
Of course there is a tradeoff. Random access to a tree-based mapping is O(log N) instead of "usually O(1)". Also, your keys obviously need to be fully ordered, instead of hashable (and that's a lot harder to detect automatically and raise nice error messages on).
If you want to build this yourself, you can. You don't even necessarily need a tree; just a sorted list of keys alongside a dict. You can maintain the list with the bisect module in the stdlib, as JonClements suggested. You may want to wrap up bisect to make a sorted list object—or, better, get one of the recipes on ActiveState or PyPI to do it for you. You can then wrap the sorted list and the dict together into a single object, so you don't accidentally update one without updating the other. And then you can extend the interface to be as nice as bintrees, if you want.

Using the following code will work out
some_time_to_filter_for = # blah unix time
# Create a new sub-dictionary
sub_dict = {key: val for key, val in lookup_dict.items()
if key >= some_time_to_filter_for}
Basically we just iterate through all the keys in your dictionary and given a time to filter out for we take all the keys that are greater than or equal to that value and place them into our new dictionary

Python equiv. of PHP foreach []?

I am fetching rows from the database and wish to populate a multi-dimensional dictionary.
The php version would be roughly this:
foreach($query as $rows):
$values[$rows->id][] = $rows->name;
endforeach;
return $values;
I can't seem to find out the following issues:
What is the python way to add keys to a dictionary using an automatically numbering e.g. $values[]
How do I populate a Python dictionary using variables; using, for example, values[id] = name, will not add keys, but override existing.
I totally have no idea how to achieve this, as I am a Python beginner (programming in general, actually).

values = collections.defaultdict(list)
for rows in query:
values[rows.id].append(rows.name)
return values

Just a general note:
Python's dictionaries are mappings without order, while adding numerical keys would allow "sequential" access, in case of iteration there's no guarantee that order will coincide with the natural order of keys.
It's better not to translate from PHP to Python (or any other language), but rather right code idiomatic to that particular language. Have a look at the many open-source code that does the same/similar things, you might even find a useful module (library).

all_rows=[]
for row in query:
all_rows.append(row['col'])
print(all_rows)

You can do:
from collections import defaultdict
values = defaultdict(list)
for row in query:
values[row.id].append(row.name)
return values
Edit: forgot to return the values.

How do I know what data type to use in Python?

I'm working through some tutorials on Python and am at a position where I am trying to decide what data type/structure to use in a certain situation.
I'm not clear on the differences between arrays, lists, dictionaries and tuples.
How do you decide which one is appropriate - my current understanding doesn't let me distinguish between them at all - they seem to be the same thing.
What are the benefits/typical use cases for each one?

How do you decide which data type to use? Easy:
You look at which are available and choose the one that does what you want. And if there isn't one, you make one.
In this case a dict is a pretty obvious solution.

Tuples first. These are list-like things that cannot be modified. Because the contents of a tuple cannot change, you can use a tuple as a key in a dictionary. That's the most useful place for them in my opinion. For instance if you have a list like item = ["Ford pickup", 1993, 9995] and you want to make a little in-memory database with the prices you might try something like:
ikey = tuple(item[0], item[1])
idata = item[2]
db[ikey] = idata
Lists, seem to be like arrays or vectors in other programming languages and are usually used for the same types of things in Python. However, they are more flexible in that you can put different types of things into the same list. Generally, they are the most flexible data structure since you can put a whole list into a single list element of another list, but for real data crunching they may not be efficient enough.
a = [1,"fred",7.3]
b = []
b.append(1)
b[0] = "fred"
b.append(a) # now the second element of b is the whole list a
Dictionaries are often used a lot like lists, but now you can use any immutable thing as the index to the dictionary. However, unlike lists, dictionaries don't have a natural order and can't be sorted in place. Of course you can create your own class that incorporates a sorted list and a dictionary in order to make a dict behave like an Ordered Dictionary. There are examples on the Python Cookbook site.
c = {}
d = ("ford pickup",1993)
c[d] = 9995
Arrays are getting closer to the bit level for when you are doing heavy duty data crunching and you don't want the frills of lists or dictionaries. They are not often used outside of scientific applications. Leave these until you know for sure that you need them.
Lists and Dicts are the real workhorses of Python data storage.

Best type for counting elements like this is usually defaultdict
from collections import defaultdict
s = 'asdhbaklfbdkabhvsdybvailybvdaklybdfklabhdvhba'
d = defaultdict(int)
for c in s:
d[c] += 1
print d['a'] # prints 7

Do you really require speed/efficiency? Then go with a pure and simple dict.

Personal:
I mostly work with lists and dictionaries.
It seems that this satisfies most cases.
Sometimes:
Tuples can be helpful--if you want to pair/match elements. Besides that, I don't really use it.
However:
I write high-level scripts that don't need to drill down into the core "efficiency" where every byte and every memory/nanosecond matters. I don't believe most people need to drill this deep.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.