Most effient way of List/Dict Lookups in Python - python

I have a list of dictionaries. Which looks something like,
abc = [{"name":"bob",
"age": 33},
{"name":"fred",
"age": 18},
{"name":"mary",
"age": 64}]
Lets say I want to lookup bobs age. I know I can run a for loop through etc etc. However my questions is are there any quicker ways of doing this.
One thought is to use a loop but break out of the loop once the lookup (in this case the age for bob) has been completed.
The reason for this question is my datasets are thousands of lines long so Im looking for any performance gains I can get.
Edit : I can see you can use the following via the use of a generator, however im not too sure whether this would still iterate over all items of the list or just iterate until the the first dict containing the name bob is found ?
next(item for item in abc if item["name"] == "bob")
Thanks,

Depending on how many times you want to perform this operation, it might be worth defining a dictionary mapping names to the corresponding age (or the list of corresponding ages if more than two people can share the same name).
A dictionary comprehension can help you:
abc_dict = {x["name"]:x["age"] for x in abc}

I'd consider making another dictionary and then using that for multiple age lookups:
for person in abc:
age_by_name[person['name']] = person['age']
age_by_name['bob']
# this is a quick lookup!
Edit: This is equivalent to the dict comprehension listed in Josay's answer

Try indexing it first (once), and then using the index (many times).
You can index it eg. by using dict (keys would be what you are searching by, while the values would be what you are searching for), or by putting the data in the database. That should cover the case if you really have a lot more lookups and rarely need to modify the data.

define dictionary of dictionary like this only
peoples = {"bob":{"name":"bob","age": 33},
"fred":{"name":"fred","age": 18},
"mary": {"name":",mary","age": 64}}
person = peoples["bob"]
persons_age = person["age"]
look up "bob" then look up like "age"
this is correct no ?

You might write a helper function. Here's a take.
import itertools
# First returns the first element encountered in an iterable which
# matches the predicate.
#
# If the element is never found, StopIteration is raised.
# Args:
# pred The predicate which determines a matching element.
#
first = lambda pred, seq: next(itertools.dropwhile(lambda x: not pred(x), seq))

Related

How to find maximum element from a list and its index?

I have a list with ordered dictionaries. These ordered dictionaries have different sizes and can also have the same size(for example, 10 dictionaries can have the length of 30 and 20 dictionaries can have the length of 32). I want to find the maximum number of items a dictionary from the list has. I have tried this, which gets me the correct maximum length:
maximum_len= max(len(dictionary_item) for dictionary_item in item_list)
But how can I find the dictionary fields for which the maximum_len is given? Say that the maximum_len is 30, I want to also have the dictionary with the 30 keys printed. It can be any dictionary with the size 30, not a specific one. I just need the keys of that dictionary.
Well you can always use filter:
output_dics=filter((lambda x: len(x)==maximum_len),item_list)
then you have all the dictionarys that satisfies the condition , pick a random one or the first one
Don't know if this is the easiest or most elegant way to do it but you could just write a simple function that returns 2 values, the max_length you already calculated but also the dict that you can get via the .index method and the max_length of the object you were searching for.
im talking about something like this:
def get_max(list_of_dict):
plot = []
for dict_index, dictionary in enumerate(list_of_dict):
plot.append(len(dictionary))
return max(plot), list_of_dict[plot.index(max(plot))]
maximum_len, max_dict = get_max(test)
tested it, works for my case, although i have just made myself a testlist with just 5 dicts of different length.
EDIT:
changed variable "dict" to "dictionary" to prevent it shadowing from outer scope.

how to sort values from a dictionary in a list

I'm just starting to learn python. We were told about data types (integer, float, boolean), and also about sets, strings, lists etc. A little more about cycles (for/while). In my homework I need code which returns a filtered list of geo_logs containing only visits from India.
I need to do this without difficult functions, alpha or something like that. Only standard cycles.
geo_logs = [
{'visit1': ['Moscow', 'Russia']},
{'visit2': ['Delhi', 'India']},
{'visit3': ['Bangalore', 'India']},
{'visit4': ['Lisbon', 'Portugal']},
{'visit5': ['Paris', 'France']},
{'visit6': ['Mumbai', 'India']},
]
for visit in geo_logs:
if visit.values() == 'India':
print(visit.values())
but this does not return anything.
If possible, write a code and explain it. I want to understand how python works, and not just do homework.
.values() returns a list of all the values in the entire dictionary. Since your values are already a list, now you have a list-of-list, like [['Delhi', 'India']]
Obviously, [['Delhi', 'India']] does not equal 'India'.
Try if 'India' in list(visit.values())[0] instead.
This data structure is a bit confusing -- why do you have different keys visit1, visit2 etc. when the data is in separate dictionaries anyway? Either make them all have the same key visit, or combine them into one large dictionary.
Try this:
for visits in geo_logs:
for visit,[City,Country] in visits.items():
if Country == 'India':
print (visits)

Pythonic way to create a dictionary by iterating

I'm trying to write something that answers "what are the possible values in every column?"
I created a dictionary called all_col_vals and iterate from 1 to however many columns my dataframe has. However, when reading about this online, someone stated this looked too much like Java and the more pythonic way would be to use zip. I can't see how I could use zip here.
all_col_vals = {}
for index in range(RCSRdf.shape[1]):
all_col_vals[RCSRdf.iloc[:,index].name] = set(RCSRdf.iloc[:,index])
The output looks like 'CFN Network': {nan, 'N521', 'N536', 'N401', 'N612', 'N204'}, 'Exam': {'EXRC', 'MXRN', 'HXRT', 'MXRC'} and shows all the possible values for that specific column. The key is the column name.
I think #piRSquared's comment is the best option, so I'm going to steal it as an answer and add some explanation.
Answer
Assuming you don't have duplicate columns, use the following:
{k : {*df[k]} for k in df}
Explanation
k represents a column name in df. You don't have to use the .columns attribute to access them because a pandas.DataFrame works similarly to a python dict
df[k] represents the series k
{*df[k]} unpacks the values from the series and places them in a set ({}) which only keeps distinct elements by definition (see definition of a set).
Lastly, using list comprehension to create the dict is faster than defining an empty dict and adding new keys to it via a for-loop.

Using a for loop to print keys and/or values in a dictionary for python. Looking for logical thinking explanation thanks :D

My problem is understanding why these certain lines of code do what they do. Basically why it works logically. I am using PyCharm python 3 I think.
house_Number = {
"Luca": 1, "David": 2, "Alex": 3, "Kaden": 4, "Kian": 5
}
for item in house_Number:
print(house_Number[item]) # Why does this print the values tied with the key?
print(item) # Why does this print the key?
This is my first question so sorry I don't know how to format the code to make it look nice. My question is why when you use the for loop to print the dictionary key or value the syntax to print the key is to print every item? And what does it even mean to print(house_Number[item]).
They both work to print key or value but I really want to know a logical answer as to why it works this way. Thanks :D
I'm not working on any projects just starting to learn off of codeacademey.
In Python, iteration over a dictionary (for item in dict) is defined as iteration over that dictionary's keys. This is simply how the language was designed -- other languages and collection classes do it differently, iterating, for example, over key-value tuples, templated Pair<X,Y> objects, or what have you.
house_Number[item] accesses the value in house_Number referenced by the key item. [...] is the syntax for indexing in Python (and most other languages); an_array[2] gives the third element of an_array and house_Number[item] gives the value corresponding to the key item in the dictionary house_Number.
Just a side note: Python naming conventions would dictate house_number, not house_Number. Capital letters are generally only used in CamelCasedClassNames and CONSTANTS.
In python values inside a dictionary object are accessed using dictionay_name['KEY']
In your case you are iterating over the keys of dictionary
Hope this helps
for item in dic:
print(item) # key
print(dic[item]) # value
Dictionaries are basically containers containing some items (keys) which are stored by hashing method. These keys just map to the values (dic[key]).
Like in set, if you traverse using for loop, you get the keys from it (in random order since they are hashed). Similarly, dictionaries are just sets with a value associated with it. it makes more sense to iterate the keys as in sets (too in random order).
Read more about dicionaries here https://docs.python.org/3/tutorial/datastructures.html#dictionaries and hopefully that will answer your question. Specifically, look at the .items() method of the dictionary object.
When you type for item in house_Number, you don’t specify whether item is the key or value of house_Number. Then python just thinks that you meant the key of house_Number.
So when you do the function print(house_Number[item]), you’re printing the value because your taking the key and finding the value. In other words, you taking each key once, and finding their values, which are 1, 2, 3, 4, 5, 6
The print(item) is just to print the item, which are the keys, "Luca", "David", "Alex", "Kaden", "Kian"
Because the print(house_Number[item]) and print(item) alternating, you get the keys and values alternating, each on a new line.

Should I use dict or list?

I would like to loop through a big two dimension list:
authors = [["Bob", "Lisa"], ["Alice", "Bob"], ["Molly", "Jim"], ... ]
and get a list that contains all the names that occurs in authors.
When I loop through the list, I need a container to store names I've already seen, I'm wondering if I should use a list or a dict:
with a list:
seen = []
for author_list in authors:
for author in author_list:
if not author in seen:
seen.append(author)
result = seen
with a dict:
seen = {}
for author_list in authors:
for author in author_list:
if not author in seen:
seen[author] = True
result = seen.keys()
which one is faster? or is there better solutions?
You really want a set. Sets are faster than lists because they can only contain unique elements, which allows them to be implemented as hash tables. Hash tables allow membership testing (if element in my_set) in O(1) time. This contrasts with lists, where the only way to check if an element is in the list is to check every element of the list in turn (in O(n) time.)
A dict is similar to a set in that both allow unique keys only, and both are implemented as hash tables. They both allow O(1) membership testing. The difference is that a set only has keys, while a dict has both keys and values (which is extra overhead you don't need in this application.)
Using a set, and replacing the nested for loop with an itertools.chain() to flatten the 2D list to a 1D list:
import itertools
seen = set()
for author in itertools.chain(*authors):
seen.add(author)
Or shorter:
import itertools
seen = set( itertools.chain(*authors) )
Edit (thanks, #jamylak) more memory efficient for large lists:
import itertools
seen = set( itertools.chain.from_iterable(authors) )
Example on a list of lists:
>>> a = [[1,2],[1,2],[1,2],[3,4]]
>>> set ( itertools.chain(*a) )
set([1, 2, 3, 4])
P.S. : If, instead of finding all the unique authors, you want to count the number of times you see each author, use a collections.Counter, a special kind of dictionary optimised for counting things.
Here's an example of counting characters in a string:
>>> a = "DEADBEEF CAFEBABE"
>>> import collections
>>> collections.Counter(a)
Counter({'E': 5, 'A': 3, 'B': 3, 'D': 2, 'F': 2, ' ': 1, 'C': 1})
set should be faster.
>>> authors = [["Bob", "Lisa"], ["Alice", "Bob"], ["Molly", "Jim"]]
>>> from itertools import chain
>>> set(chain(*authors))
set(['Lisa', 'Bob', 'Jim', 'Molly', 'Alice'])
using a dict or a set is way faster then using a list
import itertools
result = set(itertools.chain.from_iterable(authors))
You can use set -
from sets import Set
seen = Set()
for author_list in authors:
for author in author_list:
seen.add(author)
result = seen
This way you are escaping the "if" checking, hence solution would be faster.
If you care about the performance of lookups, lookups in lists are O(n), while lookups in dictionaries are amortised to O(1).
You can find more information here.
Lists just store a bunch of items in a particular order. Think of your list of authors as a long line of pigeonhole boxes with author's names on bits of papers in the boxes. The names stay in the order you put them in, and you can find the author in any particular pigeonhole very easily, but if you want to know if a particular author is in any pigeonhole, then you have to look through each one until you find the name you're after. You can also have the same name in any number of pigeonholes.
Dictionaries are a bit more like a phone book. Given the author's name, you can very quickly check to see whether the author is listed in the phone book, and find the phone number listed with it. But you can only include each author once (with exactly one phone number), and you can't put the authors in there in any order you like, they have to be in the order that makes sense for the phone book. In a real phone book, that order is alphabetical; in Python dictionaries the order is completely unpredictable (and it changes when you add or remove things to the dictionary), but Python can find entries even faster in a dictionary than it could in a phone book.
Sets, on the other hand, are like phone books that just list names, not phone numbers. You still can't have the same name included more than once, it's either in the set or not. And you still can't use the order in which names are in the set for anything useful. But you can very quickly check whether a name is in the set.
Given your use case, a set would appear to be the obvious choice. You don't care about the order in which you've seen authors, or how many times you've seen each author, only that you can quickly check whether you've seen a particular author before.
Lists are bad for this case; they go to the effort of remembering duplicates in whatever order you specify, and they're slow to search. But you also don't have any need to map keys to values, which is what a dictionary does. To go back to the phone book analogy, you don't have anything equivalent to a "phone number"; in your dictionary example you're doing the equivalent of writing a phone book in which everybody's number is listed as True, so why bother listing the phone numbers at all?
A set, OTOH, does exactly what you need.

Categories