How to get a subset from an OrderedDict? - python

I have an OrderedDict in Python, and I only want to get the first key-vale pairs. How to get it? For example, to get the first 4 elements, i did the following:
subdict = {}
for index, pair in enumerate(my_ordered_dict.items()):
if index < 4:
subdict[pair[0]] = pair[1]
Is this the good way to do it?

That approach involves running over the whole dictionary even though you only need the first four elements, checking the index over and over, and manually unpacking the pairs, and manually performing index checking unnecessarily.
Making it short-circuit is easy:
subdict = {}
for index, pair in enumerate(my_ordered_dict.items()):
if index >= 4:
break # Ends the loop without iterating all of my_ordered_dict
subdict[pair[0]] = pair[1]
and you can nested the unpacking to get nicer names:
subdict = {}
# Inner parentheses mandatory for nested unpacking
for index, (key, val) in enumerate(my_ordered_dict.items()):
if index >= 4:
break # Ends the loop
subdict[key] = value
but you can improve on that with itertools.islice to remove the manual index checking:
from itertools import islice # At top of file
subdict = {}
# islice lazily produces the first four pairs then stops for you
for key, val in islice(my_ordered_dict.items(), 4):
subdict[key] = value
at which point you can actually one-line the whole thing (because now you have an iterable of exactly the four pairs you want, and the dict constructor accepts an iterable of pairs):
subdict = dict(islice(my_ordered_dict.items(), 4))

You can use a map function, like this
item = dict(map(lambda x: (x, subdict[x]),[*subdict][:4]))

Here is one approach:
sub_dict = dict(pair for i, pair in zip(range(4), my_ordered_dict.items()))
The length of zip(a,b) is equal to the length of the shortest of a and b, so if my_ordered_dict.items() is longer than 4, zip(range(4), my_ordered_dict.items() just takes the first 4 items. These key-value pairs are passed to the dict builtin to make a new dict.

Related

Using multiple variables in a for loop in Python

I am trying to get a deeper understanding to how for loops for different data types in Python. The simplest way of using a for loop an iterating over an array is as
for i in range(len(array)):
do_something(array[i])
I also know that I can
for i in array:
do_something(i)
What I would like to know is what this does
for i, j in range(len(array)):
# What is i and j here?
or
for i, j in array:
# What is i and j in this case?
And what happens if I try using this same idea with dictionaries or tuples?
The simplest and best way is the second one, not the first one!
for i in array:
do_something(i)
Never do this, it's needlessly complicating the code:
for i in range(len(array)):
do_something(array[i])
If you need the index in the array for some reason (usually you don't), then do this instead:
for i, element in enumerate(array):
print("working with index", i)
do_something(element)
This is just an error, you will get TypeError: 'int' object is not iterable when trying to unpack one integer into two names:
for i, j in range(len(array)):
# What is i and j here?
This one might work, assumes the array is "two-dimensional":
for i, j in array:
# What is i and j in this case?
An example of a two-dimensional array would be a list of pairs:
>>> for i, j in [(0, 1), ('a', 'b')]:
... print('i:', i, 'j:', j)
...
i: 0 j: 1
i: a j: b
Note: ['these', 'structures'] are called lists in Python, not arrays.
Your third loop will not work as it will throw a TypeError for an int not being iterable. This is because you are trying to "unpack" the int that is the array's index into i, and j which is not possible. An example of unpacking is like so:
tup = (1,2)
a,b = tup
where you assign a to be the first value in the tuple and b to be the second. This is also useful when you may have a function return a tuple of values and you want to unpack them immediately when calling the function. Like,
train_X, train_Y, validate_X, validate_Y = make_data(data)
More common loop cases that I believe you are referring to is how to iterate over an arrays items and it's index.
for i, e in enumerate(array):
...
and
for k,v in d.items():
...
when iterating over the items in a dictionary. Furthermore, if you have two lists, l1 and l2 you can iterate over both of the contents like so
for e1, e2 in zip(l1,l2):
...
Note that this will truncate the longer list in the case of unequal lengths while iterating. Or say that you have a lists of lists where the outer lists are of length m and the inner of length n and you would rather iterate over the elements in the inner lits grouped together by index. This is effectively iterating over the transpose of the matrix, you can use zip to perform this operation as well.
for inner_joined in zip(*matrix): # will run m times
# len(inner_joined) == m
...
Python's for loop is an iterator-based loop (that's why bruno desthuilliers says that it "works for all iterables (lists, tuples, sets, dicts, iterators, generators etc)". A string is also another common type of iterable).
Let's say you have a list of tuples. Using that nomenclature you shared, one can iterate through both the keys and values simultaneously. For instance:
tuple_list = [(1, "Countries, Cities and Villages"),(2,"Animals"),(3, "Objects")]
for k, v in tuple_list:
print(k, v)
will give you the output:
1 Countries, Cities and Villages
2 Animals
3 Objects
If you use a dictionary, you'll also gonna be able to do this. The difference here is the need for .items()
dictionary = {1: "Countries, Cities and Villages", 2: "Animals", 3: "Objects"}
for k, v in dictionary.items():
print(k, v)
The difference between dictionary and dictionary.items() is the following
dictionary: {1: 'Countries, Cities and Villages', 2: 'Animals', 3: 'Objects'}
dictionary.items(): dict_items([(1, 'Countries, Cities and Villages'), (2, 'Animals'), (3, 'Objects')])
Using dictionary.items() we'll get a view object containig the key-value pairs of the dictionary, as tuples in a list. In other words, with dictionary.items() you'll also get a list of tuples. If you don't use it, you'll get
TypeError: cannot unpack non-iterable int object
If you want to get the same output using a simple list, you'll have to use something like enumerate()
list = ["Countries, Cities and Villages","Animals", "Objects"]
for k, v in enumerate(list, 1): # 1 means that I want to start from 1 instead of 0
print(k, v)
If you don't, you'll get
ValueError: too many values to unpack (expected 2)
So, this naturally raises the question... do I need always a list of tuples? No. Using enumerate() we'll get an enumerate object.
Actually, "the simplest way of using a for loop an iterating over an array" (the Python type is named "list" BTW) is the second one, ie
for item in somelist:
do_something_with(item)
which FWIW works for all iterables (lists, tuples, sets, dicts, iterators, generators etc).
The range-based C-style version is considered highly unpythonic, and will only work with lists or list-like iterables.
What I would like to know is what this does
for i, j in range(len(array)):
# What is i and j here?
Well, you could just test it by yourself... But the result is obvious: it will raise a TypeError because unpacking only works on iterables and ints are not iterable.
or
for i, j in array:
# What is i and j in this case?
Depends on what is array and what it yields when iterating over it. If it's a list of 2-tuples or an iterator yielding 2-tuples, i and j will be the elements of the current iteration item, ie:
array = [(letter, ord(letter)) for letter in "abcdef"]
for letter, letter_ord in array:
print("{} : {}".format(letter, letter_ord))
Else, it will most probably raise a TypeError too.
Note that if you want to have both the item and index, the solution is the builtin enumerate(sequence), which yields an (index, item) tuple for each item:
array = list("abcdef")
for index, letter in enumerate(array):
print("{} : {}".format(index, letter)
Understood your question, explaining it using a different example.
1. Multiple Assignments using -> Dictionary
dict1 = {1: "Bitcoin", 2: "Ethereum"}
for key, value in dict1.items():
print(f"Key {key} has value {value}")
print(dict1.items())
Output:
Key 1 has value Bitcoin
Key 2 has value Ethereum
dict_items([(1, 'Bitcoin'), (2, 'Ethereum')])
Explaining dict1.items():
dict1_items() creates values dict_items([(1, 'Bitcoin'), (2, 'Ethereum')])
It comes in pairs (key, value) for each iteration.
2. Multiple Assignments using -> enumerate() Function
coins = ["Bitcoin", "Ethereum", "Cardano"]
prices = [48000, 2585, 2]
for i, coin in enumerate(coins):
price = prices[i]
print(f"${price} for 1 {coin}")
Output:
$48000 for 1 Bitcoin
$2585 for 1 Ethereum
$2 for 1 Cardano
Explaining enumerate(coins):
enumerate(coins) creates values ((0, 'Bitcoin'), (1, 'Ethereum'), (2, 'Cardano'))
It comes in pairs (index, value) for each (one) iteration
3. Multiple Assignments using -> zip() Function
coins = ["Bitcoin", "Ethereum", "Cardano"]
prices = [48000, 2585, 2]
for coin, price in zip(coins, prices):
print(f"${price} for 1 {coin}")
Output:
$48000 for 1 Bitcoin
$2585 for 1 Ethereum
$2 for 1 Cardano
Explaining
zip(coins, prices) create values (('Bitcoin', 48000), ('Ethereum', 2585), ('Cardano', 2))
It comes in pairs (value-list1, value-list2) for each (one) iteration.
I just wanted to add that, even in Python, you can get a for in effect using a classic C style loop by just using a local variable
l = len(mylist) #I often need to use this more than once anyways
for n in range(l-1):
i = mylist[n]
print("iterator:",n, " item:",i)

Parse a list, check if it has elements from another list and print out these elements

I have a list populated from entries of a log; for sake of simplicity, something like
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"......]
This list can have an undefined number of entry, which may or may not be in sequence, since I run multiple operations in async fashion.
Then I have another list, which I use as reference to get only the list of entries; which may be like
list_template = ["entry1", "entry2", "entry3"]
I am trying to use the second list, to get sequences of entries, so I can isolate the single sequence, taking only the first instance found of each entry.
Since I am not dealing with numbers, I can't use set, so I did try with a loop inside a loop, comparing values in each list
This does not work, because it is possible that another entry may happen before what I am looking for (say, I want entry1, entry2, entry3, and the loop find entry1, but then find entry3, and since I compare every element of each list, it will be happy to find an element)
for item in listlog:
entry, value = item.split(":")
for reference_entry in list_template:
if entry == reference_entry:
print item
break
I have to, in a nutshell, find a sequence as in the template list, while these items are not necessarily in order. I am trying to parse the list once, otherwise I could do a very expensive multi-pass for each element of the template list, until I find the first occurrence and bail out. I thought that doing the loop in the loop is more efficient, since my reference list is always smaller than the log list, which is usually few elements.
How would you approach this problem, in the most efficient and pythonic way? All that I can think of, is multiple passes on the log list
you can use dict:
>>> listlog
['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
>>> list_template
['entry1', 'entry2', 'entry3']
>>> for x in listlog:
... key, value = x.split(":")
... if key not in my_dict and key in list_template:
... my_dict[key] = value
...
>>> my_dict
{'entry2': 'abbds', 'entry3': 'orieqor', 'entry1': 'abcde'}
Disclaimer : This answer could use someone's insight on performance. Sure, list/dict comprehensions and zip are pythonic but the following may very well be a poor use of those tools.
You could use zip :
>>> data = ["a:12", "b:32", "c:54"]
>>> ref = ['c', 'b']
>>> matches = zip(ref, [val for key,val in [item.split(':') for item in data] if key in ref])
>>> for k, v in matches:
>>> print("{}:{}".format(k, v))
c:32
b:54
Here's another (worse? I'm not sure, performance-wise) way to get around this :
>>> data = ["a:12", "b:32", "c:54"]
>>> data_dict = {x:y for x,y in [item.split(':') for item in data]}
>>> ["{}:{}".format(key, val) for key,val in md.items() if key in ref]
['b:32', 'c:54']
Explanation :
Convert your initial list into a dict using a dict
For each pair of (key, val) found in the dict, join both in a string if the key is found in the 'ref' list
You can use a list comprehension something like this:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
print([item for item in listlog if re.search('entry', item)])
# ['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
Than u can split 'em as u wish and create a dictonary if u want:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
mylist = [item for item in listlog if re.search('entry', item)]
def create_dict(string, dict_splitter=':'):
_dict = {}
temp = string.split(dict_splitter)
key = temp[0]
value = temp[1]
_dict[key] = value
return _dict
mydictionary = {}
for x in mylist:
x = str(x)
mydictionary.update(create_dict(x))
for k, v in mydictionary.items():
print(k, v)
# entry1 eorieo
# entry2 iroewiow
# entry3 orieqor
As you see this method need an update, cause we have changing the dictionary value. That's bad. Most better to update value for the same key. But it's much easier as u can think

Creating a list by iterating over a dictionary

I defined a dictionary like this (list is a list of integers):
my_dictionary = {'list_name' : list, 'another_list_name': another_list}
Now, I want to create a new list by iterating over this dictionary. In the end, I want it to look like this:
my_list = [list_name_list_item1, list_name_list_item2,
list_name_list_item3, another_list_name_another_list_item1]
And so on.
So my question is: How can I realize this?
I tried
for key in my_dictionary.keys():
k = my_dictionary[key]
for value in my_dictionary.values():
v = my_dictionary[value]
v = str(v)
my_list.append(k + '_' + v)
But instead of the desired output I receive a Type Error (unhashable type: 'list') in line 4 of this example.
You're trying to get a dictionary item by it's value whereas you already have your value.
Do it in one line using a list comprehension:
my_dictionary = {'list_name' : [1,4,5], 'another_list_name': [6,7,8]}
my_list = [k+"_"+str(v) for k,lv in my_dictionary.items() for v in lv]
print(my_list)
result:
['another_list_name_6', 'another_list_name_7', 'another_list_name_8', 'list_name_1', 'list_name_4', 'list_name_5']
Note that since the order in your dictionary is not guaranteed, the order of the list isn't either. You could fix the order by sorting the items according to keys:
my_list = [k+"_"+str(v) for k,lv in sorted(my_dictionary.items()) for v in lv]
Try this:
my_list = []
for key in my_dictionary:
for item in my_dictionary[key]:
my_list.append(str(key) + '_' + str(item))
Hope this helps.
Your immediate problem is that dict().values() is a generator yielding the values from the dictionary, not the keys, so when you attempt to do a lookup on line 4, it fails (in this case) as the values in the dictionary can't be used as keys. In another case, say {1:2, 3:4}, it would fail with a KeyError, and {1:2, 2:1} would not raise an error, but likely give confusing behaviour.
As for your actual question, lists do not attribute any names to data, like dictionaries do; they simply store the index.
def f()
a = 1
b = 2
c = 3
l = [a, b, c]
return l
Calling f() will return [1, 2, 3], with any concept of a, b, and c being lost entirely.
If you want to simply concatenate the lists in your dictionary, making a copy of the first, then calling .extend() on it will suffice:
my_list = my_dictionary['list_name'][:]
my_list.extend(my_dictionary['another_list_name'])
If you're looking to keep the order of the lists' items, while still referring to them by name, look into the OrderedDict class in collections.
You've written an outer loop over keys, then an inner loop over values, and tried to use each value as a key, which is where the program failed. Simply use the dictionary's items method to iterate over key,value pairs instead:
["{}_{}".format(k,v) for k,v in d.items()]
Oops, failed to parse the format desired; we were to produce each item in the inner list. Not to worry...
d={1:[1,2,3],2:[4,5,6]}
list(itertools.chain(*(
["{}_{}".format(k,i) for i in l]
for (k,l) in d.items() )))
This is a little more complex. We again take key,value pairs from the dictionary, then make an inner loop over the list that was the value and format those into strings. This produces inner sequences, so we flatten it using chain and *, and finally save the result as one list.
Edit: Turns out Python 3.4.3 gets quite confused when doing this nested as generator expressions; I had to turn the inner one into a list, or it would replace some combination of k and l before doing the formatting.
Edit again: As someone posted in a since deleted answer (which confuses me), I'm overcomplicating things. You can do the flattened nesting in a chained comprehension:
["{}_{}".format(k,v) for k,l in d.items() for v in l]
That method was also posted by Jean-François Fabre.
Use list comprehensions like this
d = {"test1":[1,2,3,],"test2":[4,5,6],"test3":[7,8,9]}
new_list = [str(item[0])+'_'+str(v) for item in d.items() for v in item[1]]
Output:
new_list:
['test1_1',
'test1_2',
'test1_3',
'test3_7',
'test3_8',
'test3_9',
'test2_4',
'test2_5',
'test2_6']
Let's initialize our data
In [1]: l0 = [1, 2, 3, 4]
In [2]: l1 = [10, 20, 30, 40]
In [3]: d = {'name0': l0, 'name1': l1}
Note that in my example, different from yours, the lists' content is not strings... aren't lists heterogeneous containers?
That said, you cannot simply join the keys and the list's items, you'd better cast these value to strings using the str(...) builtin.
Now it comes the solution to your problem... I use a list comprehension
with two loops, the outer loop comes first and it is on the items (i.e., key-value couples) in the dictionary, the inner loop comes second and it is on the items in the corresponding list.
In [4]: res = ['_'.join((str(k), str(i))) for k, l in d.items() for i in l]
In [5]: print(res)
['name0_1', 'name0_2', 'name0_3', 'name0_4', 'name1_10', 'name1_20', 'name1_30', 'name1_40']
In [6]:
In your case, using str(k)+'_'+str(i) would be fine as well, but the current idiom for joining strings with a fixed 'text' is the 'text'.join(...) method. Note that .join takes a SINGLE argument, an iterable, and hence in the list comprehension I used join((..., ...))
to collect the joinands in a single argument.

Dictionary with lists as values - find longest list

I have a dictionary where the values are lists. I need to find which key has the longest list as value, after removing the duplicates. If i just find the longest list this won't work as there may be a lot of duplicates. I have tried several things, but nothing is remotely close to being correct.
d = # your dictionary of lists
max_key = max(d, key= lambda x: len(set(d[x])))
# here's the short version. I'll explain....
max( # the function that grabs the biggest value
d, # this is the dictionary, it iterates through and grabs each key...
key = # this overrides the default behavior of max
lambda x: # defines a lambda to handle new behavior for max
len( # the length of...
set( # the set containing (sets have no duplicates)
d[x] # the list defined by key `x`
)
)
)
Since the code for max iterates through the dictionaries' keys (that's what a dictionary iterates through, by the by. for x in dict: print x will print each key in dict) it will return the key that it finds to have the highest result when it applies the function we built (that's what the lambda does) for key=. You could literally do ANYTHING here, that's the beauty of it. However, if you wanted the key AND the value, you might be able to do something like this....
d = # your dictionary
max_key, max_value = max(d.items(), key = lambda k,v: len(set(v)))
# THIS DOESN'T WORK, SEE MY NOTE AT BOTTOM
This differs because instead of passing d, which is a dictionary, we pass d.items(), which is a list of tuples built from d's keys and values. As example:
d = {"foo":"bar", "spam":['green','eggs','and','ham']}
print(d.items())
# [ ("foo", "bar"),
# ("spam", ["green","eggs","and","ham"])]
We're not looking at a dictionary anymore, but all the data is still there! It makes it easier to deal with using the unpack statement I used: max_key, max_value =. This works the same way as if you did WIDTH, HEIGHT = 1024, 768. max still works as usual, it iterates through the new list we built with d.items() and passes those values to its key function (the lambda k,v: len(set(v))). You'll also notice we don't have to do len(set(d[k])) but instead are operating directly on v, that's because d.items() has already created the d[k] value, and using lambda k,v is using that same unpack statement to assign the key to k and the value to v.
Magic! Magic that doesn't work, apparently. I didn't dig deep enough here, and lambdas cannot, in fact, unpack values on their own. Instead, do:
max_key, max_value = max(d.items(), key = lambda x: len(set(x[1])))
for less advanced user this can be a solution:
longest = max(len(item) for item in your_dict.values())
result = [item for item in your_dict.values() if len(item) == longest]

iterating quickly through list of tuples

I wonder whether there's a quicker and less time consuming way to iterate over a list of tuples, finding the right match. What I do is:
# this is a very long list.
my_list = [ (old1, new1), (old2, new2), (old3, new3), ... (oldN, newN)]
# go through entire list and look for match
for j in my_list:
if j[0] == VALUE:
PAIR_FOUND = True
MATCHING_VALUE = j[1]
break
this code can take quite some time to execute, depending on the number of items in the list. I'm sure there's a better way of doing this.
I think that you can use
for j,k in my_list:
[ ... stuff ... ]
Assuming a bit more memory usage is not a problem and if the first item of your tuple is hashable, you can create a dict out of your list of tuples and then looking up the value is as simple as looking up a key from the dict. Something like:
dct = dict(tuples)
val = dct.get(key) # None if item not found else the corresponding value
EDIT: To create a reverse mapping, use something like:
revDct = dict((val, key) for (key, val) in tuples)
The question is dead but still knowing one more way doesn't hurt:
my_list = [ (old1, new1), (old2, new2), (old3, new3), ... (oldN, newN)]
for first,*args in my_list:
if first == Value:
PAIR_FOUND = True
MATCHING_VALUE = args
break
The code can be cleaned up, but if you are using a list to store your tuples, any such lookup will be O(N).
If lookup speed is important, you should use a dict to store your tuples. The key should be the 0th element of your tuples, since that's what you're searching on. You can easily create a dict from your list:
my_dict = dict(my_list)
Then, (VALUE, my_dict[VALUE]) will give you your matching tuple (assuming VALUE exists).
I wonder whether the below method is what you want.
You can use defaultdict.
>>> from collections import defaultdict
>>> s = [('red',1), ('blue',2), ('red',3), ('blue',4), ('red',1), ('blue',4)]
>>> d = defaultdict(list)
>>> for k, v in s:
d[k].append(v)
>>> sorted(d.items())
[('blue', [2, 4, 4]), ('red', [1, 3, 1])]

Categories