I have a dictionary variable with several thousands of items. For the purpose of writing code and debugging, I want to temporarily reduce its size to more easily work with it (i.e. check contents by printing). I don't really care which items get removed for this purpose. I tried to keep only 10 first keys with this code:
i = 0
for x in dict1:
if i >= 10:
dict1.pop(x)
i += 1
but I get the error:
RuntimeError: dictionary changed size during iteration
What is the best way to do it?
You could just rewrite the dictionary selecting a slice from its items.
dict(list(dict1.items())[:10])
Select some random keys to delete first, then iterate over that list and remove them.
import random
keys = random.sample(list(dict1.keys()), k=10)
for k in keys:
dict1.pop(k)
You can convert the dictionary into a list of items, split, and convert back to a dictionary like this:
splitPosition = 10
subDict = dict(list(dict1.items())[:splitPosition])
Related
I have a list with ordered dictionaries. These ordered dictionaries have different sizes and can also have the same size(for example, 10 dictionaries can have the length of 30 and 20 dictionaries can have the length of 32). I want to find the maximum number of items a dictionary from the list has. I have tried this, which gets me the correct maximum length:
maximum_len= max(len(dictionary_item) for dictionary_item in item_list)
But how can I find the dictionary fields for which the maximum_len is given? Say that the maximum_len is 30, I want to also have the dictionary with the 30 keys printed. It can be any dictionary with the size 30, not a specific one. I just need the keys of that dictionary.
Well you can always use filter:
output_dics=filter((lambda x: len(x)==maximum_len),item_list)
then you have all the dictionarys that satisfies the condition , pick a random one or the first one
Don't know if this is the easiest or most elegant way to do it but you could just write a simple function that returns 2 values, the max_length you already calculated but also the dict that you can get via the .index method and the max_length of the object you were searching for.
im talking about something like this:
def get_max(list_of_dict):
plot = []
for dict_index, dictionary in enumerate(list_of_dict):
plot.append(len(dictionary))
return max(plot), list_of_dict[plot.index(max(plot))]
maximum_len, max_dict = get_max(test)
tested it, works for my case, although i have just made myself a testlist with just 5 dicts of different length.
EDIT:
changed variable "dict" to "dictionary" to prevent it shadowing from outer scope.
I currently have a list which stores the URLs that I have read from a file. I then made a dictionary by mapping those URLs to a simple key (0,1,2,3, etc.).
Now I want to make sure that if the same URL shows up again that it doesn't get mapped to a different key. So I am trying to make a conditional statement to check for that. Basically I want to check if the item in the list( the URL) is the same as the value of any of the keys in the dictionary. If it is I don't want to add it again since that would be redundant.
I'm not sure what to put inside the if conditional statement for this to work.
pairs = {} #my dictionary
for i in list1:
if ( i == t for t in pairs ):
i = i +1
else:
pairs[j] = i
j = j + 1
Any help would be appreciated!
Thank you.
This might be what you're looking for. It adds the unique values to pairs, numbering them starting at zero. Duplicate values are ignored and do not affect the numbering:
pairs = {}
v = 0
for k in list1:
if k not in pairs:
pairs[k] = v
v += 1
To map items in a list to increasing integer keys there's this delightful idiom, which uses a collections.defaultdict with its own length as a default factory
import collections
map = collections.defaultdict()
items = 'aaaabdcdvsaafvggddd'
map.default_factory = map.__len__
for x in items:
map[x]
print(map)
You can access all values in your dict (pairs) by a simple pairs.values(). In your if condition, you can just add another condition that checks if the new item already exists in the values of your dictionary.
I have a dictionary like one below (but with 10k key-value pairs):
test_dict={'2*foo*+':['5','10'],'3*bar*-':['15','20']}
Is there a way in python to find an element which key.split("*")[0]==2, key.split("*")[2]=="+" and val[1]<15 without looping through the dictionary. Its easy to do by for loop, but in my case this is a part of a bigger code which is nested into another for loop, so it will take very long to finish.
Thanks,
As asked, the answer is no. There is no way to test the keys and values of a dictionary without looking at each one in turn until you find a match.
However, if you build a more complex datastructure (possibly consisting of a series of dicts) so that entries are also indexed by key.split("*")[0], then you would only have to loop over those elements.
(It does sound like you are trying to build an in-memory database though - you might well be better off just using a proper database, and relying on the caching to keep most of it in memory.)
You can use filter with a set():
test_dict={'2*foo*+':['5','10'],'3*bar*-':['15':'20']}
possibilities = list(filter(lambda x: int(x[0].split("*")[0]) == 2 and x[0].split("*")[2] == "+" and int(x[1][1]) < 15, test_dict.items()))
You note that "this is a part of a bigger code which is nested into another for loop" so I suggest that you build an index of key parts before your outer loop. Your indexes will contain sets of keys matching individual conditions. Because they contain sets you can find fast intersections to find keys that satisfy your key condition.
from collections import defaultdict
key_index_num = defaultdict(set)
key_index_word = defaultdict(set)
key_index_sign = defaultdict(set)
for key in test_dict:
num, word, sign = key.split('*')
key_index_num[num].add(key)
key_index_word[word].add(key)
key_index_sign[sign].add(key)
Then it will be easy to find keys in your inner loop. Let's say you want to find all keys that have num == '2' and sign == '+'. Find the keys by doing:
keys = key_index_num['2'].intersection(key_index_sign['+'])
Note: I have built three indexes, but if the three parts of your key are always unique you can build a single key index. The code would then look like this:
from collections import defaultdict
key_index = defaultdict(set)
for key in test_dict:
for key_part in key.split('*'):
key_index[key_part].add(key)
And keys search would look like:
keys = key_index['2'].intersection(key_index['+'])
I want to go through an x number of the most recently added entries of an ordered dictionary. So far, the only way I can think of is this:
listLastKeys = orderedDict.keys()[-x:]
for key in listLastKeys:
#do stuff with orderedDict[key]
But it feels redundant and somewhat wasteful to make another list and go through the ordered dictionary with that list when the ordered dictionary should already know what order it is in. Is there an alternative way? Thanks!
Iterate over the dict in reverse and apply an itertools.islice:
from itertools import islice
for key in islice(reversed(your_ordered_dict), 5):
# do something with last 5 keys
Instead of reversing it like you are, you can loop through it in reverse order using reversed(). Example:
D = {0 : 'h', 1: 'i', 2:'j'}
x = 1
for key in reversed(D.keys()):
if x == key:
break
You could keep a list of the keys present in the dictionary last run.
I don't know the exact semantics of your program, but this is a function that will check for new keys.
keys=[] #this list should be global
def checkNewKeys(myDict):
for key, item in myDict.iteritems():
if key not in keys:
#do what you want with new keys
keys.append(key)
This basically keep track of what was in the dictionary the whole run of your program, without needing to create a new list every time.
I have a dictionary and a list. The list is made up of values. The dictionary has all of the values plus some more values.
I'm trying to count the number of times the values in the list show up in the dictionary per key/values pair.
It looks something like this:
for k in dict:
count = 0
for value in dict[k]:
if value in list:
count += 1
list.remove(value)
dict[k].append(count)
I have something like ~1 million entries in the list so searching through each time is ultra slow.
Is there some faster way to do what I'm trying to do?
Thanks,
Rohan
You're going to have all manner of trouble with this code, since you're both removing items from your list and using an index into it. Also, you're using list as a variable name, which gets you into interesting trouble as list is also a type.
You should be able to get a huge performance improvement (once you fix the other defects in your code) by using a set instead of a list. What you lose by using a set is the ordering of the items and the ability to have an item appear in the list more than once. (Also your items have to be hashable.) What you gain is O(1) lookup time.
If you search in a list, then convert this list to a set, it will be much faster:
listSet = set(list)
for k, values in dict.iteritems():
count = 0
for value in values:
if value in listSet:
count += 1
listSet.remove(value)
dict[k].append(count)
list = [elem for elem in list if elem in listSet]
# return the original list without removed elements
for val in my_list:
if val in my_dict:
my_dict[val] = my_dict[val] + 1
else:
my_dict[val] = 0
What you still need
Handle case when val is not in dict
I changed the last line to append to the dictionary. It's a defaultdict(list). Hopefully that clears up some of the questions. Thanks again.