Searching a python list quickly? - python

I have a dictionary and a list. The list is made up of values. The dictionary has all of the values plus some more values.
I'm trying to count the number of times the values in the list show up in the dictionary per key/values pair.
It looks something like this:
for k in dict:
count = 0
for value in dict[k]:
if value in list:
count += 1
list.remove(value)
dict[k].append(count)
I have something like ~1 million entries in the list so searching through each time is ultra slow.
Is there some faster way to do what I'm trying to do?
Thanks,
Rohan

You're going to have all manner of trouble with this code, since you're both removing items from your list and using an index into it. Also, you're using list as a variable name, which gets you into interesting trouble as list is also a type.
You should be able to get a huge performance improvement (once you fix the other defects in your code) by using a set instead of a list. What you lose by using a set is the ordering of the items and the ability to have an item appear in the list more than once. (Also your items have to be hashable.) What you gain is O(1) lookup time.

If you search in a list, then convert this list to a set, it will be much faster:
listSet = set(list)
for k, values in dict.iteritems():
count = 0
for value in values:
if value in listSet:
count += 1
listSet.remove(value)
dict[k].append(count)
list = [elem for elem in list if elem in listSet]
# return the original list without removed elements

for val in my_list:
if val in my_dict:
my_dict[val] = my_dict[val] + 1
else:
my_dict[val] = 0
What you still need
Handle case when val is not in dict

I changed the last line to append to the dictionary. It's a defaultdict(list). Hopefully that clears up some of the questions. Thanks again.

Related

How to remove random excess keys and values from dictionary in Python

I have a dictionary variable with several thousands of items. For the purpose of writing code and debugging, I want to temporarily reduce its size to more easily work with it (i.e. check contents by printing). I don't really care which items get removed for this purpose. I tried to keep only 10 first keys with this code:
i = 0
for x in dict1:
if i >= 10:
dict1.pop(x)
i += 1
but I get the error:
RuntimeError: dictionary changed size during iteration
What is the best way to do it?
You could just rewrite the dictionary selecting a slice from its items.
dict(list(dict1.items())[:10])
Select some random keys to delete first, then iterate over that list and remove them.
import random
keys = random.sample(list(dict1.keys()), k=10)
for k in keys:
dict1.pop(k)
You can convert the dictionary into a list of items, split, and convert back to a dictionary like this:
splitPosition = 10
subDict = dict(list(dict1.items())[:splitPosition])

Fastest way to find and add Integers in List in Python

I have a lot of high integers and I want to decrease their processing capacity by adding them to a list and link the integer to the index number of the list.
For example I add the Integer 656853 to a new list. Its Index is 0 now, since it's the first entry. For the next integer I check the list to see if the value already exists. If not, I add the integer to the list. This should be done with multiple integers.
What is the fastest way to find and add integers to a List?
Is it a good idea to sort them, so they are quicker to find?
Use a set(). This will make sure you will have only one entry for given int.
int_set = set()
int_set.add(7)
int_set.add(7)
# there is only one 7 in the int set
unique_list = list(set(huge_list_of_integers))
If you need indexed reference:
for i,num in enumerate(unique_list):
print(i, num)
Or map to a dict of index:value:
unique_dict = dict(enumerate(set(huge_list_of_integers)))
for i,v in unique_dict.items():
print(i, v)
Dict by value:index:
unique_dict = {v:i for i,v in enumerate(set(huge_list_of_integers))}
for v,i in unique_dict.items():
print(v,i)

How do I compare the values mapped to keys in a dictionary with elements in list to avoid redundancy?

I currently have a list which stores the URLs that I have read from a file. I then made a dictionary by mapping those URLs to a simple key (0,1,2,3, etc.).
Now I want to make sure that if the same URL shows up again that it doesn't get mapped to a different key. So I am trying to make a conditional statement to check for that. Basically I want to check if the item in the list( the URL) is the same as the value of any of the keys in the dictionary. If it is I don't want to add it again since that would be redundant.
I'm not sure what to put inside the if conditional statement for this to work.
pairs = {} #my dictionary
for i in list1:
if ( i == t for t in pairs ):
i = i +1
else:
pairs[j] = i
j = j + 1
Any help would be appreciated!
Thank you.
This might be what you're looking for. It adds the unique values to pairs, numbering them starting at zero. Duplicate values are ignored and do not affect the numbering:
pairs = {}
v = 0
for k in list1:
if k not in pairs:
pairs[k] = v
v += 1
To map items in a list to increasing integer keys there's this delightful idiom, which uses a collections.defaultdict with its own length as a default factory
import collections
map = collections.defaultdict()
items = 'aaaabdcdvsaafvggddd'
map.default_factory = map.__len__
for x in items:
map[x]
print(map)
You can access all values in your dict (pairs) by a simple pairs.values(). In your if condition, you can just add another condition that checks if the new item already exists in the values of your dictionary.

Iterating through a dictionary that starts at the first key value each iteration

I have a dictionary that contains usernames in the key, and dates in the value. Because I cannot sort the dictionary alphabetically, I want to iterate through the dictionary keys (username) to match an item in a separate alphabetical ordered list that holds all the usernames and then print the associated value (date) in a spreadsheet. The problem I am running into is that when it is iterating through the dictionary, it doesn't start at the beginning dictionary key each time and the next item to match in the list has already been iterated through in the dictionary. Is there a specific iteration tool that will start at the beginning each time?
Code I have so far:
x=0
nd=0
for k, v in namedatedict.items():
if k == usernamelist[x]:
sh1.write(nd, col3_name, v)
x += 1
nd += 1
To give you some background on what I am doing, I am trying to find the date that matches the specific username and print it next to the associated username in a spreadsheet. There may be a completely different way to iterate through these value but this is the only way I could think of. Any help/guidance would be appreciated.
Based on your answer to my comment, it seems that really the only reason you have the alphabetical list is to get sort-order. In that case, this problem becomes much simpler; Python supports sorting lists of tuples by one of their elements via the sorted() method. And dictionaries can provide a list of key-value tuples via the items() method. So, this just becomes:
for name, date in sorted(namedatedict.items(), key = lambda entry: entry[0]):
# do whatever you need with the name and date. Names will be alphabetical.
What about something like that:
for k in usernamelist:
sh1.write(nd, col3_name, namedatedict[k])
x += 1
nd += 1
This will iterate through the list of keys ("username") -- and get each value from the dictionary based on that key.
If usernamelist is not a proper sub-set of the keys of namedatedict, you should use get to get the value an be prepared to handle the case of non-existent key:
for k in usernamelist:
value = namedatedict.get(k)
if value is not None:
sh1.write(nd, col3_name, value)
x += 1
nd += 1
you could handle that using exception too:
for k in usernamelist:
try:
sh1.write(nd, col3_name, namedatedict[k])
x += 1
nd += 1
except KeyError:
pass
Finally, if you don't need to maintain a separate list to get keys in order, you could use sorted on the keys:
for k in sorted(namedatedict.keys())
sh1.write(nd, col3_name, namedatedict[k])
x += 1
nd += 1

Dedupe python list based on multiple criteria

I have a list:
mylist = [('Item A','CA','10'),('Item B','CT','12'),('Item A','CA','14'),('Item A','NH','10')]
I would like to remove duplicates based on column 1 and 2. So my desired output would be:
[('Item A','CA','10'),('Item B','CT','12'),('Item A','NH','10')]
I'm not really sure how to go about this, so I haven't posted any code, but am just looking for some help :)
Use a dict. The other answer is good. For variety, here's a single expression that will give you the uniq'd list (though the order of elements is not preserved).
{ tuple(item[0:2]):item for item in mylist[::-1] }.values()
This creates a dict from the elements of mylist using elements 0 and 1 as the key (implicitly removing duplicates). Because mylist is iterated in reverse order, the last element with a duplicate key (elements 0 and 1) will remain in the dict.
Dict keys can be of any hashable type. Create a dict with the first two columns of each item as the key, and only add to unique if those columns haven't been seen before.
unique = {}
for item in mylist:
if item[0:2] not in unique:
unique[item[0:2]] = item
print unique.values()

Categories