How to update a large dictionary using a list quickly? - python

I am looking for a fast way to update the values in a (ordered) dictionary, which contains tens of millions of values, where the updated values are stored in a list/array.
The program I am writing takes the list of keys from the original dictionary (which are numerical tuples) as a numpy array, and passes them through a function which returns an array of new numbers (one for each key value). This array is then multiplied with the corresponding dictionary values (through piece-wise array multiplication), and it is this returned 1-D array of values that we wish to use to update the dictionary. The entries in the new array are stored in the order of the corresponding keys, so I could use a loop to go through the dictionary a update the values one-by-one. But this is too inefficient. Is there a faster way in which to update the values in this dictionary which doesn't use loops?
An example of a similar problem would be if the keys in a dictionary represent the x and y-coordinates of points in space, and the values represent the forces being applied at that point. If we want to calculate the torque experienced at each point from the origin, we would first need a function like:
def euclid(xy):
return (xy[0]**2 + xy[1]**2)**0.5
Which, if xy represents the x, y-tuple, would return the Euclidean distance from the origin. We could then multiply this by the corresponding dictionary value to return the torque, like so:
for xy in dict.keys():
dict[xy] = euclid(xy)*dict[xy]
But this loop is slow, and we could take advantage of array algebra to get the new values in one operation:
new_dict_values = euclid(np.array(dict.keys()))*np.array(dict.values())
And it is here that we wish to find a fast method to update the dictionary, instead of utilising:
i = 0
for key in dict.keys():
dict[key] = new_dict_value[i]
i += 1

That last piece of code isn't just slow. I don't think it does what you want it to do:
for key in dict.keys():
for i in range(len(new_dict_values)):
dict[key] = new_dict_value[i]
For every key in the dictionary, you are iterating through the entire list of new_dict_values and assigning each one to the value of that key, overwriting the value you assigned in the previous iteration of the loop. This will give you a dictionary where every key has the value of the last element in new_dict_value, which I don't think is what you want.
If you are certain that the order of the keys in the dictionary is the same as the order of the values in new_dict_values, then you can do this:
for key, value in zip(dict.keys(), new_dict_values):
dict[key] = value
Edit: Also, in the future there is no need in python to iterate through a range of numbers and access elements of a list via the index. This:
for i in range(len(new_dict_values)):
dict[key] = new_dict_value[i]
is equivalent to this:
for i in new_dict_values:
dict[key] = i

Related

Only the last result of a for-loop iteration is being saved in a dictionary

I am very new to Python and I am currently trying to store results from a for-loop in a dictionary. I am aiming to get "ratios" as keys and "frequency" as values. I want to iterate through a list of unique ratios and count them in a dictionary called comparison_dict. I have done that part and to do it I first created a type list from the comparison_dict values (list_orig_ratios).
frequencies = dict()
sorted_unique_ratios = sorted(unique_ratios)
list_orig_ratios = list(comparison_dict.values())
for ratio in sorted_unique_ratios:
freq = list_orig_ratios.count(ratio)
frequencies = {ratio:freq}
print(frequencies)
When I add the print command at the end of my for-loop I can see all pairs of ratios and their counts but each of them is a separate dictionary. I would like to have them all as a single dictionary, with ratios as keys and frequencies (counts) as values. If I run the print command outside of the loop I see that only the last key:value pair is saved there.
How can I store the results of this for-loop in a single dictionary?
You are assigning a new dict to the name frequencies in each iteration, rather than adding a new key-value pair to the existing dict.
for ratio in sorted_unique_ratios:
freq = list_orig_ratios.count(ratio)
frequencies[ratio] = freq
print(frequencies)
A simpler solution, though, is to use a dictionary comprehension:
frequencies = {ratio: list_orig_ratios.count(ratio) for ratio in sorted(unique_ratios)}

How to compare the values from a dictionary and list through iteration

I am trying to compare the value of each key from a dictionary to each value within a list. Unfortunately, I cannot figure out a way to do that, since in the second for loop, the variable "column" is a string key, and I want to iterate through both the same number of instances in my "row" list as there is columns. Lists can only take integers, and I don't think dictionaries can take integers to iterate. How can I write this, to be able to iterate through both the dictionary and list to compare.
The size of the list is the same as the size of the dictionary, in terms of instances within them.
for row in data:
for column in dna:
if dna[column] != int(row['?????']):
break
else:
print(row[0])
break
else:
print("No match")
The builtin function zip can be used to iterate in parallel over multiple iterables. You can achieve the desired result using it as follows:
for dict_key, list_item in zip(my_dict, my_list):
# your code here
In order to get acces to the keys and values, you should use:
for key, value in your_dictionary.items():
#your code here
when you use your_dictionary.items() what you got is access to an iterable that looks like dict_items([(key1,value1), (key2, value2), ...])

Selecting Key,Value pairs from Key values in another dictionary

I'm trying to select all of the key,values from one dictionary (imgs_dict) that have a key that appears in another dictionary (train_data_labels) and save the result into a third dictionary (training_images). One of the two original dictionaries has keys that are stored as numeric values (train_data_labels) and one is stored as a string (imgs_dict), so I first try to convert the key values of train_data_labels to a string and add leading zeros to match structure of imgs_dict, but this creates a list rather than a dictionary.
# Add leading zeros and convert key values to strings to allow referencing the image dictionary
training_ids=[str(k).zfill(5) for k in train_data_label]
After this I am trying to pull the image ids appearing in training_ids from imgs_dict, this appears to work but creates a list rather than a dictionary.
# Pull the training images from the total image set and save in a training images dictionary
training_images=[imgs_dict[i] for i in training_ids if i in imgs_dict]
It seems like there should be a straight forward way to save a new dictionary with altered keys, then select certain dictionary items that you want from another dictionary based on another list or dictionary but I'm having trouble finding it.
You could use a dictionary comprehension instead of the second list comp.
training_images = {key: imgs_dict[key] for key in training_ids if key in imgs_dict}

Calculating distance between two points using dictionary in python

I am trying to calculate a distance between two locations, using their coordinates. However I don't know how I can access the coordinate values, since they are in a dictionary.
I am very new to coding, and didn't understand any of the code I found regarding this problem, since it's too advanced for me. I don't really know where to start. My main function creates the dictionary: (Edit)
def main():
filename = input("Enter the filename:\n")
file= open(filename, 'r')
rows= file.readlines()
d = {}
list = []
for x in rows:
list.append(x)
#print(list)
for elem in list:
row = elem.split(";")
d[row[3]] = {row[0], row[1]} #these are the indexes that the name and latitude & longitude have in the file
{'Location1': {'40.155444793742276', '28.950292890004903'}, 'Location2': ... }
The dictionary is like this, so the key is the name and then the coordinates are the values. Here is the function, which contains barely anything so far:
def calculate_distance(dictionary, location1, location2):
distance_x = dictionary[location1] - dictionary[location2]
# Here I don't know how I can get the values from the dictionary,
# since there are two values, longitude and latitude...
distance_y = ...
distance = ... # Here I will use the pythagorean theorem
return distance
Basically I just need to know how to work with the dictionary, since I don't know how I can get the values out so I can use them to calculate the distance.
--> How to search a key from a dictionary and get the values to my use. Thank you for answering my stupid question. :)
Well you are starting out, its normal that this makes it more difficult for you.
So lets see, you have a function that outputs a dictionary where the keys are locations and the values are coordinate pairs.
First lets talk about the data types that you use.
location_map={'Location1': {'40.155444793742276', '28.950292890004903'}, 'Location2': ... }
I think there is an issue with your values, it seems that they are sets of strings. This has 2 main advantages for your goal.
First, set objects do not support indexing, this means that you cannot access location_map['Location1'][0] to get the first coordinate. Trying this would give you a TypeError. Instead, by using tuples when creating your map would allow you to index. You can do this by defining the coordinates as tuple([longitude,latitude]) instead of {longitude,latitude}.
Second, it seems that your coordinates are strings, in order to perform arithmetic operations with your data you need a numeric type such as integers or in your case floats. If you are reading longitude and latitude values as strings you can convert them by using float(longitude) and float(latitude).
There are multiple ways to do it, few are listed below:
# option 1
for i, v in data.items(): # to get key and value from dict.
for k in v: # get each element of value (its a set)
print (k)
# option 2
for i, v in data.items(): # to get key and value from dict.
value_data = [k for k in list(v)] # convert set to list and put it in a list
print (i, value_data[0], value_data[1]) # use values from here
I would suggest you to go through the python documentations to get more in-depth knowledge.

How to calculate Euclidian of dictionary with tuple as key

I have created a matrix by using a dictionary with a tuple as the key (e.g. {(user, place) : 1 } )
I need to calculate the Euclidian for each place in the matrix.
I've created a method to do this, but it is extremely inefficient because it iterates through the entire matrix for each place.
def calculateEuclidian(self, place):
count = 0;
for key, value in self.matrix.items():
if(key[1] == place and value == 1):
count += 1
euclidian = math.sqrt(count)
return euclidian
Is there a way to do this more efficiently?
I need the result to be in a dictionary with the place as a key, and the euclidian as the value.
You can use a dictionary comprehension (using a vectorized form is much faster than a for loop) and accumulate the result of the conditionals (0 or 1) as the euclidean value:
def calculateEuclidian(self, place):
return {place: sum(p==place and val==1 for (_,p), val in self.matrix.items())}
With your current data structure, I doubt there is any way you can avoid iterating through the entire dictionary.
If you cannot use another way (or an auxiliary way) of representing your data, iterating through every element of the dict is as efficient as you can get (asymptotically), since there is no way to ask a dict with tuple keys to give you all elements with keys matching (_, place) (where _ denotes "any value"). There are other, and more succinct, ways of writing the iteration code, but you cannot escape the asymptotic efficiency limitation.
If this is your most common operation, and you can in fact use another way of representing your data, you can use a dict[Place, list[User]] instead. That way, you can, in O(1) time, get the list of all users at a certain place, and all you would need to do is count the items in the list using the len(...) function which is also O(1). Obviously, you'll still need to take the sqrt in the end.
There may be ways to make it more Pythonic, but I do not think you can change the overall complexity since you are making a query based off both key and value. I think you have to search the whole matrix for your instances.
you may want to create a new dictionary from your current dictionary which isn't adapted to this kind of search and create a dictionary with place as key and list of (user,value) tuples as values.
Get the tuple list under place key (that'll be fast), then count the times where value is 1 (linear, but on a small set of data)
Keep the original dictionary for euclidian distance computation. Hoping that you don't change the data too often in the program, because you'd need to keep both dicts in-sync.

Categories