Get dictionary mapping values to reference ID - python

I have a list of numpy arrays like the following:
list_list = [np.array([53, 5, 2, 5, 5, 2, 1, 5, 9]), np.array([6, 4, 1,2, 53, 23, 1, 4])]
and a list of IDs for each array above:
ID = [6, 2]
How can I get a dictionary that for each unique value in list_list, I get a list of the IDs which contain it?
For example, for this very simple example, I want something like:
{53: [6, 2], 5: [6], 2: [6, 2], 1: [6, 2], etc}
My actual list_list is over 1000 lists long, with each numpy array containing around 10 million value, so efficiency in the solution is key.
I know that dict(zip(ID, list_list)) will give me a dictionary corresponding an ID with all of its values, but it won't give me a value corresponding to IDs, which is what I want.
Thanks!

The best way to approach a problem like this is to break it into smaller steps. Describe these in a combination of English and pseudo-python as seems appropriate. You seem to have the right idea to get started with zip(ID, list_list). This creates the association between the two lists as we discussed in the comments.
So what next? Well, we want to build a dictionary with the values in list_list as keys. To do so, we need to iterate over the list returned by zip():
for id, list in zip(ID, list_list):
pass
And then we need to iterate over the elements of list to determine the keys of the dictionary:
for id, list in zip(ID, list_list):
for x in list:
pass
Now we need an empty dictionary to add things to:
d = {}
for id, list in zip(ID, list_list):
for x in list:
pass
Next, we need to get a list for the dictionary if it exists. If it doesn't exist, we can use an empty list instead. Then append the id to the list and put it into the dictionary:
d = {}
for id, list in zip(ID, list_list):
for x in list:
l = d.get(x, [])
l.append(id)
d[x] = l
Notice how I describe in words what to do at each step and then translate it into Python. Breaking a problem into small steps like this is a key part of developing your skills as a programmer.

We iterate the Zip(ID,list_list) and to get only the unique elements in the lis by creating a set of it.
Then we will iterate through this set and if that element is not allready present in the dictionary we add it to the dictionary if it's already present we append the id.
import numpy as np
list_list = [np.array([53, 5, 2, 5, 5, 2, 1, 5, 9]), np.array([6, 4, 1,2, 53, 23, 1, 4])]
ID = [6, 2]
dic={}
for id,lis in zip(ID,list_list):
lis=set(lis)
for ele in lis:
if ele not in dic:
dic[ele]=[id]
else:
dic[ele].append(id)
print(dic)
{1: [6, 2], 2: [6, 2], 5: [6], 9: [6], 53: [6, 2], 4: [2], 6: [2], 23: [2]}

Related

Adding Two Lists into One Dictionary (Python)

I have two lists, one containing a list of keys, and another containing the values to be assigned to each key, chronologically, by key.
For example;
key_list = ['cat', 'dog', 'salamander']
value_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
I'm looking to make a quick method that takes these two lists, and from it can spit out a dictionary that looks like this:
key_value_pairs = {
'cat': [1, 4, 7],
'dog': [2, 5, 8],
'salamander': [3, 6, 9]
}
Regardless of the length of the values, I'm looking for a way to just iterate through each value and amend them to a dictionary containing one entry for each item in the key_list. Any ideas?
key_value_pairs = {k: [v for v_i, v in enumerate(value_list) if v_i % len(key_list) == k_i] for k_i, k in enumerate(key_list)}
Edit: that's a fun one-liner, but it has worse time complexity than the following solution, which doesn't use any nested loops:
lists = [[] for _ in key_list]
for i, v in enumerate(value_list):
lists[i % len(key_list)].append(v)
key_value_pairs = dict(zip(keys, lists))

Populate Python dictionaries with pre-assigned keys and values of a particular length

Say for example I have the following dictionary in Python:
memory_map = {'data': [1,2,3], 'theta': [4,5,6,7]}
I would like to create another random dictionary that is identical to memory_map, has the same keys and the same lengths of lists as their values however the elements of the list are populated with random values using the np.random.default_rng(seed).uniform(low, high, size) function.
An example would be: random_dict = {'data': [5,3,1], 'theta': [7,3,4,8]}.
Moreover, if the names of the keys or the lengths of the lists in values change, this should automatically be reflected in the random_dict that is created.
If I add a new key or remove a key from the memory_map this should also be reflected in the random_dict.
So far, I have random_dict = {item: [] for item in list(memory_map.keys())} but am unsure of how to populate the empty list with the random values of the same length.
Thanks.
Looks like you want something like this.
import random
import itertools
def shuffle_map(input_map):
# Grab a list of all values
all_values = list(itertools.chain(*input_map.values()))
random.shuffle(all_values) # Shuffle it
new_map = {} # Initialize a new map
i = 0 # Keep track of how many items we've consumed
for key, value in input_map.items():
n = len(value) # How many values per this key
new_map[key] = all_values[i : i + n] # Assign a slice
i += n
return new_map
memory_map = {"data": [1, 2, 3], "theta": [4, 5, 6, 7]}
print(shuffle_map(memory_map))
This prints out (e.g., consecutive runs)
{'data': [1, 5, 7], 'theta': [2, 4, 3, 6]}
{'data': [6, 7, 1], 'theta': [5, 4, 3, 2]}
{'data': [5, 2, 3], 'theta': [7, 6, 4, 1]}
For the random lists you should take a look at the random module in the standard library, specifically at the random.sample or random.choices, depending on your needs.
For the second request, of automatically update the dictionary based on changes of the first, the easiest way to do it is to create a wrapper around the first dict inheriting from the collections.UserDict class

Can a loop reference a list without naming it, within a global frame?

I have been tasked to group a list by frequency. This is a very common question on SOF and so far the forum has been very educational. However, of all the examples given, only one follows these perimeters:
Sort the given iterable so that its elements end up in the decreasing frequency order.
If two elements have the same frequency, they should end up in the same order as the first appearance in the iterable.
Using these two lists:
[4, 6, 2, 2, 6, 4, 4, 4]
[17, 99, 42]
The following common codes given as solutions to this question have failed.
from collections import Counter
freq = Counter(items)
# Ex 1
# The items dont stay grouped in the final list :(
sorted(items, key = items.count, reverse=True)
sorted(items, key=lambda x: -freq[x])
[4, 4, 4, 4, 6, 2, 2, 6]
# Ex 2
# The order that the items appear in the list gets rearranged :(
sorted(sorted(items), key=freq.get, reverse=True)
[4, 4, 4, 4, 2, 2, 6, 6]
# Ex 3
# With a list of integers, after the quantity gets sorted,
# the int value gets sorted :(
sorted(items, key=lambda x: (freq[x], x), reverse=True)
[99, 42, 17]
I did find a solution that works great though:
s_list = sorted(freq, key=freq.get, reverse=True)
new_list = []
for num in s_list:
for rep in range(freq[num]):
new_list.append(num)
print(new_list)
I can't figure out how the second loop references the number of occurrences though.
I ran the process through pythontutor to visualize it and the code seems to simply know that there are four "4", two "6" and two "2" in the 'items' list. The only solution I can think of is that python can reference a list in a global frame without it being named. Or perhaps being able to utilize the value from the "freq" dictionary. Is this correct?
referenced thread:
Sort list by frequency in python
Yes, the values of freq are the ones making the second loop work.
freq is a Counter:
It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.
In other words, freq is a dictionary which keys are the unique elements of items mapped to the amount of times they appeared in items.
And to illustrate your example:
>>> from collections import Counter
>>> items = [4, 6, 2, 2, 6, 4, 4, 4]
>>> freq = Counter(items)
>>> freq
Counter({4: 4, 6: 2, 2: 2})
So when range(freq[num]) is iterated over in your second loop, all it does is iterating over the amount of times num appeared in items.
Edit 2019-02-13: Additional information and example for Python Tutor
It looks like Python Tutor represents simple build-in types (integers, strings, ...) as-is, and not as "objects" in their own cell.
You can see the references clearly if you use new objects instead of integer. For instance, if you were to wrap the integer such as:
from collections import Counter
class MyIntWrapper:
def __init__(self, value):
self.value = value
items = [4, 6, 2, 2, 6, 4, 4, 4]
items_wrapped = [MyIntWrapper(item) for item in items]
freq = Counter(items_wrapped)
s_list = sorted(freq, key=freq.get, reverse=True)
new_list = []
for num in s_list:
for rep in range(freq[num]):
new_list.append(num)

Best algorithm to get same element index of a list of object in python

I have a list of object data which is has key and value. what is the best algorithm to get list of index element that has the same key.
This is the data:
parentElement = [
{"key":"K","val":"sr"},
{"key":"L","val":"sw"},
{"key":"M","val":"se"},
{"key":"M","val":"ss"},
{"key":"M","val":"sq"},
{"key":"K","val":"sf"},
{"key":"L","val":"sv"},
{"key":"M","val":"sf"},
{"key":"K","val":"sv"},
]
and I want to get output something like:
{0: [5, 8], 1: [6], 2: [3, 4, 7]} or [[0,5,8],[1,6],[2,3,4,7]]
I have try to make the script and it's working, but maybe it's not good enough to handle a lot of data and take a lot of time:
childKey = "key"
parentElement = [
{"key":"K","val":"sr"},
{"key":"L","val":"sw"},
{"key":"M","val":"se"},
{"key":"M","val":"ss"},
{"key":"M","val":"sq"},
{"key":"K","val":"sf"},
{"key":"L","val":"sv"},
{"key":"M","val":"sf"},
{"key":"K","val":"sv"},
]
identicalElementIndex = {}
for pid1, p1 in enumerate(parentElement):
for pid2, p2 in enumerate(parentElement):
if p1 and p2:
if p1[childKey] == p2[childKey] and pid1 is not pid2:
flag = True
for ieIndex in identicalElementIndex.values():
if pid1 in ieIndex: flag = False
if flag:
if not identicalElementIndex.has_key(pid1):
identicalElementIndex.update({pid1:[pid2]})
else:
identicalElementIndex[pid1].append(pid2)
elif not p1: parentElement.pop(pid1)
elif not p2: parentElement.pop(pid2)
print identicalElementIndex
Can anyone suggest a better and faster way to do this?
Thank you
Your algorithm is O(n^2) since it is two nested loops. Each single step of the outer loop you loop through the entire list in the inner loop. This means the time will increase dramatically as the list length increases.
It is also unnecessary. Instead, you can loop through once and just note which key value has been seen at each list index.
This works efficiently in a single pass:
from collections import OrderedDict
od=OrderedDict()
for i, ld in enumerate(parentElement):
od.setdefault(ld['key'], []).append(i)
>>> od
OrderedDict([('K', [0, 5, 8]), ('L', [1, 6]), ('M', [2, 3, 4, 7])])
>>> od.values()
[[0, 5, 8], [1, 6], [2, 3, 4, 7]]
Since we are using an ordered dict, which maintains insertion order, no post sorting is required to create a list with the order of the original list. (Python 3.6 that is not necessary.)
Or, if you want the alternate representation:
>>> {l[0]:l[1:] for l in od.values()}
{0: [5, 8], 1: [6], 2: [3, 4, 7]}
With this form, you do not need to use an Ordered Dict since the order does not matter for the mapping of the first element to the remainder elements.

How to find 2 items in a list that are the same in Python

I have a list populated with ~100 names. The names in this list either occur once or twice and I would like to go through the list to find names that occur twice and names that only occur once. I will also need the position of the reoccurring names in the list and the positions of the names that only appear once.
I'm not sure how I would go about doing this because all the methods I can think of are inefficient as they would go through the whole list even if they have already found a match. Other methods that I can think of would return two duplicate positions. The names that occur twice will not necessarily be adjacent to each other.
For example, if this was the list:
mylist = [ 1, 2, 3, 1, 4, 4, 5, 6]
I would need something that outputs (something like):
[[0,3],[1],[2],[4,5],[6],[7]]
With those numbers being the positions of the duplicate names and the position of the names that occur once.
I am by no means an expert so any help would be appreciated.
You can use enumerate to get the pairs contain index of each element and the element itself then loop over it and store the items as key and indices as values using a collections.OrderedDict (to preserve the order) and dict.setdefault method:
>>> from collections import OrderedDict
>>> d=OrderedDict()
>>> for i,j in enumerate(mylist):
... d.setdefault(j,[]).append(i)
...
>>> d.values()
[[0, 3], [1], [2], [4, 5], [6], [7]]
I would use a dictionary:
mylist = [1,2,3,1,4,4,5,6]
dic = {}
for i in range(0,len(mylist)):
if mylist[i] in dic:
dic[mylist[i]].append(i)
else:
dic[mylist[i]] = [i]
print dic.values()
# prints [[0, 3], [1], [2], [4, 5], [6], [7]]

Categories