Simple set-up: I have a list (roughly 40,000 entries) containing lists of strings (each with 2-15 elements). I want to compare all of the sublists to check if they have a common element (they share at most one). At the end, I want to create a dictionary (graph if you wish) where the index of each sublist is used as a key, and its values are the indices of the other sublists with which it shares common elements.
For example
lst = [['dam', 'aam','adm', 'ada', 'adam'], ['va','ea','ev','eva'], ['va','aa','av','ava']]
should give the following:
dic = {0: [], 1: [2], 2: [1]}
My problem is that I found a solution, but it's very computationally expensive. First, I wrote a function to compute the intersection of two lists:
def intersection(lst1, lst2):
temp = set(lst2)
lst3 = [value for value in lst1 if value in temp]
return lst3
Then I would loop over all the lists to check for intersections:
dic = {}
iter_range = range(len(lst))
#loop over all lists where k != i
for i in iter_range:
#create range that doesn't contain i
new_range = list(iter_range)
new_range.remove(i)
lst = []
for k in new_range:
#check if the lists at position i and k intersect
if len(intersection(mod_names[i], mod_names[k])) > 0:
lst.append(k)
# fill dictionary
dic[i] = lst
I know that for loops are slow, and that I'm looping over the list unnecessarily often (in the above example, I compare 1 with 2, then 2 with 1), but I don't know how to change it to make the program run faster.
You can create a dict word_occurs_in which will store data which word occurs in which lists, for your sample that would be:
{'dam': [0], 'aam': [0], 'adm': [0], 'ada': [0], 'adam': [0], 'va':
[1, 2], 'ea': [1], 'ev': [1], 'eva': [1], 'aa': [2], 'av': [2], 'ava':
[2]}
Then you can create a new dict, let's call it result, in which you should store the final result, e.g. {0: [], 1: [2], 2: [1]} in your case.
Now, to get result from word_occurs_in, you should traverse the values of word_occurs_in and see if the list has more then one element. If it does, then you just need add all other values except the value of the currently observed key in result. For instance, when checking the value [1, 2] (for key 'va'), you' will add 1 to the value corresponding to 2 in the result dict and will add 2 to the value corresponding to key 1. I hope this helps.
In my understanding, the biggest complexity to your code comes from iterating the list of 40K entries twice, so this approach iterates the list only once, but uses a bit more space.
Maybe I didn't explain myself sufficiently, so here is the code:
from collections import defaultdict
lst = [['dam', 'aam', 'adm', 'ada', 'adam'], ['va', 'ea', 'ev', 'eva'], ['va', 'aa', 'av', 'ava']]
word_occurs_in = defaultdict(list)
for idx, l in enumerate(lst):
for i in l:
word_occurs_in[i].append(idx)
print(word_occurs_in)
result = defaultdict(list)
for v in word_occurs_in.values():
if len(v) > 1:
for j in v:
result[j].extend([k for k in v if k != j])
print(result)
Related
I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}
I have an issue with time in my latest python script.
In essence, i have two lists, e.g.
List1:
([a,1],[b,2])
List2:
([a,3],[b,4])
Now in the example above i have provided two entries in each list. However, in reality there is about 150,000.
In my current script I retrieve the first field from the first list [a] and loop through the entire List2 till there is a match. The two list entries are then appended.
The final result would be:
([a,1,3],[b,2,4])
However, given the size of the lists this is taking forever.
Is there a way i can use the field of list1 [a] and in constant time retrieve all entries in list2 that have [a]
I have seen some answers online suggesting sets, but i am unsure as to how to implement one and use it to solve the solution above.
Any help would be appreciated.
Further example:
l1=(['abc123','hi'], ['efg456','bye']) - l1 has around 2000 tuples
l2=(['abc123','letter'],['abc123','john'],['abc123','leaf']) - l2 has around 100,000+ tuples
Output:
l3=(['abc123','hi','letter'],['abc123','hi','john'],['abc123','hi','leaf'])
Not so hard, just use a dict for list1 and a for loop for list2.
dict1 = {key1: [value1] for key1, value1 in list1} # convert list1 to dict
# and the values should be converted to dict
for key2, value2 in list2:
try:
dict1[key2].append(value2)
except KeyError:
continue # I'm not sure what do you want to do if the keys in list2 didn't exist in list1, so just ignore them
list3 = tuple([key3, *value3] for key3, value3 in dict1.items())
print(list3)
If your a and b values are unique, you can convert the "lists" (what you have is actually a tuple of lists, not a list of lists) into dictionaries and then merge them. For example:
l1 = (['a', 1], ['b', 2], ['c', 5])
l2 = (['a', 3], ['b', 4])
d1 = { k : [v] for [k, v] in l1 }
d2 = { k : [v] for [k, v] in l2 }
for k in d1.keys():
d1[k] += d2.get(k, [])
print(d1)
Output:
{'a': [1, 3], 'b': [2, 4], 'c': [5]}
You can convert that dictionary back to a tuple of lists using a comprehension:
print(tuple([k, *v] for k, v in d1.items()))
Output:
(['a', 1, 3], ['b', 2, 4], ['c', 5])
So I have tried many methods to do this but could not find a working solution. In this problem, I have two python arrays and I would like to join them to create a big dictionary. It would go something like this:
`list1 = [
[2, "ford"],
[4,"Ferrari"],
[3, "Mercedes"],
[1, "BMW"]
]`
`list2 = [
[4, "mustang"],
[3,"LaFerrari"],
[2,"CLA"],
[1,"M5"],
[6,"opel"]
]`
The result that I would like to have is a dictionary that looks like this:
`result = {
1: ["BMW","M5"], 2: ["Ford","CLA"], 3: ["Mercedes","LaFerrari"], 4: ["Ferrari","Mustang"], 6:["Opel"]
}`
So it just basically needs to merge these two arrays based on the "key" (which is just the [0] place in the array)
It looks like task for collections.defaultdict I would do:
import collections
list1 = [
[1, "ford"],
[2,"Ferrari"],
[3, "Mercedes"],
[4, "BMW"]
]
list2 = [
[1, "mustang"],
[2,"LaFerrari"],
[3,"CLA"],
[4,"M5"]
]
result = collections.defaultdict(list)
for key, value in list1:
result[key].append(value)
for key, value in list2:
result[key].append(value)
result = dict(result)
print(result)
Output:
{1: ['ford', 'mustang'], 2: ['Ferrari', 'LaFerrari'], 3: ['Mercedes', 'CLA'], 4: ['BMW', 'M5']}
Here I used defaultdict with lists, unlike common dict if you try do something with value under key which do not exist yet, it did place list() i.e. empty list, then do requested action (appending in this case). At end I convert it into dict just to fullfill your requirement (create a python dictionary).
Use collections.defaultdict
from collections import defaultdict
result = defaultdict(list)
for k,v in list1 + list2:
result[k].append(v)
print (dict(result))
#{2: ['ford', 'CLA'], 4: ['Ferrari', 'mustang'], 3: ['Mercedes', 'LaFerrari'], 1: ['BMW', 'M5'], 6: ['opel']}
I am also pretty new to Python, but I think something like this should work if both lists have the same keys:
list1 = [
[1, "ford"],
[2, "Ferrari"],
[3, "Mercedes"],
[4, "BMW"]
]
list2 = [
[1, "mustang"],
[2, "LaFerrari"],
[3, "CLA"],
[4, "M5"]
]
dict1 = dict(list1)
dict2 = dict(list2)
result = {}
for key,val in dict1.items():
result[key] = [val]
for key, val in dict2.items():
result[key].append(val)
print(result)
output
{1: ['ford', 'mustang'], 2: ['Ferrari', 'LaFerrari'], 3: ['Mercedes', 'CLA'], 4: ['BMW', 'M5']}
As already mentioned, I am a newbie too, so there is probably a more "pythonic" way of doing this.
First, create a dict using the values in list1. Then update the lists in the dict with the values from list2, or create new lists for the keys in list2 which don't exist in list1:
result = {i: [j] for i, j in list1} # create initial dict from all values in list1
for i, j in list2:
if i in result:
result[i].append(j) # add to preexisting list corresponding to key
else:
result[i] = [j] # create new list corresponding to key
If your lists will have multiple values, you can use this where you handle the add logic in a separate function:
result = {}
def add_to_dict(d, key, val):
if key in d:
d[key].append(val)
else:
d[key] = [val]
for el in (list1 + list2):
key, *vals = el
for val in vals:
add_to_dict(result, key, val)
Here, rather than assuming each sublist has only 2 elements, we can unpack the key as the first element and the rest of the elements into a list called vals. Then, we can iterate over the list and perform the same adding logic
If you're sure that both lists contain the same number of items and both has a matching first element in each item (1, 2, 3, 4 in your example),
result = {k: [dict(list1)[k], dict(list2)[k]] for k in dict(list1)}
I am pretty new to all of this so this might be a noobie question.. but I am looking to find length of dictionary values... but I do not know how this can be done.
So for example,
d = {'key':['hello', 'brave', 'morning', 'sunset', 'metaphysics']}
I was wondering is there a way I can find the len or number of items of the dictionary value.
Thanks
Sure. In this case, you'd just do:
length_key = len(d['key']) # length of the list stored at `'key'` ...
It's hard to say why you actually want this, but, perhaps it would be useful to create another dict that maps the keys to the length of values:
length_dict = {key: len(value) for key, value in d.items()}
length_key = length_dict['key'] # length of the list stored at `'key'` ...
Lets do some experimentation, to see how we could get/interpret the length of different dict/array values in a dict.
create our test dict, see list and dict comprehensions:
>>> my_dict = {x:[i for i in range(x)] for x in range(4)}
>>> my_dict
{0: [], 1: [0], 2: [0, 1], 3: [0, 1, 2]}
Get the length of the value of a specific key:
>>> my_dict[3]
[0, 1, 2]
>>> len(my_dict[3])
3
Get a dict of the lengths of the values of each key:
>>> key_to_value_lengths = {k:len(v) for k, v in my_dict.items()}
{0: 0, 1: 1, 2: 2, 3: 3}
>>> key_to_value_lengths[2]
2
Get the sum of the lengths of all values in the dict:
>>> [len(x) for x in my_dict.values()]
[0, 1, 2, 3]
>>> sum([len(x) for x in my_dict.values()])
6
To find all of the lengths of the values in a dictionary you can do this:
lengths = [len(v) for v in d.values()]
A common use case I have is a dictionary of numpy arrays or lists where I know they're all the same length, and I just need to know one of them (e.g. I'm plotting timeseries data and each timeseries has the same number of timesteps). I often use this:
length = len(next(iter(d.values())))
Let dictionary be :
dict={'key':['value1','value2']}
If you know the key :
print(len(dict[key]))
else :
val=[len(i) for i in dict.values()]
print(val[0])
# for printing length of 1st key value or length of values in keys if all keys have same amount of values.
d={1:'a',2:'b'}
sum=0
for i in range(0,len(d),1):
sum=sum+1
i=i+1
print i
OUTPUT=2
I'm going through a list of individual words and creating a dictionary where the word is the key, and the index of the word is the value.
dictionary = {}
for x in wordlist:
dictionary[x] = wordlist.index(x)
This works fine at the moment, but I want more indexes to be added for when the same word is found a second, or third time etc. So if the phrase was "I am going to go to town", I would be looking to create a dictionary like this:
{'I': 0, 'am' : 1, 'going' : 2, 'to': (3, 5), 'go' : 4, 'town' : 6}
So I suppose I need lists inside the dictionary? And then to append more indexes to them? Any advice on how to accomplish this would be great!
You can do this way:
dictionary = {}
for i, x in enumerate(wordlist):
dictionary.setdefault(x, []).append(i)
Explanation:
You do not need the call to index(). It is more efficient and cooler to use enumerate().
dict.setdefault() uses the first argument as key. If it is not found, inserts the second argument, else it ignores it. Then it returns the (possibly newly inserted) value.
list.append() appends the item to the list.
You will get something like this:
{'I': [0], 'am' : [1], 'going' : [2], 'to': [3, 5], 'go' : [4], 'town' : [6]}
With lists instead of tuples, and using lists even if it is only one element. I really think it is better this way.
UPDATE:
Inspired shamelessly by the comment by #millimoose to the OP (thanks!), this code is nicer and faster, because it does not build a lot of [] that are never inserted in the dictionary:
import collections
dictionary = collections.defaultdict(list)
for i, x in enumerate(wordlist):
dictionary[x].append(i)
>>> wl = ['I', 'am', 'going', 'to', 'go', 'to', 'town']
>>> {w: [i for i, x in enumerate(wl) if x == w] for w in wl}
{'town': [6], 'I': [0], 'am': [1], 'to': [3, 5], 'going': [2], 'go': [4]}
Objects are objects, regardless of where they are.
dictionary[x] = []
...
dictionary[x].append(y)
import collections
dictionary= collections.defaultdict(list)
for i, x in enumerate( wordlist ) :
dictionary[x].append( i )
A possible solution:
dictionary= {}
for i, x in enumerate(wordlist):
if not x in dictionary : dictionary[x]= []
dictionary[x].append( i )