I'm going through a list of individual words and creating a dictionary where the word is the key, and the index of the word is the value.
dictionary = {}
for x in wordlist:
dictionary[x] = wordlist.index(x)
This works fine at the moment, but I want more indexes to be added for when the same word is found a second, or third time etc. So if the phrase was "I am going to go to town", I would be looking to create a dictionary like this:
{'I': 0, 'am' : 1, 'going' : 2, 'to': (3, 5), 'go' : 4, 'town' : 6}
So I suppose I need lists inside the dictionary? And then to append more indexes to them? Any advice on how to accomplish this would be great!
You can do this way:
dictionary = {}
for i, x in enumerate(wordlist):
dictionary.setdefault(x, []).append(i)
Explanation:
You do not need the call to index(). It is more efficient and cooler to use enumerate().
dict.setdefault() uses the first argument as key. If it is not found, inserts the second argument, else it ignores it. Then it returns the (possibly newly inserted) value.
list.append() appends the item to the list.
You will get something like this:
{'I': [0], 'am' : [1], 'going' : [2], 'to': [3, 5], 'go' : [4], 'town' : [6]}
With lists instead of tuples, and using lists even if it is only one element. I really think it is better this way.
UPDATE:
Inspired shamelessly by the comment by #millimoose to the OP (thanks!), this code is nicer and faster, because it does not build a lot of [] that are never inserted in the dictionary:
import collections
dictionary = collections.defaultdict(list)
for i, x in enumerate(wordlist):
dictionary[x].append(i)
>>> wl = ['I', 'am', 'going', 'to', 'go', 'to', 'town']
>>> {w: [i for i, x in enumerate(wl) if x == w] for w in wl}
{'town': [6], 'I': [0], 'am': [1], 'to': [3, 5], 'going': [2], 'go': [4]}
Objects are objects, regardless of where they are.
dictionary[x] = []
...
dictionary[x].append(y)
import collections
dictionary= collections.defaultdict(list)
for i, x in enumerate( wordlist ) :
dictionary[x].append( i )
A possible solution:
dictionary= {}
for i, x in enumerate(wordlist):
if not x in dictionary : dictionary[x]= []
dictionary[x].append( i )
Related
Simple set-up: I have a list (roughly 40,000 entries) containing lists of strings (each with 2-15 elements). I want to compare all of the sublists to check if they have a common element (they share at most one). At the end, I want to create a dictionary (graph if you wish) where the index of each sublist is used as a key, and its values are the indices of the other sublists with which it shares common elements.
For example
lst = [['dam', 'aam','adm', 'ada', 'adam'], ['va','ea','ev','eva'], ['va','aa','av','ava']]
should give the following:
dic = {0: [], 1: [2], 2: [1]}
My problem is that I found a solution, but it's very computationally expensive. First, I wrote a function to compute the intersection of two lists:
def intersection(lst1, lst2):
temp = set(lst2)
lst3 = [value for value in lst1 if value in temp]
return lst3
Then I would loop over all the lists to check for intersections:
dic = {}
iter_range = range(len(lst))
#loop over all lists where k != i
for i in iter_range:
#create range that doesn't contain i
new_range = list(iter_range)
new_range.remove(i)
lst = []
for k in new_range:
#check if the lists at position i and k intersect
if len(intersection(mod_names[i], mod_names[k])) > 0:
lst.append(k)
# fill dictionary
dic[i] = lst
I know that for loops are slow, and that I'm looping over the list unnecessarily often (in the above example, I compare 1 with 2, then 2 with 1), but I don't know how to change it to make the program run faster.
You can create a dict word_occurs_in which will store data which word occurs in which lists, for your sample that would be:
{'dam': [0], 'aam': [0], 'adm': [0], 'ada': [0], 'adam': [0], 'va':
[1, 2], 'ea': [1], 'ev': [1], 'eva': [1], 'aa': [2], 'av': [2], 'ava':
[2]}
Then you can create a new dict, let's call it result, in which you should store the final result, e.g. {0: [], 1: [2], 2: [1]} in your case.
Now, to get result from word_occurs_in, you should traverse the values of word_occurs_in and see if the list has more then one element. If it does, then you just need add all other values except the value of the currently observed key in result. For instance, when checking the value [1, 2] (for key 'va'), you' will add 1 to the value corresponding to 2 in the result dict and will add 2 to the value corresponding to key 1. I hope this helps.
In my understanding, the biggest complexity to your code comes from iterating the list of 40K entries twice, so this approach iterates the list only once, but uses a bit more space.
Maybe I didn't explain myself sufficiently, so here is the code:
from collections import defaultdict
lst = [['dam', 'aam', 'adm', 'ada', 'adam'], ['va', 'ea', 'ev', 'eva'], ['va', 'aa', 'av', 'ava']]
word_occurs_in = defaultdict(list)
for idx, l in enumerate(lst):
for i in l:
word_occurs_in[i].append(idx)
print(word_occurs_in)
result = defaultdict(list)
for v in word_occurs_in.values():
if len(v) > 1:
for j in v:
result[j].extend([k for k in v if k != j])
print(result)
I want to do something like this:
myList = [10, 20, 30]
yourList = myList.append(40)
Unfortunately, list append does not return the modified list.
So, how can I allow append to return the new list?
See also: Why do these list operations (methods) return None, rather than the resulting list?
Don't use append but concatenation instead:
yourList = myList + [40]
This returns a new list; myList will not be affected. If you need to have myList affected as well either use .append() anyway, then assign yourList separately from (a copy of) myList.
In python 3 you may create new list by unpacking old one and adding new element:
a = [1,2,3]
b = [*a,4] # b = [1,2,3,4]
when you do:
myList + [40]
You actually have 3 lists.
list.append is a built-in and therefore cannot be changed. But if you're willing to use something other than append, you could try +:
In [106]: myList = [10,20,30]
In [107]: yourList = myList + [40]
In [108]: print myList
[10, 20, 30]
In [109]: print yourList
[10, 20, 30, 40]
Of course, the downside to this is that a new list is created which takes a lot more time than append
Hope this helps
Try using itertools.chain(myList, [40]). That will return a generator as a sequence, rather than allocating a new list. Essentially, that returns all of the elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted.
Unfortunately, none of the answers here solve exactly what was asked. Here is a simple approach:
lst = [1, 2, 3]
lst.append(4) or lst # the returned value here would be the OP's `yourList`
# [1, 2, 3, 4]
One may ask the real need of doing this, like when someone needs to improve RAM usage, do micro-benchmarks etc. that are, usually, useless. However, sometimes someone is really "asking what was asked" (I don't know if this is the case here) and the reality is more diverse than we can know of. So here is a (contrived because out-of-a-context) usage...
Instead of doing this:
dic = {"a": [1], "b": [2], "c": [3]}
key, val = "d", 4 # <- example
if key in dic:
dic[key].append(val)
else:
dic[key] = [val]
dic
# {'a': [1], 'b': [2], 'c': [3], 'd': [4]}
key, val = "b", 5 # <- example
if key in dic:
dic[key].append(val)
else:
dic[key] = [val]
dic
# {'a': [1], 'b': [2, 5], 'c': [3], 'd': [4]}
One can use the OR expression above in any place an expression is needed (instead of a statement):
key, val = "d", 4 # <- example
dic[key] = dic[key].append(val) or dic[key] if key in dic else [val]
# {'a': [1], 'b': [2], 'c': [3], 'd': [4]}
key, val = "b", 5 # <- example
dic[key] = dic[key].append(val) or dic[key] if key in dic else [val]
# {'a': [1], 'b': [2, 5], 'c': [3], 'd': [4]}
Or, equivalently, when there are no falsy values in the lists, one can try dic.get(key, <default value>) in some better way.
You can subclass the built-in list type and redefine the 'append' method. Or even better, create a new one which will do what you want it to do. Below is the code for a redefined 'append' method.
#!/usr/bin/env python
class MyList(list):
def append(self, element):
return MyList(self + [element])
def main():
l = MyList()
l1 = l.append(1)
l2 = l1.append(2)
l3 = l2.append(3)
print "Original list: %s, type %s" % (l, l.__class__.__name__)
print "List 1: %s, type %s" % (l1, l1.__class__.__name__)
print "List 2: %s, type %s" % (l2, l2.__class__.__name__)
print "List 3: %s, type %s" % (l3, l3.__class__.__name__)
if __name__ == '__main__':
main()
Hope that helps.
Just to expand on Storstamp's answer
You only need to do
myList.append(40)
It will append it to the original list,now you can return the variable containing the original list.
If you are working with very large lists this is the way to go.
You only need to do
myList.append(40)
It will append it to the original list, not return a new list.
I wrote this code to perform as a simple search engine in a list of strings like the example below:
mii(['hello world','hello','hello cat','hellolot of cats']) == {'hello': {0, 1, 2}, 'cat': {2}, 'of': {3}, 'world': {0}, 'cats': {3}, 'hellolot': {3}}
but I constantly get the error
'dict' object has no attribute 'add'
how can I fix it?
def mii(strlist):
word={}
index={}
for str in strlist:
for str2 in str.split():
if str2 in word==False:
word.add(str2)
i={}
for (n,m) in list(enumerate(strlist)):
k=m.split()
if str2 in k:
i.add(n)
index.add(i)
return { x:y for (x,y) in zip(word,index)}
In Python, when you initialize an object as word = {} you're creating a dict object and not a set object (which I assume is what you wanted). In order to create a set, use:
word = set()
You might have been confused by Python's Set Comprehension, e.g.:
myset = {e for e in [1, 2, 3, 1]}
which results in a set containing elements 1, 2 and 3. Similarly Dict Comprehension:
mydict = {k: v for k, v in [(1, 2)]}
results in a dictionary with key-value pair 1: 2.
x = [1, 2, 3] # is a literal that creates a list (mutable array).
x = [] # creates an empty list.
x = (1, 2, 3) # is a literal that creates a tuple (constant list).
x = () # creates an empty tuple.
x = {1, 2, 3} # is a literal that creates a set.
x = {} # confusingly creates an empty dictionary (hash array), NOT a set, because dictionaries were there first in python.
Use
x = set() # to create an empty set.
Also note that
x = {"first": 1, "unordered": 2, "hash": 3} # is a literal that creates a dictionary, just to mix things up.
I see lots of issues in your function -
In Python {} is an empty dictionary, not a set , to create a set, you should use the builtin function set() .
The if condition - if str2 in word==False: , would never amount to True because of operator chaining, it would be converted to - if str2 in word and word==False , example showing this behavior -
>>> 'a' in 'abcd'==False
False
>>> 'a' in 'abcd'==True
False
In line - for (n,m) in list(enumerate(strlist)) - You do not need to convert the return of enumerate() function to list, you can just iterate over its return value (which is an iterator directly)
Sets do not have any sense of order, when you do - zip(word,index) - there is no guarantee that the elements are zipped in the correct order you want (since they do not have any sense of order at all).
Do not use str as a variable name.
Given this, you are better off directly creating the dictionary from the start , rather than sets.
Code -
def mii(strlist):
word={}
for i, s in enumerate(strlist):
for s2 in s.split():
word.setdefault(s2,set()).add(i)
return word
Demo -
>>> def mii(strlist):
... word={}
... for i, s in enumerate(strlist):
... for s2 in s.split():
... word.setdefault(s2,set()).add(i)
... return word
...
>>> mii(['hello world','hello','hello cat','hellolot of cats'])
{'cats': {3}, 'world': {0}, 'cat': {2}, 'hello': {0, 1, 2}, 'hellolot': {3}, 'of': {3}}
def mii(strlist):
word_list = {}
for index, str in enumerate(strlist):
for word in str.split():
if word not in word_list.keys():
word_list[word] = [index]
else:
word_list[word].append(index)
return word_list
print mii(['hello world','hello','hello cat','hellolot of cats'])
Output:
{'of': [3], 'cat': [2], 'cats': [3], 'hellolot': [3], 'world': [0], 'hello': [0, 1, 2]}
I think this is what you wanted.
How do you take a pre-existing dictionary and essentially add an item from a list into the dictionary as a tuple using a for loop? I made this example below. I want to take color_dict and reformat it so that each item would be in the format 'R':['red',1].
I got as far as below, but then couldn't figure out how to do the last part.
lista = {'red':'R', 'orange':'O', 'yellow':'Y', 'green':'G',
'blue':'B', 'indigo':'I', 'violet':'V'}
color_dict = {'R':1, 'O':2, 'Y':3, 'G':4, 'B':5, 'I':6, 'V':7}
a = color_dict.keys()
color_keys = []
color_vals = []
for x in lista[0::2]:
color_keys.append(x)
for x in lista[1::2]:
color_vals.append(x)
new = zip(color_keys, color_vals)
new_dict = dict(new)
print new_dict
If anyone has any other suggestions that would be great, I'm not understanding how to use dict comprehension.
Basically what you want to do is to loop through the items in lista and for each pair color: colkey find the respective value in color_dict (indexed by colkey). And then you just need to stitch everything together: colkey: [color, color_dict[colkey]] is the new item in the new dict for each item in the lista dict.
You can use a dict comprehension to build this:
>>> new_dict = {colkey: [color, color_dict[colkey]] for color, colkey in lista.items()}
>>> new_dict
{'O': ['orange', 2], 'Y': ['yellow', 3], 'V': ['violet', 7], 'R': ['red', 1], 'G': ['green', 4], 'B': ['blue', 5], 'I': ['indigo', 6]}
I am trying to create a list of lists based on hashes. That is, I want a list of lists of items that hash the same. Is this possible in a single-line comprehension?
Here is the simple code that works without comprehensions:
def list_of_lists(items):
items_by_hash = defaultdict(list)
for item in items:
words_by_key[hash(item)].append(item)
return words_by_key.values()
For example, let's say we have this simple hash function:
def hash(string):
import __builtin__
return __builtin__.hash(string) % 10
Then,
>>> l = ['sam', 'nick', 'nathan', 'mike']
>>> [hash(x) for x in l]
[4, 3, 2, 2]
>>>
>>> list_of_lists(l)
[['nathan', 'mike'], ['nick'], ['sam']]
Is there any way I could do this in a comprehension? I need to be able to reference the dictionary I'm building mid-comprehension, in order to append the next item to the list-value.
This is the best I've got, but it doesn't work:
>>> { hash(word) : [word] for word in l }.values()
[['mike'], ['nick'], ['sam']]
It obviously creates a new list every time which is not what I want. I want something like
{ hash(word) : __this__[hash(word)] + [word] for word in l }.values()
or
>>> dict([ (hash(word), word) for word in l ])
{2: 'mike', 3: 'nick', 4: 'sam'}
but this causes the same problem.
[[y[1] for y in x[1]] for x in itertools.groupby(sorted((hash(y), y)
for y in items), operator.itemgetter(0))]