add items to dictionary of list - python

I have files with a lines as such, where every row has an index (a,b) and then a list of number associated to them
a\t12|123|08340|4985
b\t3856|12|276
What i want is to get to this output
12 a
123 a
8340 a
4985 a
3856 b
276 b
Note that I am only wanting to output a unique set of the genes, with the value of first occurrence in case there are more than one of the same numbers in the rows.
I went about it in this way: by trying to add the numbers to a dictionary with the letter as keys, and the numbers as values. Finally, only outputting the set() of the numbers together with the corresponding letter.
uniqueval = set()
d = defaultdict(list)
for line in file:
fields = line.strip().split(\t)
Idx = fields[0]
Values = fields[1].split("|")
for Val in Values:
uniqueval.add(Val)
d[Idx] += Val
for u in uniqueval:
print u,"\t", [key for key in d.keys() if u in d.values()]
The script runs, but when I look into the dictionary, the Val's are all split by character, as such:
{'a': ['1','2','1'....], 'b': ['3', '8',....]}
I don't understand why the Values get split since it's in a for loop, I thought it was going to take each Val as a new value to add to the dict. Could you help me understand this issue?
Thank you.

You are extending your lists with Val:
d[Idx] += Val
This adds each character in Val as a separate element.
Use append() instead:
d[Idx].append(Val)

Related

hello, i am getting an error with this code. error [list index out of range]

I am trying to use key list elements as keys and test_list elements as values and create a new dictionary dict.
test_list = ["a",1,"b",2,"c",3,"d",4,"f",5]
key_list = ["name", "number"]
dic = {}
for i in range(0,len(test_list),2):
for j in range(i,i+2):
dic[key_list[j]] = test_list[j]
print(dic)
This is probably what you meant to do:
for i in range(0,len(test_list) - 1 ,2):
for j in range(i,i+2):
dic[key_list[j%2] + str(i//2)] = test_list[j]
3 notes: (1) - key_list is of size 2, so I changed the indexing to j%2. (2) - Your inner loop runs up to i+1, so you need your outer loop to run one less iteration. (3) - a dictionary only contains one value per key, so I've added + str(i//2) to the key, because I assume you want all the values and not just the last one.

Matching values of list type in a Nested Dictionary with certain conditions in Python

I'm sorry if this is a stupid question as I'm new to python and coding.
Do ask if there are any doubts in understanding my problem.
I have a nested dictionary as
Nested_Dictionary = {0:{'item':['Cat','Dog'],'set':'a'},1:{'item':['Living','Dog'],'set':'b'},2:{'item':['Mouse','Cat'],'set':'c'},3:{'item':['Cat','Dog'],'set':'d'},4:{'item':['Cat','Living'],'set':'e'},5:{'item':['Mouse','Cat'],'set':'f'},6:{'item':['Living','Cat'],'set':'g'},7:{'item':['Pigeon','Living'],'set':'h'},8:{'item':['Cat','Dog'],'set':'i'},9:{'item':['Living','Dog'],'set':'j'},10:{'item':['Mouse','Cat'],'set':'k'},11:{'item':['Living','Living'],'set':'l'}}
I want to match the list values in the nested 'item' key and get their upper keys and create a new dictionary with 'item' list values as key and list of all matching elements as their value. The catch is that if 'Living' is in any position in the list 'item', the other position is only used to find the match; and if 'Living' is in both the positions, it's key is appended to all the matches.
The output should be as:
{'Cat,Dog':[0,3,8,1,9,4,11], 'Mouse,Cat':[2,5,10,6,11], 'Pigeon,Living':[7,11]}
'Pigeon,Living' is unique as Pigeon is there only once. 'Living,Living' is appended to all.
As of now I have been able to match the items once without looking into 'Living' in single position ;and appending the keys with 'Living,Living' in 'item' using this way:
compari = defaultdict(list)
for k,v in Nested_Dictionary.items():
for k1,v1 in v.items():
if x == 'item':
compari[v1[0]+','+v1[1]].append(k) #To match items and append their upper keys
compari = {key:value for key,value in compari.items()}
p=compari.get('Living,Living') #To get list of keys in 'Living,Living'
del compari['Living,Living'] #Deleting the 'Living,Living' key
for x,y in compari.items():
compari[x].append(p) #appending list of keys in 'Living,Living' to all elements
print(compari)
I'm getting the output as:
{'Cat,Dog': [0,3,8,11], 'Living,Dog': [1,9,11], 'Mouse,Cat': [2,5,10,11], 'Cat,Living': [4,11], 'Living,Cat': [6,11], 'Pigeon,Living': [7,11]}
I'm stuck at comparing and finding the matches where only one position of the item list is 'Living' where I have to find the match based on the other position. Also suggest if there is any better way of doing what I have already done. Thanks.
You can use collections.defaultdict to store the running matching and original dictionary keys. This answer breaks up the input into specific groups based on the item content: complete item values are saved in full, and any item with at least one Living goes in p:
from collections import defaultdict
Nested_Dictionary = {0:{'item':['Cat','Dog'],'set':'a'},1:{'item':['Living','Dog'],'set':'b'},2:{'item':['Mouse','Cat'],'set':'c'},3:{'item':['Cat','Dog'],'set':'d'},4:{'item':['Cat','Living'],'set':'e'},5:{'item':['Mouse','Cat'],'set':'f'},6:{'item':['Living','Cat'],'set':'g'},7:{'item':['Pigeon','Living'],'set':'h'},8:{'item':['Cat','Dog'],'set':'i'},9:{'item':['Living','Dog'],'set':'j'},10:{'item':['Mouse','Cat'],'set':'k'},11:{'item':['Living','Living'],'set':'l'}}
d, full, p = defaultdict(list), set(), {'partial':set(), 'full':set()}
#separate complete `item`s from those that have "Living"
for a, b in Nested_Dictionary.items():
if 'Living' in b['item']:
p['full' if sum(i == 'Living' for i in b['item']) == 2 else 'partial'].add((a, tuple(b['item'])))
else:
d[(x:=tuple(b['item']))].append(a)
full.add(x)
#match `item`s that have any partial occurrences of "Living"
for v, i in p['partial']:
if (m:=[b for b in full if any(j == k for j, k in zip(i, b))]):
for x in m:
d[x].append(v)
else:
d[i].append(v)
full.add(i)
#update `d` with key value of any `item`s that just have "Living"
for v, i in p['full']:
for x in full:
d[x].append(v)
print({','.join(a):b for a, b in d.items()})
Output:
{'Cat,Dog': [0, 3, 8, 1, 4, 9, 11], 'Mouse,Cat': [2, 5, 10, 6, 11], 'Pigeon,Living': [7, 11]}

Creating replica of data using for loop

I have dictiony which contains 36 data items. I want to replicate each record 100 times. So total records would be 3600.
def createDataReplication(text_list):
data_item = {}
print(len(text_list))
for k,v in text_list.iteritems():
for i in range(0,100):
data_item[k+str(i)] = v
print(len(data_item))
output
36
3510
Why it's 3510 and not 3600? Am I doing any mistake?
The concatenation k+str(i) is repeated for some combinations of k and i. Dictionary keys must be unique. This causes existing keys to be overwritten.
I suggest you use tuple keys instead which, in addition, aligns data structure with your logic:
for k, v in text_list.iteritems():
for i in range(100):
data_item[(k, i)] = v
Consider that a key like '110' could be created in two ways:
k+str(i) = '1' + str(10) or
k+str(i) = '11' + str(0).
You need to replace k+str(i) with something that is guaranteed to create unique key values. One way to do that is make the key a tuple: (k, i):
data_item[k,i] = v

Iterate over dict from value to value

I have a dictionary like this :
data = {1: [u'-', u's'], 2: [u'je', u'co', u'na'], ...}
The KEY is the LENGTH of the words that belong to it. I want to call a function (that will count levenshtein distance) for words that are longer than X and shorter than Y. How would I do it ?
The main problem is getting the dictionary length because len() returns the number of items in the dictionary, not keys.
Now I am doing it like this:
for k in data:
if k >= len(word)-distance and k <= len(word)+distance:
for item in data[k]:
if levenshtein(word, item) == distance:
words.append(item)
return words
data.keys() will give you the keys, you can iterate over them.
You can get a list of all words with length between X and Y with
sum((words for key, words in data.items() if X<key<Y), [])

Retrieve the unique keys (second dimension) of a 2 dimensional dictionary in python?

I have a a 2d dictionary (named d2_dic). I know how to get the unique keys (It's always unique) of the first dimension by d2_dic.keys(). But how do I get the unique keys of the second dimension?
from collections import defaultdict
d2_dic = defaultdict(dict)
d2_dic['1']['a'] = 'Hi'
d2_dic['1']['b'] = 'there'
d2_dic['2']['a'] = '.'
To get the unique keys in the first dimension, i just need to do a d2_dic.keys() {o/p = 1,2}
How do I retrieve the unique keys of the second dimension??
I need an o/p of [a,b]
The entity d2_dic['1'] is itself a dictionary (same with d2_dic['2']). So you can use d2_dic['1'].keys() to get the keys for that dictionary. If you want a list of all the possible keys in the second dimension then you could do the following.
mykeys = []
for k in d2_dic.keys() :
mykeys.extend(d2_dic[k].keys())
# this removes duplicates but destroys order
mykeys = list(set(mykeys))
print mykeys # ['a', 'b']
Apparently, you can also do this in one line with list comprehension, as per the comment by #vaultah: mykeys = list({x for d in d2_dic.values() for x in d.keys()}).
You have to be careful with this though, because d2_dic['2'][mykeys[1]] will resault in KeyError: 'b'. You may want to wrap some of your code in try and except statements. For example:
for i in d2_dic.keys() :
for j in mykeys :
try :
d2_dic[i][j]
except KeyError :
d2_dic[i][j] = None
print i, j, d2_dic[i][j]
Note that these print statements won't work in python 3

Categories