I have dictiony which contains 36 data items. I want to replicate each record 100 times. So total records would be 3600.
def createDataReplication(text_list):
data_item = {}
print(len(text_list))
for k,v in text_list.iteritems():
for i in range(0,100):
data_item[k+str(i)] = v
print(len(data_item))
output
36
3510
Why it's 3510 and not 3600? Am I doing any mistake?
The concatenation k+str(i) is repeated for some combinations of k and i. Dictionary keys must be unique. This causes existing keys to be overwritten.
I suggest you use tuple keys instead which, in addition, aligns data structure with your logic:
for k, v in text_list.iteritems():
for i in range(100):
data_item[(k, i)] = v
Consider that a key like '110' could be created in two ways:
k+str(i) = '1' + str(10) or
k+str(i) = '11' + str(0).
You need to replace k+str(i) with something that is guaranteed to create unique key values. One way to do that is make the key a tuple: (k, i):
data_item[k,i] = v
Related
I am trying to use key list elements as keys and test_list elements as values and create a new dictionary dict.
test_list = ["a",1,"b",2,"c",3,"d",4,"f",5]
key_list = ["name", "number"]
dic = {}
for i in range(0,len(test_list),2):
for j in range(i,i+2):
dic[key_list[j]] = test_list[j]
print(dic)
This is probably what you meant to do:
for i in range(0,len(test_list) - 1 ,2):
for j in range(i,i+2):
dic[key_list[j%2] + str(i//2)] = test_list[j]
3 notes: (1) - key_list is of size 2, so I changed the indexing to j%2. (2) - Your inner loop runs up to i+1, so you need your outer loop to run one less iteration. (3) - a dictionary only contains one value per key, so I've added + str(i//2) to the key, because I assume you want all the values and not just the last one.
I have files with a lines as such, where every row has an index (a,b) and then a list of number associated to them
a\t12|123|08340|4985
b\t3856|12|276
What i want is to get to this output
12 a
123 a
8340 a
4985 a
3856 b
276 b
Note that I am only wanting to output a unique set of the genes, with the value of first occurrence in case there are more than one of the same numbers in the rows.
I went about it in this way: by trying to add the numbers to a dictionary with the letter as keys, and the numbers as values. Finally, only outputting the set() of the numbers together with the corresponding letter.
uniqueval = set()
d = defaultdict(list)
for line in file:
fields = line.strip().split(\t)
Idx = fields[0]
Values = fields[1].split("|")
for Val in Values:
uniqueval.add(Val)
d[Idx] += Val
for u in uniqueval:
print u,"\t", [key for key in d.keys() if u in d.values()]
The script runs, but when I look into the dictionary, the Val's are all split by character, as such:
{'a': ['1','2','1'....], 'b': ['3', '8',....]}
I don't understand why the Values get split since it's in a for loop, I thought it was going to take each Val as a new value to add to the dict. Could you help me understand this issue?
Thank you.
You are extending your lists with Val:
d[Idx] += Val
This adds each character in Val as a separate element.
Use append() instead:
d[Idx].append(Val)
In Python I currently have a Dictionary with a composite Key. In this dictionary there are multiple occurences of these keys. (The keys are comma-separated):
(A,B), (A,C), (A,B), (A,D), (C,A), (A,B), (C,A), (C,B), (C,B)
I already have something that totals the unique occurrences and counts the duplicates which gives me a print-out similar to this:
(A,B) with a count of 4, (A,C) with a count of 2, (B,C) with a count of 6, etc.
I would like to know how to code a loop that would give me the following:
Print out the first occurance of the first part of the key and its associtated values and counts.
Name: A:
Type Count
B 4
C 2
Total 6
Name: B:
Type Count
A 3
B 2
C 3
Total 8
I know I need to create a loop where the first statement = the first statement and do the following, but have no real idea how to approach/code this.
Here's a slightly slow algorithm that'll get it done:
def convert(myDict):
keys = myDict.keys()
answer = collections.defaultdict(dict)
for key in keys:
for k in [k for k in keys if k.startswith(key[0])]:
answer[key[0]][k[1]] = myDict[k]
return answer
Ultimately, I think what you're after is a trie
Its a little misleading to say that your dictionary has multiple values for a given key. Python doesn't allow that. Instead, what you have are keys that are tuples. You want to unpack those tuples and rebuild a nested dictionary.
Here's how I'd do it:
import collections
# rebuild data structure
nested = collections.defaultdict(dict)
for k, v in myDict.items():
k1, k2 = k # unpack key tuple
nested[k1][k2] = v
# print out data in the desired format (with totals)
for k1, inner in nested.items():
print("%s\tType\tCount" % k1)
total = 0
for k2, v in innner.items():
print("\t%s\t%d" % (k2, v))
total += v
print("\tTotal\t%d" % total)
I have a file with a list of paired entries (keys) that goes like this:
6416 2318
84665 88
90 2339
2624 5371
6118 6774
And I've got another file with the values to those keys:
266743 Q8IUM7
64343 H7BXU6
64343 Q9H6S1
64343 C9JB40
23301 Q8NDI1
23301 A8K930
As you can see the same key can have more than one value. What I'm trying to do is creating a dictionary by automatically creating the initial k, v pair, and then append more values for each entry that is already in the dictionary, like this:
Program finds "266743: 'Q8IUM7'", then "64343: 'H7BXU6'". And when it finds "64343: 'Q9H6S1'" it does this: "64343: ['H7BXU6', 'Q9H6S1']".
This is what I have so far:
# Create dictionary
data = {}
for line in inmap:
value = []
k, v = [x.strip() for x in line.split('\t')]
data[k] = value.append(v)
if k in data.viewkeys() == True and v in data.viewvalues() == False:
data[k] = value.append(v)
But the if statement seems to not be working. That or having the value = [] inside the for loop. Any thoughts?
This is not a good idea. You should be using a list from the start and expand that list as you go along, not change from "string" to "list of strings" when more than one value is found for the key.
For this, you can simply use
from collections import defaultdict
data = defaultdict(list)
for line in inmap:
k, v = (x.strip() for x in line.split('\t'))
data[k].append(v)
This works because a defaultdict of type list will automatically create a key together with an empty list as its value when you try to reference a key that doesn't yet exist. Otherwise, it behaves just like a normal dictionary.
Result:
>>> data
defaultdict(<type 'list'>, {'23301': ['Q8NDI1', 'A8K930'],
'64343': ['H7BXU6', 'Q9H6S1', 'C9JB40'], '266743': ['Q8IUM7']})
How can I take a list of values (percentages):
example = [(1,100), (1,50), (2,50), (1,100), (3,100), (2,50), (3,50)]
and return a dictionary:
example_dict = {1:250, 2:100, 3:150}
and recalculate by dividing by sum(example_dict.values())/100:
final_dict = {1:50, 2:20, 3:30}
The methods I have tried for mapping the list of values to a dictionary results in values being iterated over rather than summed.
Edit:
Since it was asked here are some attempts (after just writing over old values) that went no where and demonstrate my 'noviceness' with python:
{k: +=v if k==w[x][0] for x in range(0,len(w),1)}
invalid
for i in w[x][0] in range(0,len(w),1):
for item in r:
+=v (don't where I was going on that one)
invalid again.
another similar one that was invalid, nothing on google, then to SO.
You could try something like this:
total = float(sum(v for k,v in example))
example_dict = {}
for k,v in example:
example_dict[k] = example_dict.get(k, 0) + v * 100 / total
See it working online: ideone
Use the Counter class:
from collections import Counter
totals = Counter()
for k, v in example: totals.update({k:v})
total = sum(totals.values())
final_dict = {k: 100 * v // total for k, v in totals.items()}