Printing a text file as a dictionary python

Printing a text file as a dictionary python - python

Text file I am working on
3:d:10:i:30
1:d:10:i:15
4:r:30
1:r:15
2:d:12:r:8
4:l:20
5:i:15
3:l:20:r:22
4:d:30
5:l:15:r:15
I am trying to print a dictionary from a text file that should come out looking like :
{1: {'d': 10, 'i': 15, 'r': 15},
2: {'d': 12, 'r': 8},
3: {'d': 10, 'i': 30, 'l': 20, 'r': 22},
4: {'d': 30, 'l': 20, 'r': 30},
5: { 'i': 15, 'l': 15, 'r': 15}}
Instead my code is overiding each line in the file and only takes the most recent line as the value in the dictionary so it looks like:
{'3': {'l': '20', 'r': '22'},
'1': {'r': '15'},
'4': {'d': '30'},
'2': {'d': '12', 'r': '8'},
'5': {'l': '15', 'r': '15'}})
This is what I have so far
def read_db(file):
d = defaultdict(dict)
for line in open('db1.txt'):
z = line.rstrip().split(':')
d[z[0]]=dict( zip(z[1::2],z[2::2]))
print(d)
I tried doing += but that operand is not working on the dict. I am a bit stuck. Thanks for any help

This is one approach using a simple iteration and dict.setdefault.
Ex:
res = {}
with open(filename) as infile: #Open file to read
for line in infile: #Iterate Each line
line = line.strip().split(":") #Split by colon
res.setdefault(line.pop(0), {}).update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}}
Or using collections.defaultdict
Ex:
from collections import defaultdict
res = defaultdict(dict)
with open(filename) as infile:
for line in infile:
line = line.strip().split(":")
res[line.pop(0)].update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
defaultdict(<class 'dict'>,
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}})

Related

iterating over a list of dictionaries with a for loop

I have a variable that looks like this, it contains multiple lists and each list has multiple dictionaries. what i need to do now is:
combine the lists into 1 big list
if 2 dictionaries have the same key i need to combine them(keep 1 of the keys and add their values)
i know i need to use a for loop but how do i reference dictionaries inside a list and how do i refernce the lists stored in the variable?
i tried doing something like this:
for list in bigram_lists:
for list1 in bigram_lists:
list.append(list1)
it gives back the error that dict object has no attribute append
help would be appreciated

import ast
x = "[{'a': 1850}, {'b': 397}, {'c': 811}, {'d': 990}, {'e': 3198}, {'f': 605}, {'g': 435}, {'h': 1339}, {'i': 1904}, {'j': 59}, {'k': 138}, {'l': 946}, {'m': 652}, {'n': 1691}, {'o': 1813}, {'p': 510}, {'q': 13}, {'r': 1469}, {'s': 1695}, {'t': 2322}, {'u': 516}, {'v': 285}, {'w': 353}, {'x': 49}, {'y': 393}, {'z': 23}] [{'a': 3815}, {'b': 716}, {'c': 1989}, {'d': 1904}, {'e': 5429}, {'f': 908}, {'g': 836}, {'h': 1902}, {'i': 3340}, {'j': 42}, {'k': 148}, {'l': 1818}, {'m': 1156}, {'n': 3782}, {'o': 3365}, {'p': 992}, {'q': 98}, {'r': 2683}, {'s': 3125}, {'t': 3708}, {'u': 1123}, {'v': 335}, {'w': 399}, {'x': 153}, {'y': 706}, {'z': 85}] [{'a': 5087}, {'b': 823}, {'c': 1949}, {'d': 2366}, {'e': 6904}, {'f': 1322}, {'g': 1128}, {'h': 2756}, {'i': 3754}, {'j': 138}, {'k': 346}, {'l': 2709}, {'m': 1618}, {'n': 4391}, {'o': 4675}, {'p': 1321}, {'q': 74}, {'r': 3681}, {'s': 3554}, {'t': 5438}, {'u': 1658}, {'v': 519}, {'w': 1012}, {'x': 128}, {'y': 718}, {'z': 53}]"
strs = x.replace(']','],')[:-1]
strs = "[" + strs + "]"
listOfLists = ast.literal_eval(strs)
finalDict = {}
for ls in listOfLists:
for dct in ls:
if (list(dct.keys())[0]) in finalDict:
finalDict[list(dct.keys())[0]] += dct[list(dct.keys())[0]]
else:
finalDict[list(dct.keys())[0]] = dct[list(dct.keys())[0]]
print(finalDict)
gives you
{'a': 10752, 'b': 1936, 'c': 4749, 'd': 5260, 'e': 15531, 'f': 2835, 'g': 2399, 'h': 5997, 'i': 8998, 'j': 239, 'k': 632, 'l': 5473, 'm': 3426, 'n': 9864, 'o': 9853, 'p': 2823, 'q': 185, 'r': 7833, 's': 8374, 't': 11468, 'u': 3297, 'v': 1139, 'w': 1764, 'x': 330, 'y': 1817, 'z': 161}

Working with x as a list of lists, I created a dictionary with multiple keys, that you can split if you want later, but each key has the addition of the same key in each list :
result = {}
for sublist in x:
for elem in sublist:
for key, value in elem.items():
if key not in result:
result[key] = value
else:
result[key] += value
>>> print(result)
{'a': 10752, 'b': 1936, 'c': 4749, 'd': 5260, 'e': 15531, 'f': 2835, 'g': 2399, 'h': 5997, 'i': 8998, 'j': 239, 'k': 632, 'l': 5473, 'm': 3426, 'n': 9864, 'o': 9853, 'p': 2823, 'q': 185, 'r': 7833, 's': 8374, 't': 11468, 'u': 3297, 'v': 1139, 'w': 1764, 'x': 330, 'y': 1817, 'z': 161}

Having corrected the x input as a list of lists:
x = [[{'a': 1850}, {'b': 397}, {'c': 811}, {'d': 990}, {'e':
3198}, {'f': 605}, {'g': 435}, {'h': 1339}, {'i': 1904}, {'j':
59}, {'k': 138}, {'l': 946}, {'m': 652}, {'n': 1691}, {'o':
1813}, {'p': 510}, {'q': 13}, {'r': 1469}, {'s': 1695}, {'t':
2322}, {'u': 516}, {'v': 285}, {'w': 353}, {'x': 49}, {'y': 393},
{'z': 23}],
[{'a': 3815}, {'b': 716}, {'c': 1989}, {'d': 1904}, {'e': 5429},
{'f': 908}, {'g': 836}, {'h': 1902}, {'i': 3340}, {'j': 42},
{'k': 148}, {'l': 1818}, {'m': 1156}, {'n': 3782}, {'o': 3365},
{'p': 992}, {'q': 98}, {'r': 2683}, {'s': 3125}, {'t': 3708},
{'u': 1123}, {'v': 335}, {'w': 399}, {'x': 153}, {'y': 706},
{'z': 85}],
[{'a': 5087}, {'b': 823}, {'c': 1949}, {'d': 2366}, {'e': 6904},
{'f': 1322}, {'g': 1128}, {'h': 2756}, {'i': 3754}, {'j': 138},
{'k': 346}, {'l': 2709}, {'m': 1618}, {'n': 4391}, {'o': 4675},
{'p': 1321}, {'q': 74}, {'r': 3681}, {'s': 3554}, {'t': 5438},
{'u': 1658}, {'v': 519}, {'w': 1012}, {'x': 128}, {'y': 718},
{'z': 53}]]
this:
R=[]
for ld in x:
result = {}
for d in ld:
result.update(d)
R.append(result)
D = dict.fromkeys(R[0].keys(), 0)
for d in R:
for k in R[0].keys():
D[k]+=d[k]
will give you the answer you wanted.

Normalization of a nested dictionary in python

I am new to Python and I have a nested dictionary for which I want to normalize the values of the dictionary. For example:
nested_dictionary={'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
And I would like to get the normalization as
Normalized_result={'D': {'D': '0.47', 'B': '0.24', 'C': '0.00', 'A': '0.24', 'K': '0.00', 'J': '0.04'}, 'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
I have seen the example in Normalizing dictionary values which only for one dictionary but I want to go further with nested one.
I have tried to flatten the nested_dictionary and apply the normalization as
import flatdict
d = flatdict.FlatDict(nested_dictionary, delimiter='_')
dd=dict(d)
newDict = dict(zip(dd.keys(), [float(value) for value in dd.values()]))
def normalize(d, target=1.0):
global factor
raw = sum(d.values())
print(raw)
if raw==0:
factor=0
#print('ok')
else:
# print('kok')
factor = target/raw
return {key:value*factor for key,value in d.items()}
normalize(newDict)
And I get the result as
{'D_D': 0.2578125,
'D_B': 0.1328125,
'D_C': 0.0,
'D_A': 0.1328125,
'D_K': 0.0,
'D_J': 0.023437499999999997,
'A_A': 0.39062499999999994,
'A_K': 0.0,
'A_J': 0.06249999999999999}
But what I want is the Normalized_result as above
Thanks in advance.

nested_dictionary = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'},
'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
In this example, your dict values are str type, so we need to convert to float:
nested_dictionary = dict([b, dict([a, float(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': 0.33, 'B': 0.17, 'C': 0.0, 'A': 0.17, 'K': 0.0, 'J': 0.03},
'A': {'A': 0.5, 'K': 0.0, 'J': 0.08}}
The function below is adapted from the link you provided.
It loops through the dictionaries, calculates the factor and updates the values inplace.
for _, d in nested_dictionary.items():
factor = 1.0/sum(d.values())
for k in d:
d[k] = d[k] * factor
nested_dictionary
{'D': {'D': 0.47142857142857136,
'B': 0.24285714285714285,
'C': 0.0,
'A': 0.24285714285714285,
'K': 0.0,
'J': 0.04285714285714285},
'A': {'A': 0.8620689655172414, 'K': 0.0, 'J': 0.13793103448275865}}
If you need to convert back to str, use the function below:
nested_dictionary = dict([b, dict([a, "{:.2f}".format(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': '0.47',
'B': '0.24',
'C': '0.00',
'A': '0.24',
'K': '0.00',
'J': '0.04'},
'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}

This code would do:
def normalize(d, target=1.0):
raw = sum(float(number) for number in d.values())
factor = (target/raw if raw else 0)
return {key: f'{float(value)*factor:.2f}' for key, value in d.items()}
{key: normalize(dct) for key, dct in nested_dictionary.items()}

Turn the string-values in your inner dicts into floats.
Take one of the solutions from the the duplicate, for example really_safe_normalise_in_place.
Use the solution on each dict.
Example:
d = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
d = {k: {kk: float(vv) for kk, vv in v.items()} for k, v in d.items()}
for v in d.values():
really_safe_normalise_in_place(v)

Creating Max and Min list for Rolling Sublists

I have a list that contains sublists:
[{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
I wish to work my way from left to right, finding the max for 'h' and the min for 'l' within the decreasing number of remaining sublists, including the current sublist being referenced. The result should be as follows.
[{'h': '30', 'l': '3'}, {'h': '30', 'l': '3'}, {'h': '25', 'l': '3'}, {'h': '22', 'l': '3'}, {'h': '22', 'l': '3'}]
Finding the max and min of the whole list is easy enough, but I cannot figure out the best way to "discard" the preceding subsists and only use the remaining sublists in creating the new list.

This code solves your question:
inp = [{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
l1 = inp[::-1]
l2 = []
max1 = int(l1[0]['h'])
min1 = int(l1[0]['l'])
for item in l1:
max1 = int(item['h']) if int(item['h'])>max1 else max1
min1 = int(item['l']) if int(item['l'])<min1 else min1
l2.append({'h':str(max1),'l':str(min1)})
l2 = l2[::-1]
print(l2)
Output
[{'h': '30', 'l': '3'}, {'h': '30', 'l': '3'}, {'h': '25', 'l': '3'}, {'h': '22', 'l': '3'}, {'h': '22', 'l': '3'}]
More info
First, I reversed the input list and named it l1 then I iterate over l1 holding the current max1 for h and min1 for l. I appended the max1 and min1 to l2 in each iteration. and finally I reversed the l2 list.

You can use pandas for a little help. First you should cast your numbers as ints as they are strings currently.
l = [{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
l = [{k: int(v) for k, v in x.items()} for x in l]
Then you can convert them to a dataframe and use cummax and cummin. You'll have to reverse the order to get it how you describe then reverse that output:
df = pd.DataFrame(l).iloc[::-1]
df['h'] = df['h'].cummax()
df['l'] = df['l'].cummin()
df = df.iloc[::-1]
Use to_dict to go back to your original format:
df.to_dict('records')
[{'h': 30, 'l': 3},
{'h': 30, 'l': 3},
{'h': 25, 'l': 3},
{'h': 22, 'l': 3},
{'h': 22, 'l': 3}]

You can solve this problem in O(n).
Imagine you want to solve this problem only for max value, (solving it for min value is the same), you just need to iterate from the end of the list to the beginning, while keeping max values with their indices, for your example, this list would be:
[(22,4), (25,2), (30,1)].
Then when you want to answer, you will have a loop starting from 0 to 4, and while your counter is less than or equal to 1, your answer will be 30. After that while your counter is less than or equal to 2, your answer will be 25, and after than while your counter is less than or equal to 4, your answer for max value will be 22.
The same theory applies for finding minimum value.
You can my solution code below:
data = [{'h': '20', 'l': '9'}, {'h': '30', 'l': '20'}, {'h': '25', 'l': '7'}, {'h': '18', 'l': '19'}, {'h': '22', 'l': '3'}]
mx_list = []
mx = 0 # Suppose numbers are Natural
mn_list = []
mn = 1000000 # Suppose it is bigger than all of our numbers
for i, d in enumerate(reversed(data)):
index = len(data) - i - 1
print(int(d['h']))
if int(d['h']) > mx:
mx_list.append((int(d['h']), index))
mx = int(d['h'])
for i, d in enumerate(reversed(data)):
index = len(data) - i - 1
print(int(d['l']))
if int(d['l']) < mn:
mn_list.append((int(d['l']), index))
mn = int(d['l'])
answer_list = []
for i in range(len(data)):
if i <= mx_list[len(mx_list)-1][1]:
cur_mx = mx_list[len(mx_list)-1][0]
else:
mx_list.pop()
cur_mx = mx_list[len(mx_list)-1][0]
if i <= mn_list[len(mn_list)-1][1]:
cur_mn = mn_list[len(mn_list)-1][0]
else:
mn_list.pop()
cur_mn = mn_list[len(mn_list)-1][0]
answer_list.append({'h': cur_mx, 'l': cur_mn})
print(answer_list)

Count values in dict of pandas series

I have a pandas series of dicts like this:
print(df['genres'])
0 {'0': '1', '1': '4', '2': '23'}
1 {'0': '1', '1': '25', '2': '4', '3': '37'}
2 {'0': '9'}
print(type(df['genres']))
<class 'pandas.core.series.Series'>
print(type(df['genres'][0]))
<class 'dict'>
I want to count the values to get something like this:
{'1': 2, '4': 2, '9': 1, '23': 1, '25': 1, '37': 1}
I tried the following:
print(Counter(chain.from_iterable(df.genres.values)))
Counter({'0': 3, '1': 2, '2': 2, '3': 1})
print(pd.Series(df['genres']).value_counts())
{'0': '1', '1': '4', '2': '23'} 1
{'0': '1', '1': '25', '2': '4', '3': '37'} 1
{'0': '9'} 1
I think it is pretty easy for someone more experienced than me. But I really don't get it ...

Try:
pd.DataFrame(list(df.genres)).stack().value_counts().to_dict()
Output:
{'1': 2, '4': 2, '37': 1, '9': 1, '23': 1, '25': 1}

Parse a text file in an array python

A C G T
A 2 -1 -1 -1
C -1 2 -1 -1
G -1 -1 2 -1
T -1 -1 -1 2
This file is separated by tabs as a text file and I want it to be mapped in a similar format to in python.
{'A': {'A': 91, 'C': -114, 'G': -31, 'T': -123},
'C': {'A': -114, 'C': 100, 'G': -125, 'T': -31},
'G': {'A': -31, 'C': -125, 'G': 100, 'T': -114},
'T': {'A': -123, 'C': -31, 'G': -114, 'T': 91}}
I have tried very had but I cannot figure out how to do this as I am new to python.
Please help.
My code so far:
seq = flines[0]
newseq = []
j = 0
while(l < 4):
i = 2
while(o < 4):
newseq[i][j] = seqLine[i]
i = i + 1;
o = o + 1
j = j + 1
l = l + 1
print (seq)
print(seqLine)

I think this is what you want:
import csv
data = {}
with open('myfile.csv', 'rb') as csvfile:
ntreader = csv.reader(csvfile, delimiter="\t", quotechar='"')
for rowI, rowData in enumerate(ntreader):
if rowI == 0:
headers = rowData[1:]
else:
data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])}
print data
To make life easy I use csv-module and just say tab is delimiter, then I grab the column headers on the first row and use them for all other rows to label the values.
This produces:
{'A ': {'A': '2', 'C': '-1', 'T': '-1 ', 'G': '-1'},
'C': {'A': '-1', 'C': '2', 'T': '-1', 'G': '-1'},
'T': {'A': '-1', 'C': '-1', 'T': '2', 'G': '-1'},
'G': {'A': '-1', 'C': '-1', 'T': '-1', 'G': '2'}}
Edit*
For python <2.7 it should work if you switch the dictionary comprehension line (rowData[0]] = ....) above and use a simple loop in the same place:
rowDict = dict()
for k, v in zip(headers, rowData[1:]):
rowDict[k] = int(v)
data[rowData[0]] = rowDict

Using csv.DictReader gets you most of the way there on your own:
reader = DictReader('file.csv', delimiter='\t')
#dictdata = {row['']: row for row in reader} # <-- python 2.7+ only
dictdata = dict((row[''], row) for row in reader) # <-- python 2.6 safe
Outputs:
{'A': {None: [''], '': 'A', 'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
'C': {'': 'C', 'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
'G': {'': 'G', 'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
'T': {'': 'T', 'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}
To clean up the extraneous keys got messy, and I needed to rebuild the inner dict, but replace the last line with this:
dictdata = {row['']: {key: value for key, value in row.iteritems() if key} for row in reader}
Outputs:
{'A': {'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
'C': {'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
'G': {'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
'T': {'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}
Edit: for Python <2.7
Dictionary comprehensions were added in 2.7. For 2.6 and lower, use the dict constructor:
dictdata = dict((row[''], dict((key, value) for key, value in row.iteritems() if key)) for row in reader)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Printing a text file as a dictionary python - python

Related

iterating over a list of dictionaries with a for loop

Normalization of a nested dictionary in python

Creating Max and Min list for Rolling Sublists

Count values in dict of pandas series

Parse a text file in an array python

Categories

Resources