Parse a text file in an array python - python

A C G T
A 2 -1 -1 -1
C -1 2 -1 -1
G -1 -1 2 -1
T -1 -1 -1 2
This file is separated by tabs as a text file and I want it to be mapped in a similar format to in python.
{'A': {'A': 91, 'C': -114, 'G': -31, 'T': -123},
'C': {'A': -114, 'C': 100, 'G': -125, 'T': -31},
'G': {'A': -31, 'C': -125, 'G': 100, 'T': -114},
'T': {'A': -123, 'C': -31, 'G': -114, 'T': 91}}
I have tried very had but I cannot figure out how to do this as I am new to python.
Please help.
My code so far:
seq = flines[0]
newseq = []
j = 0
while(l < 4):
i = 2
while(o < 4):
newseq[i][j] = seqLine[i]
i = i + 1;
o = o + 1
j = j + 1
l = l + 1
print (seq)
print(seqLine)

I think this is what you want:
import csv
data = {}
with open('myfile.csv', 'rb') as csvfile:
ntreader = csv.reader(csvfile, delimiter="\t", quotechar='"')
for rowI, rowData in enumerate(ntreader):
if rowI == 0:
headers = rowData[1:]
else:
data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])}
print data
To make life easy I use csv-module and just say tab is delimiter, then I grab the column headers on the first row and use them for all other rows to label the values.
This produces:
{'A ': {'A': '2', 'C': '-1', 'T': '-1 ', 'G': '-1'},
'C': {'A': '-1', 'C': '2', 'T': '-1', 'G': '-1'},
'T': {'A': '-1', 'C': '-1', 'T': '2', 'G': '-1'},
'G': {'A': '-1', 'C': '-1', 'T': '-1', 'G': '2'}}
Edit*
For python <2.7 it should work if you switch the dictionary comprehension line (rowData[0]] = ....) above and use a simple loop in the same place:
rowDict = dict()
for k, v in zip(headers, rowData[1:]):
rowDict[k] = int(v)
data[rowData[0]] = rowDict

Using csv.DictReader gets you most of the way there on your own:
reader = DictReader('file.csv', delimiter='\t')
#dictdata = {row['']: row for row in reader} # <-- python 2.7+ only
dictdata = dict((row[''], row) for row in reader) # <-- python 2.6 safe
Outputs:
{'A': {None: [''], '': 'A', 'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
'C': {'': 'C', 'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
'G': {'': 'G', 'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
'T': {'': 'T', 'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}
To clean up the extraneous keys got messy, and I needed to rebuild the inner dict, but replace the last line with this:
dictdata = {row['']: {key: value for key, value in row.iteritems() if key} for row in reader}
Outputs:
{'A': {'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
'C': {'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
'G': {'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
'T': {'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}
Edit: for Python <2.7
Dictionary comprehensions were added in 2.7. For 2.6 and lower, use the dict constructor:
dictdata = dict((row[''], dict((key, value) for key, value in row.iteritems() if key)) for row in reader)

Related

I want to convert file data into 3d dictionary using python

Like I want this type of dictionary by reading file:
table = {
0: {'A': '1', 'B': '2', 'C': '3'},
1: {'A': '4', 'B': '5', 'C': '6'},
2: {'A': '7', 'B': '8', 'C': '9'}
}
or this will be enough.
table = {
{'A': '1', 'B': '2', 'C': '3'},
{'A': '4', 'B': '5', 'C': '6'},
{'A': '7', 'B': '8', 'C': '9'}
}
I have a file lets name file.txt which has data like
A B C
1 2 3
4 5 6
7 8 9
I am trying but i dint get the result this following is my try:
it gives me output {'A': '7', 'B': '8', 'C': '9'}
I know its obvious it will not give me 3d dict but I don't know how to get there.
array=[]
with open("file.txt") as f:
for line in f:
array = line.split()
break #it will give me array=['A','B','C']
v=[]
dic = {}
for i in range(0,len(array)):
for line in open("file.txt"):
x=0
v = line.split()
dic[ array[i] ] = v[i]
print(dic)
You can use Pandas
# Python env: pip install pandas
# Anaconda env: conda install pandas
import pandas as pd
df = pd.read_table('file.txt', sep=' ')
table = df.to_dict('index')
print(table)
# Output
{0: {'A': 1, 'B': 2, 'C': 3},
1: {'A': 4, 'B': 5, 'C': 6},
2: {'A': 7, 'B': 8, 'C': 9}}
If you want to use just built-in modules, you can use csv.DictReader:
import csv
with open("data.csv", "r") as f_in:
reader = csv.DictReader(f_in, delimiter=" ")
# if the file countains floats use float(v) instead int(v)
# if you want values just strings you can do:
# data = list(reader)
data = [{k: int(v) for k, v in row.items()} for row in reader]
print(data)
Prints:
[{"A": 1, "B": 2, "C": 3}, {"A": 4, "B": 5, "C": 6}, {"A": 7, "B": 8, "C": 9}]
Try to use the following code:
table = {}
with open("file.txt") as f:
headers = next(f).split() # get the headers from the first line
for i, line in enumerate(f):
row = {}
for j, value in enumerate(line.split()):
row[headers[j]] = value
table[i] = row
print(table)
You should get format like this:
{
0: {'A': '1', 'B': '2', 'C': '3'},
1: {'A': '4', 'B': '5', 'C': '6'},
2: {'A': '7', 'B': '8', 'C': '9'}
}
If you only want the inner dictionaries and not the outer structure, you can use a list instead of a dictionary to store the rows:
table = []
with open("file.txt") as f:
headers = next(f).split() # get the headers from the first line
for line in f:
row = {}
for j, value in enumerate(line.split()):
row[headers[j]] = value
table.append(row)
print(table)
This will give you the following output:
[
{'A': '1', 'B': '2', 'C': '3'},
{'A': '4', 'B': '5', 'C': '6'},
{'A': '7', 'B': '8', 'C': '9'}
]
DictReader from the csv module will give you what you seem to need - i.e., a list of dictionaries.
import csv
with open('file.txt', newline='') as data:
result = list(csv.DictReader(data, delimiter=' '))
print(result)
Output:
[{'A': '1', 'B': '2', 'C': '3'}, {'A': '4', 'B': '5', 'C': '6'}, {'A': '7', 'B': '8', 'C': '9'}]
Optionally:
If you have an aversion to module imports you could achieve the same objective as follows:
result = []
with open('file.txt') as data:
columns = data.readline().strip().split()
for line in map(str.strip, data):
result.append(dict(zip(columns, line.split())))
print(result)
Output:
[{'A': '1', 'B': '2', 'C': '3'}, {'A': '4', 'B': '5', 'C': '6'}, {'A': '7', 'B': '8', 'C': '9'}]

Normalization of a nested dictionary in python

I am new to Python and I have a nested dictionary for which I want to normalize the values of the dictionary. For example:
nested_dictionary={'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
And I would like to get the normalization as
Normalized_result={'D': {'D': '0.47', 'B': '0.24', 'C': '0.00', 'A': '0.24', 'K': '0.00', 'J': '0.04'}, 'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
I have seen the example in Normalizing dictionary values which only for one dictionary but I want to go further with nested one.
I have tried to flatten the nested_dictionary and apply the normalization as
import flatdict
d = flatdict.FlatDict(nested_dictionary, delimiter='_')
dd=dict(d)
newDict = dict(zip(dd.keys(), [float(value) for value in dd.values()]))
def normalize(d, target=1.0):
global factor
raw = sum(d.values())
print(raw)
if raw==0:
factor=0
#print('ok')
else:
# print('kok')
factor = target/raw
return {key:value*factor for key,value in d.items()}
normalize(newDict)
And I get the result as
{'D_D': 0.2578125,
'D_B': 0.1328125,
'D_C': 0.0,
'D_A': 0.1328125,
'D_K': 0.0,
'D_J': 0.023437499999999997,
'A_A': 0.39062499999999994,
'A_K': 0.0,
'A_J': 0.06249999999999999}
But what I want is the Normalized_result as above
Thanks in advance.
nested_dictionary = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'},
'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
In this example, your dict values are str type, so we need to convert to float:
nested_dictionary = dict([b, dict([a, float(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': 0.33, 'B': 0.17, 'C': 0.0, 'A': 0.17, 'K': 0.0, 'J': 0.03},
'A': {'A': 0.5, 'K': 0.0, 'J': 0.08}}
The function below is adapted from the link you provided.
It loops through the dictionaries, calculates the factor and updates the values inplace.
for _, d in nested_dictionary.items():
factor = 1.0/sum(d.values())
for k in d:
d[k] = d[k] * factor
nested_dictionary
{'D': {'D': 0.47142857142857136,
'B': 0.24285714285714285,
'C': 0.0,
'A': 0.24285714285714285,
'K': 0.0,
'J': 0.04285714285714285},
'A': {'A': 0.8620689655172414, 'K': 0.0, 'J': 0.13793103448275865}}
If you need to convert back to str, use the function below:
nested_dictionary = dict([b, dict([a, "{:.2f}".format(x)] for a, x in y.items())] for b, y in nested_dictionary.items())
nested_dictionary
{'D': {'D': '0.47',
'B': '0.24',
'C': '0.00',
'A': '0.24',
'K': '0.00',
'J': '0.04'},
'A': {'A': '0.86', 'K': '0.00', 'J': '0.14'}}
This code would do:
def normalize(d, target=1.0):
raw = sum(float(number) for number in d.values())
factor = (target/raw if raw else 0)
return {key: f'{float(value)*factor:.2f}' for key, value in d.items()}
{key: normalize(dct) for key, dct in nested_dictionary.items()}
Turn the string-values in your inner dicts into floats.
Take one of the solutions from the the duplicate, for example really_safe_normalise_in_place.
Use the solution on each dict.
Example:
d = {'D': {'D': '0.33', 'B': '0.17', 'C': '0.00', 'A': '0.17', 'K': '0.00', 'J': '0.03'}, 'A': {'A': '0.50', 'K': '0.00', 'J': '0.08'}}
d = {k: {kk: float(vv) for kk, vv in v.items()} for k, v in d.items()}
for v in d.values():
really_safe_normalise_in_place(v)

Multiple Dictionary in list Manipulation using Python

Input string in the python : multiple dictionaries in the list
input = [{'a': '1', 'b':'2','c':'10'},{'a': '1', 'b':'3','c':'11'},{'a':'2','b':'19','c':'100']
output = [ {'1':{'b': ('2','3'),'c':('10','11')},'2':{'b':(19),'c':(100)}}]
def dic_to_output(dic,col):
df = pd.DataFrame(dic)
colname = list(df.columns)
#print(colname)
colname.remove(str(col))
df = df.groupby('a').agg(lambda x: list(x)).reset_index().set_index('a').T
return df.to_dict()
input = [{'a': '1', 'b':'2','c':'10'},{'a': '1', 'b':'3','c':'11'},{'a':'2','b':'19','c':'100'}]
dic_to_output(dic,'a')
output:
{'1': {'b': ['2', '3'], 'c': ['10', '11']}, '2': {'b': ['19'], 'c': ['100']}}
def func(mylist):
t = {}
for i in d:
itr = iter(i)
k = i[next(itr)]
tmp = t.get(k, {})
for m in itr:
n = i[m]
if (tmp.get(m, None) == None):
tmp[m] = tuple()
if (i[m] not in set(tmp[m])):
tmp[m] += (n,)
t[k] = tmp
print([t])
d = [{'a': '1', 'b':'2','c':'10'},{'a': '1', 'b':'3','c':'11'},{'a':'2','b':'19','c':'100'}]
func(d)
d = [{'a': '1', 'b':'2','c':'10'},{'a': '1', 'b':'3','c':'11'}]
func(d)
d = [{'a': '1', 'd': 4, 'b':'2','c':'10'},{'a': '1', 'd': 4, 'b':'3','c':'11'}]
func(d)
d = [{'a': '1', 'd': 4, 'b':'2','c':'10'},{'a': '1', 'b':'3', 'd': 4,'c':'11'}]
func(d)

Printing a text file as a dictionary python

Text file I am working on
3:d:10:i:30
1:d:10:i:15
4:r:30
1:r:15
2:d:12:r:8
4:l:20
5:i:15
3:l:20:r:22
4:d:30
5:l:15:r:15
I am trying to print a dictionary from a text file that should come out looking like :
{1: {'d': 10, 'i': 15, 'r': 15},
2: {'d': 12, 'r': 8},
3: {'d': 10, 'i': 30, 'l': 20, 'r': 22},
4: {'d': 30, 'l': 20, 'r': 30},
5: { 'i': 15, 'l': 15, 'r': 15}}
Instead my code is overiding each line in the file and only takes the most recent line as the value in the dictionary so it looks like:
{'3': {'l': '20', 'r': '22'},
'1': {'r': '15'},
'4': {'d': '30'},
'2': {'d': '12', 'r': '8'},
'5': {'l': '15', 'r': '15'}})
This is what I have so far
def read_db(file):
d = defaultdict(dict)
for line in open('db1.txt'):
z = line.rstrip().split(':')
d[z[0]]=dict( zip(z[1::2],z[2::2]))
print(d)
I tried doing += but that operand is not working on the dict. I am a bit stuck. Thanks for any help
This is one approach using a simple iteration and dict.setdefault.
Ex:
res = {}
with open(filename) as infile: #Open file to read
for line in infile: #Iterate Each line
line = line.strip().split(":") #Split by colon
res.setdefault(line.pop(0), {}).update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}}
Or using collections.defaultdict
Ex:
from collections import defaultdict
res = defaultdict(dict)
with open(filename) as infile:
for line in infile:
line = line.strip().split(":")
res[line.pop(0)].update(dict(zip(line[::2], line[1::2])))
print(res)
Output:
defaultdict(<class 'dict'>,
{'1': {'d': '10', 'i': '15', 'r': '15'},
'2': {'d': '12', 'r': '8'},
'3': {'d': '10', 'i': '30', 'l': '20', 'r': '22'},
'4': {'d': '30', 'l': '20', 'r': '30'},
'5': {'i': '15', 'l': '15', 'r': '15'}})

Count value in dictionary relative to another value

I have a python dictionary like the one below:
{'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
....
}
I want to count the number of 'NaN' in each value, and return that count with the number of 'NaN' that have the value 'E': True.
So I would like to create a dictionary like this:
{'A': {'NaN': 0, 'E': 0},
'B': {'NaN': 1, 'E': 1},
'C': {'NaN': 2, 'E': 1},
'D': {'NaN': 2, 'E': 2}}
I have this code that returns a dictionary with the count of NaN
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]=0
for k,v in dict.iteritems():
for i in v:
if v[i] == 'NaN':
NaNs[i]+=1
print NaN
How can I add the count of E:True to it?
Ok, why don't you try this:
dict = {'Jason': {'A': 200, 'B': 'NaN', 'C': 34, 'D': 'NaN', 'E': True},
'John': {'A': 250, 'B': '34', 'C':98, 'D': 59, 'E': False},
'Steve': {'A': 230, 'B': '45', 'C':'NaN', 'D': 67, 'E': False},
'Louis': {'A': 220, 'B': '37', 'C':'NaN', 'D': 'Nan', 'E': True},
}
NaNs = {}
for k,v in dict.iteritems():
for i in v:
if i != 'E':
NaNs[i]={'NaN': 0, 'E': 0}
for k,v in dict.iteritems():
for i in v:
if str(v[i]).lower() == 'nan':
NaNs[i]['NaN']+=1
if v['E'] == True:
NaNs[i]['E']+=1
print NaNs
I shouldn't really be going around calling variables dict and NaNs, but I tried to change your code as little as possible.

Categories