I'm trying to convert a text file containing DNA sequences to a dictionary in python. The file is setup in columns.
TTT F
TCT S
TAT Y
TGT C
TTC F
import os.path
if os.path.isfile("GeneticCode_2.txt"):
f = open('GeneticCode_2.txt', 'r')
my_dict = eval(f.read())
Trying to get it to:
my_dict = {TTT: F, TCT: S, TAT: Y}
You can use the dict constructor using an iterable of pairs (2-tuples) and pass it the split lines of your file:
with open('GeneticCode_2.txt', 'r') as f:
my_dict = dict(line.split() for line in f)
# works only if file only contains lines that split into exactly 2 tokens
d = {}
with open("GeneticCode_2.txt") as infile:
for line in infile:
k,v = line.strip().split()
d[k] = v
This isn't the most compact way of doing it, but it is very readable.
my_dict = dict()
for line in f.readlines():
parts = line.strip().split()
if not len(parts) < 2:
my_dict[parts[0]] = parts[1]
Related
I have a text file named file.txt with some numbers like the following :
1 79 8.106E-08 2.052E-08 3.837E-08
1 80 -4.766E-09 9.003E-08 4.812E-07
1 90 4.914E-08 1.563E-07 5.193E-07
2 2 9.254E-07 5.166E-06 9.723E-06
2 3 1.366E-06 -5.184E-06 7.580E-06
2 4 2.966E-06 5.979E-07 9.702E-08
2 5 5.254E-07 0.166E-02 9.723E-06
3 23 1.366E-06 -5.184E-03 7.580E-06
3 24 3.244E-03 5.239E-04 9.002E-08
I want to build a python dictionary, where the first number in each row is the key, the second number is always ignored, and the last three numbers are put as values. But in a dictionary, a key can not be repeated, so when I write my code (attached at the end of the question), what I get is
'1' : [ '90' '4.914E-08' '1.563E-07' '5.193E-07' ]
'2' : [ '5' '5.254E-07' '0.166E-02' '9.723E-06' ]
'3' : [ '24' '3.244E-03' '5.239E-04' '9.002E-08' ]
All the other numbers are removed, and only the last row is kept as the values. What I need is to have all the numbers against a key, say 1, to be appended in the dictionary. For example, what I need is :
'1' : ['8.106E-08' '2.052E-08' '3.837E-08' '-4.766E-09' '9.003E-08' '4.812E-07' '4.914E-08' '1.563E-07' '5.193E-07']
Is it possible to do it elegantly in python? The code I have right now is the following :
diction = {}
with open("file.txt") as f:
for line in f:
pa = line.split()
diction[pa[0]] = pa[1:]
with open('file.txt') as f:
diction = {pa[0]: pa[1:] for pa in map(str.split, f)}
You can use a defaultdict.
from collections import defaultdict
data = defaultdict(list)
with open("file.txt", "r") as f:
for line in f:
line = line.split()
data[line[0]].extend(line[2:])
Try this:
from collections import defaultdict
diction = defaultdict(list)
with open("file.txt") as f:
for line in f:
key, _, *values = line.strip().split()
diction[key].extend(values)
print(diction)
This is a solution for Python 3, because the statement a, *b = tuple1 is invalid in Python 2. Look at the solution of #cha0site if you are using Python 2.
Make the value of each key in diction be a list and extend that list with each iteration. With your code as it is written now when you say diction[pa[0]] = pa[1:] you're overwriting the value in diction[pa[0]] each time the key appears, which describes the behavior you're seeing.
with open("file.txt") as f:
for line in f:
pa = line.split()
try:
diction[pa[0]].extend(pa[1:])
except KeyError:
diction[pa[0]] = pa[1:]
In this code each value of diction will be a list. In each iteration if the key exists that list will be extended with new values from pa giving you a list of all the values for each key.
To do this in a very simple for loop:
with open('file.txt') as f:
return_dict = {}
for item_list in map(str.split, f):
if item_list[0] not in return_dict:
return_dict[item_list[0]] = []
return_dict[item_list[0]].extend(item_list[1:])
return return_dict
Or, if you wanted to use defaultdict in a one liner-ish:
from collections import defaultdict
with open('file.txt') as f:
return_dict = defaultdict(list)
[return_dict[item_list[0]].extend(item_list[1:]) for item_list in map(str.split, f)]
return return_dict
I have a text file that has word frequencies in the format:
word<space>freq
where freq is a number. I want to sort the file such as the frequencies are in descending order. For that, I have tried the following:
Read the file into a dictionary:
kvp = {}
d = {}
with open("/home/melvyn/word_freq.txt") as myfile:
for line in myfile:
word, freq = line.partition(" ")[::2]
kvp[word.strip()] = int(freq)
Sort the dictionary by values:
d = sorted(kvp.items(), key=lambda x:x[1])
Write the sorted dictionary into another text file:
with open('/home/melvyn/word_freq_sorted.txt', 'w') as f:
json.dump(d, f)
I have the following questions:
1. Sorting is not happening. Why?
2. How can I add new line between every key-value pair while doing a json.dump? Is there a cleaner way to write the dictionary contents into the text file?
Instead of json.dump, try writing to the file with file.write, formatting the strings as needed.
import json
kvp = {}
d = {}
with open("a.txt", "r") as f:
for line in f:
word, freq = line.partition(" ")[::2]
kvp[word.strip()] = int(freq)
d = sorted(kvp.items(), key=lambda x:x[1])
with open("b.txt", "w") as f:
for i, v in d:
f.write(str(i) + " " + str(v) + "\n")
So lets say I want to convert the following to a dictionary where the 1st column is keys, and 2nd column is values.
http://pastebin.com/29bXkYhd
The following code works for this (assume romEdges.txt is the name of the file):
f = open('romEdges.txt')
dic = {}
for l in f:
k, v = l.split()
if k in dic:
dic[k].extend(v)
else:
dic[k] = [v]
f.close()
OK
But why doesn't the code work for this file?
http://pastebin.com/Za0McsAM
If anyone can tell me the correct code for the 2nd text file to work as well I would appreciate it.
Thanks in advance.
You should use append instead of extend
from collections import defaultdict
d = defaultdict(list)
with open("romEdges.txt") as fin:
for line in fin:
k, v = line.strip().split()
d[k].append(v)
print d
or using sets to prevent duplicates
d = defaultdict(set)
with open("romEdges.txt") as fin:
for line in fin:
k, v = line.strip().split()
d[k].add(v)
print d
If you want to append the data to dictionary, then you can use update in python. Please use following code:
f = open('your file name')
dic = {}
for l in f:
k,v = l.split()
if k in dic:
dict.update({k:v })
else:
dic[k] = [v]
print dic
f.close()
output:
{'0100464': ['0100360'], '0100317': ['0100039'], '0100405': ['0100181'], '0100545': ['0100212'], '0100008': ['0000459'], '0100073': ['0100072'], '0100044': ['0100426'], '0100062': ['0100033'], '0100061': ['0000461'], '0100066': ['0100067'], '0100067': ['0100164'], '0100064': ['0100353'], '0100080': ['0100468'], '0100566': ['0100356'], '0100048': ['0100066'], '0100005': ['0100448'], '0100007': ['0100008'], '0100318': ['0100319'], '0100045': ['0100046'], '0100238': ['0100150'], '0100040': ['0100244'], '0100024': ['0100394'], '0100025': ['0100026'], '0100022': ['0100419'], '0100009': ['0100010'], '0100020': ['0100021'], '0100313': ['0100350'], '0100297': ['0100381'], '0100490': ['0100484'], '0100049': ['0100336'], '0100075': ['0100076'], '0100074': ['0100075'], '0100077': ['0000195'], '0100071': ['0100072'], '0100265': ['0000202'], '0100266': ['0000201'], '0100035': ['0100226'], '0100079': ['0100348'], '0100050': ['0100058'], '0100017': ['0100369'], '0100030': ['0100465'], '0100033': ['0100322'], '0100058': ['0100056'], '0100013': ['0100326'], '0100036': ['0100463'], '0100321': ['0100320'], '0100323': ['0100503'], '0100003': ['0100004'], '0100056': ['0100489'], '0100055': ['0100033'], '0100053': ['0100495'], '0100286': ['0100461'], '0100285': ['0100196'], '0100482': ['0100483']}
I have a file named report_data.csv that contains the following:
user,score
a,10
b,15
c,10
a,10
a,5
b,10
I am creating a dictionary from this file using this code:
with open('report_data.csv') as f:
f.readline() # Skip over the column titles
mydict = dict(csv.reader(f, delimiter=','))
After running this code mydict is:
mydict = {'a':5,'b':10,'c':10}
But I want it to be:
mydict = {'a':25,'b':25,'c':10}
In other words, whenever a key that already exists in mydict is encountered while reading a line of the file, the new value in mydict associated with that key should be the sum of the old value and the integer that appears on that line of the file. How can I do this?
The most straightforward way is to use defaultdict or Counter from useful collections module.
from collections import Counter
summary = Counter()
with open('report_data.csv') as f:
f.readline()
for line in f:
lbl, n = line.split(",")
n = int(n)
summary[lbl] = summary[lbl] + n
One of the most useful features in Counter class is the most_common() function, that is absent from the plain dictionaries and from defaultdict
This should work for you:
with open('report_data.csv') as f:
f.readline()
mydict = {}
for line in csv.reader(f, delimiter=','):
mydict[line[0]] = mydict.get(line[0], 0) + int(line[1])
try this.
mydict = {}
with open('report_data.csv') as f:
f.readline()
x = csv.reader(f, delimiter=',')
for x1 in x:
if mydict.get(x1[0]):
mydict[x1[0]] += int(x1[1])
else:
mydict[x1[0]] = int(x1[1])
print mydict
Im trying to find out how to get certain data from a file in the easiest way possible. I have searched all over the internet but can't find anything. I want to be able to do this:
File.txt:
data1 = 1
data2 = 2
but i want to get only data1 like so,
p = open('file.txt')
f = p.get(data1)
print(f)
Any Ideas, Thanks in advance.
You can do:
with open("file.txt", "r") as f:
for line in f:
key, val = line.split('=')
key = key.strip()
val = val.strip()
if key == 'data1': # if data1 is not the first line
# do something with value and data
using map:
from operator import methodcaller
with open("file.txt", "r") as f:
for line in f:
key, val = map(methodcaller("strip", " "), line.split('='))
if key == "data1":
# do something with value and data
with open("file.txt", "r") as f:
key, val = f.readline().split('=')
if key.strip() == 'data1': # if data1 is not the first line
# do something with value and data
If you know you only want data1 which is on the first line, you can do
with open('file.txt', 'r') as f:
key, val = tuple(x.strip() for x in f.readline().split('='))
The list comprehension is used to remove the whitespace from each string.